### 시계열
* 시계열 데이터
    - 시간의 흐름에 따라 데이터의 변화를 분석 및 예측하는데 사용되는 데이터
    - 타임스탬프(timestamp) :  특정 시간(시간)을 의미하는 자료형
        * 기능
            - to_datetime : 데이터를 시간 자료형으로 변환
            - to_period : 날짜 데이터의 년, 월, 일 등을 가져옴
            - 옵션
                * freq를 이용하여 각각의 날짜 정보를 얻어옴
                * freq = A(년), M(월), D(일)
                * D지정시 2023-01-01까지 나오며, M지정시 2023-01, A지정 2023까지 나온다.
            - Datetimeindex자료형
                * dt객체를 제공
                * dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second.. 등 제공
                

In [1]:
import pandas as pd
dates = ['2023-01-01','2023-03-01','2023-05-01']
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-03-01
2,2023-05-01


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3 non-null      object
dtypes: object(1)
memory usage: 152.0+ bytes


In [3]:
df['new_date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 176.0+ bytes


In [4]:
df['new_date'].dt.year
df['year'] = df['new_date'].dt.year
df['month'] = df['new_date'].dt.month
df['day'] = df['new_date'].dt.day
print( df.info() )
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
 2   year      3 non-null      int64         
 3   month     3 non-null      int64         
 4   day       3 non-null      int64         
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 248.0+ bytes
None


Unnamed: 0,date,new_date,year,month,day
0,2023-01-01,2023-01-01,2023,1,1
1,2023-03-01,2023-03-01,2023,3,1
2,2023-05-01,2023-05-01,2023,5,1


## apply와 lambda

In [5]:
# dates = ['2023-01-01','2023-03-01','2023-05-01']
dates = [1,2,3]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,1
1,2
2,3


In [6]:
df['date'].apply(lambda x : x*10)
#df['date'].apply(lambda x : str(x) + 'a')

0    10
1    20
2    30
Name: date, dtype: int64

In [7]:
for i in df['date']:
    print(i*10)

10
20
30


In [8]:
dates = ['2023-01-01','2023-03-01','2023-05-01']
df = pd.DataFrame(dates, columns=['date'])
df


Unnamed: 0,date
0,2023-01-01
1,2023-03-01
2,2023-05-01


In [9]:
datetimer = df['date'].apply(lambda x : pd.to_datetime(x) )
datetimer.dt.month

0    1
1    3
2    5
Name: date, dtype: int64

In [10]:
df['year'] = datetimer.dt.year
df['month'] = datetimer.apply(lambda x : x.month)
df['hour'] = datetimer.apply(lambda x : x.hour)
df['dayofweek'] = datetimer.apply(lambda x : x.dayofweek )
df

Unnamed: 0,date,year,month,hour,dayofweek
0,2023-01-01,2023,1,0,6
1,2023-03-01,2023,3,0,2
2,2023-05-01,2023,5,0,0


In [11]:
df['month'] = df['month'].apply( lambda x : 1111 if x > 2 else x)
df

Unnamed: 0,date,year,month,hour,dayofweek
0,2023-01-01,2023,1,0,6
1,2023-03-01,2023,1111,0,2
2,2023-05-01,2023,1111,0,0


In [12]:
df[(df['year']==2023) & (df['month']==1111)]['dayofweek'] = \
df[(df['year']==2023) & (df['month']==1111)]['dayofweek'].apply(lambda x : 1111 if x > 1 else x)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[(df['year']==2023) & (df['month']==1111)]['dayofweek'] = \


In [13]:
df

Unnamed: 0,date,year,month,hour,dayofweek
0,2023-01-01,2023,1,0,6
1,2023-03-01,2023,1111,0,2
2,2023-05-01,2023,1111,0,0


### DataFrame => 딕셔너리 변경
* to_dic('list')

In [14]:
df_test = df[['date', 'year']]
df_test

Unnamed: 0,date,year
0,2023-01-01,2023
1,2023-03-01,2023
2,2023-05-01,2023


In [15]:
d = df_test.to_dict('list')
d

{'date': ['2023-01-01', '2023-03-01', '2023-05-01'],
 'year': [2023, 2023, 2023]}