### 시계열 데이터
* 시계열 데이터
    - 시간의 흐름에 따라 데이터의 변화를 분석 및 예측하는데 사용되는 데이터
    - 타임스탬프(timestamp) : 특정 시점(시간)을 의미하는 자료형
* 기능
    - to_datetime : 데이터를 시간 자료형으로 변환시켜주는 함수
    - to_period : 날짜 데이터의 년, 월, 일 등을 가져올 수 있다.
        * 옵션
            - freq를 이용하여 각각의 날짜 정보를 얻어올 수 있다
            - freq = A(년), M(월), D(일)
            - D지정시 2020-01-01까지 나오며, M지정시 2020-01, A지정시 2020까지 나온다.
    - Datetimeindex자료형
        * dt객체를 제공하며, dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second...등을 제공
>
* 자세한 정보는 다음 사이트를 참조
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html

In [2]:
import pandas as pd
dates = ["2023-01-01", "2023-03-02", "2023-09-05"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-03-02
2,2023-09-05


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3 non-null      object
dtypes: object(1)
memory usage: 156.0+ bytes


In [4]:
df['new_date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 180.0+ bytes


In [6]:
df

Unnamed: 0,date,new_date
0,2023-01-01,2023-01-01
1,2023-03-02,2023-03-02
2,2023-09-05,2023-09-05


In [8]:
df['year'] = df['new_date'].dt.year
df['month'] = df['new_date'].dt.month
df['day'] = df['new_date'].dt.day
print(df.info())
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
 2   year      3 non-null      int32         
 3   month     3 non-null      int32         
 4   day       3 non-null      int32         
dtypes: datetime64[ns](1), int32(3), object(1)
memory usage: 216.0+ bytes
None


Unnamed: 0,date,new_date,year,month,day
0,2023-01-01,2023-01-01,2023,1,1
1,2023-03-02,2023-03-02,2023,3,2
2,2023-09-05,2023-09-05,2023,9,5


- apply 기능

In [9]:
def test(num) :
    return str(num);

In [10]:
number = 100;
type(number)

int

In [11]:
type(test(number)) #형변환

str

- lamdba 함수 사용

In [13]:
lb = lambda x : x+1000

In [14]:
type(lb(1000))

int

In [15]:
lb(1000)

2000

In [20]:
dates = [1,2,3]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,1
1,2
2,3


In [21]:
df['date'] = df['date'].apply(lambda x : x+10)
df

Unnamed: 0,date
0,11
1,12
2,13


- apply 기능 대신 for문 사용 가능

In [25]:
def test(num):
    return num * 100

In [26]:
count = 0;
for i in df['date']:
    print(i)
    count += 1;
    df['date'][count] = test(i)

11
1100
110000


In [27]:
df

Unnamed: 0,date
0,11
1,1100
2,110000


In [28]:
dates = ["2023-01-01", "2023-03-02", "2023-09-05"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-03-02
2,2023-09-05


In [30]:
print("변경 전 : ",type( df['date'][0]))
datetimer = df['date'].apply( lambda x : pd.to_datetime(x))
print("변경 후 : ",type(datetimer[0]))
datetimer

변경 전 :  <class 'str'>
변경 후 :  <class 'pandas._libs.tslibs.timestamps.Timestamp'>


0   2023-01-01
1   2023-03-02
2   2023-09-05
Name: date, dtype: datetime64[ns]

In [44]:
df['year'] = datetimer.apply( lambda x : x.year )
df['month'] = datetimer.apply( lambda x : x.month )
df['day'] = datetimer.apply( lambda x : x.day )
df['dayofweek'] = datetimer.apply( lambda x : x.dayofweek)
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-03-02,2023,3,2,3
2,2023-09-05,2023,9,5,1


- dictionary 변환

In [35]:
df_test = df[['date', 'year']]
df_test

Unnamed: 0,date,year
0,2023-01-01,2023
1,2023-03-02,2023
2,2023-09-05,2023


In [36]:
df_test = df_test.to_dict('list')
df_test

{'date': ['2023-01-01', '2023-03-02', '2023-09-05'],
 'year': [2023, 2023, 2023]}

In [37]:
df_test['date']

['2023-01-01', '2023-03-02', '2023-09-05']

In [38]:
df_test['date'][0]

'2023-01-01'

In [39]:
len(df_test['date'])

3

In [41]:
for i in range(len(df_test['date'])):
    print(i)
    print(df_test['date'][i], ":", df_test['year'][i])

0
2023-01-01 : 2023
1
2023-03-02 : 2023
2
2023-09-05 : 2023


- lambda 사용

In [47]:
df['year'] = datetimer.apply( lambda x : x.year )
df['month'] = datetimer.apply( lambda x : x.month )
df['day'] = datetimer.apply( lambda x : x.day )
df['dayofweek'] = datetimer.apply( lambda x : x.dayofweek)
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-03-02,2023,3,2,3
2,2023-09-05,2023,9,5,1


In [48]:
df['month'] = df['month'].apply( lambda x : 1234 if x > 2 else 333)

In [49]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,3
2,2023-09-05,2023,1234,5,1


In [50]:
df['month'] == 1234

0    False
1     True
2     True
Name: month, dtype: bool

In [51]:
(df['month'] == 1234 ) & (df['year']==2023)

0    False
1     True
2     True
dtype: bool

In [52]:
df[(df['month'] == 1234 ) & (df['year']==2023)]

Unnamed: 0,date,year,month,day,dayofweek
1,2023-03-02,2023,1234,2,3
2,2023-09-05,2023,1234,5,1


In [53]:
df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek']

1    3
2    1
Name: dayofweek, dtype: int64

In [55]:
df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek'] = df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek'].apply( lambda x : 123 if x>2 else 333)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek'] = df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek'].apply( lambda x : 123 if x>2 else 333)


In [56]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,3
2,2023-09-05,2023,1234,5,1


- 위 오류에서 loc문법 사용하라고 함

In [57]:
df.loc[(df['month'] == 1234 ) & (df['year']==2023), "dayofweek"] = df[(df['month'] == 1234 ) & (df['year']==2023)]['dayofweek'].apply( lambda x : 123 if x>2 else 333)

In [58]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,123
2,2023-09-05,2023,1234,5,333
