# 충남과학고등학교 데이터분석 특강 (3H / 6H)
## 예제로 배우는 데이터 분석
## 초급: 3H, 중급: 6H (초중급 3H 동일)

## Case 6: 대전광역시 공공자전거 타슈 데이터 분석 - 공개 데이터 분석
### Question: 타슈 이용 경로 및 정류장 분석
##### 데이터출처1: [공공데이터포털](https://www.data.go.kr/data/15062798/fileData.do)
##### 데이터출처2: [대전광역시시설관리공단](https://www.djsiseol.or.kr/portal/sub050201.asp)

### Step 1. Question - 타슈 대여 정보 데이터 전처리

#### 도구 불러오기

In [11]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
# %matplotlib widget
%matplotlib inline

#### 대여 기록 원시 데이터

In [12]:
data_path = ['datasets/tashu/2016.csv',
             'datasets/tashu/2017.csv',
             'datasets/tashu/2018.csv',
             'datasets/tashu/2019.csv',
             'datasets/tashu/2020.csv',
            ]
data_path

['datasets/tashu/2016.csv',
 'datasets/tashu/2017.csv',
 'datasets/tashu/2018.csv',
 'datasets/tashu/2019.csv',
 'datasets/tashu/2020.csv']

#### 데이터 불러오기

In [13]:
df = pd.DataFrame()
for path in data_path:
    tdf = pd.read_csv(path)
    df = pd.concat([df, tdf], ignore_index=True)
df = df.drop(df[df.isnull().any(axis=1)].index)
df

Unnamed: 0,대여스테이션,대여일시,반납스테이션,반납일시,이동거리,회원구분
0,46.0,20160101050015,17.0,20160101050451,380.0,0
1,152.0,20160101050629,82.0,20160101053753,3190.0,2
2,133.0,20160101052416,172.0,20160101055647,2070.0,2
3,133.0,20160101052919,172.0,20160101055734,2080.0,2
4,39.0,20160101053244,57.0,20160101054033,860.0,0
...,...,...,...,...,...,...
2791301,87.0,20201231233850,118.0,20210101002139,0.0,2
2791302,182.0,20201231233856,182.0,20201231233937,0.0,2
2791303,42.0,20201231234559,83.0,20210101001215,2470.0,0
2791304,115.0,20201231235132,196.0,20210101001502,1730.0,0


#### 데이터 형 변환 (자료형)

In [14]:
df = df.astype({'대여스테이션': 'int16', '대여일시': 'str', 
                '반납스테이션': 'int16', '반납일시': 'str', 
                '이동거리': 'float32', '회원구분': 'int8'})

#### 데이터 저장: pickle

In [16]:
df.to_pickle('datasets/tashu/tashu_dataset-rental_history.pkl')

### Step 2. 타슈 정류장 정보 데이터 전처리

#### 데이터 경로

In [17]:
path = 'datasets/tashu/loc_20200801.csv'
path

'datasets/tashu/loc_20200801.csv'

In [18]:
df = pd.read_csv(path, index_col='연번')
df

Unnamed: 0_level_0,Station 스테이션/성명,위치,광역시도코드,광역시도명,시군구코드,시군구명,법정동코드,법정동명,행정동코드,행정동명
연번,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,무역전시관입구(택시승강장),대전광역시 유성구 도룡동 3-8,30,대전광역시,30200,유성구,3020012700,도룡동,3020055000,신성동
2,대전컨벤션센터,대전광역시 유성구 도룡동 4-19,30,대전광역시,30200,유성구,3020012700,도룡동,3020055000,신성동
3,한밭수목원1,대전광역시 서구 만년동 396,30,대전광역시,30170,서구,3017012800,만년동,3017065000,만년동
4,초원아파트(104동 버스정류장),대전광역시 서구 만년동 401,30,대전광역시,30170,서구,3017012800,만년동,3017065000,만년동
5,둔산대공원 입구(버스정류장),대전광역시 서구 둔산2동 1521-10,30,대전광역시,30170,서구,3017011200,둔산동,3017064000,둔산2동
...,...,...,...,...,...,...,...,...,...,...
258,천문대입구,대전광역시 유성구 신성동 458,30,대전광역시,30200,유성구,3020012500,신성동,3020055000,신성동
259,대덕대학교,대전광역시 유성구 장동 48,30,대전광역시,30200,유성구,3020012800,장동,3020055000,신성동
260,오정농수산물 도매시장,대전광역시 대덕구 오정동 45-1,30,대전광역시,30230,대덕구,3023010100,오정동,3023051000,오정동
261,도로교통공단(건너편 라도무스),대전광역시 유성구 원신흥동 608,30,대전광역시,30200,유성구,3020011400,원신흥동,3020061000,원신흥동


In [19]:
df = df.astype({'Station 스테이션/성명': 'str', 
                '위치': 'str', '광역시도코드': 'int8', 
                '광역시도명': 'str', '시군구코드': 'int32', 
                '시군구명': 'str', '법정동코드': 'int64', 
                '법정동명': 'str', '행정동코드': 'int64', 
                '행정동명': 'str'})
df

Unnamed: 0_level_0,Station 스테이션/성명,위치,광역시도코드,광역시도명,시군구코드,시군구명,법정동코드,법정동명,행정동코드,행정동명
연번,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,무역전시관입구(택시승강장),대전광역시 유성구 도룡동 3-8,30,대전광역시,30200,유성구,3020012700,도룡동,3020055000,신성동
2,대전컨벤션센터,대전광역시 유성구 도룡동 4-19,30,대전광역시,30200,유성구,3020012700,도룡동,3020055000,신성동
3,한밭수목원1,대전광역시 서구 만년동 396,30,대전광역시,30170,서구,3017012800,만년동,3017065000,만년동
4,초원아파트(104동 버스정류장),대전광역시 서구 만년동 401,30,대전광역시,30170,서구,3017012800,만년동,3017065000,만년동
5,둔산대공원 입구(버스정류장),대전광역시 서구 둔산2동 1521-10,30,대전광역시,30170,서구,3017011200,둔산동,3017064000,둔산2동
...,...,...,...,...,...,...,...,...,...,...
258,천문대입구,대전광역시 유성구 신성동 458,30,대전광역시,30200,유성구,3020012500,신성동,3020055000,신성동
259,대덕대학교,대전광역시 유성구 장동 48,30,대전광역시,30200,유성구,3020012800,장동,3020055000,신성동
260,오정농수산물 도매시장,대전광역시 대덕구 오정동 45-1,30,대전광역시,30230,대덕구,3023010100,오정동,3023051000,오정동
261,도로교통공단(건너편 라도무스),대전광역시 유성구 원신흥동 608,30,대전광역시,30200,유성구,3020011400,원신흥동,3020061000,원신흥동


In [20]:
df.to_pickle('datasets/tashu/tashu_dataset-station_information.pkl')

### Step 3. 날씨 데이터 전처리

#### 데이터 경로

In [21]:
data_path = ['datasets/tashu/weather2016.csv',
             'datasets/tashu/weather2017.csv',
             'datasets/tashu/weather2018.csv',
             'datasets/tashu/weather2019.csv',
             'datasets/tashu/weather2020.csv',
            ]
data_path

['datasets/tashu/weather2016.csv',
 'datasets/tashu/weather2017.csv',
 'datasets/tashu/weather2018.csv',
 'datasets/tashu/weather2019.csv',
 'datasets/tashu/weather2020.csv']

In [22]:
df = pd.DataFrame()
for path in data_path:
    tdf = pd.read_csv(path)
    df = pd.concat([df, tdf], ignore_index=True)
df.fillna(0, inplace=True)
df

Unnamed: 0,지점,지점명,일시,평균기온(°C),일강수량(mm)
0,133,대전,2016-01-01,1.6,0.0
1,133,대전,2016-01-02,6.6,0.0
2,133,대전,2016-01-03,6.9,0.0
3,133,대전,2016-01-04,5.1,0.0
4,133,대전,2016-01-05,-0.6,0.0
...,...,...,...,...,...
1822,133,대전,2020-12-27,3.0,0.0
1823,133,대전,2020-12-28,5.4,0.0
1824,133,대전,2020-12-29,1.3,2.5
1825,133,대전,2020-12-30,-7.5,0.0


In [23]:
df = df.astype({'지점': 'int16', '지점명': 'str', '일시': 'str', 
                '평균기온(°C)': 'float32', '일강수량(mm)': 'float32'})
df

Unnamed: 0,지점,지점명,일시,평균기온(°C),일강수량(mm)
0,133,대전,2016-01-01,1.6,0.0
1,133,대전,2016-01-02,6.6,0.0
2,133,대전,2016-01-03,6.9,0.0
3,133,대전,2016-01-04,5.1,0.0
4,133,대전,2016-01-05,-0.6,0.0
...,...,...,...,...,...
1822,133,대전,2020-12-27,3.0,0.0
1823,133,대전,2020-12-28,5.4,0.0
1824,133,대전,2020-12-29,1.3,2.5
1825,133,대전,2020-12-30,-7.5,0.0


In [24]:
df.to_pickle('datasets/tashu/weather.pkl')