## 1. 분석 목적
	- 공공자전거 신규 가입자 정보를 일별로 분석하여 신규가입과 날씨와 연관성이 있는지 확인

## 2. 분석 개요
	- 분석 과제 : 공공자전거의 신규 가입자 수와 날씨와의 연관성
	- 분석 데이터 
		- 서울 열린데이터 광장(http://data.seoul.go.kr/)의 서울특별시 공공자전거 신규가입자 정보(일별) (open API, excel 파일)
		- 기상청 기상자료개방포털(https://data.kma.go.kr/cmmn/main.do)의 기상청 기상관측자료(종관_일자료) (open API 파일)
	- 분석 도구 : Jupyter Notebook(python)
	- 분석 내용 : 
		- 공공자전거 신규 가입자 수 분석
		- 날씨 데이터 최저/최고 기온, 평균 전운량/풍속, 강수량 분석

## 3. 분석 과정

 사람들이 자전거를 타고 싶을 때는 날씨가 맑고 너무 덥지도 않고 춥지도 않은 날에 타고 싶다고 생각하여 날씨가 좋은 날에 자전거를 타고 싶은 사람들이 많이 가입한다고 생각했다.

### 공공자전거 날짜별 신규 가입 데이터 불러오기

In [4]:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import codecs
import pandas as pd

- rent_dt : 대여 일자
- member_cd : 사용자 코드
- sex_cd : 성별
- year_cd : 연령대 코드
- mem_cnt : 신규 가입자 수

In [5]:
data_list = {}
driver = webdriver.Chrome("chromedriver.exe")
startcnt = 1
endcnt = 1000

while endcnt <= 9000:
    time.sleep(1)
    
    driver.get(
    'http://openapi.seoul.go.kr:8088/57454f474b67677837337173794a4f/xml/cycleNewMemberRentInfoDay/'+str(startcnt)+'/'+str(endcnt)+'/')

    source = driver.page_source

    s1 = BeautifulSoup(source, "html.parser")

    find_list = ['rent_dt', 'member_cd', 'sex_cd', 'year_cd', 'mem_cnt']
    
    for each in find_list:
        find =  s1.find_all(each)
        #data_list.append(find)
        if data_list.get(each) is None:
          data_list[each]=[x.text for x in find]
        else:
          data_list[each]=data_list[each]+[x.text for x in find]
        
    startcnt += 1000
    endcnt += 1000


In [6]:
data = pd.DataFrame(data_list)

In [7]:
data

Unnamed: 0,rent_dt,member_cd,sex_cd,year_cd,mem_cnt
0,2018-07-19,회원-내국인,,70대~,2194
1,2018-07-18,회원-내국인,,70대~,2123
2,2018-07-17,회원-내국인,,70대~,2133
3,2018-07-16,회원-내국인,,70대~,2161
4,2018-07-15,회원-내국인,,70대~,2937
...,...,...,...,...,...
8090,2017-01-01,회원-내국인,F,~10대,3
8091,2017-01-01,회원-내국인,F,50대,5
8092,2017-01-01,회원-내국인,F,40대,11
8093,2017-01-01,회원-내국인,F,30대,16


In [8]:
data.rename(columns = {"rent_dt": "가입일자"}, inplace = True)

In [9]:
data.rename(columns = {"member_cd": "사용자코드"}, inplace = True)
data.rename(columns = {"sex_cd": "성별"}, inplace = True)
data.rename(columns = {"year_cd": "연령대코드"}, inplace = True)
data.rename(columns = {"mem_cnt": "가입 수"}, inplace = True)

In [10]:
data

Unnamed: 0,가입일자,사용자코드,성별,연령대코드,가입 수
0,2018-07-19,회원-내국인,,70대~,2194
1,2018-07-18,회원-내국인,,70대~,2123
2,2018-07-17,회원-내국인,,70대~,2133
3,2018-07-16,회원-내국인,,70대~,2161
4,2018-07-15,회원-내국인,,70대~,2937
...,...,...,...,...,...
8090,2017-01-01,회원-내국인,F,~10대,3
8091,2017-01-01,회원-내국인,F,50대,5
8092,2017-01-01,회원-내국인,F,40대,11
8093,2017-01-01,회원-내국인,F,30대,16


In [11]:
data.dtypes

가입일자     object
사용자코드    object
성별       object
연령대코드    object
가입 수     object
dtype: object

open api 파일은 2018-07-19까지 밖에 없어서 추가로 xlsx 파일을 불러올 것이다.

In [12]:
data1 = pd.read_excel("서울특별시 공공자전거 신규가입자 정보(일별)_20180720_20181231.xlsx")

In [13]:
data1

Unnamed: 0,가입일자,사용자코드,연령대코드,성별,가입 수
0,2018-07-20,회원-내국인,AGE_001,,221
1,2018-07-20,회원-내국인,AGE_002,,1189
2,2018-07-20,회원-내국인,AGE_003,,348
3,2018-07-20,회원-내국인,AGE_004,,201
4,2018-07-20,회원-내국인,AGE_005,,91
...,...,...,...,...,...
1126,2018-12-31,회원-내국인,AGE_003,,49
1127,2018-12-31,회원-내국인,AGE_004,,22
1128,2018-12-31,회원-내국인,AGE_005,,10
1129,2018-12-31,회원-내국인,AGE_006,,2


In [14]:
data1.dtypes

가입일자     datetime64[ns]
사용자코드            object
연령대코드            object
성별              float64
가입 수              int64
dtype: object

In [15]:
import datetime

data1['가입일자'] = data1['가입일자'].dt.strftime("%Y-%m-%d")

In [16]:
data1

Unnamed: 0,가입일자,사용자코드,연령대코드,성별,가입 수
0,2018-07-20,회원-내국인,AGE_001,,221
1,2018-07-20,회원-내국인,AGE_002,,1189
2,2018-07-20,회원-내국인,AGE_003,,348
3,2018-07-20,회원-내국인,AGE_004,,201
4,2018-07-20,회원-내국인,AGE_005,,91
...,...,...,...,...,...
1126,2018-12-31,회원-내국인,AGE_003,,49
1127,2018-12-31,회원-내국인,AGE_004,,22
1128,2018-12-31,회원-내국인,AGE_005,,10
1129,2018-12-31,회원-내국인,AGE_006,,2


In [17]:
data2 = pd.read_excel("서울특별시 공공자전거 신규가입자 정보(일별)_20190101_20191130.xlsx")

In [18]:
data2

Unnamed: 0,가입일자,사용자코드,연령대코드,성별,가입 수
0,2019-01-01,회원-내국인,AGE_001,,64
1,2019-01-01,회원-내국인,AGE_002,,196
2,2019-01-01,회원-내국인,AGE_003,,51
3,2019-01-01,회원-내국인,AGE_004,,37
4,2019-01-01,회원-내국인,AGE_005,,17
...,...,...,...,...,...
2503,2019-11-30,회원-내국인,AGE_005,F,23
2504,2019-11-30,회원-내국인,AGE_005,M,41
2505,2019-11-30,회원-내국인,AGE_006,F,7
2506,2019-11-30,회원-내국인,AGE_006,M,8


In [19]:
data2.dtypes

가입일자     datetime64[ns]
사용자코드            object
연령대코드            object
성별               object
가입 수              int64
dtype: object

In [20]:
data2['가입일자'] = data2['가입일자'].dt.strftime("%Y-%m-%d")

In [21]:
data2.dtypes

가입일자     object
사용자코드    object
연령대코드    object
성별       object
가입 수      int64
dtype: object

In [22]:
data2

Unnamed: 0,가입일자,사용자코드,연령대코드,성별,가입 수
0,2019-01-01,회원-내국인,AGE_001,,64
1,2019-01-01,회원-내국인,AGE_002,,196
2,2019-01-01,회원-내국인,AGE_003,,51
3,2019-01-01,회원-내국인,AGE_004,,37
4,2019-01-01,회원-내국인,AGE_005,,17
...,...,...,...,...,...
2503,2019-11-30,회원-내국인,AGE_005,F,23
2504,2019-11-30,회원-내국인,AGE_005,M,41
2505,2019-11-30,회원-내국인,AGE_006,F,7
2506,2019-11-30,회원-내국인,AGE_006,M,8


In [23]:
data.shape

(8095, 5)

In [24]:
type(data1)

pandas.core.frame.DataFrame

In [25]:
merge = pd.concat([data, data1, data2])

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


In [26]:
merge

Unnamed: 0,가입 수,가입일자,사용자코드,성별,연령대코드
0,2194,2018-07-19,회원-내국인,,70대~
1,2123,2018-07-18,회원-내국인,,70대~
2,2133,2018-07-17,회원-내국인,,70대~
3,2161,2018-07-16,회원-내국인,,70대~
4,2937,2018-07-15,회원-내국인,,70대~
...,...,...,...,...,...
2503,23,2019-11-30,회원-내국인,F,AGE_005
2504,41,2019-11-30,회원-내국인,M,AGE_005
2505,7,2019-11-30,회원-내국인,F,AGE_006
2506,8,2019-11-30,회원-내국인,M,AGE_006


In [27]:
merge.shape

(11734, 5)

In [28]:
bikejoin = merge

In [29]:
bikejoin

Unnamed: 0,가입 수,가입일자,사용자코드,성별,연령대코드
0,2194,2018-07-19,회원-내국인,,70대~
1,2123,2018-07-18,회원-내국인,,70대~
2,2133,2018-07-17,회원-내국인,,70대~
3,2161,2018-07-16,회원-내국인,,70대~
4,2937,2018-07-15,회원-내국인,,70대~
...,...,...,...,...,...
2503,23,2019-11-30,회원-내국인,F,AGE_005
2504,41,2019-11-30,회원-내국인,M,AGE_005
2505,7,2019-11-30,회원-내국인,F,AGE_006
2506,8,2019-11-30,회원-내국인,M,AGE_006


In [30]:
bikejoin.dtypes

가입 수     object
가입일자     object
사용자코드    object
성별       object
연령대코드    object
dtype: object

#### 공공자전거 결측치 확인

신규 가입자 수만 추출할 것 이므로 성별 결측치는 그대로 둘 것이다.

In [31]:
bikejoin.isnull().sum()

가입 수        0
가입일자        0
사용자코드       0
성별       3231
연령대코드       0
dtype: int64

### 공공자전거 데이터 전처리

#### 신규 가입자 수만 추출

In [32]:
bikejoin.dtypes

가입 수     object
가입일자     object
사용자코드    object
성별       object
연령대코드    object
dtype: object

In [33]:
bikejoin['가입 수'] = pd.to_numeric(bikejoin['가입 수'])

In [34]:
bikejoin.dtypes

가입 수      int64
가입일자     object
사용자코드    object
성별       object
연령대코드    object
dtype: object

In [35]:
bikejoin = bikejoin.groupby('가입일자')['가입 수'].sum()

In [36]:
bikejoin

가입일자
2017-01-01     221
2017-01-02     182
2017-01-03     198
2017-01-04     189
2017-01-05     190
              ... 
2019-11-26     920
2019-11-27     688
2019-11-28     712
2019-11-29     671
2019-11-30    1100
Name: 가입 수, Length: 1062, dtype: int64

In [37]:
type(bikejoin)

pandas.core.series.Series

In [38]:
bikejoin = pd.DataFrame(bikejoin)

In [39]:
bikejoin

Unnamed: 0_level_0,가입 수
가입일자,Unnamed: 1_level_1
2017-01-01,221
2017-01-02,182
2017-01-03,198
2017-01-04,189
2017-01-05,190
...,...
2019-11-26,920
2019-11-27,688
2019-11-28,712
2019-11-29,671


In [40]:
bikejoin.reset_index(inplace=True)

In [41]:
bikejoin

Unnamed: 0,가입일자,가입 수
0,2017-01-01,221
1,2017-01-02,182
2,2017-01-03,198
3,2017-01-04,189
4,2017-01-05,190
...,...,...
1057,2019-11-26,920
1058,2019-11-27,688
1059,2019-11-28,712
1060,2019-11-29,671


In [42]:
bikejoin = bikejoin.sort_values(by=['가입일자'])

In [43]:
bikejoin

Unnamed: 0,가입일자,가입 수
0,2017-01-01,221
1,2017-01-02,182
2,2017-01-03,198
3,2017-01-04,189
4,2017-01-05,190
...,...,...
1057,2019-11-26,920
1058,2019-11-27,688
1059,2019-11-28,712
1060,2019-11-29,671


In [44]:
bikejoin.dtypes

가입일자    object
가입 수     int64
dtype: object

In [45]:
bikejoin.rename(columns = {"가입일자": "날짜"}, inplace = True)

In [46]:
bikejoin

Unnamed: 0,날짜,가입 수
0,2017-01-01,221
1,2017-01-02,182
2,2017-01-03,198
3,2017-01-04,189
4,2017-01-05,190
...,...,...
1057,2019-11-26,920
1058,2019-11-27,688
1059,2019-11-28,712
1060,2019-11-29,671


### 날씨 데이터 불러오기

- tm : 날짜
- min_ta : 최저 기온
- max_ta : 최고 기온
- avg_tca : 평균 전운량 -> 맑음(0-2), 구름조금(3-5), 구름많음(6-8), 흐림(9-10이상)
- sum_rn : 일 강수량
- iscs : 일기 현상
- avg_ws : 평균 풍속
- stn_nm : 지역
- stn_id : 지점 번호

### 풍력 계급표 
#### 출처 : https://bestnara.tistory.com/10
<img src="풍력계급.jfif" width=500 height=1000 align="left">

In [47]:
from urllib.request import urlopen
from urllib.parse import urlencode, unquote, quote_plus
import urllib
import requests
import json
import pandas as pd

In [48]:
url = 'http://data.kma.go.kr/apiData/getData'

In [49]:
params = '?' + urlencode({
    quote_plus("type"): "json",
    quote_plus("dataCd"): "ASOS",
    quote_plus("dateCd"): "DAY",
    quote_plus("startDt"): "20170101",
    quote_plus("endDt"): "20191130",
    quote_plus("stnIds"): "108",
    quote_plus("schListCnt"): "800",
    quote_plus("pageIndex"): "1",
    quote_plus("apiKey"): "XfV3VGyAZTRP/zPdbyN%2BRl5NpGdAjhBndaCbr9QuWgK5%2BFkvhTuv7XTDZPvaCg3f"
})
    
req = urllib.request.Request(url + unquote(params))

In [50]:
params1 = '?' + urlencode({
    quote_plus("type"): "json",
    quote_plus("dataCd"): "ASOS",
    quote_plus("dateCd"): "DAY",
    quote_plus("startDt"): "20170101",
    quote_plus("endDt"): "20191130",
    quote_plus("stnIds"): "108",
    quote_plus("schListCnt"): "800",
    quote_plus("pageIndex"): "2",
    quote_plus("apiKey"): "XfV3VGyAZTRP/zPdbyN%2BRl5NpGdAjhBndaCbr9QuWgK5%2BFkvhTuv7XTDZPvaCg3f"
})
    
req1 = urllib.request.Request(url + unquote(params1))

In [2]:
response_body = urlopen(req, timeout=60).read() # get bytes data
weather_data1 = json.loads(response_body)	# convert bytes data to json data

NameError: name 'urlopen' is not defined

In [52]:
res1 = pd.DataFrame(weather_data1[3]['info'])

In [53]:
res1

Unnamed: 0,AVG_M1_5_TE,N9_9_RN,SUM_LRG_EV,MIN_RHM_HRMT,STN_ID,MAX_INS_WS_WD,MAX_PS_HRMT,MIN_RHM,SS_DUR,AVG_CM5_TE,...,DD_MEFS,SUM_DPTH_FHSC,DD_MES_HRMT,DD_MES,DD_MEFS_HRMT,SUM_FOG_DUR,HR1_MAX_RN_HRMT,MI10_MAX_RN,MI10_MAX_RN_HRMT,HR1_MAX_RN
0,10.2,0.3,0.7,1342.0,108,20.0,928,59.0,9.6,0.5,...,,,,,,,,,,
1,10.1,,0.9,1355.0,108,230.0,1,57.0,9.6,3.9,...,,,,,,,,,,
2,9.9,,1.3,1254.0,108,180.0,858,38.0,9.7,2.6,...,,,,,,,,,,
3,9.8,,1.5,1503.0,108,290.0,2358,31.0,9.7,3.4,...,,,,,,,,,,
4,9.7,,1.6,943.0,108,50.0,2132,42.0,9.7,2.7,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,6.7,,2.9,1603.0,108,320.0,2358,26.0,11.6,7.5,...,,,,,,,,,,
796,6.9,,3.4,1708.0,108,200.0,929,17.0,11.6,6.5,...,,,,,,,,,,
797,7.0,,2.9,1650.0,108,290.0,1,10.0,11.7,5.9,...,,,,,,,,,,
798,7.2,,2.2,1142.0,108,70.0,1,12.0,11.7,8.0,...,,,,,,,,,,


In [54]:
response_body1 = urlopen(req1, timeout=60).read() # get bytes data
weather_data2 = json.loads(response_body1)	# convert bytes data to json data

[{'status': 200}, {'msg': 'success'}, {'stnIds': '108'}, {'info': [{'AVG_M1_5_TE': 7.4, 'N9_9_RN': 0, 'SUM_LRG_EV': 1.7, 'MIN_RHM_HRMT': 1705, 'STN_ID': 108, 'MAX_INS_WS_WD': 270, 'MAX_PS_HRMT': 2357, 'MIN_RHM': 21, 'SS_DUR': 11.8, 'AVG_CM5_TE': 4.9, 'MAX_WS_WD': 270, 'AVG_TCA': 4.9, 'HR1_MAX_ICSR': 1.16, 'RNUM': 801, 'MAX_PS': 1017, 'AVG_M5_0_TE': 14.1, 'STN_NM': '서울', 'MAX_TA': 7.9, 'HR24_SUM_RWS': 2369, 'AVG_M3_0_TE': 10.5, 'AVG_CM10_TE': 5.8, 'HR1_MAX_ICSR_HRMT': 1200, 'MAX_INS_WS': 11, 'ISCS': '-{박무}-{박무}{강도1}0300-{박무}{강도1}0600-{박무}{강도2}0900-{박무}{강도2}1200-1230. {비}1045-1140. {연무}1225-1330.', 'AVG_CM30_TE': 6.4, 'MAX_WS_HRMT': 2031, 'SUM_RN_DUR': 0.92, 'MAX_WD': 270, 'MIN_PS': 1008.8, 'TM': '2019-03-12', 'AVG_RHM': 55.1, 'MAX_INS_WS_HRMT': 1328, 'AVG_TS': 3.3, 'AVG_PV': 4.6, 'SUM_SS_HR': 1.8, 'AVG_PS': 1011.4, 'MAX_WS': 6, 'MAX_TA_HRMT': '1314', 'MIN_TG': -4.8, 'SUM_SML_EV': 2.4, 'AVG_TD': -5.8, 'AVG_TA': 3.9, 'AVG_CM20_TE': 6.3, 'MIN_TA': -0.3, 'MIN_PS_HRMT': 1324, 'AVG_M0_5_TE': 

In [55]:
res2 = pd.DataFrame(weather_data2[3]['info'])

In [56]:
res2

Unnamed: 0,AVG_M1_5_TE,N9_9_RN,SUM_LRG_EV,MIN_RHM_HRMT,STN_ID,MAX_INS_WS_WD,MAX_PS_HRMT,MIN_RHM,SS_DUR,AVG_CM5_TE,...,AVG_LMAC,SUM_GSR,AVG_M1_0_TE,AVG_PA,AVG_WS,SUM_FOG_DUR,MI10_MAX_RN,HR1_MAX_RN,HR1_MAX_RN_HRMT,MI10_MAX_RN_HRMT
0,7.4,0.0,1.7,1705,108,270,2357,21,11.8,4.9,...,4.8,5.15,6.8,1000.8,2.7,,,,,
1,7.5,0.0,3.2,1345,108,340,2351,14,11.8,4.3,...,1.8,17.87,6.8,1009.3,3.3,,,,,
2,7.5,0.0,2.2,1335,108,200,206,18,11.9,5.0,...,4.0,14.02,6.8,1011.9,1.9,,,,,
3,7.6,3.5,1.5,1505,108,270,6,24,11.9,6.7,...,5.3,9.93,6.7,1006.8,1.6,,,,,
4,7.7,,1.6,1323,108,200,1010,38,12.0,5.5,...,4.9,11.76,6.8,1009.9,1.8,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,15.0,,1.7,1439,108,270,103,42,9.9,6.6,...,3.3,9.78,12.5,1018.6,2.0,,,,,
260,14.8,,1.8,1427,108,290,850,33,9.9,5.8,...,0.4,10.43,12.4,1017.7,2.7,,,,,
261,14.7,,1.9,1152,108,320,907,35,9.8,5.4,...,1.8,11.21,12.2,1019.1,2.0,,,,,
262,14.5,,2.0,1409,108,320,1019,21,9.8,4.1,...,0.0,11.40,12.0,1020.9,2.0,,,,,


In [57]:
weather1 = res1[['TM', 'MIN_TA', 'MAX_TA', 'AVG_TCA', 'SUM_RN', 'AVG_WS', 'STN_NM', 'STN_ID']]

In [58]:
weather2 = res2[['TM', 'MIN_TA', 'MAX_TA', 'AVG_TCA', 'SUM_RN', 'AVG_WS', 'STN_NM', 'STN_ID']]

In [59]:
weather1

Unnamed: 0,TM,MIN_TA,MAX_TA,AVG_TCA,SUM_RN,AVG_WS,STN_NM,STN_ID
0,2017-01-01,-1.6,6.9,7.0,,1.5,서울,108
1,2017-01-02,1.8,9.2,7.6,0.3,2.1,서울,108
2,2017-01-03,-2.3,7.7,1.1,,1.8,서울,108
3,2017-01-04,1.0,8.9,2.6,,1.7,서울,108
4,2017-01-05,-0.1,7.3,8.4,,3.1,서울,108
...,...,...,...,...,...,...,...,...
795,2019-03-07,3.1,12.7,1.5,,2.2,서울,108
796,2019-03-08,0.6,13.4,0.1,,1.9,서울,108
797,2019-03-09,-0.4,14.9,5.4,,1.3,서울,108
798,2019-03-10,6.7,15.3,7.1,,1.2,서울,108


In [60]:
weather2

Unnamed: 0,TM,MIN_TA,MAX_TA,AVG_TCA,SUM_RN,AVG_WS,STN_NM,STN_ID
0,2019-03-12,-0.3,7.9,4.9,0.0,2.7,서울,108
1,2019-03-13,-0.4,8.5,1.8,0.0,3.3,서울,108
2,2019-03-14,-1.7,10.0,5.5,0.0,1.9,서울,108
3,2019-03-15,1.6,11.3,7.4,3.5,1.6,서울,108
4,2019-03-16,0.2,9.2,5.5,,1.8,서울,108
...,...,...,...,...,...,...,...,...
259,2019-11-26,1.2,13.7,3.9,,2.0,서울,108
260,2019-11-27,0.3,6.8,4.0,,2.7,서울,108
261,2019-11-28,-0.6,10.4,1.8,,2.0,서울,108
262,2019-11-29,-3.4,6.9,0.9,,2.0,서울,108


In [61]:
weather = pd.concat([weather1, weather2])

In [62]:
weather

Unnamed: 0,TM,MIN_TA,MAX_TA,AVG_TCA,SUM_RN,AVG_WS,STN_NM,STN_ID
0,2017-01-01,-1.6,6.9,7.0,,1.5,서울,108
1,2017-01-02,1.8,9.2,7.6,0.3,2.1,서울,108
2,2017-01-03,-2.3,7.7,1.1,,1.8,서울,108
3,2017-01-04,1.0,8.9,2.6,,1.7,서울,108
4,2017-01-05,-0.1,7.3,8.4,,3.1,서울,108
...,...,...,...,...,...,...,...,...
259,2019-11-26,1.2,13.7,3.9,,2.0,서울,108
260,2019-11-27,0.3,6.8,4.0,,2.7,서울,108
261,2019-11-28,-0.6,10.4,1.8,,2.0,서울,108
262,2019-11-29,-3.4,6.9,0.9,,2.0,서울,108


In [63]:
weather.rename(columns = {'TM' : '날짜'}, inplace = True)
weather.rename(columns = {'MIN_TA' : '최저기온'}, inplace = True)
weather.rename(columns = {'MAX_TA' : '최고기온'}, inplace = True)
weather.rename(columns = {'AVG_TCA' : '평균전운량'}, inplace = True)
weather.rename(columns = {'SUM_RN' : '일강수량'}, inplace = True)
weather.rename(columns = {'AVG_WS' : '평균풍속'}, inplace = True)
weather.rename(columns = {'STN_NM' : '지역'}, inplace = True)
weather.rename(columns = {'STN_ID' : '지점번호'}, inplace = True)

In [64]:
weather

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
0,2017-01-01,-1.6,6.9,7.0,,1.5,서울,108
1,2017-01-02,1.8,9.2,7.6,0.3,2.1,서울,108
2,2017-01-03,-2.3,7.7,1.1,,1.8,서울,108
3,2017-01-04,1.0,8.9,2.6,,1.7,서울,108
4,2017-01-05,-0.1,7.3,8.4,,3.1,서울,108
...,...,...,...,...,...,...,...,...
259,2019-11-26,1.2,13.7,3.9,,2.0,서울,108
260,2019-11-27,0.3,6.8,4.0,,2.7,서울,108
261,2019-11-28,-0.6,10.4,1.8,,2.0,서울,108
262,2019-11-29,-3.4,6.9,0.9,,2.0,서울,108


In [65]:
weather.isnull().sum()

날짜         0
최저기온       0
최고기온       1
평균전운량      0
일강수량     672
평균풍속       3
지역         0
지점번호       0
dtype: int64

In [66]:
weather[weather['평균풍속'].isnull()]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
286,2017-10-14,9.0,20.5,4.9,,,서울,108
338,2017-12-05,-8.2,-0.4,1.6,0.1,,서울,108
339,2017-12-06,-4.5,6.0,7.0,1.2,,서울,108


In [67]:
weather[weather['최고기온'].isnull()]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
284,2017-10-12,8.8,,7.1,,2.0,서울,108


In [68]:
weather[255:310]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
255,2017-09-13,15.9,27.5,0.0,,1.9,서울,108
256,2017-09-14,14.8,29.5,1.5,,1.7,서울,108
257,2017-09-15,17.8,27.9,3.9,,3.6,서울,108
258,2017-09-16,16.7,27.6,3.1,,3.6,서울,108
259,2017-09-17,19.0,29.2,3.5,,2.7,서울,108
260,2017-09-18,16.4,27.6,1.8,,2.1,서울,108
261,2017-09-19,20.1,25.8,7.9,8.0,3.0,서울,108
262,2017-09-20,14.1,24.9,0.8,,2.4,서울,108
263,2017-09-21,14.2,27.0,0.5,,1.6,서울,108
264,2017-09-22,15.4,26.8,3.5,,2.1,서울,108


 2017년 10월 12일과 최저 기온이 비슷하고 가까운 날짜의 최고 기온의 평균을 결측치로 대체

In [69]:
weather[285:288]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
285,2017-10-13,6.1,18.9,3.4,,3.2,서울,108
286,2017-10-14,9.0,20.5,4.9,,,서울,108
287,2017-10-15,9.0,23.0,5.4,,1.7,서울,108


In [70]:
weather['최고기온'][285:288].mean()

20.8

In [71]:
weather['최고기온'].fillna(20.8, inplace=True)

In [72]:
weather.isnull().sum()

날짜         0
최저기온       0
최고기온       0
평균전운량      0
일강수량     672
평균풍속       3
지역         0
지점번호       0
dtype: int64

In [73]:
weather[weather['최고기온'].isnull()]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호


In [74]:
weather.loc[284]

날짜       2017-10-12
최저기온            8.8
최고기온           20.8
평균전운량           7.1
일강수량            NaN
평균풍속              2
지역               서울
지점번호            108
Name: 284, dtype: object

'평균풍속' 결측치의 가까운 날짜 10개의 평균을 결측치로 대체하였다.

In [75]:
weather['평균풍속'][281:292].mean()

1.9799999999999998

In [76]:
weather.loc[286]

날짜       2017-10-14
최저기온              9
최고기온           20.5
평균전운량           4.9
일강수량            NaN
평균풍속            NaN
지역               서울
지점번호            108
Name: 286, dtype: object

In [77]:
weather['평균풍속'].loc[286] = 1.98

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [78]:
weather.loc[286]

날짜       2017-10-14
최저기온              9
최고기온           20.5
평균전운량           4.9
일강수량            NaN
평균풍속           1.98
지역               서울
지점번호            108
Name: 286, dtype: object

In [79]:
weather['평균풍속'][333:345]

333    3.3
334    1.6
335    1.7
336    1.5
337    3.4
338    NaN
339    NaN
340    1.7
341    1.8
342    1.0
343    2.0
344    3.5
Name: 평균풍속, dtype: float64

In [80]:
weather['평균풍속'][333:345].mean()

2.15

In [81]:
weather.loc[338:339]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
338,2017-12-05,-8.2,-0.4,1.6,0.1,,서울,108
339,2017-12-06,-4.5,6.0,7.0,1.2,,서울,108


In [82]:
weather['평균풍속'].fillna(2.15, inplace=True)

In [83]:
weather.loc[338:339]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,지역,지점번호
338,2017-12-05,-8.2,-0.4,1.6,0.1,2.15,서울,108
339,2017-12-06,-4.5,6.0,7.0,1.2,2.15,서울,108


In [84]:
weather.isnull().sum()

날짜         0
최저기온       0
최고기온       0
평균전운량      0
일강수량     672
평균풍속       0
지역         0
지점번호       0
dtype: int64

나머지 결측치인 '일강수량'은 결측치가 자료의 반 이상을 차지하여 정확하게 판단할 수 없으므로 0으로 결측치를 대체할 것이다.

In [85]:
weather['일강수량'].fillna(0, inplace=True)

In [86]:
weather.isnull().sum()

날짜       0
최저기온     0
최고기온     0
평균전운량    0
일강수량     0
평균풍속     0
지역       0
지점번호     0
dtype: int64

### 날씨 데이터 전처리

 지역과 지점번호는 편의상 추출한 것이므로 삭제할 것이다.

In [87]:
weather = weather.drop(['지역', '지점번호'], axis = 1)

In [88]:
weather

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속
0,2017-01-01,-1.6,6.9,7.0,0.0,1.5
1,2017-01-02,1.8,9.2,7.6,0.3,2.1
2,2017-01-03,-2.3,7.7,1.1,0.0,1.8
3,2017-01-04,1.0,8.9,2.6,0.0,1.7
4,2017-01-05,-0.1,7.3,8.4,0.0,3.1
...,...,...,...,...,...,...
259,2019-11-26,1.2,13.7,3.9,0.0,2.0
260,2019-11-27,0.3,6.8,4.0,0.0,2.7
261,2019-11-28,-0.6,10.4,1.8,0.0,2.0
262,2019-11-29,-3.4,6.9,0.9,0.0,2.0


### 데이터 합치기

In [89]:
weather.dtypes

날짜        object
최저기온     float64
최고기온     float64
평균전운량    float64
일강수량     float64
평균풍속     float64
dtype: object

In [90]:
bikejoin

Unnamed: 0,날짜,가입 수
0,2017-01-01,221
1,2017-01-02,182
2,2017-01-03,198
3,2017-01-04,189
4,2017-01-05,190
...,...,...
1057,2019-11-26,920
1058,2019-11-27,688
1059,2019-11-28,712
1060,2019-11-29,671


In [91]:
df = pd.merge(weather, bikejoin, on="날짜")

In [92]:
df

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
0,2017-01-01,-1.6,6.9,7.0,0.0,1.5,221
1,2017-01-02,1.8,9.2,7.6,0.3,2.1,182
2,2017-01-03,-2.3,7.7,1.1,0.0,1.8,198
3,2017-01-04,1.0,8.9,2.6,0.0,1.7,189
4,2017-01-05,-0.1,7.3,8.4,0.0,3.1,190
...,...,...,...,...,...,...,...
1057,2019-11-26,1.2,13.7,3.9,0.0,2.0,920
1058,2019-11-27,0.3,6.8,4.0,0.0,2.7,688
1059,2019-11-28,-0.6,10.4,1.8,0.0,2.0,712
1060,2019-11-29,-3.4,6.9,0.9,0.0,2.0,671


In [103]:
df.isnull().sum()

날짜       0
최저기온     0
최고기온     0
평균전운량    0
일강수량     0
평균풍속     0
가입 수     0
dtype: int64

In [108]:
df.dtypes

날짜        object
최저기온     float64
최고기온     float64
평균전운량    float64
일강수량     float64
평균풍속     float64
가입 수       int64
dtype: object

## 4. 분석 결과
 신규가입 수가 5,000명 이상이면 매우 많고, 100명 이하면 매우 적다고 판단하였다. 

In [122]:
cloud_high = df.sort_values(by='평균전운량', ascending=False).head(50)

In [132]:
cloud_high

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
498,2018-05-16,21.8,24.4,10.0,45.0,1.9,634
674,2018-11-08,10.5,14.7,10.0,64.0,1.8,281
866,2019-05-19,17.9,21.6,10.0,22.0,1.2,1331
603,2018-08-29,22.7,27.4,10.0,42.0,2.4,557
190,2017-07-10,22.6,25.4,10.0,144.5,2.3,255
195,2017-07-15,23.4,27.3,10.0,42.5,2.2,423
621,2018-09-16,21.0,23.9,10.0,2.0,1.7,2127
976,2019-09-06,21.6,29.1,10.0,2.4,2.1,675
226,2017-08-15,20.8,24.0,10.0,93.5,3.6,865
495,2018-05-12,11.8,16.1,10.0,32.0,0.9,536


In [123]:
cloud_high['가입 수'].mean()

785.5

In [124]:
cloud_low = df.sort_values(by='평균전운량', ascending=False).tail(50)

In [133]:
cloud_low

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
647,2018-10-12,5.2,17.4,0.0,0.0,1.3,2032
638,2018-10-03,11.2,24.2,0.0,0.0,1.3,6966
481,2018-04-28,10.5,21.7,0.0,0.0,2.0,3379
292,2017-10-21,11.0,25.2,0.0,0.0,1.3,2566
294,2017-10-23,8.6,20.6,0.0,0.0,1.4,1358
300,2017-10-29,6.3,17.2,0.0,0.0,3.3,1522
591,2018-08-17,21.7,33.8,0.0,0.0,1.8,2799
301,2017-10-30,2.5,13.2,0.0,0.0,2.4,625
369,2018-01-06,-6.8,2.9,0.0,0.0,1.5,192
311,2017-11-09,3.7,16.8,0.0,0.0,1.2,695


In [125]:
cloud_low['가입 수'].mean()

1522.16

평균 전운량의 수치가 높으면 비가 올 확률도 있고, 흐리기 때문에 맑은 날씨보다 습한 느낌이 있어서 가입 수가 적을 것이라고 예상했는데 평균 전운량의 수치가 높은 쪽과 낮은 쪽의 가입 수는 눈에 띄는 큰 차이가 없었다. 하지만 평균 전운량의 수치가 낮은 쪽이 가입 수의 평균이 더 높았다.

In [127]:
wind_high = df.sort_values(by='평균풍속', ascending=False).head(50)

In [134]:
wind_high

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
977,2019-09-07,22.6,29.5,10.0,2.8,6.0,424
50,2017-02-20,-5.2,5.4,2.5,0.5,4.2,125
40,2017-02-10,-9.3,-2.5,0.9,0.0,4.1,64
460,2018-04-07,1.4,8.6,5.1,0.2,4.1,949
64,2017-03-06,-2.8,4.0,0.5,0.0,4.0,219
235,2017-08-24,23.9,28.1,7.9,14.0,4.0,794
65,2017-03-07,-4.0,2.9,1.0,0.0,3.9,205
1049,2019-11-18,0.4,12.7,7.1,14.7,3.9,538
47,2017-02-17,-5.2,9.0,3.1,4.0,3.9,171
397,2018-02-03,-10.4,-2.1,1.3,0.5,3.8,118


In [129]:
wind_high['가입 수'].mean()

854.06

In [130]:
wind_low = df.sort_values(by='평균풍속', ascending=False).tail(50)

In [135]:
wind_low

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
504,2018-05-22,15.4,23.2,9.0,12.5,1.1,1966
596,2018-08-22,23.4,37.6,4.5,0.0,1.1,1817
527,2018-06-14,18.8,25.1,9.5,29.0,1.1,1147
923,2019-07-15,22.1,27.4,7.5,7.7,1.1,1803
541,2018-06-28,20.6,27.3,9.5,26.5,1.1,1238
403,2018-02-09,-5.3,6.1,9.1,0.0,1.1,147
462,2018-04-09,0.1,15.4,2.6,0.0,1.1,1411
656,2018-10-21,7.2,21.1,2.4,0.0,1.1,3960
658,2018-10-23,9.8,16.9,5.4,5.0,1.1,1304
347,2017-12-15,-7.5,0.9,7.3,0.0,1.0,113


In [131]:
wind_low['가입 수'].mean()

1291.02

평균 풍속이 높으면 바람이 강해 걷기가 힘들 때도 있고 몸을 가누지 못할 때도 있어서 자전거를 타기 힘든 환경이기 때문에 가입 수가 적을 것이라고 예상하였다. 역시 눈에 띄는 큰 차이는 없지만, 평균값을 비교해보면 평균 풍속이 낮은 쪽이 평균 가입 수가 더 높다.

In [136]:
rain_high = df.sort_values(by='일강수량', ascending=False).head(50)

In [137]:
rain_high

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
190,2017-07-10,22.6,25.4,10.0,144.5,2.3,255
203,2017-07-23,26.1,28.1,9.8,133.5,2.1,627
231,2017-08-20,22.0,25.8,10.0,124.5,2.9,309
602,2018-08-28,22.5,26.1,9.0,96.5,0.9,611
226,2017-08-15,20.8,24.0,10.0,93.5,3.6,865
182,2017-07-02,21.9,26.8,10.0,92.0,2.5,513
544,2018-07-01,21.0,22.8,9.9,83.5,0.8,689
499,2018-05-17,18.8,23.5,10.0,83.0,1.4,312
539,2018-06-26,20.2,26.9,10.0,71.5,2.4,635
183,2017-07-03,21.9,27.4,9.9,67.5,3.5,641


In [138]:
rain_high['가입 수'].mean()

890.78

일 강수량은 결측치가 너무 많아서 그 결측치를 0으로 처리했기 때문에 제대로 분석을 할 수 없었다. 일 강수량이 적은 쪽과 비교는 할 수 없지만, 가입 수의 평균이 1,000명 미만이기 때문에 적은 편이라는 것을 알 수 있다.

In [142]:
hightemp_high = df.sort_values(by='최고기온', ascending=False).head(50)

In [143]:
hightemp_high

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
575,2018-08-01,27.8,39.6,0.8,0.0,1.7,1153
574,2018-07-31,27.8,38.3,2.8,0.0,1.5,1463
589,2018-08-15,28.3,38.0,6.4,1.0,1.2,1792
565,2018-07-22,25.3,38.0,3.5,0.0,1.4,1949
577,2018-08-03,30.0,37.9,1.0,0.0,1.9,1192
576,2018-08-02,30.3,37.9,4.0,0.0,1.6,1173
596,2018-08-22,23.4,37.6,4.5,0.0,1.1,1817
588,2018-08-14,27.7,37.2,6.9,0.0,1.3,1581
573,2018-07-30,26.2,36.9,3.1,0.0,1.2,1696
564,2018-07-21,24.9,36.9,0.3,0.0,1.5,2270


In [144]:
hightemp_high['가입 수'].mean()

1678.02

In [152]:
lowtemp_high = df.sort_values(by='최저기온', ascending=False).head(50)

In [153]:
lowtemp_high

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
576,2018-08-02,30.3,37.9,4.0,0.0,1.6,1173
577,2018-08-03,30.0,37.9,1.0,0.0,1.9,1192
566,2018-07-23,29.2,35.7,7.9,0.0,1.8,1643
589,2018-08-15,28.3,38.0,6.4,1.0,1.2,1792
580,2018-08-06,28.3,35.3,7.6,6.5,1.6,1161
217,2017-08-06,28.2,34.0,6.9,5.0,1.9,987
571,2018-07-28,28.0,35.2,5.5,7.5,1.2,1417
578,2018-08-04,28.0,34.9,3.4,0.0,2.4,1613
945,2019-08-06,27.9,36.8,6.8,0.0,1.5,1672
575,2018-08-01,27.8,39.6,0.8,0.0,1.7,1153


In [154]:
lowtemp_high['가입 수'].mean()

1404.5

In [145]:
hightemp_low = df.sort_values(by='최고기온', ascending=False).tail(50)

In [146]:
hightemp_low

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
726,2018-12-30,-11.3,-0.9,0.8,0.0,1.2,223
29,2017-01-30,-7.9,-0.9,2.9,0.1,3.3,67
20,2017-01-21,-10.0,-0.9,5.3,2.1,1.6,32
14,2017-01-15,-11.5,-0.9,0.0,0.0,2.3,89
351,2017-12-19,-6.5,-1.0,1.0,0.0,2.1,73
358,2017-12-26,-7.9,-1.1,0.3,0.0,2.9,134
703,2018-12-07,-9.6,-1.1,1.1,0.0,3.8,175
743,2019-01-16,-10.1,-1.1,4.1,0.0,2.3,317
391,2018-01-28,-9.3,-1.2,3.3,0.0,2.6,120
372,2018-01-09,-6.1,-1.2,3.5,0.5,2.9,147


In [147]:
hightemp_low['가입 수'].mean()

118.5

In [155]:
lowtemp_low = df.sort_values(by='최저기온', ascending=False).tail(50)

In [156]:
lowtemp_low

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
15,2017-01-16,-8.9,4.0,0.1,0.0,1.4,105
768,2019-02-10,-9.1,-1.9,7.5,0.0,1.7,415
41,2017-02-11,-9.2,0.8,0.3,0.0,2.8,90
32,2017-02-02,-9.2,3.3,1.3,0.0,2.7,119
391,2018-01-28,-9.3,-1.2,3.3,0.0,2.6,120
40,2017-02-10,-9.3,-2.5,0.9,0.0,4.1,64
10,2017-01-11,-9.4,1.5,1.9,0.0,2.1,88
736,2019-01-09,-9.4,1.3,4.9,0.0,1.3,305
405,2018-02-11,-9.5,-2.2,3.5,0.0,2.7,111
406,2018-02-12,-9.6,-1.8,4.1,0.0,2.7,109


In [157]:
lowtemp_low['가입 수'].mean()

120.68

최고 기온과 최저 기온이 너무 높으면 가입 수가 적을 줄 알았지만, 가입 수가 꽤 있는 편이고, 최고 기온과 최저 기온이 너무 낮으면 가입 수가 적다.

In [106]:
head = df.sort_values(by='가입 수', ascending=False).head(43)

In [113]:
head

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
853,2019-05-06,9.8,19.8,0.3,0.0,3.1,8525
852,2019-05-05,13.6,26.5,5.1,0.0,2.2,8006
859,2019-05-12,14.7,28.9,4.3,0.0,1.5,7396
858,2019-05-11,13.2,28.3,2.0,0.0,1.8,7220
634,2018-09-29,14.6,27.0,0.4,0.0,1.0,7170
631,2018-09-26,16.2,24.8,5.0,0.0,1.7,7149
851,2019-05-04,12.0,27.4,4.9,0.0,1.7,7015
630,2018-09-25,10.8,24.3,2.1,0.0,1.1,6973
638,2018-10-03,11.2,24.2,0.0,0.0,1.3,6966
848,2019-05-01,11.2,22.2,1.9,0.0,2.3,6760


가입 수가 많은 쪽을 보면 주로 주말이나 공휴일이다. 주로 너무 덥지도 않고 춥지도 않은 4, 5, 9, 10월이 많다.

In [148]:
head[head['평균전운량'] > 6]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
865,2019-05-18,19.1,25.9,8.5,0.0,2.1,6520
971,2019-09-01,19.2,27.8,6.1,0.0,1.4,6256
872,2019-05-25,18.4,28.6,7.9,0.0,1.6,5715
551,2018-07-08,17.7,28.1,7.0,0.0,1.5,5706
838,2019-04-21,12.2,20.2,7.9,0.3,1.9,5605
644,2018-10-09,10.7,19.8,9.0,0.0,1.1,5278
837,2019-04-20,9.7,19.1,7.1,1.2,1.9,5178
1003,2019-10-03,19.8,27.3,6.3,4.6,2.8,5045
845,2019-04-28,9.9,16.6,8.1,0.0,1.6,5006


In [158]:
head['평균전운량'].mean()

4.013953488372093

In [149]:
head['평균풍속'].mean()

1.7325581395348837

한편, 평균 전운량 즉, 구름이 많아서 흐린 것과 가입 수는 밀접한 관계가 없는 것처럼 보인다. 필자는 평균 전운량이 6 이상이면 구름이 많다고 판단했는데, 평균 전운량이 6 이상인 요소가 엄청 많지는 않지만 43개의 요소 중에 9개가 있었다. 일 강수량은 대부분 0이고 평균 풍속의 평균은 1.7 정도로 수치가 낮은 편이다.

In [111]:
tail = df.sort_values(by='가입 수', ascending=False).tail(43)

In [150]:
tail

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
390,2018-01-27,-15.9,-3.5,1.8,0.0,1.4,99
35,2017-02-05,-1.5,2.3,7.3,0.5,2.1,93
373,2018-01-10,-10.3,-4.8,0.9,0.3,3.0,93
401,2018-02-07,-13.4,-1.6,1.0,0.0,1.5,92
9,2017-01-10,-7.4,1.1,0.8,0.0,3.2,92
41,2017-02-11,-9.2,0.8,0.3,0.0,2.8,90
399,2018-02-05,-11.8,-5.1,0.1,0.0,3.0,89
14,2017-01-15,-11.5,-0.9,0.0,0.0,2.3,89
376,2018-01-13,-6.6,-1.2,8.3,0.4,0.8,89
343,2017-12-11,-11.0,-2.5,0.3,0.0,3.5,88


가입 수가 적은 쪽을 보면, 주로 겨울인 12, 1, 2월이며, 강수량의 수치가 높은 7월도 한 개 있다.

In [112]:
tail[tail['평균전운량'] > 6]

Unnamed: 0,날짜,최저기온,최고기온,평균전운량,일강수량,평균풍속,가입 수
35,2017-02-05,-1.5,2.3,7.3,0.5,2.1,93
376,2018-01-13,-6.6,-1.2,8.3,0.4,0.8,89
352,2017-12-20,-8.5,-0.1,7.1,0.2,1.1,86
52,2017-02-22,-0.9,2.9,9.9,5.1,2.7,77
350,2017-12-18,-5.9,3.3,6.8,3.4,1.2,76
25,2017-01-26,-7.9,3.4,6.9,0.3,2.2,70
28,2017-01-29,-0.9,2.5,9.6,4.5,1.8,51
186,2017-07-06,23.9,34.6,7.1,34.0,1.7,38


In [151]:
tail['평균전운량'].mean()

2.5953488372093014

In [159]:
tail['평균풍속'].mean()

2.4255813953488374

한편, 평균 전운량과 가입 수는 밀접한 관계가 없는 것처럼 보이고, ‘평균전운량’의 평균이 약 2.6으로 낮은 편이다. 오히려 가입 수가 많은 데이터의 평균이 약 4로 더 높다. 그리고 평균 풍속도 큰 차이가 없는데 가입 수가 많은 데이터의 평균은 약 1.7, 가입 수가 적은 쪽은 2.4이다. 가입 수가 적은 데이터가 조금 높다. 

## 5. 분석 결론

평균 전운량과 평균 풍속은 공공자전거의 신규 가입자 수에 큰 영향을 주지는 않지만, 어느 정도는 영향을 준다. 또한, 일 강수량이 높아도 가입자 수에 어느 정도 영향을 준다. 
날씨 중에서 가장 큰 영향을 주는 요소는 기온이다. 기온이 많이 떨어지는 날씨는 가입자 수에 영향을 많이 주고 기온이 너무 높으면 봄, 가을 날씨에 비교해서 가입자 수가 떨어진다.
따라서 평균 전운량과 평균 풍속, 일 강수량은 많은 영향을 주지는 않고 기온은 많은 영향을 준다. 즉, 봄, 가을 기온에 가입자 수가 많다. 