#수강 전 준비
* 공공데이터포털(http://data.go.kr) 가입
* library 설치
    * pip install lxml html5lib xlrd requests pysimplesoap

#데이터 구경하기
## Open API
* API (Application Programming Interface)
    * 기능의 입력과 출력에 대한 약속
* Web API
    * 웹을 통해 이용할 수 있는 기능에 대한 API
* 누구나 사용가능하도록 공개된 Web API
  * 인증키가 필요한 경우가 있음
  * 서비스의 경우 서드파티 어플리케이션 개발을 위해 SDK (Software Development Kit) 를 제공하기도 함
* Open API 를 이용
  * 데이터 수집
  * 데이터와 메시업(Mesh up)
  * 서비스 개발
      * 다음지도 API + 부동산 매물 데이터 = 직방 지도 서비스
* 예제
  * http://openapi.airport.co.kr/service/rest/AirportCodeList/getAirportCodeList?serviceKey=<인증키>&schLineType=I&_type=json&pageNo=2
* Open API 의 기능을 직접 만들거나 데이터셋을 구축할 필요가 없다.


##Crawling
  * 온라인에 게시되어 있는 리소스를 수집
  * 리소스의 구조나 스타일이 반복되는 내용을 파싱하여 데이터화 할 수 있음

#Open API 의 데이터 타입
## XML (eXtensible Markup Language)
* <태그> 등으로 문서나 데이터 구조를 명시하는 마크업 언어
```
<employees>
    <employee>
        <firstName>John</firstName> <lastName>Doe</lastName>
    </employee>
    <employee>
        <firstName>Anna</firstName> <lastName>Smith</lastName>
    </employee>
    <employee>
        <firstName>Peter</firstName> <lastName>Jones</lastName>
    </employee>
</employees>
```

## JSON (Javascript Object Notation)
* Javascript 의 자료 표현 언어
```
{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}
```

#Open API 의 종류
## SOAP (Simple Object Access Protocol)
* XML 기반의 메시지 통신
* XML 로 Request 메시지를 구성해야 함

## REST (Representational safe transfer)
* URL + Method 요청
* Method
    * **GET** - 우리가 사용할 Method
    * POST
    * DELETE / PUT

#공공데이터 가져오기
## 공공데이터
    * 정부에서 생성하는 모든 데이터 중 공개된 데이터
    * Open API, FIle, Link, Service 형태로 제공

* 공공데이터포탈 : http://data.go.kr

* 서울시 열린데이터광장 : http://data.seoul.go.kr

In [1]:
import requests
import json

In [2]:
servicekey = 'oVe6oF+10r9lo1LeluHJsTCvWRr19wnrvlp0NKaZmrVUTBwevUP04ENshg23yrmpUGxuQewqLnxWKW0Gr0CArw=='
endpoint = 'http://openapi.airport.co.kr/service'

rest = '/rest/AirportCodeList/getAirportCodeList'
params = { 'serviceKey' : servicekey, 'schLineType' : 'I', '_type' : 'json', 'pageNo' : 2 }

In [3]:
response = requests.get(endpoint+rest, params = params)
response

<Response [200]>

endpoint+rest?
* 파이썬에서는 '+' 연산자로 문자열을 연결할 수 있다.
* http://openapi.airport.co.kr/service/rest/AirportCodeList/getAirportCodeList

params
* URL 뒤에 API 에서 정의한 정보를 전달하기 위해 사용
* getAirportCodeList?_type=json&schLineType=I&serviceKey=<인증키>&pageNo=2'

response (http://ko.wikipedia.org/wiki/HTTP_%EC%83%81%ED%83%9C_%EC%BD%94%EB%93%9C)
* 200 - OK
* 404 - Not Found
* 500 - Server Error

In [4]:
response.text

u'{"response":{"header":{"resultCode":"00","resultMsg":"NORMAL SERVICE."},"body":{"items":{"item":[{"cityCode":"KUV","cityEng":"GUNSAN","cityJpn":"\u30af\u30f3\u30b5\u30f3","cityKor":"\uad70\uc0b0"},{"cityCode":"KWJ","cityEng":"GWANGJU","cityJpn":"\u30af\u30a2\u30f3\u30b8\u30e5","cityKor":"\uad11\uc8fc"},{"cityChn":"\uf91f\u5dde","cityCode":"LHW","cityEng":"LANZHOU","cityJpn":"\uf91f\u5dde","cityKor":"\ub780\uc800\uc6b0"},{"cityChn":"\u6fb3\u9580","cityCode":"MFM","cityEng":"MACAU","cityJpn":"\u6fb3\u9580","cityKor":"\ub9c8\uce74\uc624"},{"cityCode":"MKO","cityEng":"MAKAU","cityKor":"\ub9c8\uce74\uc624"},{"cityCode":"MMB","cityEng":"MEMANBETSU","cityJpn":"\u30df\u30de\u30f3\u30c6\u30b9","cityKor":"\uba54\ub9cc\ubca0\uce20"},{"cityCode":"MNL","cityEng":"MANILA","cityJpn":"\u30de\u30cb\u30e9","cityKor":"\ub9c8\ub2d0\ub77c"},{"cityCode":"MPK","cityEng":"MOKPO","cityKor":"\ubaa9\ud3ec"},{"cityCode":"MRO","cityEng":"MORIOKA","cityKor":"\ubaa8\ub9ac\uc624\uce74"},{"cityCode":"MWX","cityEng":

In [5]:
result = response.json()

In [6]:
result

{u'response': {u'body': {u'items': {u'item': [{u'cityCode': u'KUV',
      u'cityEng': u'GUNSAN',
      u'cityJpn': u'\u30af\u30f3\u30b5\u30f3',
      u'cityKor': u'\uad70\uc0b0'},
     {u'cityCode': u'KWJ',
      u'cityEng': u'GWANGJU',
      u'cityJpn': u'\u30af\u30a2\u30f3\u30b8\u30e5',
      u'cityKor': u'\uad11\uc8fc'},
     {u'cityChn': u'\uf91f\u5dde',
      u'cityCode': u'LHW',
      u'cityEng': u'LANZHOU',
      u'cityJpn': u'\uf91f\u5dde',
      u'cityKor': u'\ub780\uc800\uc6b0'},
     {u'cityChn': u'\u6fb3\u9580',
      u'cityCode': u'MFM',
      u'cityEng': u'MACAU',
      u'cityJpn': u'\u6fb3\u9580',
      u'cityKor': u'\ub9c8\uce74\uc624'},
     {u'cityCode': u'MKO',
      u'cityEng': u'MAKAU',
      u'cityKor': u'\ub9c8\uce74\uc624'},
     {u'cityCode': u'MMB',
      u'cityEng': u'MEMANBETSU',
      u'cityJpn': u'\u30df\u30de\u30f3\u30c6\u30b9',
      u'cityKor': u'\uba54\ub9cc\ubca0\uce20'},
     {u'cityCode': u'MNL',
      u'cityEng': u'MANILA',
      u'cityJpn': u'\u30

‘response' -> 'body' -> 'items' -> 'item'

In [7]:
result['response']['body']['items']['item']

[{u'cityCode': u'KUV',
  u'cityEng': u'GUNSAN',
  u'cityJpn': u'\u30af\u30f3\u30b5\u30f3',
  u'cityKor': u'\uad70\uc0b0'},
 {u'cityCode': u'KWJ',
  u'cityEng': u'GWANGJU',
  u'cityJpn': u'\u30af\u30a2\u30f3\u30b8\u30e5',
  u'cityKor': u'\uad11\uc8fc'},
 {u'cityChn': u'\uf91f\u5dde',
  u'cityCode': u'LHW',
  u'cityEng': u'LANZHOU',
  u'cityJpn': u'\uf91f\u5dde',
  u'cityKor': u'\ub780\uc800\uc6b0'},
 {u'cityChn': u'\u6fb3\u9580',
  u'cityCode': u'MFM',
  u'cityEng': u'MACAU',
  u'cityJpn': u'\u6fb3\u9580',
  u'cityKor': u'\ub9c8\uce74\uc624'},
 {u'cityCode': u'MKO',
  u'cityEng': u'MAKAU',
  u'cityKor': u'\ub9c8\uce74\uc624'},
 {u'cityCode': u'MMB',
  u'cityEng': u'MEMANBETSU',
  u'cityJpn': u'\u30df\u30de\u30f3\u30c6\u30b9',
  u'cityKor': u'\uba54\ub9cc\ubca0\uce20'},
 {u'cityCode': u'MNL',
  u'cityEng': u'MANILA',
  u'cityJpn': u'\u30de\u30cb\u30e9',
  u'cityKor': u'\ub9c8\ub2d0\ub77c'},
 {u'cityCode': u'MPK', u'cityEng': u'MOKPO', u'cityKor': u'\ubaa9\ud3ec'},
 {u'cityCode': u'MRO',


In [8]:
import pandas as pd
res = json.loads(response.content, encoding='utf-8')
df = pd.DataFrame(res['response']['body']['items']['item'])
print df

  cityChn cityCode     cityEng cityJpn cityKor
0     NaN      KUV      GUNSAN    クンサン      군산
1     NaN      KWJ     GWANGJU   クアンジュ      광주
2      蘭州      LHW     LANZHOU      蘭州     란저우
3      澳門      MFM       MACAU      澳門     마카오
4     NaN      MKO       MAKAU     NaN     마카오
5     NaN      MMB  MEMANBETSU   ミマンテス    메만베츠
6     NaN      MNL      MANILA     マニラ     마닐라
7     NaN      MPK       MOKPO     NaN      목포
8     NaN      MRO     MORIOKA     NaN    모리오카
9     NaN      MWX        MUAN     NaN      무안


In [None]:
import sys
# reload(sys)
sys.getdefaultencoding()

In [None]:
sys.stdout

In [9]:
print '[[국제선 공항 코드 목록]]'
for item in result['response']['body']['items']['item']:
    item = item.items() #dictionary 자료형을 출력하기 쉽도록 tuple 로 재구성
    print item
    for k,v in item:
        print k,v, v
    print '--------------'

[[국제선 공항 코드 목록]]
[(u'cityEng', u'GUNSAN'), (u'cityKor', u'\uad70\uc0b0'), (u'cityCode', u'KUV'), (u'cityJpn', u'\u30af\u30f3\u30b5\u30f3')]
cityEng GUNSAN GUNSAN
cityKor 군산 군산
cityCode KUV KUV
cityJpn クンサン クンサン
--------------
[(u'cityEng', u'GWANGJU'), (u'cityKor', u'\uad11\uc8fc'), (u'cityCode', u'KWJ'), (u'cityJpn', u'\u30af\u30a2\u30f3\u30b8\u30e5')]
cityEng GWANGJU GWANGJU
cityKor 광주 광주
cityCode KWJ KWJ
cityJpn クアンジュ クアンジュ
--------------
[(u'cityEng', u'LANZHOU'), (u'cityKor', u'\ub780\uc800\uc6b0'), (u'cityCode', u'LHW'), (u'cityJpn', u'\uf91f\u5dde'), (u'cityChn', u'\uf91f\u5dde')]
cityEng LANZHOU LANZHOU
cityKor 란저우 란저우
cityCode LHW LHW
cityJpn 蘭州 蘭州
cityChn 蘭州 蘭州
--------------
[(u'cityEng', u'MACAU'), (u'cityKor', u'\ub9c8\uce74\uc624'), (u'cityCode', u'MFM'), (u'cityJpn', u'\u6fb3\u9580'), (u'cityChn', u'\u6fb3\u9580')]
cityEng MACAU MACAU
cityKor 마카오 마카오
cityCode MFM MFM
cityJpn 澳門 澳門
cityChn 澳門 澳門
--------------
[(u'cityEng', u'MAKAU'), (u'cityCode', u'MKO'), (u'cityKor', u

print k,v
* 문자열에 대해 띄어쓰기로 출력

In [10]:
rest2 = '/rest/FlightScheduleList/getIflightScheduleList'

In [11]:
data = requests.get(endpoint+rest2, params = params)
result = data.json()
print result

{u'response': {u'body': {u'items': {u'item': [{u'internationalSat': u'Y', u'city': u'\ud64d\ucf69', u'airlineKorean': u'\uc5d0\uc5b4\ubd80\uc0b0', u'internationalStdate': u'2016-10-30T00:00:00+09:00', u'internationalMon': u'Y', u'internationalFri': u'Y', u'internationalTime': u'0610', u'internationalWed': u'Y', u'airport': u'\uae40\ud574', u'airlineHomepageUrl': u'www.flyairbusan.com', u'internationalNum': u'BX392', u'internationalThu': u'Y', u'internationalSun': u'Y', u'internationalEddate': u'2017-03-25T00:00:00+09:00', u'internationalIoType': u'IN', u'internationalTue': u'Y'}, {u'internationalSat': u'Y', u'city': u'\ub09c\ub2dd', u'airlineKorean': u'\ud2f0\uc6e8\uc774\ud56d\uacf5', u'internationalStdate': u'2016-10-31T00:00:00+09:00', u'internationalMon': u'N', u'internationalFri': u'N', u'internationalTime': u'0610', u'internationalWed': u'N', u'airport': u'\uc81c\uc8fc', u'airlineHomepageUrl': u'www.twayair.com', u'internationalNum': u'TW632', u'internationalThu': u'Y', u'internat

In [12]:
for item in result['response']['body']['items']['item']:
    item = item.items()
    for k,v in item:
        print k,v
    print '--------------'

internationalSat Y
city 홍콩
airlineKorean 에어부산
internationalStdate 2016-10-30T00:00:00+09:00
internationalMon Y
internationalFri Y
internationalTime 0610
internationalWed Y
airport 김해
airlineHomepageUrl www.flyairbusan.com
internationalNum BX392
internationalThu Y
internationalSun Y
internationalEddate 2017-03-25T00:00:00+09:00
internationalIoType IN
internationalTue Y
--------------
internationalSat Y
city 난닝
airlineKorean 티웨이항공
internationalStdate 2016-10-31T00:00:00+09:00
internationalMon N
internationalFri N
internationalTime 0610
internationalWed N
airport 제주
airlineHomepageUrl www.twayair.com
internationalNum TW632
internationalThu Y
internationalSun N
internationalEddate 2017-03-26T00:00:00+09:00
internationalIoType IN
internationalTue Y
--------------
internationalSat Y
city 코타키나발루
airlineKorean 이스타항공
internationalStdate 2016-10-30T00:00:00+09:00
internationalMon Y
internationalFri Y
internationalTime 0610
internationalWed Y
airport 김해
airlineHomepageUrl www.eastarjet.com
intern

In [13]:
import pandas as pd
df2 = pd.DataFrame(result['response']['body']['items']['item'])
df2

Unnamed: 0,airlineHomepageUrl,airlineKorean,airport,city,internationalEddate,internationalFri,internationalIoType,internationalMon,internationalNum,internationalSat,internationalStdate,internationalSun,internationalThu,internationalTime,internationalTue,internationalWed
0,www.flyairbusan.com,에어부산,김해,홍콩,2017-03-25T00:00:00+09:00,Y,IN,Y,BX392,Y,2016-10-30T00:00:00+09:00,Y,Y,610,Y,Y
1,www.twayair.com,티웨이항공,제주,난닝,2017-03-26T00:00:00+09:00,N,IN,N,TW632,Y,2016-10-31T00:00:00+09:00,N,Y,610,Y,N
2,www.eastarjet.com,이스타항공,김해,코타키나발루,2017-03-25T00:00:00+09:00,Y,IN,Y,ZE942,Y,2016-10-30T00:00:00+09:00,Y,Y,610,Y,Y
3,www.jejuair.net,제주항공,김해,사이판,2017-03-25T00:00:00+09:00,Y,IN,Y,7C3451,N,2016-10-30T00:00:00+09:00,Y,Y,610,N,N
4,www.jejuair.net,제주항공,김해,타이페이(타오위안),2017-03-25T00:00:00+09:00,Y,IN,Y,7C2654,Y,2016-10-30T00:00:00+09:00,Y,Y,610,Y,Y
5,www.flyairbusan.com,에어부산,김해,마카오,2017-03-25T00:00:00+09:00,N,IN,Y,BX382,Y,2016-10-30T00:00:00+09:00,N,Y,615,N,N
6,www.flyairbusan.com,에어부산,김해,시안,2017-03-25T00:00:00+09:00,Y,IN,N,BX342,N,2016-10-30T00:00:00+09:00,Y,Y,615,Y,N
7,www.koreanair.co.kr,대한항공,김해,홍콩,2017-03-25T00:00:00+09:00,Y,IN,Y,KE618,Y,2016-10-30T00:00:00+09:00,Y,Y,620,Y,Y
8,www.flyasiana.com,아시아나항공,김해,사이판,2017-03-25T00:00:00+09:00,N,IN,N,OZ608,N,2016-10-30T00:00:00+09:00,Y,Y,620,N,N
9,www.flyasiana.com,아시아나항공,김해,광쪄우,2017-03-25T00:00:00+09:00,Y,IN,Y,OZ306,N,2016-10-30T00:00:00+09:00,N,N,620,N,N


###XML 데이터

In [14]:
import requests
import json
import xml.etree.ElementTree as et
import pandas as pd

res_subway = requests.get('http://openapi.seoul.go.kr:8088/7163686f486c75693436464f707148/xml/CardSubwayTime/1/5/201503/3호선/신사/')
# print res_subway.content
# print res_subway.text
xml_subway = et.fromstring(res_subway.content)

if xml_subway.find('row') is not None:    
    rows = xml_subway.iter('row')
    df_subway = pd.DataFrame(columns=['name', 'value'])
    # count = 0
    for row in rows.next():
#         print zip(['name', 'value'],[row.tag, row.text])
        r = dict(zip(['name', 'value'],[row.tag, row.text]))
        r_s = pd.Series(r)
    #     r_s.name = count
        df_subway = df_subway.append(r_s, ignore_index=True)
    #     count += 1


    df_subway = df_subway.replace({'RIDE_NUM' : '승차인원'}, regex=True)
    df_subway = df_subway.replace({'ALIGHT_NUM' : '하차인원'}, regex=True)

    timewords = [ "MIDNIGHT", "ONE", "TWO", "THREE", "FOUR", "FIVE", "SIX", "SEVEN", "EIGHT", "NINE", "TEN", 
             "ELEVEN", "TWELVE", "THIRTEEN", "FOURTEEN", "FIFTEEN", "SIXTEEN", "SEVENTEEN", "EIGHTEEN", "NINETEEN",
             "TWENTY", "TWENTY_ONE", "TWENTY_TWO", "TWENTY_THREE"
            ]
    timewords.reverse()

    numwords = []
    for idx, word in enumerate(timewords):    
        numwords.append(str(idx) + '시')

    numwords.reverse()

    df_subway = df_subway.replace(timewords, numwords, regex=True)

df_subway

Unnamed: 0,name,value
0,USE_MON,201503
1,LINE_NUM,3호선
2,SUB_STA_NM,신사
3,4시_승차인원,62
4,4시_하차인원,4
5,5시_승차인원,5120
6,5시_하차인원,1414
7,6시_승차인원,8747
8,6시_하차인원,19326
9,7시_승차인원,18975


#SOAP Example

In [1]:
from pysimplesoap.client import SoapClient
client = SoapClient(wsdl='http://www.webservicex.com/currencyconvertor.asmx?WSDL')

No handlers could be found for logger "pysimplesoap.helpers"


In [2]:
client.services

{u'CurrencyConvertor': {u'documentation': u'',
  u'ports': {u'CurrencyConvertorHttpGet': {u'location': None,
    u'name': u'CurrencyConvertorHttpGet',
    u'operations': {u'ConversionRate': {u'documentation': u"<br><b>Get conversion rate from one currency to another currency <b><br><p><b><font color='#000080' size='1' face='Verdana'><u>Differenct currency Code and Names around the world</u></font></b></p><blockquote><p><font face='Verdana' size='1'>AFA-Afghanistan Afghani<br>ALL-Albanian Lek<br>DZD-Algerian Dinar<br>ARS-Argentine Peso<br>AWG-Aruba Florin<br>AUD-Australian Dollar<br>BSD-Bahamian Dollar<br>BHD-Bahraini Dinar<br>BDT-Bangladesh Taka<br>BBD-Barbados Dollar<br>BZD-Belize Dollar<br>BMD-Bermuda Dollar<br>BTN-Bhutan Ngultrum<br>BOB-Bolivian Boliviano<br>BWP-Botswana Pula<br>BRL-Brazilian Real<br>GBP-British Pound<br>BND-Brunei Dollar<br>BIF-Burundi Franc<br>XOF-CFA Franc (BCEAO)<br>XAF-CFA Franc (BEAC)<br>KHR-Cambodia Riel<br>CAD-Canadian Dollar<br>CVE-Cape Verde Escudo<br>KYD-

In [3]:
client.get_operation('ConversionRate')

{u'action': u'http://www.webserviceX.NET/ConversionRate',
 u'documentation': u"<br><b>Get conversion rate from one currency to another currency <b><br><p><b><font color='#000080' size='1' face='Verdana'><u>Differenct currency Code and Names around the world</u></font></b></p><blockquote><p><font face='Verdana' size='1'>AFA-Afghanistan Afghani<br>ALL-Albanian Lek<br>DZD-Algerian Dinar<br>ARS-Argentine Peso<br>AWG-Aruba Florin<br>AUD-Australian Dollar<br>BSD-Bahamian Dollar<br>BHD-Bahraini Dinar<br>BDT-Bangladesh Taka<br>BBD-Barbados Dollar<br>BZD-Belize Dollar<br>BMD-Bermuda Dollar<br>BTN-Bhutan Ngultrum<br>BOB-Bolivian Boliviano<br>BWP-Botswana Pula<br>BRL-Brazilian Real<br>GBP-British Pound<br>BND-Brunei Dollar<br>BIF-Burundi Franc<br>XOF-CFA Franc (BCEAO)<br>XAF-CFA Franc (BEAC)<br>KHR-Cambodia Riel<br>CAD-Canadian Dollar<br>CVE-Cape Verde Escudo<br>KYD-Cayman Islands Dollar<br>CLP-Chilean Peso<br>CNY-Chinese Yuan<br>COP-Colombian Peso<br>KMF-Comoros Franc<br>CRC-Costa Rica Colon<br>

In [4]:
client.ConversionRate(FromCurrency='KPW', ToCurrency='USD')

{'ConversionRateResult': -1.0}

![soap_request](soap_request.png)
![soap_response](soap_response.png)

#JSON

In [10]:
import requests
res = requests.get('http://www.w3schools.com/js/customers_mysql.php')

In [11]:
res

<Response [200]>

In [12]:
res.json()

[{u'City': u'Berlin', u'Country': u'Germany', u'Name': u'Alfreds Futterkiste'},
 {u'City': u'Lule\xe5',
  u'Country': u'Sweden',
  u'Name': u'Berglunds snabbk\xf6p'},
 {u'City': u'M\xe9xico D.F.',
  u'Country': u'Mexico',
  u'Name': u'Centro comercial Moctezuma'},
 {u'City': u'Graz', u'Country': u'Austria', u'Name': u'Ernst Handel'},
 {u'City': u'Madrid',
  u'Country': u'Spain',
  u'Name': u'FISSA Fabrica Inter. Salchichas S.A.'},
 {u'City': u'Barcelona',
  u'Country': u'Spain',
  u'Name': u'Galer\xeda del gastr\xf3nomo'},
 {u'City': u'Cowes', u'Country': u'UK', u'Name': u'Island Trading'},
 {u'City': u'Brandenburg',
  u'Country': u'Germany',
  u'Name': u'K\xf6niglich Essen'},
 {u'City': u'Vancouver',
  u'Country': u'Canada',
  u'Name': u'Laughing Bacchus Wine Cellars'},
 {u'City': u'Bergamo',
  u'Country': u'Italy',
  u'Name': u'Magazzini Alimentari Riuniti'},
 {u'City': u'London', u'Country': u'UK', u'Name': u'North/South'},
 {u'City': u'Paris',
  u'Country': u'France',
  u'Name': u'

#JSON -> DataFrame

In [13]:
import pandas as pd

In [14]:
df = pd.DataFrame(res.json())

In [15]:
df

Unnamed: 0,City,Country,Name
0,Berlin,Germany,Alfreds Futterkiste
1,Luleå,Sweden,Berglunds snabbköp
2,México D.F.,Mexico,Centro comercial Moctezuma
3,Graz,Austria,Ernst Handel
4,Madrid,Spain,FISSA Fabrica Inter. Salchichas S.A.
5,Barcelona,Spain,Galería del gastrónomo
6,Cowes,UK,Island Trading
7,Brandenburg,Germany,Königlich Essen
8,Vancouver,Canada,Laughing Bacchus Wine Cellars
9,Bergamo,Italy,Magazzini Alimentari Riuniti


#Appendix

In [16]:
import requests
import bs4 as soup
import pandas as pd

In [22]:
zb = requests.get('https://api.zigbang.com/v2/items?lat_south=37.469666050690506&lat_north=37.610794673200786&lng_west=127.03400360696608&lng_east=127.045846503341&room=01;02;03;04;05')
zbdatalist = soup.BeautifulSoup(zb.text)
json_zb = zb.json()
json_zb

{u'list_items': [{u'position': 0,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6243828}},
  {u'position': 1,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6092255}},
  {u'position': 2,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6234041}},
  {u'position': 3,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6131076}},
  {u'position': 4,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6247948}},
  {u'position': 5,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6095930}},
  {u'position': 6,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6335700}},
  {u'position': 7,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6288279}},
  {u'position': 8,
   u'section_type': u'premium_recommand',
   u'simple_item': {u'item_id': 6307000}},
  {u'position': 9,
   u'section_type': u'premium_

In [19]:
zbdatalist

<html><body><p>{"list_items":[{"simple_item":{"item_id":6257496},"section_type":"premium_recommand","position":0},{"simple_item":{"item_id":6265892},"section_type":"premium_recommand","position":1},{"simple_item":{"item_id":6166604},"section_type":"premium_recommand","position":2},{"simple_item":{"item_id":6295405},"section_type":"premium_recommand","position":3},{"simple_item":{"item_id":6245604},"section_type":"premium_recommand","position":4},{"simple_item":{"item_id":6277188},"section_type":"premium_recommand","position":5},{"simple_item":{"item_id":6122565},"section_type":"premium_recommand","position":6},{"simple_item":{"item_id":6320896},"section_type":"premium_recommand","position":7},{"simple_item":{"item_id":6327855},"section_type":"premium_recommand","position":8},{"simple_item":{"item_id":6314971},"section_type":"premium_recommand","position":9},{"simple_item":{"item_id":6334520},"section_type":"premium_recommand","position":10},{"simple_item":{"item_id":6272571},"section_t

In [26]:
pd.DataFrame(json_zb['list_items'])

Unnamed: 0,position,section_type,simple_item
0,0,premium_recommand,{u'item_id': 6243828}
1,1,premium_recommand,{u'item_id': 6092255}
2,2,premium_recommand,{u'item_id': 6234041}
3,3,premium_recommand,{u'item_id': 6131076}
4,4,premium_recommand,{u'item_id': 6247948}
5,5,premium_recommand,{u'item_id': 6095930}
6,6,premium_recommand,{u'item_id': 6335700}
7,7,premium_recommand,{u'item_id': 6288279}
8,8,premium_recommand,{u'item_id': 6307000}
9,9,premium_recommand,{u'item_id': 6310347}


In [33]:
df = pd.DataFrame(json_zb['list_items'])
pd.DataFrame(df.simple_item[3:15:2])

Unnamed: 0,simple_item
3,{u'item_id': 6131076}
5,{u'item_id': 6095930}
7,{u'item_id': 6288279}
9,{u'item_id': 6310347}
11,{u'item_id': 6321776}
13,{u'item_id': 6198915}


df.item[3:15] ?
* Slicing
    * 숫자 Index 를 가지는 경우 일부만 잘라 이용할 수 있음

In [34]:
import xml.etree.ElementTree as et
import requests
import pandas as pd

weather_rss_url = 'http://www.kma.go.kr/wid/queryDFSRSS.jsp?zone=1159068000'
response = requests.get(weather_rss_url)
root = et.fromstring(response.content)
root_data = root.find('channel').find('item').find('description').find('body')
li_data = root_data.findall('data')

# column name 을 얻기 위해 첫번째 data 의 tag만 list로 추출
f_data = li_data[0]
items = f_data.iter()
list_col_name = []
for item in items.next():
    list_col_name.append(item.tag)
print list_col_name

# tag list 를 column name 으로 하는 dataframe 생성
df = pd.DataFrame(columns=list_col_name)

# data 를 돌면서 row 를 추가
for data in li_data:
    row = {} # row 로 추가될 값을 dictionary 로 추출
    items = data.iter()
    for item in items.next():
        row[item.tag]=item.text # {item.tag:item.text}
    s = pd.Series(row, name=data.attrib['seq'])
    df = df.append(s)
# df = df.convert_objects(convert_numeric=True) 
df

['hour', 'day', 'temp', 'tmx', 'tmn', 'sky', 'pty', 'wfKor', 'wfEn', 'pop', 'r12', 's12', 'ws', 'wd', 'wdKor', 'wdEn', 'reh', 'r06', 's06']


Unnamed: 0,hour,day,temp,tmx,tmn,sky,pty,wfKor,wfEn,pop,r12,s12,ws,wd,wdKor,wdEn,reh,r06,s06
0,9,0,8.7,9.5,-999.0,4,1,비,Rain,61,0.0,0.0,1.0,4,남,S,75,2.0120292,0.0
1,12,0,8.8,9.5,-999.0,4,0,흐림,Cloudy,30,0.0,0.0,3.2,7,북서,NW,65,2.0120292,0.0
2,15,0,8.6,9.5,-999.0,4,0,흐림,Cloudy,30,0.0,0.0,3.8,7,북서,NW,52,0.0,0.0
3,18,0,7.7,9.5,-999.0,3,0,구름 많음,Mostly Cloudy,20,0.0,0.0,4.0,7,북서,NW,47,0.0,0.0
4,21,0,5.4,9.5,-999.0,2,0,구름 조금,Partly Cloudy,10,0.0,0.0,4.800000000000001,7,북서,NW,48,0.0,0.0
5,24,0,3.0,9.5,-999.0,2,0,구름 조금,Partly Cloudy,10,0.0,0.0,5.2,7,북서,NW,47,0.0,0.0
6,3,1,0.2,8.2,-1.8,1,0,맑음,Clear,0,0.0,0.0,5.0,7,북서,NW,46,0.0,0.0
7,6,1,-0.9,8.2,-1.8,1,0,맑음,Clear,0,0.0,0.0,4.4,7,북서,NW,49,0.0,0.0
8,9,1,1.5,8.2,-1.8,1,0,맑음,Clear,0,0.0,0.0,4.0,7,북서,NW,42,0.0,0.0
9,12,1,5.6,8.2,-1.8,1,0,맑음,Clear,0,0.0,0.0,4.0,7,북서,NW,33,0.0,0.0


In [35]:
import bs4 as soup
import requests
response = requests.get('http://naver.com')
response
response.text
root = soup.BeautifulSoup(response.text)
rankup = root.find("div", "rankup")
rankup
ranks = rankup.findAll("li")
ranks
for rank in ranks:
    print rank.text

박효신상승78
이재명상승51
판타스틱 듀오상승105
서지혜상승183
우병우상승39
소사이어티 게..상승69
날씨동일0
방탄소년단상승33
롤드컵상승30
로또상승18
박효신상승78


#Breaktime

#파이썬으로 데이터 읽기

In [36]:
file = open('newsjelly_obesity_BMI.csv', 'r')
file.read()

'\xc3\xbc\xc1\xfa\xb7\xae\xc1\xf6\xbc\xf6(BMI)\xb0\xa1 25\xc0\xcc\xbb\xf3\xc0\xc7 \xba\xf1\xc0\xb2,,,,\r\n,2009,2010,2011,2012\r\n\xbc\xad\xbf\xef\xbd\xc3,21.4,21.7,22.7,23.7\r\n\xb0\xad\xb3\xb2\xb1\xb8,18.6,16.3,16.1,19.2\r\n\xb0\xad\xb5\xbf\xb1\xb8,21.7,21.1,23.7,23.8\r\n\xb0\xad\xba\xcf\xb1\xb8,23.4,22.1,25.3,23.8\r\n\xb0\xad\xbc\xad\xb1\xb8,20.7,21.2,23.9,24.5\r\n\xb0\xfc\xbe\xc7\xb1\xb8,22.2,20.5,24.1,23.2\r\n\xb1\xa4\xc1\xf8\xb1\xb8,20.2,21,20.6,21.3\r\n\xb1\xb8\xb7\xce\xb1\xb8,25.4,21.3,24.8,22.2\r\n\xb1\xdd\xc3\xb5\xb1\xb8,23.5,22.4,21.6,25.7\r\n\xb3\xeb\xbf\xf8\xb1\xb8,20.2,25.1,21,25.3\r\n\xb5\xb5\xba\xc0\xb1\xb8,21.5,26.6,24.7,25.1\r\n\xb5\xbf\xb4\xeb\xb9\xae\xb1\xb8,20,22.7,26.3,27.4\r\n\xb5\xbf\xc0\xdb\xb1\xb8,20.9,24.7,22.5,23.7\r\n\xb8\xb6\xc6\xf7\xb1\xb8,19.9,23,18.5,23.3\r\n\xbc\xad\xb4\xeb\xb9\xae\xb1\xb8,19.5,22.6,25.6,25.7\r\n\xbc\xad\xc3\xca\xb1\xb8,22.5,19.4,20.9,20.9\r\n\xbc\xba\xb5\xbf\xb1\xb8,22.7,21.6,22.1,21.7\r\n\xbc\xba\xba\xcf\xb1\xb8,22.5,23.2,23.6,26.6\r

In [37]:
file.close()

In [38]:
import codecs
with codecs.open('newsjelly_obesity_BMI.csv','r','cp949') as file:
    for line in file.readlines():
        print line

체질량지수(BMI)가 25이상의 비율,,,,

,2009,2010,2011,2012

서울시,21.4,21.7,22.7,23.7

강남구,18.6,16.3,16.1,19.2

강동구,21.7,21.1,23.7,23.8

강북구,23.4,22.1,25.3,23.8

강서구,20.7,21.2,23.9,24.5

관악구,22.2,20.5,24.1,23.2

광진구,20.2,21,20.6,21.3

구로구,25.4,21.3,24.8,22.2

금천구,23.5,22.4,21.6,25.7

노원구,20.2,25.1,21,25.3

도봉구,21.5,26.6,24.7,25.1

동대문구,20,22.7,26.3,27.4

동작구,20.9,24.7,22.5,23.7

마포구,19.9,23,18.5,23.3

서대문구,19.5,22.6,25.6,25.7

서초구,22.5,19.4,20.9,20.9

성동구,22.7,21.6,22.1,21.7

성북구,22.5,23.2,23.6,26.6

송파구,19.9,17.9,22.7,24.6

양천구,25.4,19.4,23.1,24.3

영등포구,19.2,22.6,22.6,22.3

용산구,19.5,17.7,22.6,27.7

은평구,23.5,23.8,23.5,21.4

종로구,21.1,22.7,22,22.7

중구,23.5,19.7,20.1,20.9

중랑구,18.2,26.6,26.5,24.9



##CSV (Comma-Seperated Values)
* 쉼표(,)로 구분된 텍스트 데이터
* 유사품
    * TSV (Tab-Seperated Values)
    * SSV (Space-Seperated Values)

###Using csv library

In [39]:
import csv

In [40]:
file = open('coffees.csv','r')

open('파일이름', '모드')
모드
* r - read
* w - write
* a - append

In [41]:
reader = csv.reader(file, delimiter=',', quotechar='"')

In [42]:
for row in reader:
    print row

['mericano', 'caffe latte', 'dutch coffee', 'espresso', 'cappuccino']
['europicano', 'greentea latte', 'england coffee', 'nesopresso', 'italiano']


In [43]:
file.close()

In [44]:
file = open('coffees_using_csv.csv', 'w')

In [45]:
writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

In [46]:
writer.writerow(['americano', 'caffe latte', 'dutch, coffee', 'espresso', 'cappuccino'])

In [47]:
writer.writerow(['europicano', 'greentea latte', 'england coffee', 'nesopresso', 'italiano'])

In [48]:
file.close()

###Using pandas

In [49]:
import pandas as pd

In [50]:
df = pd.read_csv('coffees.csv', names=['a','b','c','d','e','f'])

In [51]:
df

Unnamed: 0,a,b,c,d,e,f
0,mericano,caffe latte,dutch coffee,espresso,cappuccino,
1,europicano,greentea latte,england coffee,nesopresso,italiano,


In [52]:
df.to_csv('/Users/luinhon/Downloads/coffees_using_pandas.csv')

##JSON (Javascript Object Notation)
* 텍스트 포멧으로 저장하여 이용하기도 함

###Using json library

In [53]:
import json

In [54]:
file = open('donuts.json', 'r')

In [55]:
decoded = json.load(file)

In [56]:
decoded

{u'batters': {u'batter': [{u'id': u'1001', u'type': u'Regular'},
   {u'id': u'1002', u'type': u'Chocolate'},
   {u'id': u'1003', u'type': u'Blueberry'},
   {u'id': u'1004', u'type': u"Devil's Food"}]},
 u'id': u'0001',
 u'name': u'Cake',
 u'ppu': 0.55,
 u'topping': [{u'id': u'5001', u'type': u'None'},
  {u'id': u'5002', u'type': u'Glazed'},
  {u'id': u'5005', u'type': u'Sugar'},
  {u'id': u'5007', u'type': u'Powdered Sugar'},
  {u'id': u'5006', u'type': u'Chocolate with Sprinkles'},
  {u'id': u'5003', u'type': u'Chocolate'},
  {u'id': u'5004', u'type': u'Maple'}],
 u'type': u'donut'}

In [57]:
print decoded['type']

donut


In [58]:
print decoded['name']

Cake


In [59]:
decoded['batters']

{u'batter': [{u'id': u'1001', u'type': u'Regular'},
  {u'id': u'1002', u'type': u'Chocolate'},
  {u'id': u'1003', u'type': u'Blueberry'},
  {u'id': u'1004', u'type': u"Devil's Food"}]}

In [60]:
decoded['topping']

[{u'id': u'5001', u'type': u'None'},
 {u'id': u'5002', u'type': u'Glazed'},
 {u'id': u'5005', u'type': u'Sugar'},
 {u'id': u'5007', u'type': u'Powdered Sugar'},
 {u'id': u'5006', u'type': u'Chocolate with Sprinkles'},
 {u'id': u'5003', u'type': u'Chocolate'},
 {u'id': u'5004', u'type': u'Maple'}]

In [61]:
file.close()

In [62]:
file = open('donut_using_json.json', 'w')

In [63]:
england_league_income = 9995
england_teams = ['liverpool','arsenal','manchester']
spain_league_income = 12985
spain_teams = ['barcelona','madrid','atlantico']
italy_league_income = 7128
italy_teams = ['juventus','ac milan','forgot']

soccer_dic = { 'leagues': [
        {'country': 'england', 'income': england_league_income, 'teams': england_teams},
        {'country': 'spain', 'income': spain_league_income, 'teams': spain_teams },
        {'country': 'italy', 'income': italy_league_income, 'teams': italy_teams },
    ]}

soccer_dic


{'leagues': [{'country': 'england',
   'income': 9995,
   'teams': ['liverpool', 'arsenal', 'manchester']},
  {'country': 'spain',
   'income': 12985,
   'teams': ['barcelona', 'madrid', 'atlantico']},
  {'country': 'italy',
   'income': 7128,
   'teams': ['juventus', 'ac milan', 'forgot']}]}

In [64]:
file.write(json.dumps(soccer_dic))

In [65]:
file.close()

###Using pandas

In [66]:
import pandas as pd

In [71]:
pd.read_json('donuts.json')

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

Table 형태로 읽을 수 없는 JSON 파일

In [72]:
df = pd.read_json('donut_using_json.json')

In [73]:
for record in df['leagues']:
    print record

{u'country': u'england', u'income': 9995, u'teams': [u'liverpool', u'arsenal', u'manchester']}
{u'country': u'spain', u'income': 12985, u'teams': [u'barcelona', u'madrid', u'atlantico']}
{u'country': u'italy', u'income': 7128, u'teams': [u'juventus', u'ac milan', u'forgot']}


In [74]:
df.leagues[0]

{u'country': u'england',
 u'income': 9995,
 u'teams': [u'liverpool', u'arsenal', u'manchester']}

In [75]:
pd.DataFrame(df.leagues[0])

Unnamed: 0,country,income,teams
0,england,9995,liverpool
1,england,9995,arsenal
2,england,9995,manchester


#OpenAPI 로 가져온 데이터 파일로 저장하기

In [76]:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import requests
import json
servicekey = 'oVe6oF+10r9lo1LeluHJsTCvWRr19wnrvlp0NKaZmrVUTBwevUP04ENshg23yrmpUGxuQewqLnxWKW0Gr0CArw=='
endpoint = 'http://openapi.airport.co.kr/service'

rest = '/rest/AirportCodeList/getAirportCodeList'
params = { 'serviceKey' : servicekey, 'schLineType' : 'I', '_type' : 'json', 'pageNo' : 2 }

In [77]:
response = requests.get(endpoint+rest, params = params)
response

<Response [200]>

In [78]:
result = response.json()

In [79]:
import pandas as pd
df = pd.DataFrame(result['response']['body']['items']['item'])
df

Unnamed: 0,cityChn,cityCode,cityEng,cityJpn,cityKor
0,,KUV,GUNSAN,クンサン,군산
1,,KWJ,GWANGJU,クアンジュ,광주
2,蘭州,LHW,LANZHOU,蘭州,란저우
3,澳門,MFM,MACAU,澳門,마카오
4,,MKO,MAKAU,,마카오
5,,MMB,MEMANBETSU,ミマンテス,메만베츠
6,,MNL,MANILA,マニラ,마닐라
7,,MPK,MOKPO,,목포
8,,MRO,MORIOKA,,모리오카
9,,MWX,MUAN,,무안


In [80]:
df.to_csv('AirportCodeList.csv')

#CSV utf-8 저장

"\xEF\xBB\xBF"

In [None]:
file = open('AirportCodeList_utf.csv','w')
file.write('\xEF\xBB\xBF')

In [None]:
df.to_csv(file)

In [None]:
file.close()

#EXCEL

In [None]:
import pandas as pd
xlsdata = pd.read_excel('20150427_085237.xlsx',u'RAW데이터', index_col=None, na_values=['NA'])

In [None]:
xlsdata

#HTML
lxml, html5lib 라이브러리 설치 필요

In [None]:
import pandas as pd
url = 'http://www.fdic.gov/bank/individual/failed/banklist.html'
dfs = pd.read_html(url)

In [None]:
dfs[0]

#cp949

In [None]:
import pandas as pd
df = pd.read_csv('newsjelly_obesity_BMI.csv', encoding='cp949', skiprows=1, index_col=0)

In [None]:
df

In [None]:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')