# `pandas` 표 연습<br>Practices for `pandas` table

## `pandas` 소개<br>About `pandas`

`pandas`는 데이터 취급과 분석을 위한 파이썬 라이브러리 가운데 하나이다.<br>`pandas` is one of python libraries to handle and analyze data.

In [None]:
import pandas as pd



## 데이터 가져오기<br>Importing data

In [None]:
domestic_url = 'https://finance.naver.com/sise/sise_quant.nhn'



In [None]:
# https://stackoverflow.com/a/25669685
def percent(percent_str):
    return float(percent_str.strip('%'))*0.01


assert 0.1 == percent('10%')
assert -0.1 == percent('-10%')



In [None]:
table_list = pd.read_html(
    domestic_url,
    encoding='cp949',
    converters={'등락률': percent},
    header=0
)



In [None]:
len(table_list)



In [None]:
table_list[0]



In [None]:
table_list[1]



In [None]:
table = table_list[1]



### 인덱스 설정과 자료 없는 행 삭제<br>`set_index()` and `dropna()`



In [None]:
table.dropna(how='all', inplace=True)



In [None]:
assert 'N' in table.columns, table.columns



In [None]:
table.N = table.N.astype(int)
table.set_index('N', inplace=True)



In [None]:
table.dtypes



In [None]:
table



### 데이터 저장하기<br>Storing data

#### CSV : Comma-separated values



In [None]:
csv_filename = 'sample.csv'

table.to_csv(csv_filename)
!ls -l *.csv



#### Excel



In [None]:
xls_filename = 'sample.xls'

table.to_excel(xls_filename)
!ls -l sample.*



## 조건에 맞는 행 검색<br>Filtering rows



### 특정 열 조건<br>Condition for a column



In [None]:
table.ROE.dtypes



In [None]:
df_roe_10 = table.loc[table.ROE > 10]



In [None]:
df_roe_10



### 특정 행의 값이 어떤 모임에 소속<br>Membership of the values of a row

#### 또 다른 표 받아 오기<br>Getting another table



In [None]:
machinary_url = 'https://finance.naver.com/sise/sise_group_detail.nhn?type=upjong&no=138'



In [None]:
machinary_df_list = pd.read_html(machinary_url, encoding='cp949', header=0)



In [None]:
len(machinary_df_list)



In [None]:
machinary_df_list[-1]



In [None]:
machinary_df = machinary_df_list[-1]



In [None]:
machinary_df.dropna(how='all', inplace=True)



In [None]:
machinary_df



#### 소속 여부 적용<br>Membership

In [None]:
machinary_table = table.loc[
    table['종목명'].isin(machinary_df['종목명'])
]



In [None]:
machinary_table



### 패턴<br>Pattern



특정 열의 문자열 값이 'KODEX'로 시작하는 행 선택<br>Selecting rows with string fo one of columns starts with 'KODEX'



In [None]:
kodex_table = table.loc[table['종목명'].str.startswith('KODEX')]



In [None]:
kodex_table



## 시각화

### `seaborn`

In [None]:
import seaborn as sns



#### 1차원 히스토그램<br>One-dimensional histogram



In [None]:
x = table['거래량']
sns.distplot(x, bins=20)
sns.utils.axlabel("volume", "frequency")



#### 히스토그램과 산점도<br>Histogram and scatter plot



In [None]:
table_xy = table.loc[:, ['거래량', '등락률', 'PER', 'ROE']]
table_xy.rename(
    columns={
        '거래량': 'volume',
        '등락률': 'change',
    },
    inplace=True,
)



In [None]:
table_xy['EPR'] = 1.0 / table['PER']



In [None]:
table_xy.dtypes



In [None]:
table_xy.shape



In [None]:
sns.jointplot(x="ROE", y="EPR", data=table_xy, kind="reg")



#### 이변량 산점도<br>pairplot()

In [None]:
sns.pairplot(table_xy.loc[:, ['volume', 'PER', 'EPR', 'ROE']])



## Wes McKinny, Data science without borders, 2017

[![video](https://i.ytimg.com/vi/wdmf1msbtVs/hqdefault.jpg)](https://youtu.be/wdmf1msbtVs)



## 참고문헌<br>References



* 매키니 저, 김영근 역, 파이썬 라이브러리를 활용한 데이터 분석, 한빛미디어, 2013.<br>Wes McKinney, Python for Data Analysis, 2nd Ed., O'Reilly, 2017.
* 브라운리 저, 한창진 외 역, 파이썬 데이터 분석 입문, 한빛미디어, 2017.<br>
Reference : Brownley, Foundations for Analytics with Python, O.Reilly, 2016.
* Excelsior-JH, Pandas를 이용한 Naver금융에서 주식데이터 가져오기, Tistory, 2017 [Online] Available : https://excelsior-cjh.tistory.com/109.
* Wikipedia contributors, Pandas (software), Wikipedia, [Online] Available : https://en.wikipedia.org/wiki/Pandas_(software).
* Pandas contributors, Pandas Documentation, [Online] Available : https://pandas.pydata.org/pandas-docs/stable/.
* Wikipedia contributors, Comma-separated values, Wikipedia, [Online] Available : https://en.wikipedia.org/wiki/Comma-separated_values.



## Final Bell<br>마지막 종



In [None]:
# stackoverfow.com/a/24634221
import os
os.system("printf '\a'");

