# Industry 4.0 의 중심, BigData

<div align='right'><font size=2 color='gray'>Data Processing Based Python @ <font color='blue'><a href='https://www.facebook.com/jskim.kr'>FB / jskim.kr</a></font>, [김진수](bigpycraft@gmail.com)</font></div>
<hr>

# Pandas Basic 6

In [1]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

## 6. Other pandas topics
> pandas와 관련된 기타 주제

### <font color='brown'> Integer indexing </font>
> 정수 색인
- pandas 객체를 정수로 색인해서 사용하는 일은 파이썬에서 리스트나 튜플 같은 기본 자료 구조에서 사용하는 색인의 의미와 약간 달라서 종종 실수를 할 수 있으므로 주의한다 !!

In [241]:
ser = Series(np.arange(3.))
ser

0    0.0
1    1.0
2    2.0
dtype: float64

In [242]:
ser.iloc[-1]

2.0

In [243]:
ser2 = Series(np.arange(3.), index=['a', 'b', 'c'])
ser2

a    0.0
b    1.0
c    2.0
dtype: float64

In [244]:
ser2[-1]

2.0

In [245]:
ser.ix[:1]

0    0.0
1    1.0
dtype: float64

In [246]:
ser3 = Series(range(3), index=[-5, 1, 3])
ser3

-5    0
 1    1
 3    2
dtype: int32

In [247]:
ser3.iloc[2]

2

In [248]:
frame = DataFrame(np.arange(6).reshape((3, 2)), index=[2, 0, 1])
frame

Unnamed: 0,0,1
2,0,1
0,2,3
1,4,5


In [249]:
frame.iloc[0]

0    0
1    1
Name: 2, dtype: int32

### <font color='brown'> Panel data </font>
> Panel 데이터
- Panel : DataFrame 객체를 담고 있는 사전이나 3차원 ndarray를 통해 생성할 수 있다.
- <font color='blue'> Panel은 DataFrame의 3차원 버전이라고 이해 !! </font>
- pandas 개발은 스프레드시트 형식의 데이터를 다루는 데 초점을 맞추고 있고,
- 계층적 색인을 이용하면 대개의 경우 N차원 배열은 불필요하다.

In [250]:
import pandas.io.data as web

pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk))
                       for stk in ['AAPL', 'GOOG', 'MSFT', 'DELL']))

In [251]:
pdata

<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 1795 (major_axis) x 6 (minor_axis)
Items axis: AAPL to MSFT
Major_axis axis: 2010-01-04 00:00:00 to 2017-01-20 00:00:00
Minor_axis axis: Open to Adj Close

In [252]:
pdata = pdata.swapaxes('items', 'minor')
pdata['Adj Close']

Unnamed: 0_level_0,AAPL,DELL,GOOG,MSFT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-01-04,27.847252,14.06528,313.062468,25.710416
2010-01-05,27.895396,14.38450,311.683844,25.718722
2010-01-06,27.451683,14.10397,303.826685,25.560888
2010-01-07,27.400936,14.23940,296.753749,25.295062
2010-01-08,27.583106,14.36516,300.709808,25.469510
2010-01-11,27.339779,14.37483,300.255255,25.145534
2010-01-12,27.028789,14.56830,294.945572,24.979392
2010-01-13,27.410045,14.57797,293.252243,25.211991
2010-01-14,27.251297,14.22005,294.630868,25.718722
2010-01-15,26.795872,13.92985,289.710772,25.635652


In [253]:
pdata.ix[:, '6/1/2012', :]

Unnamed: 0,Open,High,Low,Close,Volume,Adj Close
AAPL,569.159996,572.650009,560.520012,560.989983,130246900.0,72.996726
DELL,12.15,12.3,12.045,12.07,19397600.0,11.67592
GOOG,571.790972,572.650996,568.350996,570.981,6138700.0,285.205295
MSFT,28.76,28.959999,28.440001,28.450001,56634300.0,25.093451


In [254]:
pdata.ix['Adj Close', '5/22/2012':, :]

Unnamed: 0_level_0,AAPL,DELL,GOOG,MSFT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2012-05-22,72.473644,14.58765,300.100412,26.248896
2012-05-23,74.241986,12.08221,304.426106,25.675584
2012-05-24,73.560156,12.04351,301.528978,25.640302
2012-05-25,73.165884,12.05319,295.470050,25.631482
2012-05-28,,12.05319,,
2012-05-29,74.464493,12.24666,296.873645,26.072491
2012-05-30,75.362333,12.14992,293.821674,25.878448
2012-05-31,75.174961,11.92743,290.140354,25.746145
2012-06-01,72.996726,11.67592,285.205295,25.093451
2012-06-04,73.426126,11.60821,289.006480,25.181652


In [255]:
stacked = pdata.ix[:, '5/30/2012':, :].to_frame()
stacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Open,High,Low,Close,Volume,Adj Close
Date,minor,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-05-30,AAPL,569.199997,579.989990,566.559990,579.169998,132357400.0,75.362333
2012-05-30,DELL,12.590000,12.700000,12.460000,12.560000,19787800.0,12.149920
2012-05-30,GOOG,588.161028,591.901014,583.530999,588.230992,3827600.0,293.821674
2012-05-30,MSFT,29.350000,29.480000,29.120001,29.340000,41585500.0,25.878448
2012-05-31,AAPL,580.740021,581.499985,571.460022,577.730019,122918600.0,75.174961
2012-05-31,DELL,12.530000,12.540000,12.330000,12.330000,19955600.0,11.927430
2012-05-31,GOOG,588.720982,590.001032,579.001013,580.860990,5958800.0,290.140354
2012-05-31,MSFT,29.299999,29.420000,28.940001,29.190001,39134000.0,25.746145
2012-06-01,AAPL,569.159996,572.650009,560.520012,560.989983,130246900.0,72.996726
2012-06-01,DELL,12.150000,12.300000,12.045000,12.070000,19397600.0,11.675920


In [256]:
stacked.to_panel()

<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 1182 (major_axis) x 4 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2012-05-30 00:00:00 to 2017-01-20 00:00:00
Minor_axis axis: AAPL to MSFT

<hr>
<marquee><font size=3 color='brown'>The BigpyCraft find the information to design valuable society with Technology & Craft.</font></marquee>
<div align='right'><font size=2 color='gray'> &lt; The End &gt; </font></div>