In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Motivation: Python for Data Analysis
- **목적**
    - Python으로 데이터를 입출력하고 전처리 후 데이터 분석을 진행할 수 있는 역량을 기른다.
    - 파이썬 기초 프래그래밍 및 Pandas, Numpy 등 분석 라이브러리의 이용 방법을 익힌다.



- **파이썬 소개**
    - 간결하면서도 강력한 프로그래밍 기능 (다른 언어와 다르게 매우 간결하지만 많은 기능을 실행 가능)
    - 높은 범용성 (다른 개발언어 및 Spark/Hadoop 등 데이터 처리 언어와 높은 궁합)
    - Pandas, Numpy, Matplotlib, Scikit-learn 등 많지는 않지만 분석에 필수적인 라이브러리 보유
    - 빠른 연산 속도 및 효율적인 리소스 관리 가능

### 1. 필수 프로그램/라이브러리 설치

- Anaconda 2.7 버전 (Python, Jupyter 등 필수 프로그램을 패키지로 설치 가능) [Download Anaconda](https://www.continuum.io/downloads)
- 필수 라이브러리 목록: **Numpy, Pandas, Matplotlib, Seabron, Scipy, Statsmodel, Scikit-learn**

    - 데이터셋 입력, 데이터셋 탐색 (행열의 개수, 변수의 형태 등 파악) → **Pandas**
    - 데이터셋 변환 (변수 형태 변환, 파생변수 생성, 변수 삭제, 이상치 제거, 필터, 정렬 등) → **Pandas, Numpy**
    - 기술 통계 분석 (평균, 표준편차, 빈도수, 비율 확인, 상관분석 등) → **Pandas, Numpy**
    - 추론 통계 분석 (가설 검증, 분포 추정, 신뢰구간 확인 등) → **SciPy, StatsModel**
    - 기계 학습 (회귀분석, 의사결정나무, 연관성 분석 등) → **Scikit-learn, StatsModels**
    - 시각화 → **Matplotlib, Seaborn**
    - 코드 편집기 → **Jupyter Notebook**

### 2. How to use Jupyter Notebook
- 목적: Jupyter Notebook에 대한 주요 기능을 파악하고 실습한다
- 실습과제: H 키를 눌러 Keyborad shortcut을 확인한다.
- 실습과제: 자주 사용하는 Shortcut을 외운다 (Shift + Enter등)
- 실습과제: Markdown 기능에 대해 간단히 살펴본다

### 3. Python 기초 프로그래밍
- 목적: Python 데이터 형태와 함수, 조건문 사용 방법 등을 숙지하고 실습한다
- 본 수업에서는 기본적인 내용만 다루고 자세한 사항은 다음 링크를 참조한다 → [점프투파이썬](https://wikidocs.net/book/1)


- **Python 스칼라형**
    - None: Null 값
    - str: 문자열
    - unicode 유니코드 문자열
    - float: 부동소수점 실수
    - bool: True or False
    - int: 정수
    - long: 긴 정수 


- **Python 자료형**
    - 숫자형
    - 문자열
    - 리스트
    - ~~튜플~~
    - ~~딕셔너리~~
    - ~~집합~~
    - ~~불리언~~

In [2]:
i = 23423
i1 = i * 6
print i1
print i

140538
23423


In [4]:
a = 3
b = 4
a ** b

81

In [8]:
a = "this is a string"
a

'this is a string'

In [9]:
a[3]

's'

In [10]:
b = a.replace('a', 'the')
b

'this is the string'

In [12]:
x = 4.53
type(x)

float

In [13]:
s = 'python'
list(s)
#s[:2]

['p', 'y', 't', 'h', 'o', 'n']

In [15]:
a = [1,2,3,4,5]
a

[1, 2, 3, 4, 5]

In [18]:
print a[0] + a[3]
print a[0:3]

5
[1, 2, 3]


In [19]:
b = [1,2,3] *3 
b

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [20]:
b.append(4)
b

[1, 2, 3, 1, 2, 3, 1, 2, 3, 4]

In [21]:
b.append([7,7])
b

[1, 2, 3, 1, 2, 3, 1, 2, 3, 4, [7, 7]]

In [22]:
b.sort()
b

[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, [7, 7]]

In [23]:
b.count(2)

3

In [24]:
b.extend([8,8])
b

[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, [7, 7], 8, 8]

In [25]:
x = 7
n = 8

if x > 7:
    print("땡")
else:
    print("정답")

정답


In [26]:
mylist = [2,3,4,5]

if 1 in mylist:
    print("YES")
else:
    print("NO")

NO


In [27]:
t = 0
while t < 10:
    t = t + 1
    print t
    if t == 10:
        print "The end"

1
2
3
4
5
6
7
8
9
10
The end


In [29]:
k = ['a', 'b', 'c']

for x in k:
    t = x + 'yyy' 
    print t

ayyy
byyy
cyyy


In [30]:
m = [30, 40, 22, 34, 123, 555]

i = 0
for x in m:
    i = i + 1
    if x <= 50:
        print "Low Performance"
    else:
        print "High Performance"

Low Performance
Low Performance
Low Performance
Low Performance
High Performance
High Performance


In [31]:
res = [x * 10 for x in m]
res

[300, 400, 220, 340, 1230, 5550]

In [32]:
res1 = [x * 10 for x in m if x <= 50]
res1

[300, 400, 220, 340]

In [35]:
def merong(a, x):
    res = a * x / 2
    return res    

merong(3,4)

6

In [36]:
range(5)

[0, 1, 2, 3, 4]

In [37]:
map(lambda x: x ** 2, range(5))

[0, 1, 4, 9, 16]

In [38]:
filter(lambda x: x < 5, range(10))

[0, 1, 2, 3, 4]

### 4. Pandas 
- 목적: 데이터 분석에 특화된 라이브러리인 Pandas의 기본 방법을 숙지하고 실습한다
- Pandas 주요 특성 by Wes McKinney
    - A set of <font color="red">labled</font> array data structures
    - <code>Series</code>(1D labeled homogeneously-typed array) and <code>DataFrame</code>(General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed columns)
    - Integrated group by engine 
    - Date range generation (<code>date_range</code>)
    - Input/Output tools
    - Moving window statistics (rolling mean, rolling standard deviation, etc)
   
- 상세 메뉴얼은 다음 링크를 참조한다 → [PDF Download](http://pandas.pydata.org/pandas-docs/stable/pandas.pdf) [HTML Docs](http://pandas.pydata.org/pandas-docs/stable/)

In [40]:
import pandas as pd
import numpy as np

In [41]:
s = pd.Series([1,3,5,np.nan,6,8])
s

0     1
1     3
2     5
3   NaN
4     6
5     8
dtype: float64

In [42]:
dates = pd.date_range('20161220', periods=6)
dates

DatetimeIndex(['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23',
               '2016-12-24', '2016-12-25'],
              dtype='datetime64[ns]', freq='D')

In [46]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')).round(2)
df

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-25,-1.61,-0.26,-0.52,0.32


In [48]:
df2 = pd.DataFrame({'A' : 1., 
                    'B' : pd.Timestamp('20161220'),
                    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D' : np.array([3] * 4, dtype='int32'),
                    'E' : pd.Categorical(["test", "train", "test", "train"]),
                    'F' : 'foo'
                   })
df2

Unnamed: 0,A,B,C,D,E,F
0,1,2016-12-20,1,3,test,foo
1,1,2016-12-20,1,3,train,foo
2,1,2016-12-20,1,3,test,foo
3,1,2016-12-20,1,3,train,foo


In [50]:
df.head(3)

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75


In [51]:
df.tail(2)

Unnamed: 0,A,B,C,D
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-25,-1.61,-0.26,-0.52,0.32


In [52]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6 entries, 2016-12-20 to 2016-12-25
Freq: D
Data columns (total 4 columns):
A    6 non-null float64
B    6 non-null float64
C    6 non-null float64
D    6 non-null float64
dtypes: float64(4)
memory usage: 240.0 bytes


In [53]:
df.index

DatetimeIndex(['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23',
               '2016-12-24', '2016-12-25'],
              dtype='datetime64[ns]', freq='D')

In [54]:
df.columns

Index([u'A', u'B', u'C', u'D'], dtype='object')

In [55]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.125,0.055,0.266667,0.376667
std,1.072991,1.495563,1.133078,0.580333
min,-1.61,-1.3,-0.97,-0.57
25%,-0.4025,-0.665,-0.4075,0.125
50%,-0.285,-0.225,0.03,0.475
75%,0.0575,-0.1075,0.6325,0.72
max,1.7,2.96,2.23,1.07


In [57]:
df

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-25,-1.61,-0.26,-0.52,0.32


In [61]:
df.sort_index(axis=0, ascending=False)

Unnamed: 0,A,B,C,D
2016-12-25,-1.61,-0.26,-0.52,0.32
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-20,-0.38,2.96,0.8,0.63


In [62]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-25,-1.61,-0.26,-0.52,0.32
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-20,-0.38,2.96,0.8,0.63


In [68]:
df

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-25,-1.61,-0.26,-0.52,0.32


In [67]:
type(df[['A', 'C']])

pandas.core.frame.DataFrame

In [None]:
type(df['A'])

In [69]:
df[0:3]

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75


In [71]:
dates[0]

Timestamp('2016-12-20 00:00:00', offset='D')

In [72]:
df.loc[dates[0]]

A   -0.38
B    2.96
C    0.80
D    0.63
Name: 2016-12-20 00:00:00, dtype: float64

In [74]:
df.loc[:, ['A','C']]

Unnamed: 0,A,C
2016-12-20,-0.38,0.8
2016-12-21,-0.41,2.23
2016-12-22,1.7,-0.07
2016-12-23,0.14,0.13
2016-12-24,-0.19,-0.97
2016-12-25,-1.61,-0.52


In [75]:
df.loc['20161220':'20161223', ['A', 'B']]

Unnamed: 0,A,B
2016-12-20,-0.38,2.96
2016-12-21,-0.41,-0.8
2016-12-22,1.7,-1.3
2016-12-23,0.14,-0.19


In [76]:
df.loc['20161220', ['A', 'B']]

A   -0.38
B    2.96
Name: 2016-12-20 00:00:00, dtype: float64

In [77]:
df.loc[dates[0], 'A']

-0.38

In [78]:
df.iloc[3]

A    0.14
B   -0.19
C    0.13
D    1.07
Name: 2016-12-23 00:00:00, dtype: float64

In [79]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2016-12-23,0.14,-0.19
2016-12-24,-0.19,-0.08


In [80]:
df.iloc[1:3,:]

Unnamed: 0,A,B,C,D
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75


In [81]:
df.iloc[1,1]

-0.80000000000000004

In [82]:
df[df.A > 0]

Unnamed: 0,A,B,C,D
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-23,0.14,-0.19,0.13,1.07


In [83]:
df2 = df.copy(deep=True)

In [84]:
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']
df2

Unnamed: 0,A,B,C,D,E
2016-12-20,-0.38,2.96,0.8,0.63,one
2016-12-21,-0.41,-0.8,2.23,0.06,one
2016-12-22,1.7,-1.3,-0.07,0.75,two
2016-12-23,0.14,-0.19,0.13,1.07,three
2016-12-24,-0.19,-0.08,-0.97,-0.57,four
2016-12-25,-1.61,-0.26,-0.52,0.32,three


In [85]:
df2[df2['E'].isin(['two', 'four'])]

Unnamed: 0,A,B,C,D,E
2016-12-22,1.7,-1.3,-0.07,0.75,two
2016-12-24,-0.19,-0.08,-0.97,-0.57,four


In [89]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1], 'E'] = 1
df1

Unnamed: 0,A,B,C,D,E
2016-12-20,-0.38,2.96,0.8,0.63,1.0
2016-12-21,-0.41,-0.8,2.23,0.06,1.0
2016-12-22,1.7,-1.3,-0.07,0.75,
2016-12-23,0.14,-0.19,0.13,1.07,


In [90]:
df1.dropna(how='any')

Unnamed: 0,A,B,C,D,E
2016-12-20,-0.38,2.96,0.8,0.63,1
2016-12-21,-0.41,-0.8,2.23,0.06,1


In [91]:
df1.dropna(how='all')

Unnamed: 0,A,B,C,D,E
2016-12-20,-0.38,2.96,0.8,0.63,1.0
2016-12-21,-0.41,-0.8,2.23,0.06,1.0
2016-12-22,1.7,-1.3,-0.07,0.75,
2016-12-23,0.14,-0.19,0.13,1.07,


In [92]:
df1.fillna(5)

Unnamed: 0,A,B,C,D,E
2016-12-20,-0.38,2.96,0.8,0.63,1
2016-12-21,-0.41,-0.8,2.23,0.06,1
2016-12-22,1.7,-1.3,-0.07,0.75,5
2016-12-23,0.14,-0.19,0.13,1.07,5


In [94]:
pd.isnull(df1).sum()

A    0
B    0
C    0
D    0
E    2
dtype: int64

In [95]:
df.mean()

A   -0.125000
B    0.055000
C    0.266667
D    0.376667
dtype: float64

In [96]:
df.apply(np.cumsum)

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.79,2.16,3.03,0.69
2016-12-22,0.91,0.86,2.96,1.44
2016-12-23,1.05,0.67,3.09,2.51
2016-12-24,0.86,0.59,2.12,1.94
2016-12-25,-0.75,0.33,1.6,2.26


In [97]:
df.apply(lambda x: x.max() - x.min())

A    3.31
B    4.26
C    3.20
D    1.64
dtype: float64

In [98]:
df

Unnamed: 0,A,B,C,D
2016-12-20,-0.38,2.96,0.8,0.63
2016-12-21,-0.41,-0.8,2.23,0.06
2016-12-22,1.7,-1.3,-0.07,0.75
2016-12-23,0.14,-0.19,0.13,1.07
2016-12-24,-0.19,-0.08,-0.97,-0.57
2016-12-25,-1.61,-0.26,-0.52,0.32


In [99]:
s = pd.Series(np.random.randint(0, 7, size=10))
s

0    2
1    3
2    2
3    3
4    4
5    4
6    4
7    2
8    5
9    1
dtype: int32

In [100]:
s.value_counts()

4    3
2    3
3    2
5    1
1    1
dtype: int64

In [101]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
s

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object

In [102]:
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

In [105]:
df = pd.DataFrame(np.random.randn(10, 4))
df

Unnamed: 0,0,1,2,3
0,-0.700626,0.052017,0.522772,0.627879
1,-0.008108,-0.342504,0.528208,-1.144252
2,0.808317,1.666355,1.240376,-0.064106
3,0.288669,0.699611,-1.156847,-0.500837
4,0.679061,0.101091,2.0632,2.405293
5,-0.084759,-0.393185,-1.316562,-1.108606
6,-0.383322,-0.243338,-0.79788,-1.59958
7,1.904239,-0.889455,0.144801,-0.596677
8,-2.371958,0.984856,-0.204652,0.242562
9,1.618237,-0.062098,-0.515375,0.332189


In [106]:
pieces = [df[:3], df[3:7], df[7:]]

In [107]:
pieces

[          0         1         2         3
 0 -0.700626  0.052017  0.522772  0.627879
 1 -0.008108 -0.342504  0.528208 -1.144252
 2  0.808317  1.666355  1.240376 -0.064106,
           0         1         2         3
 3  0.288669  0.699611 -1.156847 -0.500837
 4  0.679061  0.101091  2.063200  2.405293
 5 -0.084759 -0.393185 -1.316562 -1.108606
 6 -0.383322 -0.243338 -0.797880 -1.599580,
           0         1         2         3
 7  1.904239 -0.889455  0.144801 -0.596677
 8 -2.371958  0.984856 -0.204652  0.242562
 9  1.618237 -0.062098 -0.515375  0.332189]

In [108]:
pd.concat(pieces)

Unnamed: 0,0,1,2,3
0,-0.700626,0.052017,0.522772,0.627879
1,-0.008108,-0.342504,0.528208,-1.144252
2,0.808317,1.666355,1.240376,-0.064106
3,0.288669,0.699611,-1.156847,-0.500837
4,0.679061,0.101091,2.0632,2.405293
5,-0.084759,-0.393185,-1.316562,-1.108606
6,-0.383322,-0.243338,-0.79788,-1.59958
7,1.904239,-0.889455,0.144801,-0.596677
8,-2.371958,0.984856,-0.204652,0.242562
9,1.618237,-0.062098,-0.515375,0.332189


In [109]:
df = pd.DataFrame(np.random.randn(8,4), columns=['A', 'B', 'C', 'D'])

In [110]:
df

Unnamed: 0,A,B,C,D
0,-1.183525,0.348037,0.145489,1.399147
1,-1.402336,0.617362,1.522548,-0.439809
2,-0.554423,-1.352599,0.51044,-1.079695
3,0.777236,0.353661,-1.228728,0.38173
4,0.567574,0.4945,0.558873,-1.673652
5,1.128059,-0.301401,1.149936,0.295041
6,-0.484048,1.606068,-1.465498,0.359282
7,-1.536716,0.294077,1.251234,-1.044281


In [111]:
s = df.iloc[3]
s

A    0.777236
B    0.353661
C   -1.228728
D    0.381730
Name: 3, dtype: float64

In [112]:
df.append(s, ignore_index=True)

Unnamed: 0,A,B,C,D
0,-1.183525,0.348037,0.145489,1.399147
1,-1.402336,0.617362,1.522548,-0.439809
2,-0.554423,-1.352599,0.51044,-1.079695
3,0.777236,0.353661,-1.228728,0.38173
4,0.567574,0.4945,0.558873,-1.673652
5,1.128059,-0.301401,1.149936,0.295041
6,-0.484048,1.606068,-1.465498,0.359282
7,-1.536716,0.294077,1.251234,-1.044281
8,0.777236,0.353661,-1.228728,0.38173


In [113]:
df = pd.DataFrame( {
        'A': ['foo', 'bar', 'foo', 'bar',
              'foo', 'bar', 'foo', 'foo'],
        'B': ['one', 'one', 'two', 'three',
              'two', 'two', 'one', 'three'],
        'C': np.random.randn(8),
        'D': np.random.randn(8)
    })
df

Unnamed: 0,A,B,C,D
0,foo,one,1.732998,-0.400479
1,bar,one,-1.294822,-0.505049
2,foo,two,-0.787982,1.168336
3,bar,three,0.155048,0.965374
4,foo,two,-0.042929,-0.983226
5,bar,two,-0.121534,0.854191
6,foo,one,-0.506143,-0.711523
7,foo,three,-1.232048,-0.258396


In [114]:
df.groupby('A').sum()

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,-1.261308,1.314516
foo,-0.836105,-1.185289


In [115]:
df.groupby(['A', 'B']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,-1.294822,-0.505049
bar,three,0.155048,0.965374
bar,two,-0.121534,0.854191
foo,one,1.226855,-1.112002
foo,three,-1.232048,-0.258396
foo,two,-0.830911,0.18511


In [116]:
tuples = list(zip(*[['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
                    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]))
tuples

[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

In [117]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8,2), index=index, columns=['A', 'B'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.066585,0.18276
bar,two,0.114205,-0.143403
baz,one,0.086324,-1.023093
baz,two,1.659871,-0.231346
foo,one,-0.219273,-1.715725
foo,two,0.065758,0.725521
qux,one,2.287742,-1.04645
qux,two,0.246487,1.687499


In [118]:
df2 = df[:4]
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.066585,0.18276
bar,two,0.114205,-0.143403
baz,one,0.086324,-1.023093
baz,two,1.659871,-0.231346


In [119]:
stacked = df2.stack()
stacked

first  second   
bar    one     A    0.066585
               B    0.182760
       two     A    0.114205
               B   -0.143403
baz    one     A    0.086324
               B   -1.023093
       two     A    1.659871
               B   -0.231346
dtype: float64

In [120]:
stacked.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.066585,0.18276
bar,two,0.114205,-0.143403
baz,one,0.086324,-1.023093
baz,two,1.659871,-0.231346


In [121]:
stacked.unstack(0)

Unnamed: 0_level_0,first,bar,baz
second,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,0.066585,0.086324
one,B,0.18276,-1.023093
two,A,0.114205,1.659871
two,B,-0.143403,-0.231346


In [122]:
df = pd.DataFrame( {
        'A' : ['one', 'one', 'two', 'three'] * 3,
        'B' : ['A', 'B', 'C'] * 4,
        'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
        'D' : np.random.randn(12),
        'E' : np.random.randn(12)
    })
df

Unnamed: 0,A,B,C,D,E
0,one,A,foo,-0.318187,-0.233583
1,one,B,foo,0.621465,1.353752
2,two,C,foo,0.21772,-0.378379
3,three,A,bar,-1.571943,-0.526865
4,one,B,bar,-0.326984,-0.992293
5,one,C,bar,-1.054827,-2.121914
6,two,A,foo,-1.001647,0.268391
7,three,B,foo,-1.881123,-1.150764
8,one,C,foo,1.633913,0.958695
9,one,A,bar,1.609164,-0.099815
