# Pandas
**Pandas**는 **Python** 프로그래밍 언어를 위한 고성능의 사용하기 쉬운 데이터 구조 및 데이터 분석 도구를 제공하는 오픈 소스, BSD 라이선스 라이브러리입니다.

라이브러리 설명서 : http://pandas.pydata.org/

In [1]:
import numpy as np
import pandas as pd

s = pd.Series([1,3,5,np.nan,6,8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [4]:
dates = pd.date_range('2019-03-25', periods=6)
df = pd.DataFrame(data=np.random.randn(6,4), index=dates, columns=list('ABCD') )
df

Unnamed: 0,A,B,C,D
2019-03-25,-1.483572,0.152796,1.541834,0.603256
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269
2019-03-27,0.461545,-0.682124,0.347189,1.377008
2019-03-28,0.788202,0.611103,0.718637,0.018182
2019-03-29,1.596652,-0.514131,-0.450169,0.58777
2019-03-30,0.629764,-0.318142,-1.285447,0.648359


In [6]:
df2 = pd.DataFrame(
    { 'A':1,
      'B':pd.Timestamp('20190325'),
      'C':pd.Series(1,index=range(4), dtype='float32'),
      'D':np.array([3] * 4, dtype='int32'),
      'E': 'foo'
    }
)
df2

Unnamed: 0,A,B,C,D,E
0,1,2019-03-25,1.0,3,foo
1,1,2019-03-25,1.0,3,foo
2,1,2019-03-25,1.0,3,foo
3,1,2019-03-25,1.0,3,foo


In [7]:
df2.dtypes

A             int64
B    datetime64[ns]
C           float32
D             int32
E            object
dtype: object

In [8]:
df2.head(3)

Unnamed: 0,A,B,C,D,E
0,1,2019-03-25,1.0,3,foo
1,1,2019-03-25,1.0,3,foo
2,1,2019-03-25,1.0,3,foo


In [9]:
df.head(3)

Unnamed: 0,A,B,C,D
2019-03-25,-1.483572,0.152796,1.541834,0.603256
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269
2019-03-27,0.461545,-0.682124,0.347189,1.377008


In [10]:
df.index

DatetimeIndex(['2019-03-25', '2019-03-26', '2019-03-27', '2019-03-28',
               '2019-03-29', '2019-03-30'],
              dtype='datetime64[ns]', freq='D')

In [11]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [12]:
df.values

array([[-1.48357169,  0.15279621,  1.54183381,  0.60325558],
       [-0.45187957,  0.64718612, -0.23012938, -0.17226885],
       [ 0.46154529, -0.6821244 ,  0.34718932,  1.37700789],
       [ 0.7882023 ,  0.61110269,  0.71863672,  0.01818206],
       [ 1.5966523 , -0.51413105, -0.45016878,  0.58777028],
       [ 0.62976407, -0.31814244, -1.28544703,  0.64835932]])

In [13]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.256785,-0.017219,0.106986,0.510384
std,1.07608,0.573533,0.985245,0.546437
min,-1.483572,-0.682124,-1.285447,-0.172269
25%,-0.223523,-0.465134,-0.395159,0.160579
50%,0.545655,-0.082673,0.05853,0.595513
75%,0.748593,0.496526,0.625775,0.637083
max,1.596652,0.647186,1.541834,1.377008


In [14]:
df.T

Unnamed: 0,2019-03-25 00:00:00,2019-03-26 00:00:00,2019-03-27 00:00:00,2019-03-28 00:00:00,2019-03-29 00:00:00,2019-03-30 00:00:00
A,-1.483572,-0.45188,0.461545,0.788202,1.596652,0.629764
B,0.152796,0.647186,-0.682124,0.611103,-0.514131,-0.318142
C,1.541834,-0.230129,0.347189,0.718637,-0.450169,-1.285447
D,0.603256,-0.172269,1.377008,0.018182,0.58777,0.648359


In [16]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2019-03-25,0.603256,1.541834,0.152796,-1.483572
2019-03-26,-0.172269,-0.230129,0.647186,-0.45188
2019-03-27,1.377008,0.347189,-0.682124,0.461545
2019-03-28,0.018182,0.718637,0.611103,0.788202
2019-03-29,0.58777,-0.450169,-0.514131,1.596652
2019-03-30,0.648359,-1.285447,-0.318142,0.629764


In [17]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2019-03-27,0.461545,-0.682124,0.347189,1.377008
2019-03-29,1.596652,-0.514131,-0.450169,0.58777
2019-03-30,0.629764,-0.318142,-1.285447,0.648359
2019-03-25,-1.483572,0.152796,1.541834,0.603256
2019-03-28,0.788202,0.611103,0.718637,0.018182
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269


## Selection

In [18]:
df['A']

2019-03-25   -1.483572
2019-03-26   -0.451880
2019-03-27    0.461545
2019-03-28    0.788202
2019-03-29    1.596652
2019-03-30    0.629764
Freq: D, Name: A, dtype: float64

In [19]:
df.A

2019-03-25   -1.483572
2019-03-26   -0.451880
2019-03-27    0.461545
2019-03-28    0.788202
2019-03-29    1.596652
2019-03-30    0.629764
Freq: D, Name: A, dtype: float64

In [20]:
df[0:3]

Unnamed: 0,A,B,C,D
2019-03-25,-1.483572,0.152796,1.541834,0.603256
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269
2019-03-27,0.461545,-0.682124,0.347189,1.377008


In [23]:
df['20190326':'20190328']

Unnamed: 0,A,B,C,D
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269
2019-03-27,0.461545,-0.682124,0.347189,1.377008
2019-03-28,0.788202,0.611103,0.718637,0.018182


In [24]:
df.loc[dates[0]]

A   -1.483572
B    0.152796
C    1.541834
D    0.603256
Name: 2019-03-25 00:00:00, dtype: float64

In [25]:
df.loc[dates[0],'A']

-1.4835716856492422

In [26]:
df.iloc[3]

A    0.788202
B    0.611103
C    0.718637
D    0.018182
Name: 2019-03-28 00:00:00, dtype: float64

In [27]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2019-03-28,0.788202,0.611103
2019-03-29,1.596652,-0.514131


In [28]:
df

Unnamed: 0,A,B,C,D
2019-03-25,-1.483572,0.152796,1.541834,0.603256
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269
2019-03-27,0.461545,-0.682124,0.347189,1.377008
2019-03-28,0.788202,0.611103,0.718637,0.018182
2019-03-29,1.596652,-0.514131,-0.450169,0.58777
2019-03-30,0.629764,-0.318142,-1.285447,0.648359


In [29]:
df[df.A > 0]

Unnamed: 0,A,B,C,D
2019-03-27,0.461545,-0.682124,0.347189,1.377008
2019-03-28,0.788202,0.611103,0.718637,0.018182
2019-03-29,1.596652,-0.514131,-0.450169,0.58777
2019-03-30,0.629764,-0.318142,-1.285447,0.648359


In [31]:
df[df > 0]

Unnamed: 0,A,B,C,D
2019-03-25,,0.152796,1.541834,0.603256
2019-03-26,,0.647186,,
2019-03-27,0.461545,,0.347189,1.377008
2019-03-28,0.788202,0.611103,0.718637,0.018182
2019-03-29,1.596652,,,0.58777
2019-03-30,0.629764,,,0.648359


In [34]:
df3 = df.copy()
df3['E'] = ['one', 'two', 'three', 'four', 'five', 'six']
df3[df3['E'].isin(['two','four'])]

Unnamed: 0,A,B,C,D,E
2019-03-26,-0.45188,0.647186,-0.230129,-0.172269,two
2019-03-28,0.788202,0.611103,0.718637,0.018182,four


In [35]:
df.at[dates[0],'A'] = 0
df.iat[0,1]=0
df.loc[:,'D'] = np.array([5] * len(df))
df

Unnamed: 0,A,B,C,D
2019-03-25,0.0,0.0,1.541834,5
2019-03-26,-0.45188,0.647186,-0.230129,5
2019-03-27,0.461545,-0.682124,0.347189,5
2019-03-28,0.788202,0.611103,0.718637,5
2019-03-29,1.596652,-0.514131,-0.450169,5
2019-03-30,0.629764,-0.318142,-1.285447,5


In [18]:
import numpy as np
X = np.matrix([[1,2,3], [4,5,6], [4,5,6], [4,5,6], [4,5,6]])
X = X.T
X.shape

(3, 5)

In [22]:
y = np.matrix([[1,1,0,0,1]])
y=y.T
y.shape

(5, 1)

In [23]:
theta  = np.matrix([[1,1,1]])
theta = theta.T
theta.shape

(3, 1)

In [24]:
np.power(((X.T*theta) - y),2)

matrix([[ 25],
        [196],
        [225],
        [225],
        [196]], dtype=int32)

In [26]:
len(y)

5

In [15]:
import numpy as np
A = np.matrix([[1,2,3,], [4,5,6]])
B = np.matrix([[1,2], [3,4], [5,6]])

In [16]:
A.T + B

matrix([[ 2,  6],
        [ 5,  9],
        [ 8, 12]])

In [17]:
B * A

matrix([[ 9, 12, 15],
        [19, 26, 33],
        [29, 40, 51]])

In [18]:
A + B

ValueError: operands could not be broadcast together with shapes (2,3) (3,2) 

In [19]:
A * B

matrix([[22, 28],
        [49, 64]])

In [22]:
A = np.reshape(np.arange(100), (10,10))

In [25]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
v = np.zeros(len(x))

In [31]:
v = np.matrix(A) * np.matrix(x).T

In [35]:
for i in range(10):
    for j in range(10):
        v[i] = v[i] + A[i,j]*x[j]
v

matrix([[ 570],
        [1470],
        [2370],
        [3270],
        [4170],
        [5070],
        [5970],
        [6870],
        [7770],
        [8670]])