### Import Modul

In [1]:
import numpy as np
import pandas as pd

### Series

```s = pd.Series(data, index=index)```
`data` dapat beberapa macam diataranya :
- sebuah Python dictionary
- sebuah ndarray
- sebuah nilai scalar (seperti 5)

##### data dari ndarray

In [3]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a    0.715224
b    0.767742
c   -0.291969
d    0.798352
e    1.117603
dtype: float64

In [4]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [5]:
pd.Series(np.random.randn(5))

0   -1.091094
1   -1.290440
2   -0.475149
3   -0.654700
4   -1.127114
dtype: float64

#### Data dari dictionary

In [6]:
d = {'b' : 1, 'a' : 0, 'c' : 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

In [7]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [8]:
pd.Series(d, index=['b', 'c', 'd', 'a'])

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

#### Data dari nilai scalar

In [9]:
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

#### Series is ndarray-like

In [10]:
s[0]

0.7152237168723361

In [11]:
s[:3]

a    0.715224
b    0.767742
c   -0.291969
dtype: float64

In [12]:
s[s > s.median()]

d    0.798352
e    1.117603
dtype: float64

In [13]:
s[[4, 3, 1]]

e    1.117603
d    0.798352
b    0.767742
dtype: float64

In [14]:
np.exp(s)

a    2.044644
b    2.154894
c    0.746792
d    2.221877
e    3.057515
dtype: float64

#### Series is dict-like

In [16]:
s['a']

0.7152237168723361

In [15]:
s['e'] = 12.
s

a     0.715224
b     0.767742
c    -0.291969
d     0.798352
e    12.000000
dtype: float64

In [17]:
'e' in s

True

In [18]:
'f' in s

False

jika label tidak mengandung karakter yang dimaksud akan muncul error

In [19]:
s['f']

KeyError: 'f'

jika tidak ingin mendapatkan error tapi digantikan dengan None maka bisa menggunakan method `get`

In [20]:
s.get('f')
s.get('f', np.nan)

nan

#### Operasi vektor pelabelan menggunakan series

In [21]:
s + s

a     1.430447
b     1.535483
c    -0.583937
d     1.596704
e    24.000000
dtype: float64

In [22]:
s * 2

a     1.430447
b     1.535483
c    -0.583937
d     1.596704
e    24.000000
dtype: float64

In [23]:
np.exp(s)

a         2.044644
b         2.154894
c         0.746792
d         2.221877
e    162754.791419
dtype: float64

In [24]:
s[1:] + s[:-1]

a         NaN
b    1.535483
c   -0.583937
d    1.596704
e         NaN
dtype: float64

#### Atribut Nama

In [25]:
s = pd.Series(np.random.randn(5), name='something')
s

0    0.001918
1   -1.740851
2    0.543277
3    1.772499
4    0.204512
Name: something, dtype: float64

In [26]:
s.name

'something'

mengubah nama atribut series

In [27]:
s2 = s.rename("different")
s2.name

'different'

### DataFrame
DataFrame adalah struktur data berlabel 2 dimensi dengan kolom jenis yang berpotensi berbeda. Seperti series, DataFrame menerima berbagai macam input:

- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- Sebuah Series
- dari DataFrame yang lain

#### Data dari Dict series atau Dict

In [28]:
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [29]:
pd.DataFrame(d, index=['d', 'b', 'a'])

Unnamed: 0,one,two
d,,4.0
b,2.0,2.0
a,1.0,1.0


In [30]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

Unnamed: 0,two,three
d,4.0,
b,2.0,
a,1.0,


In [31]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [32]:
df.columns

Index(['one', 'two'], dtype='object')

#### Data dari Dict ndarrays / List

In [33]:
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}

In [34]:
pd.DataFrame(d)
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0


#### Data dari Record Array Terstruktur

In [36]:
data = np.zeros((2,), dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])
data[:] = [(1,2.,'Hello'), (2,3.,"World")]
pd.DataFrame(data)

Unnamed: 0,A,B,C
0,1,2.0,b'Hello'
1,2,3.0,b'World'


In [37]:
pd.DataFrame(data, index=['first', 'second'])

Unnamed: 0,A,B,C
first,1,2.0,b'Hello'
second,2,3.0,b'World'


In [38]:
pd.DataFrame(data, columns=['C', 'A', 'B'])

Unnamed: 0,C,A,B
0,b'Hello',1,2.0
1,b'World',2,3.0


#### Dari sebuah list dictionary

In [39]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [40]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [41]:
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


#### Dari sebuah Dictionary Tuple

In [42]:
pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
              ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
              ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
              ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
              ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

Unnamed: 0_level_0,Unnamed: 1_level_0,a,a,a,b,b
Unnamed: 0_level_1,Unnamed: 1_level_1,b,a,c,a,b
A,B,1.0,4.0,5.0,8.0,10.0
A,C,2.0,3.0,6.0,7.0,
A,D,,,,,9.0


#### Dari sebuah Series
Hasilnya akan menjadi DataFrame dengan indeks yang sama dengan Series input, dan dengan satu kolom yang namanya adalah nama asli dari Series (hanya jika tidak ada nama kolom lain yang disediakan).

#### Konstruktor Alternatif
`DataFrame.from_dict` mengambil dict daro beberapa dict atau dict dari sebuah array sequences dan mengembalikan DataFrame

In [43]:
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [44]:
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),
                       orient='index', columns=['one', 'two', 'three'])

Unnamed: 0,one,two,three
A,1,2,3
B,4,5,6


`DataFrame.from_records` mengambil daftar tupel atau ndarray dengan dtype terstruktur.

In [45]:
data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

In [46]:
pd.DataFrame.from_records(data, index='C')

Unnamed: 0_level_0,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1
b'Hello',1,2.0
b'World',2,3.0


#### Pemilihan kolom, penambahan, penghapusan

In [47]:
df['one']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

In [48]:
df['three'] = df['one'] * df['two']
df['flag'] = df['one'] > 2
df

Unnamed: 0,one,two,three,flag
a,1.0,1.0,1.0,False
b,2.0,2.0,4.0,False
c,3.0,3.0,9.0,True
d,,4.0,,False


Kolom dapat dihapus atau di-pop seperti dengan cara:

In [49]:
del df['two']
three = df.pop('three')
df

Unnamed: 0,one,flag
a,1.0,False
b,2.0,False
c,3.0,True
d,,False


Ketika menggunakan nilai skalar itu secara alami akan disimpan untuk mengisi kolom:

In [51]:
df['foo'] = 'bar'
df

Unnamed: 0,one,flag,foo
a,1.0,False,bar
b,2.0,False,bar
c,3.0,True,bar
d,,False,bar


In [52]:
df['one_trunc'] = df['one'][:2]
df

Unnamed: 0,one,flag,foo,one_trunc
a,1.0,False,bar,1.0
b,2.0,False,bar,2.0
c,3.0,True,bar,
d,,False,bar,


In [53]:
df.insert(1, 'bar', df['one'])
df

Unnamed: 0,one,bar,flag,foo,one_trunc
a,1.0,1.0,False,bar,1.0
b,2.0,2.0,False,bar,2.0
c,3.0,3.0,True,bar,
d,,,False,bar,


#### Indexing / Selection

| Operation                      | Syntax          | Result     |
|--------------------------------|-----------------|------------|
| Select column	                 | `df[col]`       | Series     |
| Select row by label            | `df.loc[label]` | Series     |
| Select row by integer location | `df.iloc[loc]`  | Series     |
| Slice rows                     | `df[5:10]`      | DataFrame  |
| Select rows by boolean vector  | `df[bool_vec]`  | DataFrame  |

Seleksi baris, misalnya, mengembalikan Series yang indeksnya adalah kolom dari DataFrame:

In [55]:
df.loc['b']

one              2
bar              2
flag         False
foo            bar
one_trunc        2
Name: b, dtype: object

In [56]:
df.iloc[2]

one             3
bar             3
flag         True
foo           bar
one_trunc     NaN
Name: c, dtype: object

#### Penjajaran Data dan Aritmatika

In [57]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
df + df2

Unnamed: 0,A,B,C,D
0,1.640102,-1.579391,1.438475,
1,-0.528026,2.029622,1.30588,
2,-1.234257,-0.205642,-1.498467,
3,-1.065475,-0.793135,0.28145,
4,-0.891015,-1.316124,-0.722498,
5,-0.356689,-0.818944,0.338891,
6,1.174099,-1.276321,-0.070123,
7,,,,
8,,,,
9,,,,


In [58]:
df - df.iloc[0]

Unnamed: 0,A,B,C,D
0,0.0,0.0,0.0,0.0
1,-1.773932,1.230308,-0.928119,-0.73098
2,-1.052575,-1.411526,-1.79171,-0.123784
3,-1.318154,0.077537,-0.230048,1.063831
4,-1.626764,0.590126,-1.445795,0.433144
5,-2.068756,-0.43494,-0.70587,0.260809
6,0.707491,-0.52735,-1.390303,-0.795887
7,0.35441,-0.39386,0.32388,-1.552373
8,-1.149997,-0.029797,0.007593,0.496321
9,-1.836038,0.677921,0.429839,-0.229944


In [59]:
index = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=list('ABC'))
df

Unnamed: 0,A,B,C
2000-01-01,0.717362,-0.47378,0.726707
2000-01-02,1.349179,-0.934736,0.808596
2000-01-03,0.236053,0.100838,-1.81525
2000-01-04,0.015168,-0.103056,-0.71237
2000-01-05,-0.080777,1.614723,0.180478
2000-01-06,-0.10004,-0.152784,0.616512
2000-01-07,-0.079604,-0.171881,0.045603
2000-01-08,0.047679,0.227655,0.244437


In [60]:
type(df['A'])

pandas.core.series.Series

In [61]:
df - df['A']

Unnamed: 0,2000-01-01 00:00:00,2000-01-02 00:00:00,2000-01-03 00:00:00,2000-01-04 00:00:00,2000-01-05 00:00:00,2000-01-06 00:00:00,2000-01-07 00:00:00,2000-01-08 00:00:00,A,B,C
2000-01-01,,,,,,,,,,,
2000-01-02,,,,,,,,,,,
2000-01-03,,,,,,,,,,,
2000-01-04,,,,,,,,,,,
2000-01-05,,,,,,,,,,,
2000-01-06,,,,,,,,,,,
2000-01-07,,,,,,,,,,,
2000-01-08,,,,,,,,,,,


Operasi dengan skalar sama seperti biasanya:

In [62]:
df * 5 + 2

Unnamed: 0,A,B,C
2000-01-01,5.586809,-0.368902,5.633534
2000-01-02,8.745897,-2.673678,6.04298
2000-01-03,3.180263,2.504188,-7.076248
2000-01-04,2.075841,1.484719,-1.561849
2000-01-05,1.596116,10.073616,2.902391
2000-01-06,1.499799,1.236082,5.082561
2000-01-07,1.601979,1.140596,2.228017
2000-01-08,2.238395,3.138274,3.222187


In [63]:
1 / df

Unnamed: 0,A,B,C
2000-01-01,1.393997,-2.110683,1.376071
2000-01-02,0.741191,-1.069821,1.236711
2000-01-03,4.236345,9.91694,-0.550888
2000-01-04,65.927143,-9.703452,-1.403765
2000-01-05,-12.379806,0.619301,5.540834
2000-01-06,-9.995981,-6.545201,1.622028
2000-01-07,-12.562138,-5.817986,21.928234
2000-01-08,20.973586,4.392615,4.091027


In [64]:
df ** 4

Unnamed: 0,A,B,C
2000-01-01,0.2648213,0.050386,0.278893
2000-01-02,3.313437,0.763405,0.427491
2000-01-03,0.003104808,0.000103,10.857887
2000-01-04,5.293501e-08,0.000113,0.257527
2000-01-05,4.257402e-05,6.798176,0.001061
2000-01-06,0.0001001609,0.000545,0.144466
2000-01-07,4.015557e-05,0.000873,4e-06
2000-01-08,5.167842e-06,0.002686,0.00357


Operator Boolean juga berfungsi:

In [65]:
df1 = pd.DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 1] }, dtype=bool)
df2 = pd.DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)
df1 & df2

Unnamed: 0,a,b
0,False,False
1,False,True
2,True,False


In [66]:
df1 | df2

Unnamed: 0,a,b
0,True,True
1,True,True
2,True,True


In [67]:
df1 ^ df2

Unnamed: 0,a,b
0,True,True
1,True,False
2,False,True


In [68]:
-df1

Unnamed: 0,a,b
0,False,True
1,True,False
2,False,False
