# Pandas - Week 5

***

## Re-sampling pada data deret waktu (time series data)

### 1. Import modules

In [1]:
import pandas as pd
import numpy as np

### 2. Persiapan Data Frame

In [4]:
n_rows = 365 * 24 #merepresentasikan hari dan jam
n_cols = 2
cols =  ['col1', 'col2']

df = pd.DataFrame(np.random.randint(1, 20, size=(n_rows, n_cols)), columns=cols)

df.index = pd.util.testing.makeDateIndex(n_rows, freq='H') #mengganti index menjadi bentuk date dengan frekuensi jam.
df

  import pandas.util.testing


Unnamed: 0,col1,col2
2000-01-01 00:00:00,14,8
2000-01-01 01:00:00,3,8
2000-01-01 02:00:00,5,9
2000-01-01 03:00:00,17,17
2000-01-01 04:00:00,2,1
...,...,...
2000-12-30 19:00:00,3,1
2000-12-30 20:00:00,7,19
2000-12-30 21:00:00,3,6
2000-12-30 22:00:00,14,19


### 3. Re-sampling data dengan interval monthly (perbulan)

In [6]:
df.resample('M')['col1'].sum().to_frame() #Penggunaan to_frame() untuk menghasilkan output yang lebih mudah dibaca.

Unnamed: 0,col1
2000-01-31,7523
2000-02-29,7038
2000-03-31,7569
2000-04-30,7179
2000-05-31,6885
2000-06-30,7218
2000-07-31,7289
2000-08-31,7166
2000-09-30,7227
2000-10-31,7436


Resampling data dapat dilakukan dengan method `resample()`. Dalam kasus diatas, akan dilakukan resampling terhadap kolom 1 dengan interval perbulan / monthly.

### 4. Re-sampling data dengan interval daily

In [7]:
df.resample('D')['col1'].sum().to_frame()

Unnamed: 0,col1
2000-01-01,217
2000-01-02,275
2000-01-03,217
2000-01-04,195
2000-01-05,213
...,...
2000-12-26,234
2000-12-27,215
2000-12-28,252
2000-12-29,255


Untuk melakukan resampling dengan interval daily, cukup ganti parameter pertama dari method `resample()` dengan `D`.

***

## Membentuk dummy Data Frame

### 1. Import modules

In [8]:
import pandas as pd
import numpy as np

### 2. Membentuk Data Frame dari Dictionary

In [10]:
pd.DataFrame({'col1':[1, 2, 3, 4], 
            'col2':[5, 6, 7, 8]})

Unnamed: 0,col1,col2
0,1,5
1,2,6
2,3,7
3,4,8


### 3. Membentuk Data Frame dari Numpy Array

In [11]:
n_rows = 5
n_cols = 3

arr = np.random.randint(1, 20, size=(n_rows, n_cols)) #untuk size, dispesifikasikan jumlah baris lalu kolom.
arr

array([[15, 16, 11],
       [ 6, 15,  4],
       [14,  8,  2],
       [12,  3,  9],
       [ 3, 19, 17]])

In [12]:
pd.DataFrame(arr, columns=tuple('ABC'))

Unnamed: 0,A,B,C
0,15,16,11
1,6,15,4
2,14,8,2
3,12,3,9
4,3,19,17


### 4. Membentuk Data Frame dengan memanfaatkan `pandas.util.testing`

In [13]:
pd.util.testing.makeDataFrame().head()

Unnamed: 0,A,B,C,D
hK7tNc2xBY,-0.668787,0.03769,-1.028029,2.110742
6Wb8tAk7X2,-1.410519,0.371529,1.575606,0.863319
wGc8RT609J,-0.261254,-0.602149,-1.781854,0.703448
KncOdCpvlS,-0.83406,-0.708964,0.801738,-1.457584
EpAgWD0tto,0.265344,0.145769,0.302411,-1.927308


In [14]:
pd.util.testing.makeMixedDataFrame().head()

Unnamed: 0,A,B,C,D
0,0.0,0.0,foo1,2009-01-01
1,1.0,1.0,foo2,2009-01-02
2,2.0,0.0,foo3,2009-01-05
3,3.0,1.0,foo4,2009-01-06
4,4.0,0.0,foo5,2009-01-07


In [15]:
pd.util.testing.makeTimeDataFrame().head()

Unnamed: 0,A,B,C,D
2000-01-03,-0.413797,0.286358,-0.699941,0.566653
2000-01-04,0.2198,-1.23656,0.595615,0.622749
2000-01-05,2.554396,-0.739455,0.253961,-1.962896
2000-01-06,0.540103,-0.126271,0.628143,0.331385
2000-01-07,-0.218356,-0.801479,-0.02294,0.071158


In [17]:
pd.util.testing.makeMissingDataframe().head()

Unnamed: 0,A,B,C,D
ZCj7xLdPis,-0.726671,-0.57212,-0.941654,-0.681205
iKzgiwXkPa,-0.118115,-0.302482,0.297718,0.956102
H54ZDeROWF,-0.58272,-0.273962,-0.090109,
as9TZJC0eq,-1.294293,0.359205,0.595162,1.730785
TycrXplx2E,-0.610943,0.699148,-0.90837,0.559663


Ada beberapa method yang bisa dipakai di `pandas.util.testing`:
* `makeDataFrame()` - untuk membuat data frame dummy biasa.
* `makeMixedDataFrame()` - membuat data frame dengan tipe data campuran.
* `makeTimeDataFrame()` - membuat data frame dengan tipe data time.
* `makeMissingDataframe()` - membuat data frame yang memiliki missing values.