### Object Series

In pandas module, there is two object, series and data frame. Object Series just have single dimension data, in other words doesn't have column number but only have index

In [None]:
import pandas as pd
import numpy as np

We can use list data type and changing it to series

In [None]:
#changing list to series
data= [0.5, 1, 2, 4]
data= pd.Series(data)
data

0    0.5
1    1.0
2    2.0
3    4.0
dtype: float64

In [None]:
data.values

array([0.5, 1. , 2. , 4. ])

How to display values

How to display index

In [None]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [None]:
#redefine index

We can redefining the index, this called explicit index. The total index must same with total data.

In [None]:
data= pd.Series([0.25])

### Selecting

call the data

In [None]:
#indeks eksplisit
data['a']

0.25

above is the data selection

even though we have created an explicit index, we can still call the implicit index

In [None]:
#indeks implisit
data[3]

1.0

when the implicit index and the explicit index are the same.. when we call the data, it will only rely on the explicit index

In [None]:
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index=[2, 5, 3, 7])
data_2

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [None]:
#explicitly called
data_2[2]

0.25

In [None]:
data_2[0]
#error, there is no index 0 in the explicit index

KeyError: ignored

we will try to do data slicing

In [None]:
data = pd.Series([0.25, 0.50, 0.75, 1], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

for example we will call from data b to data c

In [None]:
data['b':'c'] #indeks eksplisit

b    0.50
c    0.75
dtype: float64

but if we slicing the implicit index, then only the starting point will appear.. because the implicit index is a range

In [None]:
data[1:2]

b    0.5
dtype: float64

In [None]:
data[:2]

a    0.25
b    0.50
dtype: float64

#loc dan iloc

location dan index location

In [None]:
import pandas as pd
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index=[2, 5, 3, 7])
data_2

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

when we try to call one index, the result is the explicit index

In [None]:
data_2[2]

0.25

when we try to call index 1 to 2, the result is the implicit index for 5

In [None]:
data_2[1:2] #indeks implisit : slicing

5    0.5
dtype: float64

this might cause inconsistency, thats why we utilize loc and iloc. loc is used to call the explicit index while iloc is used to call the implicit index

In [None]:
data_2

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [None]:
#loc
data_2.loc[3] #selecting indeks eksplisit

0.75

In [None]:
data_2.loc[2:3]

2    0.25
5    0.50
3    0.75
dtype: float64

In [None]:
#iloc
data_2.iloc[3] #selecting indeks implisit

1.0

In [None]:
data_2.iloc[1:3]

5    0.50
3    0.75
dtype: float64

**Dataframe**

Data Frame is a collection of series, with at least one series.

In [None]:
dict_populasi = {'Jakarta':750,
                 'Bogor':490,
                 'Depok':350,
                 'Tanggerang':270,
                 'Bekasi':670}

In [None]:
dict_populasi

{'Bekasi': 670, 'Bogor': 490, 'Depok': 350, 'Jakarta': 750, 'Tanggerang': 270}

In [None]:
#transform dictionary to series
populasi = pd.Series(dict_populasi)

In [None]:
populasi

Jakarta       750
Bogor         490
Depok         350
Tanggerang    270
Bekasi        670
dtype: int64

In [None]:
dict_luas = {'Jakarta':737,
             'Bogor':325,
             'Depok':247,
             'Tanggerang':302,
             'Bekasi':355}

In [None]:
luas = pd.Series(dict_luas)

In [None]:
luas

Jakarta       737
Bogor         325
Depok         247
Tanggerang    302
Bekasi        355
dtype: int64

In [None]:
daerah = pd.DataFrame({'pop':populasi, 'luas':luas})

In [None]:
daerah

Unnamed: 0,pop,luas
Jakarta,750,737
Bogor,490,325
Depok,350,247
Tanggerang,270,302
Bekasi,670,355


In [None]:
#add new column

daerah['pop_per_area']=daerah['pop']/daerah['luas']

In [None]:
daerah

Unnamed: 0,pop,luas,pop_per_area
Jakarta,750,737,1.017639
Bogor,490,325,1.507692
Depok,350,247,1.417004
Tanggerang,270,302,0.89404
Bekasi,670,355,1.887324


In [None]:
#add new line

daerah_tambahan=pd.DataFrame({'Bandung':[151, 148,0.18]})

In [None]:
daerah_tambahan

Unnamed: 0,Bandung
0,151.0
1,148.0
2,0.18


In [None]:
daerah_tambahan=daerah_tambahan.T

In [None]:
daerah_tambahan

Unnamed: 0,0,1,2
Bandung,151.0,148.0,0.18


In [None]:
daerah_tambahan.columns=daerah.columns

In [None]:
daerah_tambahan

Unnamed: 0,pop,luas,pop_per_area
Bandung,151.0,148.0,0.18


In [None]:
#combine daerah and daerah_tambahan data with concat

pd.concat([daerah, daerah_tambahan])

Unnamed: 0,pop,luas,pop_per_area
Jakarta,750.0,737.0,1.017639
Bogor,490.0,325.0,1.507692
Depok,350.0,247.0,1.417004
Tanggerang,270.0,302.0,0.89404
Bekasi,670.0,355.0,1.887324
Bandung,151.0,148.0,0.18
