<a href="https://colab.research.google.com/github/nurimammasri/Wooky-Pandas/blob/master/08.%20Grouping%20Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction**

`Grouping` adalah suatu teknik memisahkan data berdasarkan kriteria tertentu. untuk melakukan grouping di pandas kita menggunakan suatu method yang bernama groupby(). definisi abstraknya dari grouping adalah kita melakukan mapping suatu data terhadap suatu group.

ketika melakukan grouping ada beberapa proses yang terjadi secara berurutan, yaitu :

1. **Splitting** : Memisahkan data kedalam suatu group berdasarkan kriteria tertentu.
2. **Applying** : Melakukan suatu operasi terhadap sekumpulan data di group-group tersebut.
3. **Combining** : Menggabungkan data menjadi suatu struktur baru

```
df.groupby(by=None, axis=0, level=None, as_index: bool=True, sort: bool=True, group_keys: bool=True, squeeze: bool=False, observed: bool=False)`
```
## **Parameters**

* `by` : mapping, function, label, or list of labels

    Used to determine the groups for the groupby.

    If ``by`` is a function, it's called on each value of the object's index. 
    
    If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). 

    If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in ``self``. 
    
    Notice that a tuple is interpreted as a (single) key.

* `axis` : {0 or 'index', 1 or 'columns'}, default 0

    Split along rows (0) or columns (1).

* `level` : int, level name, or sequence of such, default None

    If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

* `as_index` : bool, default True

    For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively "SQL-style" grouped output.

* `sort` : bool, default True

    Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

* `group_keys` : bool, default True

    When calling apply, add group keys to index to identify pieces.

* `squeeze` : bool, default False

    Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

* `observed` : bool, default False

    This only applies if any of the groupers are Categoricals.

    If True: only show observed values for categorical groupers.

    If False: show all values for categorical groupers.

```
df.groupby('class')

df.groupby('class').mean()

df.groupby(['class1','class2']).mean()

df.groupby('class').agg('mean')

df.groupby(['class1','class2'])[['column1', 'column2']].agg(['mean', 'sum', 'std'])

-----

def diff_min_max(x):
  return x.max()-x.min()

df.groupby(['class1','class2'])[['column1', 'column2']].agg(['sum', diff_min_max])

----

df.groupby(['class1','class2'])[['column1', 'column2']].agg({'column1':'sum', 'column2':'mean'})

```

In [1]:
import pandas as pd
df = pd.DataFrame()

In [2]:
df.groupby?

## Grouping Data

In [3]:
import numpy as np
import pandas as pd
df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
                           ('bird', 'Psittaciformes', 24.0),
                           ('mammal', 'Carnivora', 80.2),
                           ('mammal', 'Primates', np.nan),
                           ('mammal', 'Carnivora', 58)],
                          index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
                             columns=('class', 'order', 'max_speed'))

df

Unnamed: 0,class,order,max_speed
falcon,bird,Falconiformes,389.0
parrot,bird,Psittaciformes,24.0
lion,mammal,Carnivora,80.2
monkey,mammal,Primates,
leopard,mammal,Carnivora,58.0


In [4]:
df.groupby('class')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f62813ad450>

Coba kalian perhatikan, pada tahap ini kita baru sampai pada process **splitting** berdasarkan kategori ‘class’. hasil dari groupby ini hanya sebuah objek GroupBy. 

objek groupby ini adalah representasi dari suatu hasil penggabungan data berdasarkan group class, namun tidak menghasilkan nilai apapun sebelum kita memberikan suatu operasi terhadap hasil dari setiap grouping tersebut. 

prosess melakukan operasi terhadap sekumpulan data yang telah di groupkan adalah process **applying**. kita bisa menggunakan berbagai jenis operasi terhadap objek groupby tersebut. 

ada operasi yang telah di sediakan sebagai operasi bawaan dari pandas seperti mean(), min(), max(), dan lain sebagainya. 

dapat pula menggunakan custom operasi tergantung apa yang kita inginkan. mari kita coba praktekan salah satu jenis operasi kepada objek groupby tersebut.

In [5]:
df.groupby('class').mean()

Unnamed: 0_level_0,max_speed
class,Unnamed: 1_level_1
bird,206.5
mammal,69.1


Coba kalian perhatikan. kita telah berhasil menggunakan groupby dan memberikan hasil. jenis operasi yang di apply terhadap setiap group adalah rata-rata. 

dan hasilnya di **combine** menjadi struktur data baru. dengan begini kita menhasilkan suatu hasil analisis sederhana yaitu bahwa binatang dengan class bird memiliki kecepatan maksimal rata-rata sebesar 206.5, dan bintang class mamalia sebesar 69.1

....................................................................................................................................................................

In [6]:
import pandas as pd
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
                    'Max Speed': [380., 370., 24., 26.]})
df

Unnamed: 0,Animal,Max Speed
0,Falcon,380.0
1,Falcon,370.0
2,Parrot,24.0
3,Parrot,26.0


In [7]:
df.groupby(['Animal']).mean()

Unnamed: 0_level_0,Max Speed
Animal,Unnamed: 1_level_1
Falcon,375.0
Parrot,25.0


In [8]:
df.groupby('Animal').agg('mean')

Unnamed: 0_level_0,Max Speed
Animal,Unnamed: 1_level_1
Falcon,375.0
Parrot,25.0


**Hierarchical Indexes**

We can groupby different levels of a hierarchical index
using the `level` parameter:

In [9]:
arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Max Speed
Animal,Type,Unnamed: 2_level_1
Falcon,Captive,390.0
Falcon,Wild,350.0
Parrot,Captive,30.0
Parrot,Wild,20.0


In [10]:
df.groupby(level=0).mean()

Unnamed: 0_level_0,Max Speed
Animal,Unnamed: 1_level_1
Falcon,370.0
Parrot,25.0


In [11]:
df.groupby(level="Animal").mean()

Unnamed: 0_level_0,Max Speed
Animal,Unnamed: 1_level_1
Falcon,370.0
Parrot,25.0


In [12]:
df.groupby(level="Type").mean()

Unnamed: 0_level_0,Max Speed
Type,Unnamed: 1_level_1
Captive,210.0
Wild,185.0


In [13]:
import pandas as pd
penjualan = {
    'Hari':['Hari-1']*6+['Hari-2']*6,
    'toko':['Bandung', 'Garut', 'Jakarta', 'Garut', 'Bandung', 'Jakarta', 'Bandung', 'Garut', 'Jakarta', 'Garut', 'Bandung', 'Jakarta'],
    'Barang':['Telur', 'Telur', 'Telur', 'Minyak', 'Minyak', 'Minyak', 'Telur', 'Telur', 'Telur', 'Minyak', 'Minyak', 'Minyak'],
    'Pendapatan_Kotor':[200000, 300000, 150000, 100000, 120000, 400000, 240000, 320000, 200000, 300000, 150000, 200000],
    'Pendapatan_Bersih':[80000, 100000, 50000, 50000, 50000, 200000, 120000, 150000, 120000, 100000, 110000, 90000]
}
df = pd.DataFrame(penjualan)
df

Unnamed: 0,Hari,toko,Barang,Pendapatan_Kotor,Pendapatan_Bersih
0,Hari-1,Bandung,Telur,200000,80000
1,Hari-1,Garut,Telur,300000,100000
2,Hari-1,Jakarta,Telur,150000,50000
3,Hari-1,Garut,Minyak,100000,50000
4,Hari-1,Bandung,Minyak,120000,50000
5,Hari-1,Jakarta,Minyak,400000,200000
6,Hari-2,Bandung,Telur,240000,120000
7,Hari-2,Garut,Telur,320000,150000
8,Hari-2,Jakarta,Telur,200000,120000
9,Hari-2,Garut,Minyak,300000,100000


In [14]:
df.groupby('Hari').sum()

Unnamed: 0_level_0,Pendapatan_Kotor,Pendapatan_Bersih
Hari,Unnamed: 1_level_1,Unnamed: 2_level_1
Hari-1,1270000,530000
Hari-2,1410000,690000


In [15]:
df.groupby(['Hari', 'toko', 'Barang']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Pendapatan_Kotor,Pendapatan_Bersih
Hari,toko,Barang,Unnamed: 3_level_1,Unnamed: 4_level_1
Hari-1,Bandung,Minyak,120000,50000
Hari-1,Bandung,Telur,200000,80000
Hari-1,Garut,Minyak,100000,50000
Hari-1,Garut,Telur,300000,100000
Hari-1,Jakarta,Minyak,400000,200000
Hari-1,Jakarta,Telur,150000,50000
Hari-2,Bandung,Minyak,150000,110000
Hari-2,Bandung,Telur,240000,120000
Hari-2,Garut,Minyak,300000,100000
Hari-2,Garut,Telur,320000,150000


## **Group by Multiple Columns**

In [16]:
df

Unnamed: 0,Hari,toko,Barang,Pendapatan_Kotor,Pendapatan_Bersih
0,Hari-1,Bandung,Telur,200000,80000
1,Hari-1,Garut,Telur,300000,100000
2,Hari-1,Jakarta,Telur,150000,50000
3,Hari-1,Garut,Minyak,100000,50000
4,Hari-1,Bandung,Minyak,120000,50000
5,Hari-1,Jakarta,Minyak,400000,200000
6,Hari-2,Bandung,Telur,240000,120000
7,Hari-2,Garut,Telur,320000,150000
8,Hari-2,Jakarta,Telur,200000,120000
9,Hari-2,Garut,Minyak,300000,100000


In [17]:
df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor']].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor
Hari,toko,Unnamed: 2_level_1
Hari-1,Bandung,160000
Hari-1,Garut,200000
Hari-1,Jakarta,275000
Hari-2,Bandung,195000
Hari-2,Garut,310000
Hari-2,Jakarta,200000


In [18]:
df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor', 'Pendapatan_Bersih']].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor,Pendapatan_Bersih
Hari,toko,Unnamed: 2_level_1,Unnamed: 3_level_1
Hari-1,Bandung,160000,65000
Hari-1,Garut,200000,75000
Hari-1,Jakarta,275000,125000
Hari-2,Bandung,195000,115000
Hari-2,Garut,310000,125000
Hari-2,Jakarta,200000,105000


## **Group by Multiple Aggregation**

In [19]:
df

Unnamed: 0,Hari,toko,Barang,Pendapatan_Kotor,Pendapatan_Bersih
0,Hari-1,Bandung,Telur,200000,80000
1,Hari-1,Garut,Telur,300000,100000
2,Hari-1,Jakarta,Telur,150000,50000
3,Hari-1,Garut,Minyak,100000,50000
4,Hari-1,Bandung,Minyak,120000,50000
5,Hari-1,Jakarta,Minyak,400000,200000
6,Hari-2,Bandung,Telur,240000,120000
7,Hari-2,Garut,Telur,320000,150000
8,Hari-2,Jakarta,Telur,200000,120000
9,Hari-2,Garut,Minyak,300000,100000


In [20]:
df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor', 'Pendapatan_Bersih']].agg(['mean', 'sum', 'std'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor,Pendapatan_Kotor,Pendapatan_Kotor,Pendapatan_Bersih,Pendapatan_Bersih,Pendapatan_Bersih
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,sum,std,mean,sum,std
Hari,toko,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Hari-1,Bandung,160000,320000,56568.542495,65000,130000,21213.203436
Hari-1,Garut,200000,400000,141421.356237,75000,150000,35355.339059
Hari-1,Jakarta,275000,550000,176776.695297,125000,250000,106066.017178
Hari-2,Bandung,195000,390000,63639.610307,115000,230000,7071.067812
Hari-2,Garut,310000,620000,14142.135624,125000,250000,35355.339059
Hari-2,Jakarta,200000,400000,0.0,105000,210000,21213.203436


## **Group by with Custom Function**

In [21]:
df

Unnamed: 0,Hari,toko,Barang,Pendapatan_Kotor,Pendapatan_Bersih
0,Hari-1,Bandung,Telur,200000,80000
1,Hari-1,Garut,Telur,300000,100000
2,Hari-1,Jakarta,Telur,150000,50000
3,Hari-1,Garut,Minyak,100000,50000
4,Hari-1,Bandung,Minyak,120000,50000
5,Hari-1,Jakarta,Minyak,400000,200000
6,Hari-2,Bandung,Telur,240000,120000
7,Hari-2,Garut,Telur,320000,150000
8,Hari-2,Jakarta,Telur,200000,120000
9,Hari-2,Garut,Minyak,300000,100000


In [22]:
def diff_min_max(x):
  return x.max()-x.min()

df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor', 'Pendapatan_Bersih']].agg(['sum', diff_min_max])

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor,Pendapatan_Kotor,Pendapatan_Bersih,Pendapatan_Bersih
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,diff_min_max,sum,diff_min_max
Hari,toko,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Hari-1,Bandung,320000,80000,130000,30000
Hari-1,Garut,400000,200000,150000,50000
Hari-1,Jakarta,550000,250000,250000,150000
Hari-2,Bandung,390000,90000,230000,10000
Hari-2,Garut,620000,20000,250000,50000
Hari-2,Jakarta,400000,0,210000,30000


## **Custom Aggregation with Dictionary**

In [23]:
df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor', 'Pendapatan_Bersih']].agg({'Pendapatan_Kotor':'sum', 'Pendapatan_Bersih':'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor,Pendapatan_Bersih
Hari,toko,Unnamed: 2_level_1,Unnamed: 3_level_1
Hari-1,Bandung,320000,65000
Hari-1,Garut,400000,75000
Hari-1,Jakarta,550000,125000
Hari-2,Bandung,390000,115000
Hari-2,Garut,620000,125000
Hari-2,Jakarta,400000,105000


In [24]:
df.groupby(['Hari', 'toko'])[['Pendapatan_Kotor', 'Pendapatan_Bersih']].agg({'Pendapatan_Kotor':'sum', 'Pendapatan_Bersih':'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Pendapatan_Kotor,Pendapatan_Bersih
Hari,toko,Unnamed: 2_level_1,Unnamed: 3_level_1
Hari-1,Bandung,320000,65000
Hari-1,Garut,400000,75000
Hari-1,Jakarta,550000,125000
Hari-2,Bandung,390000,115000
Hari-2,Garut,620000,125000
Hari-2,Jakarta,400000,105000


## **Task**

In [25]:
# '/content/drive/My Drive/Colab Notebooks/Wooky Pandas/diamonds.csv'
# import pandas as pd
# df = pd.read_csv('https://drive.google.com/uc?id=1NShkvOY-bxIulZZ8iWczg0-f8g3l1QvI')
# df

In [26]:
# '/content/drive/My Drive/Colab Notebooks/Wooky Pandas/diamonds.csv'
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Data/diamonds.csv')
df

Unnamed: 0.1,Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,1,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,2,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,3,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,4,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63
4,5,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...,...,...
53935,53936,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50
53936,53937,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61
53937,53938,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56
53938,53939,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74


In [27]:
df.head()

Unnamed: 0.1,Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,1,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,2,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,3,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,4,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,5,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


**Columns Metadata:**

`carat`: berat diamond

`cut`: kualitas diamond

`color`: warna diamond

`clarity`: level kemurnian diamonds

`depth`: tinggi dari diamond tabel

`table`: lebar  dari diamond tabel

`price`: harga diamond ($US)

`x`: panjang

`y`: lebar

`z`: kedalaaman

<h2>Soal 1: Process Grouping</h2>

Jelaskan prosess yang terjadi dalam grouping data menggungkan method <b>groupby()</b> dari pandas.

jawab:

Grouping adalah suatu teknik memisahkan data berdasarkan kriteria tertentu. untuk melakukan grouping di pandas kita menggunakan suatu method yang bernama groupby(). definisi abstraknya dari grouping adalah kita melakukan mapping suatu data terhadap suatu group.

ketika melakukan grouping ada beberapa proses yang terjadi secara berurutan, yaitu :

Splitting, Applying, Combining

pada Grouping kita juga bisa membuat suatu fungsi Aggregation seperti SUM, MIN, MAX, AVG

<h2>Soal 2: Grouping All Column</h2>
    
Group data diamond berdasarkan cut (kualitas), lakukan aggregasi dengan fungsi rata2.

Kemudian jawab pertanyaan berikut.
- Berapa harga rata2 dari diamond berkualitas Premium

Expected Output:

![grouping_data01.png](https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Images/grouping_data01.png)

In [28]:
# code 
df.groupby('cut').agg('mean')

Unnamed: 0_level_0,Unnamed: 0,carat,depth,table,price,x,y,z
cut,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Fair,24147.357764,1.046137,64.041677,59.053789,4358.757764,6.246894,6.182652,3.98277
Good,24774.931309,0.849185,62.365879,58.694639,3928.864452,5.838785,5.850744,3.639507
Ideal,29047.630736,0.702837,61.709401,55.951668,3457.54197,5.507451,5.52008,3.401448
Premium,25600.209049,0.891955,61.264673,58.746095,4584.257704,5.973887,5.944879,3.647124
Very Good,26097.313193,0.806381,61.818275,57.95615,3981.759891,5.740696,5.770026,3.559801


In [29]:
df.groupby('cut').mean().loc['Premium','price']

4584.2577042999055

jawab: **4584.2577042999055**

<h2>Soal 3: Grouping Some Column base on Multiple Criteria</h2>
    
Lakukan groupby pada data berdasarkan kriteria cut (kualitas) dan color (warna) secara berurutan, dengan aggregasi pada column carat, table, dan price saja. dengan fungsi agregasi rata-rata.

Kemudian jawab pertanyaan berikut:
- Berapa harga diamond yang berkriteria premium dan berwarna J
- Berapa nilai dari carat diamond yang berkriteria very good dan berwarna H

Expected Output:

Note: gambar hanya sebagaian dari keseluruhan data

![grouping_data02.png](https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Images/grouping_data02.png)

In [30]:
# code here
data = df.groupby(['cut','color'])[['carat', 'table', 'price']].mean()
data

Unnamed: 0_level_0,Unnamed: 1_level_0,carat,table,price
cut,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fair,D,0.920123,58.969325,4291.06135
Fair,E,0.856607,59.364732,3682.3125
Fair,F,0.904712,59.453205,3827.003205
Fair,G,1.023822,58.773248,4239.254777
Fair,H,1.219175,58.69637,5135.683168
Fair,I,1.198057,59.237143,4685.445714
Fair,J,1.341176,58.917647,4975.655462
Good,D,0.744517,58.541541,3405.382175
Good,E,0.745134,58.779957,3423.644159
Good,F,0.77593,58.910891,3495.750275


In [31]:
# harga diamond yang berkriteria premium dan berwarna J
data.loc['Premium'].loc['J', 'price']

6294.591584158416

In [32]:
# nilai dari carat diamond yang berkriteria very good dan berwarna H
data.loc['Very Good'].loc['H', 'carat']

0.9159484649122761

jawab:
*   Berapa harga diamond yang berkriteria premium dan berwarna J = 6294.591584158416
*   Berapa nilai dari carat diamond yang berkriteria very good dan berwarna H = 0.9159484649122761





<h2>Soal 4: Group By Multiple Aggregate Base on Multiple Criteria</h2>
    
Lakukan groupby pada data berdasarkan kriteria cut(kualitas) dan color (warna) secara berurutan kemudian aggregasi objek group by nya dengan fungsi rata2 dan median.

Kemudian Jawab Pertanyaan berikut:

- Berapa nilai median dari harga diamond berkualitas good dan berwarna F

Expected Output:

Note : gambar hanya sebagaian dari keseluruhan data

![grouping_data03.png](https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Images/grouping_data03.png)

In [33]:
dataagg = df.groupby(['cut', 'color']).agg(['mean', 'median'])
dataagg

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,Unnamed: 0,carat,carat,depth,depth,table,table,price,price,x,x,y,y,z,z
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,median,mean,median,mean,median,mean,median,mean,median,mean,median,mean,median,mean,median
cut,color,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
Fair,D,22993.699387,15921.0,0.920123,0.9,64.048466,64.8,58.969325,58.0,4291.06135,3730.0,6.018344,6.08,5.96319,6.01,3.839877,3.9
Fair,E,24044.59375,19368.0,0.856607,0.9,63.319643,64.8,59.364732,59.0,3682.3125,2956.0,5.909063,6.07,5.858214,6.02,3.722143,3.76
Fair,F,24414.75641,19651.5,0.904712,0.9,63.508013,64.8,59.453205,59.0,3827.003205,3035.0,5.990513,6.06,5.931122,5.99,3.787821,3.815
Fair,G,26670.949045,24338.0,1.023822,0.98,64.339809,65.0,58.773248,58.0,4239.254777,3057.0,6.173822,6.125,6.114076,6.04,3.963153,3.97
Fair,H,22296.056106,18202.0,1.219175,1.01,64.585149,65.1,58.69637,58.0,5135.683168,3816.0,6.579373,6.33,6.497393,6.25,4.219373,4.09
Fair,I,24912.96,20569.0,1.198057,1.01,64.220571,65.1,59.237143,58.0,4685.445714,3246.0,6.564457,6.34,6.493486,6.27,4.193486,4.08
Fair,J,22148.983193,17416.0,1.341176,1.03,64.357143,65.0,58.917647,58.0,4975.655462,3302.0,6.747311,6.49,6.675882,6.46,4.319664,4.15
Good,D,23869.592145,19696.0,0.744517,0.7,62.36571,63.4,58.541541,58.0,3405.382175,2728.5,5.620076,5.69,5.633897,5.73,3.504864,3.54
Good,E,26368.867095,26054.0,0.745134,0.7,62.203751,63.3,58.779957,58.0,3423.644159,2420.0,5.617889,5.68,5.632454,5.71,3.496066,3.48
Good,F,25879.342134,24988.0,0.77593,0.71,62.20231,63.2,58.910891,59.0,3495.750275,2647.0,5.693443,5.76,5.709659,5.79,3.544609,3.58


In [34]:
# Berapa nilai median dari harga diamond berkualitas good dan berwarna F
dataagg.loc['Good', 'price'].loc['F', 'median']


2647.0

jawab:

2647.0

<h2>Soal 5: Group By With Different Aggregation for specific Column</h2>
    
Lakukan groupby pada data berdasarkan criteria cut(kualitas), kemudian aggreagate dengan fungsi max pada column price dan fungsi min pada column carat.

Kemudian Jawab Pertanyaan berikut:

- Berapa nilai max dari price untuk diamond berkualitas Ideal

Expected Output:

![grouping_data04.png](https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Images/grouping_data04.png)

In [35]:
# code here
dataagg2 = df.groupby('cut')[['price', 'carat']].agg({'price':'max', 'carat':'min'})
dataagg2

Unnamed: 0_level_0,price,carat
cut,Unnamed: 1_level_1,Unnamed: 2_level_1
Fair,18574,0.22
Good,18788,0.23
Ideal,18806,0.2
Premium,18823,0.2
Very Good,18818,0.2


In [36]:
# Berapa nilai max dari price untuk diamond berkualitas Ideal
dataagg2.loc['Ideal', 'price']

18806

jawab:

18806

<h2>Soal 6: Group By With Custom Function</h2>
    
Lakukan groupby pada data berdasarkan criteria cut(kualitas), kemudian lakukan aggregasi dengan fungsi standar deviasi pada column carat dan fungsi rentang nilai antara max dan min pada column price.

Kemudian jawab pertanyaan berikut:
- Berapa selisih harga tertinggi dan terendah dari diamond berkualitas Premium

Expected Output:

![grouping_data05.png](https://raw.githubusercontent.com/nurimammasri/Wooky-Pandas/master/Images/grouping_data05.png)

In [37]:
# code here
def intrvl(x) :
  return x.max() - x.min()

dataagg3 = df.groupby('cut')[['price', 'carat']].agg({'price': intrvl,'carat':'std'})
dataagg3

Unnamed: 0_level_0,price,carat
cut,Unnamed: 1_level_1,Unnamed: 2_level_1
Fair,18237,0.516404
Good,18461,0.454054
Ideal,18480,0.432876
Premium,18497,0.515262
Very Good,18482,0.459435


In [38]:
# Berapa selisih harga tertinggi dan terendah dari diamond berkualitas Premium
dataagg3.loc['Premium', 'price']

18497

jawab:

18497