****Penerapan Algoritma _Clustering_ untuk Pengelompokkan Saham IDX Berdasarkan Indikator-indikator Fundamental - Submission Machine Learning Terapan Dicoding**** 

oleh: Fikri Septrian Anggara (fikri_anggara_2c3r)

### Import library yang diperlukan

In [2]:
# untuk pengolahan data
import pandas as pd

# untuk visualisasi data
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

# untuk pembangunan klaster
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering

# untuk visualisasi klaster
from scipy.cluster.hierarchy import dendrogram, linkage

# untuk scaling data
from sklearn.preprocessing import MinMaxScaler

# untuk menghitung jarak
from scipy.spatial.distance import pdist, cdist

# untuk menghitung evaluasi klaster
from sklearn.metrics import silhouette_score

# untuk melakukan reduksi dimensi
from sklearn.decomposition import PCA

# 2. Data Understanding

### 2.1. Menyiapkan Dataset
dataset diperoleh dari kaggle, terdapat dua dataset yang digunakan yaitu:
 - [financial statement idx stocks](https://www.kaggle.com/datasets/greegtitan/financial-statement-idx-stocks?resource=download) (kaggle). terakhir diupdate pada Oktober 2022.
 - [daftar saham](https://www.kaggle.com/datasets/muamkh/ihsgstockdata?select=DaftarSaham.csv) (kaggle). terakhir diupdate pada Januari 2023

In [3]:
# load data
stockQuarter = pd.read_csv('data/quarter.csv')
masterStock = pd.read_csv('data/DaftarSaham.csv')

### 2.2. Overview Data
pada tahap ini dilakukan _overview_ pada data stockQuarter dan masterStock

In [4]:
# banyaknya baris dan kolom (baris, kolom)
print(stockQuarter.shape)
print(masterStock.shape)

(208691, 8)
(829, 14)


In [5]:
# kolom dan tipe data
stockQuarter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208691 entries, 0 to 208690
Data columns (total 8 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   symbol      208691 non-null  object 
 1   account     208691 non-null  object 
 2   type        208149 non-null  object 
 3   2021-09-30  51722 non-null   float64
 4   2021-12-31  51538 non-null   float64
 5   2022-03-31  53449 non-null   float64
 6   2022-06-30  50375 non-null   float64
 7   2022-09-30  17013 non-null   float64
dtypes: float64(5), object(3)
memory usage: 12.7+ MB


In [6]:
# kolom dan tipe data
masterStock.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 829 entries, 0 to 828
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Code                829 non-null    object 
 1   Name                829 non-null    object 
 2   ListingDate         829 non-null    object 
 3   Shares              829 non-null    float64
 4   ListingBoard        829 non-null    object 
 5   Sector              829 non-null    object 
 6   LastPrice           824 non-null    float64
 7   MarketCap           824 non-null    float64
 8   MinutesFirstAdded   787 non-null    object 
 9   MinutesLastUpdated  787 non-null    object 
 10  HourlyFirstAdded    806 non-null    object 
 11  HourlyLastUpdated   806 non-null    object 
 12  DailyFirstAdded     824 non-null    object 
 13  DailyLastUpdated    824 non-null    object 
dtypes: float64(3), object(11)
memory usage: 90.8+ KB


In [7]:
# ambil sampel random sebanyak 5 data
stockQuarter.sample(n=5, random_state=1)


Unnamed: 0,symbol,account,type,2021-09-30,2021-12-31,2022-03-31,2022-06-30,2022-09-30
181194,SQMI,CF,Cash From Discontinued Investing Activities,,,,,
137781,MSKY,CF,Paymentson Behalfof Employees,-84213000000.0,-77478000000.0,-62499000000.0,-66432000000.0,
22706,BBHI,BS,Long Term Debt And Capital Lease Obligation,4114605000.0,3291684000.0,0.0,45750370000.0,41356130000.0
143195,NETV,IS,Credit Card,,,,,
184286,SUPR,BS,Securities And Investments,,,,,


In [8]:
stockQuarter.isnull().sum()

symbol             0
account            0
type             542
2021-09-30    156969
2021-12-31    157153
2022-03-31    155242
2022-06-30    158316
2022-09-30    191678
dtype: int64

- terdapat banyak ****null value**** pada data saham perkuarter

In [10]:
# ambil sampel random sebanyak 5 data
masterStock.sample(n=5, random_state=1)

Unnamed: 0,Code,Name,ListingDate,Shares,ListingBoard,Sector,LastPrice,MarketCap,MinutesFirstAdded,MinutesLastUpdated,HourlyFirstAdded,HourlyLastUpdated,DailyFirstAdded,DailyLastUpdated
370,INTA,Intraco Penta Tbk.,1993-08-23,3343935000.0,Pengembangan,Industrials,74.0,247451200000.0,2021-11-01 09:00:00,2022-06-03 15:59:00,2020-04-16 09:00:00,2022-06-03 16:00:00,2001-04-16,2023-01-06
449,LION,Lion Metal Works Tbk.,1993-08-20,520160000.0,Utama,Industrials,735.0,382317600000.0,2021-11-01 09:00:00,2023-01-06 15:59:00,2020-04-16 09:00:00,2023-01-06 15:00:00,2001-04-16,2023-01-06
810,WINR,Winner Nusantara Jaya Tbk.,2022-04-25,5235200000.0,Utama,Properties & Real Estate,50.0,261760000000.0,2022-04-25 09:00:00,2023-01-06 15:59:00,2022-04-25 09:00:00,2023-01-06 15:00:00,2022-04-25,2023-01-06
578,PDPP,Primadaya Plastisindo Tbk.,2022-11-09 00:00:00,2500000000.0,Pengembangan,Basic Materials,202.0,505000000000.0,2022-11-09 09:00:00,2023-01-06 15:59:00,2022-11-09 09:00:00,2023-01-06 15:00:00,2022-11-10,2023-01-06
700,SMRU,SMR Utama Tbk.,2011-10-10,12499390000.0,Pengembangan,Energy,50.0,624969300000.0,,,,,2011-10-10,2023-01-06


In [11]:
# 5 number summary stock quarter
stockQuarter.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
symbol,208691.0,544.0,AALI,391.0,,,,,,,
account,208691.0,3.0,BS,73461.0,,,,,,,
type,208149.0,388.0,Interest Received,1080.0,,,,,,,
2021-09-30,51722.0,,,,1918202605901.322,24798786684018.977,-175313000000000.0,0.0,5837091260.0,233651664187.5,1637950171000000.0
2021-12-31,51538.0,,,,2042450921973.7883,26106014954333.766,-438813037000000.0,0.0,7402772575.0,257243562462.0,1725611128000000.0
2022-03-31,53449.0,,,,1940242135220.246,25585023607145.355,-223695701000000.0,0.0,5526650874.0,214030861495.0,1734074740000000.0
2022-06-30,50375.0,,,,2139261720062.0688,26835599887835.516,-194169000000000.0,0.0,7726820540.0,265889690000.0,1785706841000000.0
2022-09-30,17013.0,,,,3359261415683.1665,33720778198892.17,-197134246000000.0,79886.0,20220521000.0,557420000000.0,1839336498000000.0


- terdapat **544** buah saham
- terdapat **388** variabel pada laporan keuangan
- terdapat **3** kategori akun yaitu balance sheet, cash-flow dan income statement

In [12]:
# 5 number summary master stock
masterStock.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Code,829.0,829.0,AALI,1.0,,,,,,,
Name,829.0,829.0,Astra Agro Lestari Tbk.,1.0,,,,,,,
ListingDate,829.0,713.0,2022-11-08 00:00:00,6.0,,,,,,,
Shares,829.0,,,,10946944360.291918,45395931792.26726,3600000.0,1230839821.0,3128090000.0,9327465018.0,1184363929502.0
ListingBoard,829.0,3.0,Pengembangan,422.0,,,,,,,
Sector,829.0,11.0,Consumer Cyclicals,142.0,,,,,,,
LastPrice,824.0,,,,1363.453883,3733.432446,25.0,101.75,287.0,975.0,38000.0
MarketCap,824.0,,,,10076631834955.723,50531421356601.73,9000000000.0,297528301025.0,1111304868160.0,4236952973400.0,1012951085850000.0
MinutesFirstAdded,787.0,60.0,2021-11-01 09:00:00,698.0,,,,,,,
MinutesLastUpdated,787.0,18.0,2023-01-06 15:59:00,759.0,,,,,,,


- Terdapat 11 sektor pada master stok dengan **Consumer Cyclicals** adalah sektor yang paling banyak emitennya
- Terdapat 829 buah emiten

In [13]:
# variabel pada laporan keuangan
pd.DataFrame(pd.unique(stockQuarter['type']))

Unnamed: 0,0
0,Long Term Equity Investment
1,Additional Paid In Capital
2,Long Term Debt
3,Ordinary Shares Number
4,Total Assets
...,...
384,Interest Income From Deposits
385,Credit Losses Provision
386,Other Interest Income
387,Interest Expense For Federal Funds Sold And Se...


_dataframe_ **stockQuarter** memiliki 208691 baris dan 8 kolom. ke delapan kolom yaitu :
 - **symbol**: Kode saham IDX seperti BBRI, BBCA, BMRI, dst.
 - **account**: Akun laporan keuangan. nilainya meliputi **BS** untuk _Balance Sheet_, **IS** untuk akun _Income Statement_, dan **CF** untuk akun _Cash Flow_.
 - **type**: Tipe/variabel laporan keuangan seperti data total aset, kredit, dividen yang dibayarkan, dst. memiliki 388 tipe.
 - Kolom data variabel laporan keuangan perkuarter. meliputi tanggal **2021-09-30**, **2021-12-31**, **2022-03-31**, **2022-06-30**, **2022-09-30**.
 
_dataframe_ **masterStock** memiliki 829 baris dan 14 kolom. ke empat belas kolom tersebut yaitu:
 - **Code**: Kode saham IDX
 - **Name**: Nama saham
 - **ListingDate**: Tanggal pendaftaran saham
 - **Shares**: Total saham beredar
 - **ListingBoard**: Tingkat pasar saham, meliputi tingkat akselerasi, pengembangan dan utama
 - **Sector**: Sektor perusahaan
 - **LastPrice**: Harga terakhir saham
 - **MarketCap**: Total nilai perusahaan
 - **MinutesFirstAdded**: Menit pertama data ditambahkan
 - **MinutesLastUpdated**: Menit terakhir data diperbarui
 - **HourlyFirstAdded**: Jam pertama data ditambahkan
 - **HourlyLastUpdated**: Jam terakhir data diperbarui
 - **DailyFirstAdded**: Tanggal Pertama data ditambahkan
 - **DailyLastUpdated**: Tanggal Terakhir data diperbarui

terdapat 389 variabel pada laporan keuangan

Berdasarkan _Overview Data_, diketahui :
- Terdapat banyak **null value** pada data saham perkuarter
- Terdapat **544** buah saham yang tercatat laporan keuangannya
- Terdapat **388** variabel pada laporan keuangan
- Terdapat **3** kategori akun yaitu balance sheet, cash-flow dan income statement
- Terdapat **11** sektor pada master stok dengan **Consumer Cyclicals** adalah sektor yang paling banyak emitennya
- Terdapat total **829** buah saham
- Terdapat **389** variabel pada laporan keuangan


# 3. Data Preparation

Data stockQuarter masih belum memiliki struktur yang bisa digunakan untuk pembuatan model dan belum digabung dengan masterStock untuk memperoleh data sektor. 
Maka pada ada tahap _Data Preparation_, penulis :
1. merubah struktur data stockQuarter agar cocok untuk pembangunan model klaster
2. menggabungkan data stockQuarter dan masterStock
3. melakukan feature engineering untuk memperoleh indikator fundamental yang digunakan pada paper Examining the effectiveness of fundamental analysis
in a long-term stock portfolio
4. menghilangkan fitur yang tidak digunakan
5. melakukan analisis data eksploratif
6. melakukan imputasi pada fitur hasil feature engineering
7. melakukan reduksi dimensi 

karena data stockQuarter terbaru (2022-09-30) memiliki paling banyak _null value_, maka penulis menggunakan data kuarter sebelumnya, yaitu data kuarter kedua tahun 2022.

In [14]:
# buat kolom baru, gabungan antara account dan type untuk menyimpan nilai tipe akun
stockQuarter['account_type'] = stockQuarter['account']+'_'+stockQuarter['type']
stockQuarter.head()

Unnamed: 0,symbol,account,type,2021-09-30,2021-12-31,2022-03-31,2022-06-30,2022-09-30,account_type
0,AALI,BS,Long Term Equity Investment,323520000000.0,330904000000.0,327580000000.0,285069000000.0,471463000000.0,BS_Long Term Equity Investment
1,AALI,BS,Additional Paid In Capital,3878995000000.0,3878995000000.0,3878995000000.0,3878995000000.0,3878995000000.0,BS_Additional Paid In Capital
2,AALI,BS,Long Term Debt,5709887000000.0,2131944000000.0,2144732000000.0,2220370000000.0,3281008000000.0,BS_Long Term Debt
3,AALI,BS,Ordinary Shares Number,1924688000.0,1924688000.0,1924688000.0,1924688000.0,1924688000.0,BS_Ordinary Shares Number
4,AALI,BS,Total Assets,29694010000000.0,30399910000000.0,31232780000000.0,30233990000000.0,32638650000000.0,BS_Total Assets


### 3.1 Mengubah Struktur Data

In [15]:
# ambil data 2022 kuarter dua saja
data2022 = stockQuarter[['symbol', 'account_type', '2022-06-30']].copy()

# reshaping data, kolom" data yang baru merupakan value dari 'type' dari df yang lama
dataReshaped = pd.DataFrame(data2022.pivot_table(
    index='symbol',
    columns='account_type',
    values='2022-06-30'
).reset_index())
dataReshaped.sample(n=5, random_state=1)

account_type,symbol,BS_Accounts Payable,BS_Accounts Receivable,BS_Accumulated Depreciation,BS_Additional Paid In Capital,BS_Allowance For Doubtful Accounts Receivable,BS_Allowance For Loans And Lease Losses,BS_Assets Held For Sale,BS_Available For Sale Securities,BS_Buildings And Improvements,...,IS_Selling General And Administration,IS_Special Income Charges,IS_Tax Effect Of Unusual Items,IS_Tax Provision,IS_Tax Rate For Calcs,IS_Total Premiums Earned,IS_Total Revenue,IS_Total Unusual Items,IS_Total Unusual Items Excluding Goodwill,IS_Write Off
306,META,21693200000.0,26296710000.0,-155059700000.0,469155800000.0,-87700000.0,,,,130187000000.0,...,9808912000.0,,0.0,10885950000.0,0.0,,204358000000.0,,,
342,NFCX,233929400000.0,199382100000.0,-52618320000.0,283429000000.0,-320225284.0,,,33590860000.0,141863300000.0,...,6519324000.0,,0.0,1711056000.0,0.0,,2428054000000.0,,,
47,BACA,40884000000.0,,-306580000000.0,8488000000.0,,10732000000.0,,2928573000000.0,407267000000.0,...,21620000000.0,,0.0,957000000.0,0.0,,119528000000.0,,,
67,BEBS,4705185000.0,83772030000.0,-11487690000.0,91904750000.0,-357552323.0,,,,38580190000.0,...,2794760000.0,,0.0,10363650000.0,0.0,,160796400000.0,,,
376,PRIM,27165110000.0,67008290000.0,-145382100000.0,461057000000.0,,,,,419849500000.0,...,5785167000.0,,0.0,1650462000.0,0.0,,67972550000.0,,,


In [16]:
# saham yang tersedia
pd.DataFrame(pd.unique(dataReshaped['symbol']))

Unnamed: 0,0
0,AALI
1,ABBA
2,ABDA
3,ABMM
4,ACES
...,...
498,WSKT
499,WTON
500,YPAS
501,ZBRA


In [17]:
# cek null value dari masing masing kolom
nulltable = pd.DataFrame(dataReshaped.isnull().sum().reset_index().iloc[1:])
nulltable.columns = ['financial statement', 'sum of null']
print(nulltable)
print('total null: ', dataReshaped.isnull().sum().sum())

                               financial statement  sum of null
1                              BS_Accounts Payable           17
2                           BS_Accounts Receivable           70
3                      BS_Accumulated Depreciation           10
4                    BS_Additional Paid In Capital            5
5    BS_Allowance For Doubtful Accounts Receivable          162
..                                             ...          ...
222                       IS_Total Premiums Earned          482
223                               IS_Total Revenue           14
224                         IS_Total Unusual Items          182
225      IS_Total Unusual Items Excluding Goodwill          182
226                                   IS_Write Off          398

[226 rows x 2 columns]
total null:  63737


In [18]:
# 5 data teratas
dataReshaped.head()

account_type,symbol,BS_Accounts Payable,BS_Accounts Receivable,BS_Accumulated Depreciation,BS_Additional Paid In Capital,BS_Allowance For Doubtful Accounts Receivable,BS_Allowance For Loans And Lease Losses,BS_Assets Held For Sale,BS_Available For Sale Securities,BS_Buildings And Improvements,...,IS_Selling General And Administration,IS_Special Income Charges,IS_Tax Effect Of Unusual Items,IS_Tax Provision,IS_Tax Rate For Calcs,IS_Total Premiums Earned,IS_Total Revenue,IS_Total Unusual Items,IS_Total Unusual Items Excluding Goodwill,IS_Write Off
0,AALI,825934000000.0,430528000000.0,-12762150000000.0,3878995000000.0,-25539000000.0,,,,4876408000000.0,...,181832000000.0,2423000000.0,797509600.0,162846000000.0,0.0,,4383457000000.0,2423000000.0,2423000000.0,
1,ABBA,45659250000.0,19274530000.0,-161680700000.0,-45553710000.0,-63646410000.0,,,165986700000.0,23617470000.0,...,4808846000.0,0.0,0.0,335640100.0,0.0,,32701530000.0,0.0,0.0,0.0
2,ABDA,17261380000.0,74720960000.0,-105139400000.0,8109426000.0,,,,-2732850000.0,91033650000.0,...,25730740000.0,5790000.0,2316000.0,-735207000.0,0.0,157010000000.0,193318800000.0,5790000.0,5790000.0,
3,ABMM,161779200.0,199422500.0,-737851600.0,115087200.0,-10312220.0,,,23463080.0,86296130.0,...,9630067.0,,0.0,25695920.0,0.0,,365508400.0,,,
4,ACES,172883100000.0,76683450000.0,-1918547000000.0,440574900000.0,5312285.0,,,,2333466000000.0,...,87514840000.0,-6539966000.0,-1060284000.0,17408370000.0,0.0,,1681350000000.0,-6539966000.0,-6539966000.0,-5312285.0


### 3.2. Menggabungkan Data 

In [21]:
# filter kolom master data
filteredMasterStock = masterStock[['Code', 'Name', 'Shares', 'Sector', 'LastPrice']]

# ubah nama kolom agar bisa dijoin
renamedData = dataReshaped.rename(columns={'symbol':'Code'})

# join data stockQuarter dan masterStock
joinedData = pd.merge(left=renamedData, right=filteredMasterStock, on='Code', how='inner')

In [28]:
joinedData

Unnamed: 0,Code,BS_Accounts Payable,BS_Accounts Receivable,BS_Accumulated Depreciation,BS_Additional Paid In Capital,BS_Allowance For Doubtful Accounts Receivable,BS_Allowance For Loans And Lease Losses,BS_Assets Held For Sale,BS_Available For Sale Securities,BS_Buildings And Improvements,...,IS_Total Revenue,IS_Total Unusual Items,IS_Total Unusual Items Excluding Goodwill,IS_Write Off,Name,Shares,Sector,LastPrice,Net_Profit_Margin,Debt_to_Equity_Ratio
0,AALI,8.259340e+11,4.305280e+11,-1.276215e+13,3.878995e+12,-2.553900e+10,,,,4.876408e+12,...,4.383457e+12,2.423000e+09,2.423000e+09,,Astra Agro Lestari Tbk.,1.924688e+09,Consumer Non-Cyclicals,8000.0,0.074339,2.675966e+13
1,ABBA,4.565925e+10,1.927453e+10,-1.616807e+11,-4.555371e+10,-6.364641e+10,,,1.659867e+11,2.361747e+10,...,3.270153e+10,0.000000e+00,0.000000e+00,0.000000e+00,Mahaka Media Tbk.,3.935893e+09,Consumer Cyclicals,142.0,-0.423696,3.067343e+11
2,ABDA,1.726138e+10,7.472096e+10,-1.051394e+11,8.109426e+09,,,,-2.732850e+09,9.103365e+10,...,1.933188e+11,5.790000e+06,5.790000e+06,,Asuransi Bina Dana Arta Tbk.,6.208067e+08,Financials,6700.0,0.061643,
3,ABMM,1.617792e+08,1.994225e+08,-7.378516e+08,1.150872e+08,-1.031222e+07,,,2.346308e+07,8.629613e+07,...,3.655084e+08,,,,ABM Investama Tbk.,2.753165e+09,Industrials,3050.0,0.174272,8.540152e+08
4,ACES,1.728831e+11,7.668345e+10,-1.918547e+12,4.405749e+11,5.312285e+06,,,,2.333466e+12,...,1.681350e+12,-6.539966e+09,-6.539966e+09,-5.312285e+06,Ace Hardware Indonesia Tbk.,1.715000e+10,Consumer Cyclicals,490.0,0.052873,6.149952e+12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
498,WSKT,7.007696e+12,1.501147e+13,-4.096478e+12,1.379202e+13,-2.903400e+12,,,8.829152e+11,1.591137e+12,...,3.342124e+12,2.463070e+10,2.463070e+10,,Waskita Karya (Persero) Tbk.,2.880681e+10,Infrastructures,350.0,0.177768,7.156834e+13
499,WTON,2.239787e+12,1.119072e+12,-1.791698e+12,9.886334e+11,-9.098793e+10,,,,3.351810e+11,...,1.079902e+12,-5.005578e+09,-5.005578e+09,5.005578e+09,Wijaya Karya Beton Tbk.,8.715467e+09,Basic Materials,184.0,0.039546,4.595704e+12
500,YPAS,1.161044e+10,8.208517e+10,-1.646578e+11,2.805402e+10,-8.574100e+08,,,,6.509308e+10,...,8.282857e+10,0.000000e+00,0.000000e+00,,Yanaprima Hastapersada Tbk,6.680001e+08,Basic Materials,650.0,-0.066328,2.643906e+11
501,ZBRA,5.400567e+11,6.963487e+11,-3.866211e+11,1.167524e+12,-3.877312e+09,,,,1.430180e+11,...,6.813522e+11,-1.928984e+08,-1.928984e+08,2.563082e+08,Dosni Roha Indonesia Tbk.,2.510706e+09,Industrials,540.0,-0.053200,2.496182e+12


### 3.3. _Feature Engineering_ indikator fundamental
berdasarkan paper [[5]](fdsafdsavfdsa), terdapat 5 indikator keuangan yang mampu mewakili indikator lain, yaitu:
- **Net Profit Margin**: perbandingan antara net profit/income dengan total revenue. Melihat apakah pengelolaan perusahaan menghasilkan cukup laba dan apakah biaya operasional dan apakah terdapat biaya yang berlebihan.
- **Debt to equity ratio (D/E)** : perbandingan antara total kewajiban (_liabilities_) dengan ekuitas pemegang saham (_shareholder equity_)
- **Current Ratio**: perbandingan antara aset yang dimiliki dengan kewajiban. menunjukkan kemampuan perusahaan melunasi utang jangka pendek dengan aset lancarnya.
- **Earning per share (EPS)**: perbandingan laba/profit (net income) setelah dikurangi pajak dengan jumlah saham yang beredar (outstanding shares). Digunakan untuk melihat profitabilitas perusahaan. Outstanding Shares diperoleh dari Share Issued - Treasury Shares Number.
- **P/E Ratio**: perbandingan antara harga perlembar saham dengan laba tahunan perlembar (EPS). Untuk membandingkan nilai relatif antar perusahaan.

In [29]:
# Net profit margin = net income / total revenue
joinedData['Net_Profit_Margin'] = joinedData['IS_Net Income']/joinedData['IS_Total Revenue']

# debt to equity ratio (D/E)= total debt / stockholder equity
joinedData['Debt_to_Equity_Ratio'] = joinedData['BS_Total Debt']+joinedData['BS_Stockholders Equity']

# Current Ratio = current Assets (cash dan cash equivalents, accounts receivables, Available For Sale Securities)
# / Current liability ( di kasus ini hanya account payable, Current Notes Payable, Income Tax Payable, Trading Liabilities)
joinedData['Curent_Ratio'] = (joinedData['BS_Cash And Cash Equivalents']+joinedData['BS_Accounts Receivable']+joinedData['BS_Available For Sale Securities'])/(joinedData['BS_Accounts Payable']+joinedData['BS_Current Notes Payable']+joinedData['BS_Income Tax Payable']+joinedData['BS_Trading Liabilities'])

# EPS = Net Income / (share issued - treasury shares number)
joinedData['Earning_per_Shares'] = joinedData['IS_Net Income']/(joinedData['BS_Share Issued']-joinedData['BS_Treasury Shares Number'])

# P/E ratio = share price/EPS
joinedData['Price_to_EPS'] = joinedData['LastPrice']/joinedData['Earning_per_Shares']


KeyError: 'BS_Trading Liabilities'

In [20]:
# usedColumns = [k for k in dataReshaped.columns.tolist() if 'BS_' in k][:15]


In [22]:
# dataFinal = dataReshaped[usedColumns]
# dataFinal

account_type,BS_Accounts Payable,BS_Accounts Receivable,BS_Accumulated Depreciation,BS_Additional Paid In Capital,BS_Allowance For Doubtful Accounts Receivable,BS_Allowance For Loans And Lease Losses,BS_Assets Held For Sale,BS_Available For Sale Securities,BS_Buildings And Improvements,BS_Capital Lease Obligations,BS_Capital Stock,BS_Cash And Cash Equivalents,BS_Cash And Due From Banks,BS_Cash Cash Equivalents And Federal Funds Sold,BS_Cash Equivalents
0,8.259340e+11,4.305280e+11,-1.276215e+13,3.878995e+12,-2.553900e+10,,,,4.876408e+12,,9.623440e+11,1.956801e+12,,,1.761361e+12
1,4.565925e+10,1.927453e+10,-1.616807e+11,-4.555371e+10,-6.364641e+10,,,1.659867e+11,2.361747e+10,6.395203e+09,3.935893e+11,1.694638e+11,,,5.499452e+10
2,1.726138e+10,7.472096e+10,-1.051394e+11,8.109426e+09,,,,-2.732850e+09,9.103365e+10,,1.933167e+11,1.163526e+11,,,
3,1.617792e+08,1.994225e+08,-7.378516e+08,1.150872e+08,-1.031222e+07,,,2.346308e+07,8.629613e+07,6.676648e+07,1.465549e+08,2.917540e+08,,,5.879837e+07
4,1.728831e+11,7.668345e+10,-1.918547e+12,4.405749e+11,5.312285e+06,,,,2.333466e+12,7.785399e+11,1.715000e+11,2.475947e+12,,,3.473473e+11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
498,7.007696e+12,1.501147e+13,-4.096478e+12,1.379202e+13,-2.903400e+12,,,8.829152e+11,1.591137e+12,5.338342e+10,2.880681e+12,1.110274e+13,,,4.580050e+11
499,2.239787e+12,1.119072e+12,-1.791698e+12,9.886334e+11,-9.098793e+10,,,,3.351810e+11,5.064611e+10,8.715467e+11,7.552673e+11,,,2.620000e+11
500,1.161044e+10,8.208517e+10,-1.646578e+11,2.805402e+10,-8.574100e+08,,,,6.509308e+10,1.123992e+08,6.680001e+10,8.831136e+08,,,
501,5.400567e+11,6.963487e+11,-3.866211e+11,1.167524e+12,-3.877312e+09,,,,1.430180e+11,2.815831e+10,2.671036e+11,2.758749e+10,,,
