## Link Figma

**Dummy Data Creation Flow for KasBaik Application can be seen at the link 
[here](https://www.figma.com/file/pO2JAHoveQKoLUiSn5HHOQ/Schema-Generating-Data-for-KasBaik-Apps?node-id=5%3A2).**

### Import Dataset

In [1]:
import pandas as pd
import numpy as np

## Step 1

*Prepare the necessary data*

![](https://raw.githubusercontent.com/imamfirdaus-if/kasbaik/main/machine-learning/assets/scoring%20equation%20-%20step%201.png?token=GHSAT0AAAAAABSMKRBXQ4D4HXHX4A3FSV52YVIDOFA)

### Open Dataset

In [4]:
with open("./Dummy Datasets FICO Scoring - V4 - Clean Data.csv", 'r') as csvfile:
    print(f"First line (header) looks like this:\n\n{csvfile.readline()}")
    print(f"Each data point looks like this:\n\n{csvfile.readline()}")

First line (header) looks like this:

usia,pinjaman,tenor,pemasukan,tanggungan,pekerjaan,pinjaman ke-,telat bayar,donasi

Each data point looks like this:

25,500000,20,4800000,0,Buruh,4,0,8



In [6]:
data  = pd.read_csv(r'./Dummy Datasets FICO Scoring - V4 - Clean Data.csv', delimiter=',')
data.head()

Unnamed: 0,usia,pinjaman,tenor,pemasukan,tanggungan,pekerjaan,pinjaman ke-,telat bayar,donasi
0,25,500000,20,4800000,0,Buruh,4,0,8
1,70,3000000,4,1500000,5,Pedagang,1,6,0
2,95,1600000,4,1500000,3,Pedagang,3,3,6
3,37,1000000,12,5000000,3,Pekerja Lepas,2,1,5
4,26,1900000,6,4400000,2,Guru/Dosen,2,4,1


## Step 2
*Convert Data Into Categories*

![](https://raw.githubusercontent.com/imamfirdaus-if/kasbaik/main/machine-learning/assets/scoring%20equation%20-%20step%202.png?token=GHSAT0AAAAAABSMKRBXLO3JASM7NP6OMXJEYVIDO4Q)

### Convert Categorical to Numeric Value

In [7]:
def Cat_to_Num(features):
    for feature in features:
        feature_list = list(np.unique(data[feature]))
        feature_dict = {}
        for i in range(len(feature_list)):
                       feature_dict[feature_list[i]] = i
        data.replace({feature : feature_dict}, inplace=True)
        print(feature, '-->', feature_dict)

In [8]:
categorical_features = ['pekerjaan']
Cat_to_Num(categorical_features)

pekerjaan --> {'Buruh': 0, 'Guru/Dosen': 1, 'Pedagang': 2, 'Pekerja Lepas': 3, 'Wirausaha': 4}


In [9]:
data.head()

Unnamed: 0,usia,pinjaman,tenor,pemasukan,tanggungan,pekerjaan,pinjaman ke-,telat bayar,donasi
0,25,500000,20,4800000,0,0,4,0,8
1,70,3000000,4,1500000,5,2,1,6,0
2,95,1600000,4,1500000,3,2,3,3,6
3,37,1000000,12,5000000,3,3,2,1,5
4,26,1900000,6,4400000,2,1,2,4,1


In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   usia          10000 non-null  int64
 1   pinjaman      10000 non-null  int64
 2   tenor         10000 non-null  int64
 3   pemasukan     10000 non-null  int64
 4   tanggungan    10000 non-null  int64
 5   pekerjaan     10000 non-null  int64
 6   pinjaman ke-  10000 non-null  int64
 7   telat bayar   10000 non-null  int64
 8   donasi        10000 non-null  int64
dtypes: int64(9)
memory usage: 703.2 KB


### Convert Dataframe to List

In [11]:
x1 = data["usia"].tolist()
x2 = data["pinjaman"].tolist()
x3 = data["tenor"].tolist()
x4 = data["pemasukan"].tolist()
x5 = data["tanggungan"].tolist()
x6 = data["pekerjaan"].tolist()
x7 = data["pinjaman ke-"].tolist()
x8 = data["telat bayar"].tolist()
x9 = data["donasi"].tolist()

### Adjust value to the predefined category

In [12]:
# X10 - Kondisi Ekonomi (econ)
x10 = []
for i in range(0,10000):
# for i in range(0,10):
    x = x4[i] - ((x2[i]/(x3[i]/4)) +150000*(x5[i]+2))
    if x >= 2000000:
        x10.append(5)
    elif 1500000 <= x < 2000000:
        x10.append(4)
    elif 1000000 <= x < 1500000:
        x10.append(3)
    elif 500000 <= x < 1000000:
        x10.append(2)
    elif 100000 <= x < 500000:
        x10.append(1)
    else:
        x10.append(0)
print(len(x10))
        

10000


In [13]:
# X11 - Usia
x11 = []
for i in range(0,10000):
# for i in range(0,10):
    if x1[i] > 64:
        x11.append(1)
    else:
        x11.append(2)
print(len(x11))

10000


In [14]:
# X12 - Pekerjaan
x12 = []
for i in range(0,10000):
# for i in range(0,10):
    if x6[i]  == 0 or x6[i]  == 1 :
        x12.append(2)
    else:
        x12.append(1)
print(len(x12))

10000


In [15]:
# X13 - Telat Bayar
x13 = []
for i in range(0,10000):
# for i in range(0,10):
    if x8[i] == 0:
        x13.append(0)
    elif x8[i] == 1:
        x13.append(1)
    elif 2 <= x8[i] <= 4:
        x13.append(2)
    else:
        x13.append(3)
print(len(x13))

10000


In [16]:
# X14 - Pinjaman Ke
x14 = []
for i in range(0,10000):
# for i in range(0,10):
    if x7[i] == 1:
        x14.append(1)
    elif x7[i] == 2 or x7[i] == 3 :
        x14.append(2)
    else:
        x14.append(3)
print(len(x14))

10000


In [17]:
# X15 - Donasi
x15 = []
for i in range(0,10000):
# for i in range(0,10):
    if x9[i] == 0:
        x15.append(0)
    elif x9[i] >= 1 and x9[i] <= 3 :
        x15.append(1)
    elif x9[i] > 3 and x9[i] <= 5 :
        x15.append(2)
    else:
        x15.append(3)
print(max(x15))

3


In [18]:
print(len(x11))
print(len(x10))
print(len(x12))
print(len(x14))
print(len(x13))
print(len(x15))

10000
10000
10000
10000
10000
10000


## Step 3

*Create a scoring equation*

![](https://raw.githubusercontent.com/imamfirdaus-if/kasbaik/main/machine-learning/assets/scoring%20equation%20-%20step%203.png?token=GHSAT0AAAAAABSMKRBX4A4GHJZU4VIHWN2QYVIEHQQ)

### Determine Constant Value for Each Variable

In [19]:
print(f'konstanta x10: {0.4*(850/max(x10))}')
print(f'konstanta x11: {0.05*(850/max(x11))}')
print(f'konstanta x12: {0.15*(850/max(x12))}')
print(f'konstanta x14: {0.25*(850/max(x14))}')
print(f'konstanta x15: {0.15*(850/max(x15))}')

konstanta x10: 68.0
konstanta x11: 21.25
konstanta x12: 63.75
konstanta x14: 70.83333333333333
konstanta x15: 42.49999999999999


### Write down the Scoring Equation

In [38]:
y_list = []

for i in range(0,10000):
# for i in range(0,10):
    y = 68*x10[i] + 21.25*x11[i] + 63.75*x12[i] - 18.61*x13[i] + 70.833*x14[i] + 42.49*x15[i]
    # y = x11[i] + x10[i] + x12[i] + x7[i] - x13[i] + x9[i]
    y_list.append(y)    
   

print(max(y_list))
print(min(y_list))


849.969
100.003


## Save in DataFrame

In [40]:
datasets2 = pd.DataFrame(np.column_stack([x11, 
                                    x10,
                                    x12,
                                    x14,
                                    x13,
                                    x15,
                                    y_list
                                    ]), 
                               columns=['usia', 'econ' , 'pekerjaan',
                                        'pinjaman ke', 'telat bayar', 'donasi', 'score'
                               ])

datasets2.head()

Unnamed: 0,usia,econ,pekerjaan,pinjaman ke,telat bayar,donasi,score
0,2.0,5.0,2.0,3.0,0.0,3.0,849.969
1,1.0,0.0,1.0,1.0,3.0,0.0,100.003
2,1.0,0.0,1.0,2.0,2.0,3.0,316.916
3,2.0,5.0,1.0,2.0,1.0,2.0,654.286
4,2.0,5.0,2.0,2.0,2.0,1.0,656.936


In [41]:
datasets2.to_excel('./Dummy Datasets FICO Scoring - V4 - Final.xlsx')

In [42]:
datasets2.to_csv('./Dummy Datasets FICO Scoring - V4 - Final.csv')