# CLIENT CREDIT SCORE EVALUATION WITH AI 

This project aims to create an AI model that evaluates the credit score of customers of a payment institution that aims to become a full-fledged financial institution. This tool will provide essential data for the feasibility analysis of the transformation operation.

As the company doesn't have data about behaviors related to credit from their customers, because it's a payment institution and it can't grant credit to their current clients, we're picking up a dataframe that contains information about the clients of the PI that are also clients of a bank from the same corporate holding. 

Their behavior as customers of the bank should help us to predict their score when the PI becomes a true financial institution. 

Let's get it to our analysis:

In [1]:
import pandas as pd 

df=pd.read_csv('clientes.csv')

display(df)

Unnamed: 0,id_cliente,mes,idade,profissao,salario_anual,num_contas,num_cartoes,juros_emprestimo,num_emprestimos,dias_atraso,...,idade_historico_credito,investimento_mensal,comportamento_pagamento,saldo_final_mes,score_credito,emprestimo_carro,emprestimo_casa,emprestimo_pessoal,emprestimo_credito,emprestimo_estudantil
0,3392,1,23.0,cientista,19114.12,3.0,4.0,3.0,4.0,3.0,...,265.0,21.465380,alto_gasto_pagamento_baixos,312.494089,Good,1,1,1,1,0
1,3392,2,23.0,cientista,19114.12,3.0,4.0,3.0,4.0,3.0,...,266.0,21.465380,baixo_gasto_pagamento_alto,284.629162,Good,1,1,1,1,0
2,3392,3,23.0,cientista,19114.12,3.0,4.0,3.0,4.0,3.0,...,267.0,21.465380,baixo_gasto_pagamento_medio,331.209863,Good,1,1,1,1,0
3,3392,4,23.0,cientista,19114.12,3.0,4.0,3.0,4.0,5.0,...,268.0,21.465380,baixo_gasto_pagamento_baixo,223.451310,Good,1,1,1,1,0
4,3392,5,23.0,cientista,19114.12,3.0,4.0,3.0,4.0,6.0,...,269.0,21.465380,alto_gasto_pagamento_medio,341.489231,Good,1,1,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,37932,4,25.0,mecanico,39628.99,4.0,6.0,7.0,2.0,23.0,...,378.0,24.028477,alto_gasto_pagamento_alto,479.866228,Poor,1,0,0,0,1
99996,37932,5,25.0,mecanico,39628.99,4.0,6.0,7.0,2.0,18.0,...,379.0,24.028477,alto_gasto_pagamento_medio,496.651610,Poor,1,0,0,0,1
99997,37932,6,25.0,mecanico,39628.99,4.0,6.0,7.0,2.0,27.0,...,380.0,24.028477,alto_gasto_pagamento_alto,516.809083,Poor,1,0,0,0,1
99998,37932,7,25.0,mecanico,39628.99,4.0,6.0,7.0,2.0,20.0,...,381.0,24.028477,baixo_gasto_pagamento_alto,319.164979,Standard,1,0,0,0,1


We can see that these 100.000 customers have already been evaluated with a credit score, so it will be more easy to train our AI model based on these data.

### Understanding the Columns

Now let's see what each column of our dataframe means:

1. id_cliente: An ID number that is given to a customer when registered on the bank system
2. mes: Number of months the person has been a bank's customer
3. idade: Customer's Age
4. profissao: Customer's profession
5. salario_anual: Customer's annual earnings
6. num_contas: Number of accounts held by the customer
7. num_cartoes: Number of credit cards held by the customer
8. juros_emprestimo: Interest rate on loans taken by the customer
9. num_emprestimos: Number of loans taken by the customer
10. dias_atraso: Number of days the customer is overdue on payments
11. num_pagamentos_atrasados: Number of payments overdue by the customer
12. num_verificacoes_credito: Number of credit checks performed on the customer
13. mix_credito: Variety of credit accounts held by the customer
14. divida_total: Total debt owed by the customer
15. taxa_uso_credito: Credit utilization rate of the customer
16. idade_historico_credito: Length of credit history of the customer
17. investimento_mensal: Monthly investment made by the customer
18. comportamento_pagamento: Payment behavior of the customer
19. saldo_final_mes: Final balance at the end of the month for the customer
20. score_credito: Credit score of the customer
21. emprestimo_casa: Number of home loans taken by the customer
22. emprestimo_pessoal: Number of personal loans taken by the customer
23. emprestimo_credito: Number of credit loans taken by the customer
24. emprestimo_estudantil: Number of student loans taken by the customer

As our data frame has too much information to analyze one by one and we already have access to the database about the PI customers from the other bank, we can train our own AI Model to select the correct credit score.

## Processing our Data

First, we need to see if it's necessary to remove some lines our values from our dataframe:

In [2]:
display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 25 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   id_cliente                100000 non-null  int64  
 1   mes                       100000 non-null  int64  
 2   idade                     100000 non-null  float64
 3   profissao                 100000 non-null  object 
 4   salario_anual             100000 non-null  float64
 5   num_contas                100000 non-null  float64
 6   num_cartoes               100000 non-null  float64
 7   juros_emprestimo          100000 non-null  float64
 8   num_emprestimos           100000 non-null  float64
 9   dias_atraso               100000 non-null  float64
 10  num_pagamentos_atrasados  100000 non-null  float64
 11  num_verificacoes_credito  100000 non-null  float64
 12  mix_credito               100000 non-null  object 
 13  divida_total              100000 non-null  fl

None

There are no empty values on our database, so it's not necessary to remove lines from our dataframe. But we can see in the DataType of our columns that there are some columns fullfilled with strings, and our AI Model is not able to work with pure text, just with numbers. So, we'll need to convert this strings into numbers.