<p style="background-color:#FDFEFE; font-family:arial; color:#09042b; font-size:350%; text-align:center; border-radius:10px 10px;"></p>

<p style="background-color:#FDFEFE; font-family:arial; color:#09042b; font-size:400%; text-align:center; border-radius:10px 10px;"> Credit Score Friendly</p>

<p style="background-color:#FDFEFE; font-family:arial; color:#09042b; font-size:350%; text-align:center; border-radius:10px 10px;"> Exploratory Data Analysis Part 2 </p>


<img src="https://media.istockphoto.com/photos/credit-score-concept-picture-id1333701057?k=20&m=1333701057&s=170667a&w=0&h=wPQona6Oa_kwNj-NWz73qeHA0JErXzIyfy_z05Ze7yE=" align="center"/>

<a id="toc"></a>

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>


* [Handling With Outliers](#6)
* [Final Evaluation of Data via Graphs After Handling With Outliers](#7)
* [Other Specific Analysis Questions](#8)
* [Final Step to make ready dataset for ML Models](#9)
* [The End of the Project](#10)

In [None]:
# import data analysis and visualisation libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import patches
import seaborn as sns

# import warnings to suppress warnings
import warnings
warnings.filterwarnings("ignore")

# Statistics functions
from scipy.stats import norm
from scipy import stats
from scipy.stats import chi2_contingency
from scipy.stats import chi2

# Changing the figure size of a seaborn axes 
sns.set(rc={"figure.figsize": (10, 6)})

# The style parameters control properties
sns.set_style("whitegrid")

# To display maximum columns
pd.set_option('display.max_columns', None)

# To display maximum rows
pd.set_option('display.max_rows', 100)

# To set float format
pd.set_option('display.float_format','{:.2f}'.format)

**As we have compeleted exploring, cleaning the data, handling with missing values and handling with outliers for numerical features in the previous notebook (https://www.kaggle.com/code/lknurzelik/credit-score-friendly-exploratory-data-analysis-1/notebook), we will work on handling with outliers for categorical features and perform final evaluation and some spesific analysis of the data in this notebook.**

In [None]:
df= pd.read_csv("../input/credit-score-cleaned/df_cleaned.csv") # reading the df_cleaned.csv

In [None]:
# Creating a copy from df named df_copy

df_copy = df.copy()

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">Handling With Outliers</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:#d0fc08; background-color:#262222" data-toggle="popover">Content</a>

## Handling with outliers for categorical features

In [None]:
df_categorical = df[['Occupation', 'Credit_Mix', 'Payment_of_Min_Amount', 'Credit_Score']]

In [None]:
# checking the descriptive values of categorical features

df_categorical.describe()

### Countplot for categorical features

In [None]:
fig, axes = plt.subplots(len(df_categorical.columns ), 1, figsize=(10, 24))

for i, ax in enumerate(fig.axes):
    # plot barplot of each feature
    if i < len(df_categorical.columns):
        ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90)
        g = sns.countplot(x=df_categorical.columns[i], hue=df_categorical.Credit_Score, data=df_categorical, ax=ax, palette = "Set1")
        for i in ax.containers:
            g.bar_label(i)
fig.tight_layout();

### Payment of Min_Amount feature

In [None]:
# Checking the value counts of Payment_of_Min_Amount

df.Payment_of_Min_Amount.value_counts()

In [None]:
# Replacing the "NM" with np.nan
df.Payment_of_Min_Amount.replace("NM", np.nan, inplace=True)

In [None]:
df.Payment_of_Min_Amount.value_counts(dropna=False)

In [None]:
# Checking the value counts of Payment_of_Min_Amount column by grouping Customer_ID

df.groupby("Customer_ID")["Payment_of_Min_Amount"].value_counts(dropna=False).head()

In [None]:
# Filling the null values in Payment_of_Min_Amount column with ffill and bfill method

df["Payment_of_Min_Amount"] = df.groupby("Customer_ID")["Payment_of_Min_Amount"].fillna(method="ffill").fillna(method="bfill")

In [None]:
df.Payment_of_Min_Amount.value_counts(dropna=False)

In [None]:
# countplot of Payment_of_Min_Amount column for different Credit_Scores

ax = sns.countplot(x=df.Payment_of_Min_Amount, hue=df.Credit_Score)
for i in ax.containers:
    ax.bar_label(i);

## Can we check the correlation between Credit_Mix and Credit_Score columns using Chi square test?

In machine learning, correlation tests can be used for feature selection. In classification problems where the output variable is categorical and input variables are also categorical, a chi-squared test can be used to know if the input variables are even relevant to the output variable. Therefore we will use chi-squared test to find the relation between Credit_Mix and Credit_Score features.

* Null hypothesis H₀: whether Credit_Mix and Credit_Score are independent
* Alternative hypothesis H₁: whether Credit_Mix and Credit_Score are dependent
* α = 0.05

In [None]:
# create contingency table
data_crosstab = pd.crosstab(df['Credit_Mix'],
                            df['Credit_Score'],
                           margins=True, margins_name="Total")

# significance level
alpha = 0.05

# Calcualtion of Chisquare
chi_square = 0
rows = df['Credit_Mix'].unique()
columns = df['Credit_Score'].unique()
for i in columns:
    for j in rows:
        O = data_crosstab[i][j]
        E = data_crosstab[i]['Total'] * data_crosstab['Total'][j] / data_crosstab['Total']['Total']
        chi_square += (O-E)**2/E

# The p-value approach
print("Approach 1: The p-value approach to hypothesis testing in the decision rule")
p_value = 1 - stats.chi2.cdf(chi_square, (len(rows)-1)*(len(columns)-1))
conclusion = "Failed to reject the null hypothesis."
if p_value <= alpha:
    conclusion = "Null Hypothesis is rejected."
        
print("chisquare-score is:", chi_square, " and p value is:", p_value)
print(conclusion)
    
# The critical value approach
print("\n--------------------------------------------------------------------------------------")
print("Approach 2: The critical value approach to hypothesis testing in the decision rule")
critical_value = stats.chi2.ppf(1-alpha, (len(rows)-1)*(len(columns)-1))
conclusion = "Failed to reject the null hypothesis."
if chi_square > critical_value:
    conclusion = "Null Hypothesis is rejected."
        
print("chisquare-score is:", chi_square, " and critical value is:", critical_value)
print(conclusion)

After performing the Chi-squared test for Credit_Mix and Credit_Score, we see that there is strong co-dependency for each of these variables. Since highly dependent/correlated variables do not add much relevant new information with regards to the value of the target feature, we will drop Credit_Mix feature from the dataset.

In [None]:
df.drop(columns="Credit_Mix", inplace=True)

## Correlation between numerical features

In [None]:
#Checking the correlation between numerical features
df.corr()

In [None]:
# Checking the correlation between numerical features by grouping Credit_Score

df.drop(columns=["ID","Customer_ID","Age"]).groupby("Credit_Score").corr()

In [None]:
# Heatmap for the correlation between numerical features for all Credit_Score

plt.figure(figsize=(14,14),dpi=200)
sns.heatmap(df.drop(columns=["ID","Customer_ID","Age"]).corr(), square=True, annot=True, fmt=".2f");

In [None]:
# Heatmap for the correlation between numerical features for Poor Credit_Score

plt.figure(figsize=(14,14),dpi=200)
sns.heatmap(df.drop(columns=["ID","Customer_ID","Age"])[df.Credit_Score=="Poor"].corr(), square=True, annot=True, fmt=".2f");
plt.title("Credit_Score : Poor", fontsize=20);

In [None]:
# Heatmap for the correlation between numerical features for Standard Credit_Score

plt.figure(figsize=(14,14),dpi=200)
sns.heatmap(df.drop(columns=["ID","Customer_ID","Age"])[df.Credit_Score=="Standard"].corr(), square=True, annot=True, fmt=".2f");
plt.title("Credit_Score : Standart", fontsize=20);

In [None]:
# Heatmap for the correlation between numerical features for Good Credit_Score

plt.figure(figsize=(14,14),dpi=200)
sns.heatmap(df.drop(columns=["ID","Customer_ID","Age"])[df.Credit_Score=="Good"].corr(), square=True, annot=True, fmt=".2f");
plt.title("Credit_Score : Good", fontsize=20);

### Conclusion about correlation between numerical features

When we compared the correlation between numerical features for different Credit_Scores, it can be concluded that;

* There are strongly positive correlation between Annual_Income and Monthly_Inhand_Salary for all Credit_Score values. Therefore one of them can be dropped for further analysis.
* The correlation between Num_Bank_Accounts, Num_Credit_Card, Interest_Rate, Num_of_Loan, Delay_from_due_date, Num_of_Delayed_Payment, Num_Credit_Inquiries, Outstanding_Debt and Annual_Income/Monthly_Inhand_Salary is weakly negative. Also it decreases with changing Credit_Score from Poor to Good.
* The correlation between Credit_Utilization_Ratio and Annual_Income/Monthly_Inhand_Salary is weakly positive and it increases with changing Credit_Score from Poor to Good.
* The correlation between Credit_History_Age and Annual_Income/Monthly_Inhand_Salary is weakly positive and it decreases with changing Credit_Score from Poor to Good.
* The correlation between Total_EMI_per_month and Annual_Income/Monthly_Inhand_Salary is moderately positive and it decreases with changing Credit_Score from Poor to Good.
* The correlation between Amount_invested_monthly and Annual_Income is moderately positive and it is same for the Credit_Score Poor and Standar, but it is decreasing in Good Credit_Score.
* The correlation between Amount_invested_monthly and Monthly_Inhand_Salary is moderately positive and it is almost the same for the Poor, Standar and Good Credit_Scores.
* The correlation between Monthly_Balance and Annual_Income/Monthly_Inhand_Salary is moderately positive and it is decreasing from the Poor to Good Credit_Scores.
* The correlation between Num_Credit_Card, Interest_Rate, Num_of_Loan, Delay_from_due_date, Num_of_Delayed_Payment, Num_Credit_Inquiries, Outstanding_Debt and Num_Bank_Accounts is moderately positive and it is decreasing from the Poor to Good Credit_Scores.
* The correlation between Interest_Rate, Num_of_Loan, Delay_from_due_date, Num_of_Delayed_Payment, Num_Credit_Inquiries, Outstanding_Debt and Num_Credit_Card is moderately positive and it is increasing from the Poor to Standart Credit_Score but decreasing from Standart to Good Credit_Score.
* The correlation between Num_Bank_Accounts, Num_of_Delayed_Payment, Num_Credit_Inquiries and Interest_Rate  is moderately positive and decreasing from Poor to Good Credit_Score.
* The correlation between Outstanding_Debt and Interest_Rate is moderately positive and decreasing from Poor to Good Credit_Score. Although it increases slightly when Credit_Score changes from Poor to Standard, it decreases significantly when Credit_Score changes from Standard to Good.
* The correlation between Num_Bank_Accounts, Num_Credit_Card, Interest_Rate, Delay_from_due_date, Num_of_Delayed_Payment, Changed_Credit_Limit, Num_Credit_Inquiries, Outstanding_Debt and Num_of_Loan are moderately positive and it decreases when Credit_Score changes from Standart to Good.
* The correlation between Delay_from_due_date, Num_of_Delayed_Payment and Num_Bank_Accounts is moderately positive and it is decreases when Credit_Score changes from Poor to Good.
* The correlation between Delay_from_due_date and Num_of_Delayed_Payment is moderately positive and it is almost the same for all Credit_Scores.
* The correlation between Num_of_Loan, Delay_from_due_date, Num_of_Delayed_Payment and Outstanding_Debt is moderately positive and it decreases when Credit_Score changes from Poor to Good.
* The correlation between Num_of_Loan and Total_EMI_per_month is moderately positive and it increases when Credit_Score changes from Poor to Good.
* There are moderately positive correlation between Annual_Income, Monthly_Inhand_Salary and Total_EMI_per_month and it decreases when Credit_Score changes from Poor to Good.
* There are moderately positive correlation between Annual_Income, Monthly_Inhand_Salary and Amount_invested_monthly and it is almost the same for all Credit_Scores.
* There are moderately positive correlation between Annual_Income, Monthly_Inhand_Salary and Monthly_Balance and it decreases when Credit_Score changes from Poor to Good.

In [None]:
df.drop(columns="Monthly_Inhand_Salary", inplace=True)

## Dropping unnecessary features

**Since ID, Customer_ID, Month, Name, SSN columns will not add much relevant new information with regards to the value of target feature, we will drop these features from the dataset.**

In [None]:
df.drop(columns=["ID", "Customer_ID", "Month", "Name", "SSN"], inplace=True)

In [None]:
df.shape

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">Final Evaluation of Data via Graphs After Handling With Outliers</p>

<a id="7"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:#d0fc08; background-color:#262222" data-toggle="popover">Content</a>

## Let's check the distribution of numerical features

In [None]:
df_numeric = df.select_dtypes(exclude="O")

In [None]:
fig_ = df_numeric.hist(figsize=(12, 36), layout=(10,2), bins=30, edgecolor="black");

## Let's check the boxplots for numerical features for different Credit Scores

In [None]:
fig = plt.figure(figsize=(10,36), dpi=200)
for i, col in enumerate(df.select_dtypes(exclude="O").columns):
        plt.subplot(10,2,i+1)
        sns.boxplot(x= "Credit_Score", y=col, data=df,
                    showmeans=True,
                    meanprops={"marker":"o",
                       "markerfacecolor":"white", 
                       "markeredgecolor":"black",
                      "markersize":"10"})
plt.tight_layout();

**Conclusion about boxplots:** Although we have worked on outliers, when we examined on boxplots it seems that we still have extreme values in some features. So, how can we check that?

**``Boxplots``** are a great way to summarize the distribution of a dataset. But they become increasingly inaccurate when the size of a dataset grows. Therefore, **``Letter-Value Plots``** (or boxenplots) have been developed to overcome the problem of an inaccurate representation of outliers in boxplots.

## Let's check the boxenplots for munerical features for different Credit Scores

In [None]:
fig = plt.figure(figsize=(10,36), dpi=200)
for i, col in enumerate(df.select_dtypes(exclude="O").columns):
        plt.subplot(10,2,i+1)
        sns.boxenplot(x= "Credit_Score", y=col, data=df)
plt.tight_layout();

**Conclusion about boxenplots:**

We can see that the Boxenplot gives us much more information about the tails of our dataset’s distribution. In the boxplot above, we can’t tell what the data looks like beyond some points of numerical features:

For example, are outstanding debts greater than around 2500 extreme values/outliers❓ 🤔

According to boxplot and within the whiskers, it’s quite hard to grasp what’s going on as well. There’s a pretty big gap between the 75th percentile and maximum value of the Outstanding_Debt.

According to boxplot and within the whiskers, it seems that they are extreme values and some of them are candidates for being outliers. However, it’s quite hard to grasp and decide what they are exactly. There’s a pretty big gap between the 75th percentile and the maximum value.

With respect to Outstanding_Debt, the boxenplot, on the other hand, provides more insights in how the data is distributed beyond the quantiles. Contrary to the output of box plot, it can be assumed that there have been no extreme values.

To wrap up, interpreting boxenplots can be more straightforward. The concept of thicker boxes representing a bigger part of the total population is easier to comprehend and facilitates discussions.

## Let's check the counts of categorical features for different Credit Scores

In [None]:
df_categorical = df.select_dtypes(include="O")

In [None]:
fig, axes = plt.subplots(len(df_categorical.columns ), 1, figsize=(8, 16))

for i, ax in enumerate(fig.axes):
    # plot barplot of each feature
    if i < len(df_categorical.columns):
        ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90)
        g = sns.countplot(x=df_categorical.columns[i], hue=df_categorical.Credit_Score, data=df_categorical, ax=ax, palette = "Set1")
        for i in ax.containers:
            g.bar_label(i)
fig.tight_layout();

## Conclusion about features for different Credit_Scores

* When we compare age values for different Credit_Scores, it can be seen that the credit score improves as the mean and median values of age increase.
* The most of the Customers in the dataset have greater Annual_Income value for Standart and Good credit scores and the credit score improves as the mean/median values of Annual_Income increase.
* The mean/median values of Num_Bank_Accounts, Num_Credit_Card, Interest_Rate, Num_of_Loan, Delay_from_due_date, Num_of_Delayed_Payment, Changed_Credit_Limit, Num_Credit_Inquiries, Outstanding_Debt and Total_EMI_per_month features increase as the credit scores change from Good to Poor.
* The mean/median values of Credit_Utilization_Ratio are almost the same for all credit scores.
* The mean/median values of Credit_History_Age decrease as the credit scores change from Good to Poor.
* The mean/median values of Total_EMI_per_month slightly increase as the credit scores change from Good to Poor.
* The mean/median values of Amount_invested_monthly and Monthly_Balance slightly decrease as the credit scores change from Good to Poor.
* The number of customer distributed is almost equally for each occupation in different credit scores. At the good credit score the number of customers is lowest although it is highest at the standart credit score.
* The number of customers having Payment_of_Min_Amount is higher at Poor and Standart credit scores while the number of customers with No Payment_of_Min_Amount is higher at Good credit score.

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">Other Specific Analysis Questions</p>

<a id="8"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:#d0fc08; background-color:#262222" data-toggle="popover">Content</a>

### 1. What is the average age and credit history age of customers by credit score?

In [None]:
df.groupby("Credit_Score")[["Age","Credit_History_Age"]].mean().sort_values(by="Age")

In [None]:
ax = df.groupby("Credit_Score")[["Age","Credit_History_Age"]].mean().sort_values(by="Age").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Age and Credit_History_Age features by Credit_Score:**

When we compare the Age and Credit_History_Age features for different credit scores, it can be concluded that both of them are lower for Poor credit score. Also credit score improves with the increase in age and credit history age.

### 2. Is there any affect of occupatipn on credit score?

In [None]:
ax = sns.countplot(x=df.Occupation, hue=df.Credit_Score, data=df, palette = "tab10")
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90)
ax.legend(bbox_to_anchor=(1.02, 0.6))

for i in ax.containers:
    ax.bar_label(i);


**Conclusion about the effect of occupatipn on credit score:**

As it is seen from the chart above there isn't any significant effect of occupation on credit score.

### 3. What is the average Annual_Income of customers by credit score?

In [None]:
df.groupby("Credit_Score")[["Annual_Income"]].mean().sort_values(by="Annual_Income")

In [None]:
ax = df.groupby("Credit_Score")[["Annual_Income"]].mean().sort_values(by="Annual_Income").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Annual_Income by credit score:**

As it is seen from the chart above, credit score improves by increasing the Annual_Income level.

### 4. What is the effect of Num_Credit_Inquiries, Num_Bank_Accounts, Num_Credit_Card and Num_of_Loan of customers on credit score?

In [None]:
df.groupby("Credit_Score")[["Num_Credit_Inquiries", "Num_Bank_Accounts", "Num_Credit_Card", "Num_of_Loan"]].mean().sort_values(by="Num_Bank_Accounts")

In [None]:
ax = df.groupby("Credit_Score")[["Num_Credit_Inquiries", "Num_Bank_Accounts", "Num_Credit_Card", "Num_of_Loan"]].mean().sort_values(by="Num_Bank_Accounts", ascending=False).plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);


**Conclusion about Num_Credit_Inquiries, Num_Bank_Accounts, Num_Credit_Card and Num_of_Loan by credit score:**

As it is seen from the chart above, the credit score improves as the number credit inquiries, number of bank accounts, number of credit cards and number of loans decreases.

### 5. What is the effect of Delay_from_due_date and  Num_of_Delayed_Payment of customers on credit score?

In [None]:
df.groupby("Credit_Score")[["Delay_from_due_date", "Num_of_Delayed_Payment"]].mean().sort_values(by="Num_of_Delayed_Payment")

In [None]:
ax = df.groupby("Credit_Score")[["Delay_from_due_date", "Num_of_Delayed_Payment"]].mean().sort_values(by="Num_of_Delayed_Payment").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Delay_from_due_date and Num_of_Delayed_Payment by credit score:**

As it is seen from the chart above, the credit score gets worse as the delay from due date and number of delayed payment increases.

### 6. What is the effect of Interest_Rate, Changed_Credit_Limit  and  Credit_Utilization_Ratio of customers on credit score?

In [None]:
df.groupby("Credit_Score")[["Interest_Rate", "Changed_Credit_Limit", "Credit_Utilization_Ratio"]].mean().sort_values(by="Interest_Rate")

In [None]:
ax = df.groupby("Credit_Score")[["Interest_Rate", "Changed_Credit_Limit", "Credit_Utilization_Ratio"]].mean().sort_values(by="Interest_Rate").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Interest_Rate, Changed_Credit_Limit and Credit_Utilization_Ratio by credit score:**

As it is seen from the graph above, the credit score gets worse as the interest rate and changed credit limit increases. Credit utilization ratio is almost the same for all credit scores.

### 7. What is the effect of Outstanding_Debt of customers on credit score?

In [None]:
df.groupby("Credit_Score")[["Outstanding_Debt"]].mean().sort_values(by="Outstanding_Debt")

In [None]:
ax = df.groupby("Credit_Score")[["Outstanding_Debt"]].mean().sort_values(by="Outstanding_Debt").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Outstanding_Debt by credit score:**

As it is seen from the graph above, the credit score gets worse as the outstanding debt increases.

### 8. What is the effect of average Total_EMI_per_month,  Amount_invested_monthly and Monthly_Balance of customers on credit score?

In [None]:
df.groupby("Credit_Score")[["Total_EMI_per_month", "Amount_invested_monthly", "Monthly_Balance"]].mean().sort_values(by="Total_EMI_per_month")

In [None]:
ax = df.groupby("Credit_Score")[["Total_EMI_per_month", "Amount_invested_monthly", "Monthly_Balance"]].mean().sort_values(by="Total_EMI_per_month").plot.bar()
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.1f", fontsize=12);

**Conclusion about Total_EMI_per_month, Amount_invested_monthly and Monthly_Balance by credit score:**

As it is seen from the graph above, the credit score gets worse as the total EMI per month and amount invested monthly increases while the credit score improves as the monthly balance increases.

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">Final Step to Make Ready Dataset for ML Models</p>

<a id="9"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:#d0fc08; background-color:#262222" data-toggle="popover">Content</a>

### Convert all features (except for Credit_Score) to numeric by using get_dummies function

In [None]:
df_dummy = pd.get_dummies(df.drop(columns="Credit_Score"), drop_first=True)
df_dummy

In [None]:
# Adding Credit_Score column to the df_dummy

df_dummy["Credit_Score"] = df["Credit_Score"]

In [None]:
df_dummy.head()

In [None]:
df_dummy.shape

In [None]:
df.shape

In [None]:
df_dummy.Credit_Score.value_counts(dropna=False)

## <p style="background-color:#262222; font-family:arial; color:#d0fc08; font-size:175%; text-align:center; border-radius:10px 10px;">The End of the Project</p>

<a id="10"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:#d0fc08; background-color:#262222" data-toggle="popover">Content</a>

In [None]:
# Saving the df_dummy to use it in machine learning algorithms

df_dummy.to_csv("credit_score_dummy.csv", index=False)