### Objective:
The objective of this assignment is to read, perform Exploratory Data Analysis, data cleaning

We use the German Credit Risk dataset to answer the questions given in this notebook.

### German Credit Risk Data

**About dataset**\
The dataset consists of following columns
1. **checking_balance**           : Amount of money available in account of customers
2. **months_loan_duration**       : Duration since loan taken
3. **credit_history**             : credit history of each customers 
4. **purpose**                    : Purpose why loan has been taken
5. **amount**                     : Amount of loan taken
6. **savings_balance**            : Balance in account
7. **employment_duration**        : Duration of employment
8. **percent_of_income**          : Percentage of monthly income
9. **years_at_residence**         : Duration of current residence
10. **age**                       : Age of customer
11. **other_credit**              : Any other credits taken
12. **housing**                   : Type of housing, rent or own
13. **existing_loans_count**      : Existing count of loans
14. **job**                       : Job type
15. **dependents**                : Any dependents on customer
16. **phone**                     : Having phone or not
17. **default**                   : Default status (Target column)

#### Install Libraries

In [17]:
#install the libraries
import pandas as pd
import numpy as np
import matplotlib as plot

# Label Encoding
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import ColumnTransformer

#### Read German Credit Risk Dataset and load it into dataframe

In [44]:
def read_data(file_path):
    # Load the dataframe 
    df = pd.read_csv(file_path)
    #df = pd.read_csv('C:\Jasmine\GreatLearning\ML\project\GermanBankLoan\german-bank-loan-defaults\src\data\credit.csv')
    return df

In [27]:
#df = read_data('C:\Jasmine\GreatLearning\ML\project\GermanBankLoan\german-bank-loan-defaults\src\data\credit.csv')

#### Explore Data : Exploratory Data Analysis

In [40]:
def explore_data(df):
    
    # summary for numeric variables
    numeric_columns = df.select_dtypes(include=['number']).columns
    numeric_summary = df[numeric_columns].describe()
    
    # Calculate value_counts for categorical columns
    categorical_value_counts = {}
    categorical_columns = df.select_dtypes(include=['object']).columns
    for column in categorical_columns:
        value_counts = df[column].value_counts()
        categorical_value_counts[column] = value_counts
        
    summary = {'shape':df.shape, 'types': df.dtypes, 'numeric_summary': numeric_summary ,'categorical_summary': categorical_value_counts}
    return summary

In [41]:
#summary = explore_data(df)
#print(summary)

{'shape': (1000, 17), 'types': checking_balance        object
months_loan_duration     int64
credit_history          object
purpose                 object
amount                   int64
savings_balance         object
employment_duration     object
percent_of_income        int64
years_at_residence       int64
age                      int64
other_credit            object
housing                 object
existing_loans_count     int64
job                     object
dependents               int64
phone                   object
default                 object
dtype: object, 'numeric_summary':        months_loan_duration        amount  percent_of_income   
count           1000.000000   1000.000000        1000.000000  \
mean              20.903000   3271.258000           2.973000   
std               12.058814   2822.736876           1.118715   
min                4.000000    250.000000           1.000000   
25%               12.000000   1365.500000           2.000000   
50%               18.000

In [42]:
def clean_data(df):
    df.replace("unknown", np.nan, inplace=True)
    return df
    

In [43]:
#clean_data(df)

Unnamed: 0,checking_balance,months_loan_duration,credit_history,purpose,amount,savings_balance,employment_duration,percent_of_income,years_at_residence,age,other_credit,housing,existing_loans_count,job,dependents,phone,default
0,< 0 DM,6,critical,furniture/appliances,1169,,> 7 years,4,4,67,none,own,2,skilled,1,yes,no
1,1 - 200 DM,48,good,furniture/appliances,5951,< 100 DM,1 - 4 years,2,2,22,none,own,1,skilled,1,no,yes
2,,12,critical,education,2096,< 100 DM,4 - 7 years,2,3,49,none,own,1,unskilled,2,no,no
3,< 0 DM,42,good,furniture/appliances,7882,< 100 DM,4 - 7 years,2,4,45,none,other,1,skilled,2,no,no
4,< 0 DM,24,poor,car,4870,< 100 DM,1 - 4 years,3,4,53,none,other,2,skilled,2,no,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,,12,good,furniture/appliances,1736,< 100 DM,4 - 7 years,3,4,31,none,own,1,unskilled,1,no,no
996,< 0 DM,30,good,car,3857,< 100 DM,1 - 4 years,4,4,40,none,own,1,management,1,yes,no
997,,12,good,furniture/appliances,804,< 100 DM,> 7 years,4,4,38,none,own,1,skilled,1,no,no
998,< 0 DM,45,good,furniture/appliances,1845,< 100 DM,1 - 4 years,4,4,23,none,other,1,skilled,1,yes,yes
