### Statistical Loan Analysis for Risk Mitigation and Client Solvency
Statistical analysis serves as the compass guiding organizations towards this opportunity. It's the art of dissecting and interpreting data, employing rigorous quantitative methods to unearth valuable insights and patterns within the vast realm of financial information. Among the plethora of metrics used for risk assessment, one critical indicator stands out - the likelihood of loan repayment.

Predicting a client's creditworthiness is akin to wielding a key that unlocks a world of possibilities. It empowers financial institutions to tailor loan terms with precision, a measure that not only ensures loans are made on proper terms but also minimizes default rates, benefiting both clients and the institution. In essence, it's a pathway to not just optimizing lending practices but revolutionizing financial risk management as a whole.

Module 1 Task 1: Load the Loans data 

In [1]:
# import pandas
import pandas as pd

# Load the dataset
df = pd.read_csv('C:\\Users\\roopm\\Downloads\\Loans.csv')


#Inspect data
df

Unnamed: 0,ListingNumber,Term,LoanStatus,BorrowerRate,EstimatedEffectiveYield,EstimatedLoss,EstimatedReturn,ProsperRating (Alpha),Occupation,EmploymentStatus,IsBorrowerHomeowner,LoanOriginalAmount,MonthlyLoanPayment,Investors
0,193129,36,Completed,0.1580,,,,,Other,Self-employed,True,9425,330.43,258
1,1209647,36,Current,0.0920,0.07960,0.0249,0.05470,A,Professional,Employed,False,10000,318.93,1
2,81716,36,Completed,0.2750,,,,,Other,Not available,False,3001,123.32,41
3,658116,36,Current,0.0974,0.08490,0.0249,0.06000,A,Skilled Labor,Employed,True,10000,321.45,158
4,909464,36,Current,0.2085,0.18316,0.0925,0.09066,D,Executive,Employed,True,15000,563.97,20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113932,753087,36,Current,0.1864,0.16490,0.0699,0.09500,C,Food Service Management,Employed,True,10000,364.74,1
113933,537216,36,FinalPaymentInProgress,0.1110,0.10070,0.0200,0.08070,A,Professional,Employed,True,2000,65.57,22
113934,1069178,60,Current,0.2150,0.18828,0.1025,0.08578,D,Other,Employed,True,10000,273.35,119
113935,539056,60,Completed,0.2605,0.24450,0.0850,0.15950,C,Food Service,Full-time,True,15000,449.55,274


### Module 1 - Task 2
#### Description
Remove duplicate rows from the DataFrame 'df' to ensure data integrity.

In [2]:
# Finding Duplicates
duplicates = df.duplicated().sum()

# Displaying the total number of duplicate rows
print(f'Total number of duplicate rows: {duplicates}')
duplicates

Total number of duplicate rows: 871


871

Module 1 - Task 3 Removing Duplicate Rows.

Apply the drop_duplicates method to 'df' to remove duplicate rows.

In [3]:
# Remove duplicates rows
df.drop_duplicates()

# Display the updated DataFrame without duplicate rows
print("Updated DataFrame after removing duplicates:")
print(df)

Updated DataFrame after removing duplicates:
        ListingNumber  Term              LoanStatus  BorrowerRate  \
0              193129    36               Completed        0.1580   
1             1209647    36                 Current        0.0920   
2               81716    36               Completed        0.2750   
3              658116    36                 Current        0.0974   
4              909464    36                 Current        0.2085   
...               ...   ...                     ...           ...   
113932         753087    36                 Current        0.1864   
113933         537216    36  FinalPaymentInProgress        0.1110   
113934        1069178    60                 Current        0.2150   
113935         539056    60               Completed        0.2605   
113936        1140093    36                 Current        0.1039   

        EstimatedEffectiveYield  EstimatedLoss  EstimatedReturn  \
0                           NaN            NaN             

### Module 1 - Task 4
#### Description
Count null values in the DataFrame 'df' by applying the .isnull() method and using the .sum() method.


In [4]:
# Identify and mark null values
null_values_marked = df.isnull()

# Count the number of null values in each column
null_values = null_values_marked.sum()

# Display the count of null values in each column
print("Null Values in Each Column are:")
print(null_values)

Null Values in Each Column are:
ListingNumber                  0
Term                           0
LoanStatus                     0
BorrowerRate                   0
EstimatedEffectiveYield    29084
EstimatedLoss              29084
EstimatedReturn            29084
ProsperRating (Alpha)      29084
Occupation                  3588
EmploymentStatus            2255
IsBorrowerHomeowner            0
LoanOriginalAmount             0
MonthlyLoanPayment             0
Investors                      0
dtype: int64


### Module 1 - Task 5

Remove rows with null values from the DataFrame 'df' using the dropna method.

In [5]:
# Removing Rows with Null Values
df.dropna(inplace=True)

# Display the updated DataFrame after the removing rows with null values
print("DataFrame after removing rows with null values:")
print(df)

DataFrame after removing rows with null values:
        ListingNumber  Term              LoanStatus  BorrowerRate  \
1             1209647    36                 Current        0.0920   
3              658116    36                 Current        0.0974   
4              909464    36                 Current        0.2085   
5             1074836    60                 Current        0.1314   
6              750899    36                 Current        0.2712   
...               ...   ...                     ...           ...   
113932         753087    36                 Current        0.1864   
113933         537216    36  FinalPaymentInProgress        0.1110   
113934        1069178    60                 Current        0.2150   
113935         539056    60               Completed        0.2605   
113936        1140093    36                 Current        0.1039   

        EstimatedEffectiveYield  EstimatedLoss  EstimatedReturn  \
1                       0.07960         0.0249          

### Module 2
Task 1: Rename Columns for Clarity.

In [6]:
# Renaming Columns
namer = {
    "ListingNumber": "id",
    "Term": "duration",
    "LoanStatus": "status",
    "BorrowerRate": "rate",
    "EstimatedEffectiveYield": "yield",
    "EstimatedLoss": "loss",
    "EstimatedReturn": "return",
    "ProsperRating (Alpha)": "prosper",
    "Occupation": "occupation",
    "EmploymentStatus": "employment",
    "IsBorrowerHomeowner": 'home_owner',
    "LoanOriginalAmount": "loan_amount",
    "MonthlyLoanPayment": "payment",
    "Investors": "investors"
}

# Applying the rename method to 'df' with inplace=True
df.rename(columns=namer, inplace=True)

# Display the DataFrame with renamed columns
print("DataFrame with Renamed Columns:")
print(df)

DataFrame with Renamed Columns:
             id  duration                  status    rate    yield    loss  \
1       1209647        36                 Current  0.0920  0.07960  0.0249   
3        658116        36                 Current  0.0974  0.08490  0.0249   
4        909464        36                 Current  0.2085  0.18316  0.0925   
5       1074836        60                 Current  0.1314  0.11567  0.0449   
6        750899        36                 Current  0.2712  0.23820  0.1275   
...         ...       ...                     ...     ...      ...     ...   
113932   753087        36                 Current  0.1864  0.16490  0.0699   
113933   537216        36  FinalPaymentInProgress  0.1110  0.10070  0.0200   
113934  1069178        60                 Current  0.2150  0.18828  0.1025   
113935   539056        60               Completed  0.2605  0.24450  0.0850   
113936  1140093        36                 Current  0.1039  0.09071  0.0299   

         return prosper        

### Module 2 - Task 2 Description Solution

Converting Columns to Categorical Data Type.

Define a list called 'categories' that contains the column names to be converted to the categorical data type.
Loop through the 'categories' list to process each column specified in the list.

In [7]:
# Converting Columns to Categorical Data Type

# list of columns to be converted to categorical data type
categories = ['status', 'prosper', 'occupation', 'employment']

# Loop through the 'catgories' list to process each column
for column in categories:
    df[column] = df[column].astype('category')
    
# Display the DataFrame with converted categorical columns
print("DataFrame with Catgorical Columns:")
print(df)

DataFrame with Catgorical Columns:
             id  duration                  status    rate    yield    loss  \
1       1209647        36                 Current  0.0920  0.07960  0.0249   
3        658116        36                 Current  0.0974  0.08490  0.0249   
4        909464        36                 Current  0.2085  0.18316  0.0925   
5       1074836        60                 Current  0.1314  0.11567  0.0449   
6        750899        36                 Current  0.2712  0.23820  0.1275   
...         ...       ...                     ...     ...      ...     ...   
113932   753087        36                 Current  0.1864  0.16490  0.0699   
113933   537216        36  FinalPaymentInProgress  0.1110  0.10070  0.0200   
113934  1069178        60                 Current  0.2150  0.18828  0.1025   
113935   539056        60               Completed  0.2605  0.24450  0.0850   
113936  1140093        36                 Current  0.1039  0.09071  0.0299   

         return prosper     

### Module 2 - Task 3 

Export the Cleaned Pandas DataFrame to a CSV File.



In [8]:
# Exporting a Pandas DataFrame to a CSV File
csv_filename = 'loans_cleaned_data.csv'

# Save the DataFrame to a CSV file
df.to_csv(csv_filename, index=False)

print(f"DataFrame exported succesfully to {csv_filename}.")

DataFrame exported succesfully to loans_cleaned_data.csv.
