# Basic Statistics Case Study

## BUSINESS PROBLEM - 1

Using lending club loans data, the team would like to test below hypothesis on how different 
factors effecting each other (Hint: You may leverage hypothesis testing using statistical tests)

a. Interest rate is varied for different loan amounts (Less interest charged for high loan 
amounts)

b. Loan length is directly effecting interest rate.

c. Interest rate varies for different purpose of loans

d. There is relationship between FICO scores and Home Ownership. It means that, People 
with owning home will have high FICO scores.

### Import necessary libraries

In [23]:
import pandas as pd
import scipy.stats as stats

### Import the data set

In [21]:
loans = pd.read_csv('LoansData.csv')
loans.head()

Unnamed: 0,Amount.Requested,Amount.Funded.By.Investors,Interest.Rate,Loan.Length,Loan.Purpose,Debt.To.Income.Ratio,State,Home.Ownership,Monthly.Income,FICO.Range,Open.CREDIT.Lines,Revolving.CREDIT.Balance,Inquiries.in.the.Last.6.Months,Employment.Length
0,20000.0,20000.0,8.90%,36 months,debt_consolidation,14.90%,SC,MORTGAGE,6541.67,735-739,14.0,14272.0,2.0,< 1 year
1,19200.0,19200.0,12.12%,36 months,debt_consolidation,28.36%,TX,MORTGAGE,4583.33,715-719,12.0,11140.0,1.0,2 years
2,35000.0,35000.0,21.98%,60 months,debt_consolidation,23.81%,CA,MORTGAGE,11500.0,690-694,14.0,21977.0,1.0,2 years
3,10000.0,9975.0,9.99%,36 months,debt_consolidation,14.30%,KS,MORTGAGE,3833.33,695-699,10.0,9346.0,0.0,5 years
4,12000.0,12000.0,11.71%,36 months,credit_card,18.78%,NJ,RENT,3195.0,695-699,11.0,14469.0,0.0,9 years


### Exploratory Data Analysis

In [5]:
print(loans.shape)

(2500, 14)


In [6]:
loans.dtypes

Amount.Requested                  float64
Amount.Funded.By.Investors        float64
Interest.Rate                      object
Loan.Length                        object
Loan.Purpose                       object
Debt.To.Income.Ratio               object
State                              object
Home.Ownership                     object
Monthly.Income                    float64
FICO.Range                         object
Open.CREDIT.Lines                 float64
Revolving.CREDIT.Balance          float64
Inquiries.in.the.Last.6.Months    float64
Employment.Length                  object
dtype: object

In [22]:
loans['Interest.Rate'] = loans['Interest.Rate'].str.replace('%', '').astype(float)
loans['Loan.Length'] = loans['Loan.Length'].str.replace('months', '').astype(float)
loans['Debt.To.Income.Ratio'] = loans['Debt.To.Income.Ratio'].str.replace('%', '').astype(float)
loans.dtypes

Amount.Requested                  float64
Amount.Funded.By.Investors        float64
Interest.Rate                     float64
Loan.Length                       float64
Loan.Purpose                       object
Debt.To.Income.Ratio              float64
State                              object
Home.Ownership                     object
Monthly.Income                    float64
FICO.Range                         object
Open.CREDIT.Lines                 float64
Revolving.CREDIT.Balance          float64
Inquiries.in.the.Last.6.Months    float64
Employment.Length                  object
dtype: object

In [25]:
loans.isna().sum()

Amount.Requested                   1
Amount.Funded.By.Investors         1
Interest.Rate                      2
Loan.Length                        0
Loan.Purpose                       0
Debt.To.Income.Ratio               1
State                              0
Home.Ownership                     1
Monthly.Income                     1
FICO.Range                         2
Open.CREDIT.Lines                  3
Revolving.CREDIT.Balance           3
Inquiries.in.the.Last.6.Months     3
Employment.Length                 77
dtype: int64

### (a) Interest rate is varied for different loan amounts (Less interest charged for high loan amounts).

In [32]:
# H0: Interest rate and loan amounts are not correlated.
# Ha: Interest rate and loan amounts are correlated.

# Confidence Interval: 95% with p-value: 0.05

loans_ = loans[['Interest.Rate', 'Amount.Requested']].copy()
loans_.dropna(inplace = True)

# We will perform correlation test here,

corr, _ = stats.pearsonr(loans_['Interest.Rate'], loans_['Amount.Requested'])
print('Correlation coefficient:', corr)

if abs(corr >= 0.5):
    print('Reject Null Hypothesis. There is a strong correlation between Interest.Rate & Amount.Requested')
else:
    print('Retain Null Hypothesis. There is no correlation between Interest.Rate & Amount.Funded.By.Investor') 

Correlation coefficient: 0.3324537662008248
Retain Null Hypothesis. There is no correlation between Interest.Rate & Amount.Funded.By.Investor


### (b) Loan length is directly effecting interest rate.

In [26]:
loans

Unnamed: 0,Amount.Requested,Amount.Funded.By.Investors,Interest.Rate,Loan.Length,Loan.Purpose,Debt.To.Income.Ratio,State,Home.Ownership,Monthly.Income,FICO.Range,Open.CREDIT.Lines,Revolving.CREDIT.Balance,Inquiries.in.the.Last.6.Months,Employment.Length
0,20000.0,20000.00,8.90,36.0,debt_consolidation,14.90,SC,MORTGAGE,6541.67,735-739,14.0,14272.0,2.0,< 1 year
1,19200.0,19200.00,12.12,36.0,debt_consolidation,28.36,TX,MORTGAGE,4583.33,715-719,12.0,11140.0,1.0,2 years
2,35000.0,35000.00,21.98,60.0,debt_consolidation,23.81,CA,MORTGAGE,11500.00,690-694,14.0,21977.0,1.0,2 years
3,10000.0,9975.00,9.99,36.0,debt_consolidation,14.30,KS,MORTGAGE,3833.33,695-699,10.0,9346.0,0.0,5 years
4,12000.0,12000.00,11.71,36.0,credit_card,18.78,NJ,RENT,3195.00,695-699,11.0,14469.0,0.0,9 years
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2495,30000.0,29950.00,16.77,60.0,debt_consolidation,19.23,NY,MORTGAGE,9250.00,705-709,15.0,45880.0,1.0,8 years
2496,16000.0,16000.00,14.09,60.0,home_improvement,21.54,MD,OWN,8903.25,740-744,18.0,18898.0,1.0,10+ years
2497,10000.0,10000.00,13.99,36.0,debt_consolidation,4.89,PA,MORTGAGE,2166.67,680-684,4.0,4544.0,0.0,10+ years
2498,6000.0,6000.00,12.42,36.0,major_purchase,16.66,NJ,RENT,3500.00,675-679,8.0,7753.0,0.0,5 years
