# Indexing Loans

Read the loan data in from the CSV file and utilize it to answer various questions about the individuls taking our 3 year loans. 

Instructions:

1. Using the Pandas `read_csv` function and the Path module, read in the `loans.csv` file and create the Pandas DataFrame. Review the first five rows of the resulting DataFrame.

2. Using `iloc[]`, show the first `10` records of the data.

3. Generate the summary statistics for all of the `loans.csv` data.

4. Using `iloc[]`, create a subset DataFrame `filtered_df` by selecting only the following columns:

    * `loan_amnt`
    * `term`
    * `int_rate`
    * `emp_title`
    * `annual_inc`
    * `purpose`

5. Using `loc[]`, filter `filtered_df` by row values where `term` is equal to `36 months` in order to focus on only three-year loan records.

6. Modify rows with `term` values equal to `36 months` to be `3 years`.

7. Use the `isnull` and `sum` function to evaluate the number of missing values in the `term_df` DataFrame. Use the `fillna` function to replace the NaN values with 'Unknown'. Review the first five rows of the new DataFrame

8. Generate the summary statistics for `term_df` after all modifications.

9. Use the `value_counts()` function on the `emp_title` column of the `term_df` DataFrame to see the unique value counts for employee titles of three-year loan customers.

10. Use the `value_counts()` function on the `purpose` column of the `term_df` DataFrame to see the unique value counts for loan purposes of three-year loan customers.

11. Filter `term_df` by rows with `annual_inc` greater than `80000`. Use the `describe` function to see the mean `int_rate` of three-year loan customers with annual incomes greater than $80,000.

12. Filter `term_df` by rows with `annual_inc` less than `80000`. Use the `describe` function to see the average `int_rate` of three-year loan customers with annual incomes less than $80,000.

13. Answer the following questions about individuals taking out 3 year loans:

    * What kind of customers (employee title) seem to ask for three-year loans most frequently?

    * What are three-year loans generally used for?

    * What is the difference between counts of three-year loan customers with annual incomes greater than 80,000, compared to those with annual incomes less than 80,000?

    * What is the difference between interest rates for customers with annual incomes greater than 80,000 compared to those with annual incomes less than 80,000?

In [1]:
# Import libraries and dependencies
import pandas as pd
from pathlib import Path

## Using the Pandas `read_csv` function and the Path module, read in in the `loans.csv` file and create the Pandas DataFrame. Review the first five rows of the resulting DataFrame.

In [6]:
# Read in the CSV as a DataFrame
# YOUR CODE HERE
loans_df = pd.read_csv(
   
    Path("../Resources/loans.csv"),
)
# Review the first five rows of the DataFrame
# YOUR CODE HERE
loans_df.head

<bound method NDFrame.head of         loan_amnt  funded_amnt  funded_amnt_inv       term  int_rate  \
0           10000        10000          10000.0  36 months    0.1033   
1            4000         4000           4000.0  36 months    0.2340   
2            5000         5000           5000.0  36 months    0.1797   
3            9600         9600           9600.0  36 months    0.1298   
4            2500         2500           2500.0  36 months    0.1356   
...           ...          ...              ...        ...       ...   
128407      23000        23000          23000.0  36 months    0.1502   
128408      10000        10000          10000.0  36 months    0.1502   
128409       5000         5000           5000.0  36 months    0.1356   
128410      10000        10000           9750.0  36 months    0.1106   
128411      10000        10000          10000.0  36 months    0.1691   

        installment grade sub_grade             emp_title emp_length  \
0            324.23     B        

## Using the `iloc[]` function, show the first `10` records of the data.

In [29]:
# Retrieve rows with index 0 up to 10 (not including)
# YOUR CODE HERE
loans_df.iloc[1:10]
loans_df

Unnamed: 0,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,purpose
0,10000,10000,10000.0,36 months,0.1033,324.23,B,B1,,< 1 year,MORTGAGE,280000.0,Not Verified,Dec-18,Current,n,debt_consolidation
1,4000,4000,4000.0,36 months,0.2340,155.68,E,E1,Security,3 years,RENT,90000.0,Source Verified,Dec-18,Current,n,debt_consolidation
2,5000,5000,5000.0,36 months,0.1797,180.69,D,D1,Administrative,6 years,MORTGAGE,59280.0,Source Verified,Dec-18,Current,n,debt_consolidation
3,9600,9600,9600.0,36 months,0.1298,323.37,B,B5,,,MORTGAGE,35704.0,Not Verified,Dec-18,Current,n,home_improvement
4,2500,2500,2500.0,36 months,0.1356,84.92,C,C1,Chef,10+ years,RENT,55000.0,Not Verified,Dec-18,Current,n,debt_consolidation
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128407,23000,23000,23000.0,36 months,0.1502,797.53,C,C3,Tax Consultant,10+ years,MORTGAGE,75000.0,Source Verified,Oct-18,Charged Off,n,debt_consolidation
128408,10000,10000,10000.0,36 months,0.1502,346.76,C,C3,security guard,5 years,MORTGAGE,38000.0,Not Verified,Oct-18,Current,n,debt_consolidation
128409,5000,5000,5000.0,36 months,0.1356,169.83,C,C1,Payoff Clerk,10+ years,MORTGAGE,35360.0,Not Verified,Oct-18,Current,n,debt_consolidation
128410,10000,10000,9750.0,36 months,0.1106,327.68,B,B3,,,RENT,44400.0,Source Verified,Oct-18,Current,n,credit_card


## Generate the summary statistics for all of the `loans_df` DataFrame.

In [18]:
# Generate the summary statistics for the DataFrame
# Be sure to include all the columns
# YOUR CODE HERE
loans_df.describe()
loans_df.columns

Index(['loan_amnt', 'funded_amnt', 'funded_amnt_inv', 'term', 'int_rate',
       'installment', 'grade', 'sub_grade', 'emp_title', 'emp_length',
       'home_ownership', 'annual_inc', 'verification_status', 'issue_d',
       'loan_status', 'pymnt_plan', 'purpose'],
      dtype='object')

## Using `iloc[]`, create a subset DataFrame `filtered_df` by selecting only the following columns:

* `loan_amnt`
* `term`
* `int_rate`
* `emp_title`
* `annual_inc`
* `purpose`

In [23]:
# Using the `iloc` function, create a DataFrame that consists of all rows of the columns:
# loan_amnt, term, int_rate, emp_title, annual_inc and purpose
# YOUR CODE HERE
filtered_df = loans_df.iloc[:,[0,3,4,8,11,-1]]

# Review the first five rows of the filtered DataFrame
# YOUR CODE HERE
filtered_df

Unnamed: 0,loan_amnt,term,int_rate,emp_title,annual_inc,purpose
0,10000,36 months,0.1033,,280000.0,debt_consolidation
1,4000,36 months,0.2340,Security,90000.0,debt_consolidation
2,5000,36 months,0.1797,Administrative,59280.0,debt_consolidation
3,9600,36 months,0.1298,,35704.0,home_improvement
4,2500,36 months,0.1356,Chef,55000.0,debt_consolidation
...,...,...,...,...,...,...
128407,23000,36 months,0.1502,Tax Consultant,75000.0,debt_consolidation
128408,10000,36 months,0.1502,security guard,38000.0,debt_consolidation
128409,5000,36 months,0.1356,Payoff Clerk,35360.0,debt_consolidation
128410,10000,36 months,0.1106,,44400.0,credit_card


## Using `loc[]`, filter `filtered_df` by row values where `term` is equal to `36 months` in order to focus on only three-year loan records.

In [30]:
# Conditional indexing to filter DataFrame where 'term' is equal to '36 months'
# YOUR CODE HERE
mask = filtered_df['term'] == '36 months'
term_df = filtered_df[mask]
# Review the first five rows of the term_df DataFrame
# YOUR CODE HERE
term_df.head()

Unnamed: 0,loan_amnt,term,int_rate,emp_title,annual_inc,purpose
0,10000,36 months,0.1033,,280000.0,debt_consolidation
1,4000,36 months,0.234,Security,90000.0,debt_consolidation
2,5000,36 months,0.1797,Administrative,59280.0,debt_consolidation
3,9600,36 months,0.1298,,35704.0,home_improvement
4,2500,36 months,0.1356,Chef,55000.0,debt_consolidation


## Modify rows with `term` values equal to `36 months` to be `3 years`.

In [28]:
# Change row values within the 'term' column from '36 months' to '3 Years'
# YOUR CODE HERE
term_df['term'] == '3 years' 

# Review the first five rows of the term_df DataFrame
# YOUR CODE HERE
term_df

Unnamed: 0,loan_amnt,term,int_rate,emp_title,annual_inc,purpose
0,10000,36 months,0.1033,,280000.0,debt_consolidation
1,4000,36 months,0.2340,Security,90000.0,debt_consolidation
2,5000,36 months,0.1797,Administrative,59280.0,debt_consolidation
3,9600,36 months,0.1298,,35704.0,home_improvement
4,2500,36 months,0.1356,Chef,55000.0,debt_consolidation
...,...,...,...,...,...,...
128407,23000,36 months,0.1502,Tax Consultant,75000.0,debt_consolidation
128408,10000,36 months,0.1502,security guard,38000.0,debt_consolidation
128409,5000,36 months,0.1356,Payoff Clerk,35360.0,debt_consolidation
128410,10000,36 months,0.1106,,44400.0,credit_card


## Use the `isnull` and `sum` function to evaluate the number of missing values in the `term_df` DataFramerame. Use the `fillna` function to replace the NaN values with 'Unknown'. Review the first five rows of the new DataFrame 

In [None]:
# Use the isnaull and sum functions to evaluate the number of missing values in the `term_df` DataFramerame.
# YOUR CODE HERE


In [None]:
#  Use the `fillna` function to replace the NaN values with 'Unknown'
# YOUR CODE HERE

# Review the first five rows of the cleaned term_df DataFrame
# YOUR CODE HERE


### Generate the summary statistics for `term_df` after all modifications.

In [None]:
# Describe summary statistics for three-year loans
# YOUR CODE HERE


## Use the `value_counts()` function on the `emp_title` column of the `term_df` DataFrame to see the unique value counts for employee titles of three-year loan customers.

In [None]:
# Calculate unique values and counts for employee titles of 3 year customer loans
# YOUR CODE HERE


## Use the `value_counts()` function on the `purpose` column of the `term_df` DataFrame to see the unique value counts for loan purposes of three-year loan customers.

In [None]:
# Calculate unique values and counts for loan purposes of 3 year customer loans
# YOUR CODE HERE


## Filter `term_df` by rows with `annual_inc` greater than `80000`. Use the `describe` function to see the mean `int_rate` of three-year loan customers with annual incomes greater than $80,000.

In [None]:
# Display summary statistics where annual income is greater than $80,000 to find count and mean
# YOUR CODE HERE


## Filter `term_df` by rows with `annual_inc` less than `80000`. Use the `describe` function to see the average `int_rate` of three-year loan customers with annual incomes less than $80,000.

In [None]:
# Display summary statistics where annual income is less than $80,000 to find count and mean
# YOUR CODE HERE


## Answer the following questions about individuals taking out 3 year loans:

1. What kind of customers (employee title) seem to ask for three-year loans most frequently?
2. What are three-year loans generally used for?
3. What is the difference in count of three-year loan customers with annual incomes greater than 80,000 compared to those with annual incomes less than 80,000?
4. What is the difference in interest rates of three-year loan customers with annual incomes greater than 80,000 compared to those with annual incomes less than 80,000? 

# YOUR ANSWERS HERE