# Part I - Exploring the Prosper Loan Dataset
## By Sondos Aabed

> My Github: [@sondosaabed](https://github.com/sondosaabed) 

> My LinkedIn: [@sondosaabed](https://www.linkedin.com/in/sondosaabed/)

<hr/>

## Abstract


<hr/>

## Table of Contents
- Introduction
    - Data Dictionary
- Objectives
- Premirely Wrangling
- Univariate extrapolation
- Bivariate extrapolation
- Multivariate extrapolation
- Conclusions

<hr/>

## Introduction

> Introduce the dataset

### Data Dictionary
The following data dictionary shows each variable of the dataset and the corresponding description:

| Variable                                | Description                                                                                                                                                                   |
|-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ListingKey                              | Unique key for each listing, same value as the 'key' used in the listing object in the API.                                                                                   |
| ListingNumber                           | The number that uniquely identifies the listing to the public as displayed on the website.                                                                                    |
| ListingCreationDate                     | The date the listing was created.                                                                                                                                             |
| CreditGrade                             | The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings.                 |
| Term                                    | The length of the loan expressed in months.                                                                                                                                   |
| LoanStatus                              | The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket. |
| ClosedDate                              | Closed date is applicable for Cancelled, Completed, Chargedoff and Defaulted loan statuses.                                                                                   |
| BorrowerAPR                             | The Borrower's Annual Percentage Rate (APR) for the loan.                                                                                                                     |
| BorrowerRate                            | The Borrower's interest rate for this loan.                                                                                                                                   |
| LenderYield                             | The Lender yield on the loan. Lender yield is equal to the interest rate on the loan less the servicing fee.                                                                  |
| EstimatedEffectiveYield                 | Effective yield is equal to the borrower interest rate (i) minus the servicing fee rate, (ii) minus estimated uncollected interest on charge-offs, (iii) plus estimated collected late fees.  Applicable for loans originated after July 2009. |
| EstimatedLoss                           | Estimated loss is the estimated principal loss on charge-offs. Applicable for loans originated after July 2009.                                                                |
| EstimatedReturn                         | The estimated return assigned to the listing at the time it was created. Estimated return is the difference between the Estimated Effective Yield and the Estimated Loss Rate. Applicable for loans originated after July 2009. |
| ProsperRating (numeric)                 | The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA.  Applicable for loans originated after July 2009.  |
| ProsperRating (Alpha)                   | The Prosper Rating assigned at the time the listing was created between AA - HR.  Applicable for loans originated after July 2009.                                            |
| ProsperScore                            | A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score.  Applicable for loans originated after July 2009. |
| ListingCategory                         | The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7 - Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans |
| BorrowerState                           | The two letter abbreviation of the state of the address of the borrower at the time the Listing was created.                                                                   |
| Occupation                              | The Occupation selected by the Borrower at the time they created the listing.                                                                                                 |
| EmploymentStatus                        | The employment status of the borrower at the time they posted the listing.                                                                                                    |
| EmploymentStatusDuration                | The length in months of the employment status at the time the listing was created.                                                                                            |
| IsBorrowerHomeowner                     | A Borrower will be classified as a homeowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.                        |
| CurrentlyInGroup                        | Specifies whether or not the Borrower was in a group at the time the listing was created.                                                                                     |
| GroupKey                                | The Key of the group in which the Borrower is a member of. Value will be null if the borrower does not have a group affiliation.                                              |
| DateCreditPulled                        | The date the credit profile was pulled.                                                                                                                                       |
| CreditScoreRangeLower                   | The lower value representing the range of the borrower's credit score as provided by a consumer credit rating agency.                                                         |
| CreditScoreRangeUpper                   | The upper value representing the range of the borrower's credit score as provided by a consumer credit rating agency.                                                         |
| FirstRecordedCreditLine                 | The date the first credit line was opened.                                                                                                                                    |
| CurrentCreditLines                      | Number of current credit lines at the time the credit profile was pulled.                                                                                                     |
| OpenCreditLines                         | Number of open credit lines at the time the credit profile was pulled.                                                                                                        |
| TotalCreditLinespast7years              | Number of credit lines in the past seven years at the time the credit profile was pulled.                                                                                     |
| OpenRevolvingAccounts                   | Number of open revolving accounts at the time the credit profile was pulled.                                                                                                  |
| OpenRevolvingMonthlyPayment             | Monthly payment on revolving accounts at the time the credit profile was pulled.                                                                                              |
| InquiriesLast6Months                    | Number of inquiries in the past six months at the time the credit profile was pulled.                                                                                         |
| TotalInquiries                          | Total number of inquiries at the time the credit profile was pulled.                                                                                                          |
| CurrentDelinquencies                    | Number of accounts delinquent at the time the credit profile was pulled.                                                                                                      |
| AmountDelinquent                        | Dollars delinquent at the time the credit profile was pulled.                                                                                                                 |
| DelinquenciesLast7Years                 | Number of delinquencies in the past 7 years at the time the credit profile was pulled.                                                                                        |
| PublicRecordsLast10Years                | Number of public records in the past 10 years at the time the credit profile was pulled.                                                                                      |
| PublicRecordsLast12Months               | Number of public records in the past 12 months at the time the credit profile was pulled.                                                                                     |
| RevolvingCreditBalance                  | Dollars of revolving credit at the time the credit profile was pulled.                                                                                                        |
| BankcardUtilization                     | The percentage of available revolving credit that is utilized at the time the credit profile was pulled.                                                                      |
| AvailableBankcardCredit                 | The total available credit via bank card at the time the credit profile was pulled.                                                                                           |
| TotalTrades                             | Number of trade lines ever opened at the time the credit profile was pulled.                                                                                                  |
| TradesNeverDelinquent                   | Number of trades that have never been delinquent at the time the credit profile was pulled.                                                                                   |
| TradesOpenedLast6Months                 | Number of trades opened in the last 6 months at the time the credit profile was pulled.                                                                                       |
| DebtToIncomeRatio                       | The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%). |
| IncomeRange                             | The income range of the borrower at the time the listing was created.                                                                                                         |
| IncomeVerifiable                        | The borrower indicated they have the required documentation to support their income.                                                                                          |
| StatedMonthlyIncome                     | The monthly income the borrower stated at the time the listing was created.                                                                                                   |
| LoanKey                                 | Unique key for each loan. This is the same key that is used in the API.                                                                                                       |
| TotalProsperLoans                       | Number of Prosper loans the borrower at the time they created this listing. This value will be null if the borrower had no prior loans.                                       |
| TotalProsperPaymentsBilled              | Number of on time payments the borrower made on Prosper loans at the time they created this listing. This value will be null if the borrower had no prior loans.             |
| OnTimeProsperPayments                   | Number of on time payments the borrower had made on Prosper loans at the time they created this listing. This value will be null if the borrower has no prior loans.         |
| ProsperPaymentsLessThanOneMonthLate     | Number of payments the borrower made on Prosper loans that were less than one month late at the time they created this listing. This value will be null if the borrower had no prior loans. |
| ProsperPaymentsOneMonthPlusLate         | Number of payments the borrower made on Prosper loans that were greater than one month late at the time they created this listing. This value will be null if the borrower had no prior loans. |
| ProsperPrincipalBorrowed                | Total principal borrowed on Prosper loans at the time the listing was created. This value will be null if the borrower had no prior loans.                                     |
| ProsperPrincipalOutstanding             | Principal outstanding on Prosper loans at the time the listing was created. This value will be null if the borrower had no prior loans.                                        |
| ScorexChangeAtTimeOfListing             | Borrower's credit score change at the time the credit profile was pulled. This will be the change relative to the borrower's last Prosper loan. This value will be null if the borrower had no prior loans. |
| LoanCurrentDaysDelinquent               | The number of days delinquent.                                                                                                                                               |
| LoanFirstDefaultedCycleNumber           | The cycle the loan was charged off. If the loan has not charged off the value will be null.                                                                                   |
| LoanMonthsSinceOrigination              | Months since the loan originated.                                                                                                                                            |
| LoanNumber                              | The number that uniquely identifies the loan to the public as displayed on the website.                                                                                      |
| LoanOriginalAmount                      | The original amount of the loan.                                                                                                                                             |
| LoanOriginationDate                     | The date the loan originated.                                                                                                                                                |
| LoanOriginationQuarter                  | The quarter in which the loan originated.                                                                                                                                    |
| MemberKey                               | Unique key for each member. This is the same key that is used in the API.                                                                                                     |
| MonthlyLoanPayment                      | The monthly payment (principal and interest) the borrower is required to make for this loan.                                                                                 |
| LP_CustomerPayments                     | The total payments (principal + interest) that have been made on the loan by the borrower.                                                                                   |
| LP_CustomerPrincipalPayments            | The total principal payments that have been made on the loan by the borrower.                                                                                                |
| LP_InterestandFees                      | Interest and fees paid by the borrower.                                                                                                                                      |
| LP_ServiceFees                          | The servicing fees paid by the borrower.                                                                                                                                    |
| LP_CollectionFees                       | The collection fees paid by the borrower.                                                                                                                                   |
| LP_GrossPrincipalLoss                   | Gross principal loss on the loan.                                                                                                                                           |
| LP_NetPrincipalLoss                     | Net principal loss on the loan.                                                                                                                                             |
| LP_NonPrincipalRecoverypayments         | Non-principal recovery payments on the loan.                                                                                                                                |
| PercentFunded                           | The percentage of the loan that was funded.                                                                                                                                 |
| Recommendations                         | Number of recommendations the borrower had at the time they created the listing.                                                                                            |
| InvestmentFromFriendsCount              | Number of investments that were made by friends at the time the listing was created.                                                                                        |
| InvestmentFromFriendsAmount             | The dollar amount of investments that were made by friends at the time the listing was created.                                                                              |
| Investors                               | The number of investors that funded the loan.                                                                                                                               |


<hr/>

## Objectives

**1.  Loan Performance Analysis**

**2.  Credit Score and Borrower Analysis**

**3.  Geographic and Demographic Analysis**

<hr/>

## Preliminary Wrangling

- In this section, a preliminary data wrangling is done on the dataset. Here is what this include:
- 

In [1]:
## import all packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



### Loading the dataset

> Let's Load in the dataset into a pandas dataframe:

In [2]:
df = pd.read_csv("./data/prosperLoanData.csv") ## Load the csv into pandas dataframe
df.sample(10) ## Looking at a random sample of 10 rows.

Unnamed: 0,ListingKey,ListingNumber,ListingCreationDate,CreditGrade,Term,LoanStatus,ClosedDate,BorrowerAPR,BorrowerRate,LenderYield,...,LP_ServiceFees,LP_CollectionFees,LP_GrossPrincipalLoss,LP_NetPrincipalLoss,LP_NonPrincipalRecoverypayments,PercentFunded,Recommendations,InvestmentFromFriendsCount,InvestmentFromFriendsAmount,Investors
51080,EE133382453730991394AE6,101462,2007-02-19 14:47:48.407000000,C,36,Chargedoff,2008-04-25 00:00:00,0.16516,0.158,0.143,...,-61.12,0.0,7126.01,7126.0,0.0,1.0,0,0,0.0,117
51805,EFDE3602563345227AFAB56,1201557,2014-02-12 11:51:27.263000000,,60,Current,,0.1858,0.162,0.152,...,0.0,0.0,0.0,0.0,0.0,1.0,0,0,0.0,250
54120,E6F435580200677027BFB82,634417,2012-09-05 13:05:54.413000000,,60,Current,,0.15936,0.1364,0.1264,...,-153.32,0.0,0.0,0.0,0.0,1.0,0,0,0.0,26
105984,B68D3591588407896813095,950637,2013-10-02 15:57:32.433000000,,36,Completed,2014-03-04 00:00:00,0.33973,0.2999,0.2899,...,-15.67,0.0,0.0,0.0,0.0,1.0,0,0,0.0,1
24469,7BB53552243264578555FBF,612378,2012-07-16 13:13:05.020000000,,60,Current,,0.27554,0.2498,0.2398,...,-108.55,0.0,0.0,0.0,0.0,1.0,0,0,0.0,35
85150,CECA3602903020425670D1F,1206861,2014-02-13 20:38:59.030000000,,60,Current,,0.29594,0.2694,0.2594,...,0.0,0.0,0.0,0.0,0.0,1.0,0,0,0.0,15
89053,6A1C3383373935763251F7D,107392,2007-03-06 11:41:54.070000000,E,36,Defaulted,2008-11-16 00:00:00,0.22479,0.2099,0.1899,...,-48.93,-33.15,3310.39,3310.39,500.52,1.0,0,0,0.0,124
4080,678335436659049480426B0,578275,2012-04-13 05:23:51.167000000,,60,Completed,2013-11-19 00:00:00,0.31375,0.287,0.277,...,-72.61,0.0,0.0,0.0,0.0,1.0,0,0,0.0,9
94886,7C7935362525067699026FA,550852,2012-01-11 19:45:21.950000000,,36,Chargedoff,2013-01-20 00:00:00,0.35797,0.3177,0.3077,...,-22.11,-1.7,3485.89,3485.89,0.0,1.0,0,0,0.0,1
16018,583533899449152684870E7,141985,2007-05-23 15:10:17.637000000,D,36,Completed,2007-12-14 00:00:00,0.17711,0.1699,0.1599,...,-12.74,0.0,0.0,0.0,0.0,1.0,0,0,0.0,85


### Dataset Structure

In [3]:
df.shape ## showing the shape of the dataset

(113937, 81)

> This dataset has 113,937 rows and 81 columns. Which is a relatively big dataset.

### Dataset Assessment and Cleaning

#### Data types Validity

- Assessment: Let's look at the data types of these variables and assess them using `.info`:

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113937 entries, 0 to 113936
Data columns (total 81 columns):
 #   Column                               Non-Null Count   Dtype  
---  ------                               --------------   -----  
 0   ListingKey                           113937 non-null  object 
 1   ListingNumber                        113937 non-null  int64  
 2   ListingCreationDate                  113937 non-null  object 
 3   CreditGrade                          28953 non-null   object 
 4   Term                                 113937 non-null  int64  
 5   LoanStatus                           113937 non-null  object 
 6   ClosedDate                           55089 non-null   object 
 7   BorrowerAPR                          113912 non-null  float64
 8   BorrowerRate                         113937 non-null  float64
 9   LenderYield                          113937 non-null  float64
 10  EstimatedEffectiveYield              84853 non-null   float64
 11  EstimatedLoss

> We have the columns of `ClosedDate`, `LoanOriginationDate`, `DateCreditPulled`, and the `ListingCreationDate` has an object type and it has to be a datetime type.

In [5]:
## list the date columns
date_columns = ['ClosedDate', 'LoanOriginationDate', 'DateCreditPulled', 'ListingCreationDate'] 

### Loopong through the list and coverting to datetime data type
for col in date_columns:
    df[col] = pd.to_datetime(df[col], format='mixed') ## using the format as mixed to infer the format for each element individually

df.info() ### check if that is successful

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113937 entries, 0 to 113936
Data columns (total 81 columns):
 #   Column                               Non-Null Count   Dtype         
---  ------                               --------------   -----         
 0   ListingKey                           113937 non-null  object        
 1   ListingNumber                        113937 non-null  int64         
 2   ListingCreationDate                  113937 non-null  datetime64[ns]
 3   CreditGrade                          28953 non-null   object        
 4   Term                                 113937 non-null  int64         
 5   LoanStatus                           113937 non-null  object        
 6   ClosedDate                           55089 non-null   datetime64[ns]
 7   BorrowerAPR                          113912 non-null  float64       
 8   BorrowerRate                         113937 non-null  float64       
 9   LenderYield                          113937 non-null  float64       
 

> Now the rest of the data types of the variables are valid.

#### Data Completness

In [6]:
def get_percent_null(df):
    """
    
    """
    null_counts = df.isnull().sum()
    null_counts = null_counts[null_counts > 0].sort_values()
    return (null_counts/df.shape[0])*100

In [7]:
get_percent_null(df)

BorrowerAPR                             0.021942
CreditScoreRangeUpper                   0.518708
CreditScoreRangeLower                   0.518708
PublicRecordsLast10Years                0.611742
CurrentDelinquencies                    0.611742
InquiriesLast6Months                    0.611742
TotalCreditLinespast7years              0.611742
FirstRecordedCreditLine                 0.611742
DelinquenciesLast7Years                 0.868901
TotalInquiries                          1.017229
EmploymentStatus                        1.979164
Occupation                              3.149109
BorrowerState                           4.840394
AvailableBankcardCredit                 6.621203
TradesOpenedLast6Months                 6.621203
TradesNeverDelinquent (percentage)      6.621203
TotalTrades                             6.621203
CurrentCreditLines                      6.673864
OpenCreditLines                         6.673864
PublicRecordsLast12Months               6.673864
RevolvingCreditBalan

> There are many columns that has null values, some of which have a very high count of null values. The following columns were the highest percentage of null such as higher than 50%. These columns are droppped.

In [8]:
high_missing_percent = ['CreditGrade', 'ScorexChangeAtTimeOfListing', 'LoanFirstDefaultedCycleNumber', 'ClosedDate', 'TotalProsperLoans',
 'TotalProsperPaymentsBilled', 'OnTimeProsperPayments', 'ProsperPaymentsLessThanOneMonthLate', 'ProsperPaymentsOneMonthPlusLate', 
 'ProsperPrincipalBorrowed', 'ProsperPrincipalOutstanding']

In [9]:
df.drop(columns=high_missing_percent, inplace=True)

In [10]:
df.shape

(113937, 70)

> The drop is checked now that the variables went from 81 to 70.

Handling the other null values by dropping NA values and rows:

In [11]:
get_percent_null(df)

BorrowerAPR                            0.021942
CreditScoreRangeUpper                  0.518708
CreditScoreRangeLower                  0.518708
TotalCreditLinespast7years             0.611742
CurrentDelinquencies                   0.611742
FirstRecordedCreditLine                0.611742
PublicRecordsLast10Years               0.611742
InquiriesLast6Months                   0.611742
DelinquenciesLast7Years                0.868901
TotalInquiries                         1.017229
EmploymentStatus                       1.979164
Occupation                             3.149109
BorrowerState                          4.840394
AvailableBankcardCredit                6.621203
TradesOpenedLast6Months                6.621203
TotalTrades                            6.621203
TradesNeverDelinquent (percentage)     6.621203
PublicRecordsLast12Months              6.673864
RevolvingCreditBalance                 6.673864
BankcardUtilization                    6.673864
CurrentCreditLines                     6

> These columns are identifiers, mostly unique between each loan.They are irrelevant for the task and therefore they should be dropped too.

In [12]:
df.drop(columns=["ListingKey", "ListingNumber", "GroupKey", "LoanKey", "LoanNumber", "MemberKey", "DateCreditPulled"], inplace= True )

In [13]:
df.shape

(113937, 63)

> The drop is done, since the features count went down to 63.

In [14]:
get_percent_null(df)

BorrowerAPR                            0.021942
CreditScoreRangeUpper                  0.518708
CreditScoreRangeLower                  0.518708
InquiriesLast6Months                   0.611742
TotalCreditLinespast7years             0.611742
FirstRecordedCreditLine                0.611742
PublicRecordsLast10Years               0.611742
CurrentDelinquencies                   0.611742
DelinquenciesLast7Years                0.868901
TotalInquiries                         1.017229
EmploymentStatus                       1.979164
Occupation                             3.149109
BorrowerState                          4.840394
AvailableBankcardCredit                6.621203
TradesOpenedLast6Months                6.621203
TotalTrades                            6.621203
TradesNeverDelinquent (percentage)     6.621203
PublicRecordsLast12Months              6.673864
RevolvingCreditBalance                 6.673864
BankcardUtilization                    6.673864
OpenCreditLines                        6

> Let's handle the lower percentages by only dropping the NA values instead of the whole columns.

In [15]:
df.dropna(inplace=True)

In [16]:
get_percent_null(df)

Series([], dtype: float64)

> No missing NA values left.

In [17]:
df.shape

(76216, 63)

> After handling missing values and dropping unneccary columns the shape of the data is 76,216 rows and 63 columns

### Main Features of interest

In [18]:
df.columns

Index(['ListingCreationDate', 'Term', 'LoanStatus', 'BorrowerAPR',
       'BorrowerRate', 'LenderYield', 'EstimatedEffectiveYield',
       'EstimatedLoss', 'EstimatedReturn', 'ProsperRating (numeric)',
       'ProsperRating (Alpha)', 'ProsperScore', 'ListingCategory (numeric)',
       'BorrowerState', 'Occupation', 'EmploymentStatus',
       'EmploymentStatusDuration', 'IsBorrowerHomeowner', 'CurrentlyInGroup',
       'CreditScoreRangeLower', 'CreditScoreRangeUpper',
       'FirstRecordedCreditLine', 'CurrentCreditLines', 'OpenCreditLines',
       'TotalCreditLinespast7years', 'OpenRevolvingAccounts',
       'OpenRevolvingMonthlyPayment', 'InquiriesLast6Months', 'TotalInquiries',
       'CurrentDelinquencies', 'AmountDelinquent', 'DelinquenciesLast7Years',
       'PublicRecordsLast10Years', 'PublicRecordsLast12Months',
       'RevolvingCreditBalance', 'BankcardUtilization',
       'AvailableBankcardCredit', 'TotalTrades',
       'TradesNeverDelinquent (percentage)', 'TradesOpenedLast6M

In [None]:
df.shape

### Features for investigation 

> The following features

## Univariate Exploration

> In this section, investigate distributions of individual variables. If you see unusual points or outliers, take a deeper look to clean things up and prepare yourself to look at relationships between variables.

>**Rubric Tip**: Use the "Question-Visualization-Observations" framework  throughout the exploration. This framework involves **asking a question from the data, creating a visualization to find answers, and then recording observations after each visualisation.** 

> **Rubric Tip**: This part (Univariate Exploration) should include at least one histogram, and either a bar chart of count plot.

>**Rubric Tip**: Visualizations should depict the data appropriately so that the plots are easily interpretable. You should choose an appropriate plot type, data encodings, and formatting as needed. The formatting may include setting/adding the title, labels, legend, and comments. Also, do not overplot or incorrectly plot ordinal data.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your data. Make sure the variables that you cover here have been introduced in some fashion in the previous section (univariate exploration).

> **Rubric Tip**: This part (Bivariate Exploration) should include at least one scatter plot, one box plot, and at least one clustered bar chart or heat map.

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

> **Rubric Tip**: This part (Multivariate Exploration) should include at least one Facet Plot, and one Plot Matrix or Scatterplot with multiple encodings.

>**Rubric Tip**: Think carefully about how you encode variables. Choose appropriate color schemes, markers, or even how Facets are chosen. Also, do not overplot or incorrectly plot ordinal data.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

## Conclusions
>You can write a summary of the main findings and reflect on the steps taken during the data exploration.

> **Rubric Tip**: Create a list of summary findings to make it easy to review.

> Remove all Tips mentioned above, before you convert this notebook to PDF/HTML.


> At the end of your report, make sure that you export the notebook as an html file from the `File > Download as... > HTML or PDF` menu. Make sure you keep track of where the exported file goes, so you can put it in the same folder as this notebook for project submission. Also, make sure you remove all of the quote-formatted guide notes like this one before you finish your report!

