
## Example 1: Peer-to-Peer Lending (Finance)

### The Business Model

Peer-to-peer lending (abbreviated P2P) occurs when investors lend money directly to individuals or businesses through an online service. The online server provider matches lenders with borrowers, and conducts the analysis required to determine the loan interest rate to be charged to the borrower and the risk incurred by the lender. There is usually a lower operating cost to peer-to-peer lending, therefore investors tend to get higher returns and borrowers lower loan rates, although this is now always the case.

### Company: Lending Club

**Lending Club** is a peer-to-peer Lending company based in the US. Lending Club matches people looking to invest money with people looking to borrow money. When investors invest their money through Lending Club, the money is passed onto borrowers, and when borrowers pay their loans back, the capital plus the interest passes on back to the investors. This product is called unsecured personal loans. To learn more about Lending Club visit their [website](https://www.lendingclub.com/).

### The Dataset

The Lending Club dataset contains complete loan data for all loans issued through 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. Features include credit scores, number of finance inquiries, address including zip codes and state, and collections among others. Collections indicates whether the customer has missed one or more payments and the team is trying to recover the money.

The dataset contains about 890 thousand observations and 75 variables. More detail on this dataset can be found in [Kaggle's website](https://www.kaggle.com/wendykan/lending-club-loan-data)

### Download and save

To download the dataset:

- Go to the [Kaggle Website](https://www.kaggle.com/wendykan/lending-club-loan-data)
- Scroll down and click on the file "loan.csv"
- Click the "Download" button at the top of the screen
- Unzip the file
- Keep the dataset name as "loan.csv"
- Save the file in the parent directory of the folder where you store your notebooks

**Note the following:**
- You need to be logged in to Kaggle to download the dataset.
- You need may need to accept terms and conditions
- Save the dataset where it is indicated by the "SAVE_DATASETS_HERE.txt" file in the Jupyter Notebooks folder

## Example 2: Predicting Survival on the Titanic

### History
Perhaps one of the most infamous shipwrecks in history, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 people on board. Interestingly, by analysing the probability of survival based on few attributes like gender, age, and social status, we can make very accurate predictions on which passengers would survive. Some groups of people were more likely to survive than others, such as women, children, and the upper-class. Therefore, we can learn about the society priorities and privileges at the time.

### Dataset


### Download and Save

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')
data.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2,?,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.55,C22 C26,S,11,?,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,?,?,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,?,135,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,?,?,"Montreal, PQ / Chesterville, ON"


In [3]:
data = data.replace('?', np.nan)
data.isnull().sum()

pclass          0
survived        0
name            0
sex             0
age           263
sibsp           0
parch           0
ticket          0
fare            1
cabin        1014
embarked        2
boat          823
body         1188
home.dest     564
dtype: int64

In [4]:
def get_first_cabin(row):
    try:
        return row.split()[0]
    except:
        return np.nan 

In [5]:
data['cabin'] = data['cabin'].apply(get_first_cabin)

In [6]:
data.to_csv('../titanic.csv', index=False)