# Python Data Science & Analysis 
### Project: Credit Risk Assessment 

# Abstract

You are hired as part of a data science team at a fintech start up. The start up has offered loans of $£1,000$ to $1000$ customers in various groups of interest in order to collect data on their likelihood of repayment.  

Your role is to offer an account or predictive model of what factors lead to loan default and thereby advise the new company on its loan strategy. 

The company has collected the following data:

```
 "ID",         Customer ID
 "Income",     Annual Pre-Tax Income on-application
 "Term" ,      Short or Long Term (6mo or 12mo)
 "Balance",    Current Account Balance on-application
 "Debt" ,      Outstanding Debt on-application
 "Score",      Credit Score (from referencing agency)
 "Default"     Observed Default (True = Default, False = Settle)
```

# Part 1: Rules

Your first project is to consider the application of a single customer and prototype rules which could predict whether they defaulted or not. 

```python

customer = (690, 14300., 'Short Term', 1190., 87., 63., False)

```

### Q. Define a variable `columns` to hold the column names

Goal: your code should contain a variable which lists the names of the columns in text. 

In [1]:
columns = ["ID", "Income", "Term", "Balance", "Debt", "Score", "Default"]
print(columns)

['ID', 'Income', 'Term', 'Balance', 'Debt', 'Score', 'Default']


### Q. Define `customer` as above

Goal: include the customer variable defined above. 

In [2]:
customer = (690, 14300., "Short Term", 1190., 87., 63., False)
print(customer)

(690, 14300.0, 'Short Term', 1190.0, 87.0, 63.0, False)


### Q. Print the customer details out (the field name and value)

Goal: your code should `print()` details of the customer's loan. 

```
SAMPLE OUTPUT:

ID 690
Income 14300.0
Term Short Term
Balance 1190.0
Debt 87.0
Score 63.0
Default False
```

In [3]:
i = 0
while i < len(columns):
    print(columns[i] + "\t" + str(customer[i]))
    i+=1

ID	690
Income	14300.0
Term	Short Term
Balance	1190.0
Debt	87.0
Score	63.0
Default	False


### Q. Print a prediction and observation

1. Goal: print the observed default 
---
2. Goal: compute and print a prediction
    * eg., include an `if` that the customer score is $< 200$
        * print `True` for the prediction
        * otherwise, `False`
    
    

```
SAMPLE OUTPUT:

observation: False
prediction: True
```

In [4]:
print(f"observation: {customer[-1]}")

print(f"prediction: {customer[-2] < 200}")

observation: False
prediction: True


### Q. Improve the prototype rule: consider income

* Goal: Include a condition on the customer income

```
SAMPLE OUTPUT:

observation: False
prediction: True
```

In [5]:
print(f"observation: {customer[-1]}")

print(f"prediction: {customer[1] < 25000 or customer[-2] < 200}")

observation: False
prediction: True


### Q. Improve the prototype rule: consider term

* Goal: Include a condition on the customer term

```
SAMPLE OUTPUT:

observation: False
prediction: True
```

In [6]:
print(f"observation: {customer[-1]}")

print(f"prediction: {customer[1] < 25000 or 'Long' in customer[2]}")

observation: False
prediction: True


# Part 2: Datasets

The company now provides a partial dataset. 

Your task is to apply your rules above and estimate your prediction error using them. 

In [7]:
loans = [(690, 14300., "Short Term", 1190., 87., 63., False), 
               (75, 30000., "Short Term", 5000., 0., 250., False),
               (167, 25000., "Long Term", 2439., 200., 102., True),
               (279, 19000., "Long Term", 500., 1000., 200., True),
               (397, 40000., "Short Term", 2000., 0., None, False),
               (827, 23050., "Long Term", 100., 1500., 270., True)
              ]

### Q. Print out customer details

Goal: your code should `print()` all loan details. 

```
SAMPLE OUTPUT:

ID: 215  	 Income: 37900.0
ID: 442  	 Income: 78700.0
ID: 22  	 Income: 41900.0
ID: 711  	 Income: 24600.0
ID: 113  	 Income: 33900.0
ID: 91  	 Income: 23200.0
ID: 268  	 Income: 17700.0
ID: 735  	 Income: 37100.0
ID: 971  	 Income: 35300.0
ID: 858  	 Income: 16700.0
```

In [8]:
for cid, income, term, balance, debt, score, default in loans:
    print(f"ID: {cid}  \t  Income: {income}")

ID: 690  	  Income: 14300.0
ID: 75  	  Income: 30000.0
ID: 167  	  Income: 25000.0
ID: 279  	  Income: 19000.0
ID: 397  	  Income: 40000.0
ID: 827  	  Income: 23050.0


### Q. Print the score

Goal: print their score if it exists. 

```
SAMPLE OUTPUT:

Income: 37900	ID: 215  	 Score: 595
Income: 78700	ID: 442  	 Score: 1000
Income: 41900	ID: 22  	 Score: 372
Income: 24600	ID: 711  	 Score: 385
Income: 33900	ID: 113  	 Score: 456
Income: 23200	ID: 91  	 Score: 264
Income: 17700	ID: 268  	 Score: 289
Income: 37100	ID: 735  	 Score: 661
Income: 16700	ID: 858  	 Score: 201
```

In [9]:
for cid, income, term, balance, debt, score, default in loans:
    if(score == None):
        continue
    else:
        print(f"Income: {income:.0f} \t ID: {cid} \t Score: {score:.0f}")

Income: 14300 	 ID: 690 	 Score: 63
Income: 30000 	 ID: 75 	 Score: 250
Income: 25000 	 ID: 167 	 Score: 102
Income: 19000 	 ID: 279 	 Score: 200
Income: 23050 	 ID: 827 	 Score: 270


### Q. Include predictions

* Goal: Compute a prediction using a rule above
---
* Goal: print all predictions.
---
* `print()` whether the prediction matches the observation

```
SAMPLE OUTPUT:

Income: 37900		Score: 595	Error: True
Income: 78700		Score: 1000	Error: True
Income: 41900		Score: 372	Error: True
Income: 24600		Score: 385	Error: True
Income: 33900		Score: 456	Error: True
Income: 23200		Score: 264	Error: True
Income: 17700		Score: 289	Error: False
Income: 37100		Score: 661	Error: True
Income: 16700		Score: 201	Error: False
```

In [10]:
 for cid, income, term, balance, debt, score, default in loans:
    if(score == None):
        continue
    else:
        prediction = (score < 200)
        print(f"Income: {int(income)} \t Score: {int(score)} \t Default: {default}  \t Prediction: {prediction} \t Error: {prediction == default}")

Income: 14300 	 Score: 63 	 Default: False  	 Prediction: True 	 Error: False
Income: 30000 	 Score: 250 	 Default: False  	 Prediction: False 	 Error: True
Income: 25000 	 Score: 102 	 Default: True  	 Prediction: True 	 Error: True
Income: 19000 	 Score: 200 	 Default: True  	 Prediction: False 	 Error: False
Income: 23050 	 Score: 270 	 Default: True  	 Prediction: False 	 Error: False


### Q. Include an error

Goal: Start a running error total from `0` before the loop.

---
Goal: Modify the loop so that you increase error by one.


```
SAMPLE OUTPUT:

Income: 37900		Score: 595	Error: True
Income: 78700		Score: 1000	Error: True
Income: 41900		Score: 372	Error: True
Income: 24600		Score: 385	Error: True
Income: 33900		Score: 456	Error: True
Income: 23200		Score: 264	Error: True
Income: 17700		Score: 289	Error: False
Income: 37100		Score: 661	Error: True
Income: 16700		Score: 201	Error: False

Total Error: 2
```

In [11]:
error = 0
for cid, income, term, balance, debt, score, default in loans:
    if(score == None):
        continue
    else:
        prediction = (score < 200)
        print(f"Income: {int(income)}    Score: {int(score)} \t Error: {prediction == default}")
        if(prediction != default):
            error += 1 
            
print()
print(f"Total Error: {error}")

Income: 14300    Score: 63 	 Error: False
Income: 30000    Score: 250 	 Error: True
Income: 25000    Score: 102 	 Error: True
Income: 19000    Score: 200 	 Error: False
Income: 23050    Score: 270 	 Error: False

Total Error: 3


### Q. Report an accuracy score

Goal: `print()` an accuracy out of $100\%$

Compute the accuracy as $1 - \frac{error}{N_{loans}}$.

Compute a score out of $100\%$. 


```
SAMPLE OUTPUT:

Score: 80 %
```

In [12]:
n = len(loans)
accuracy = 1 - (error/n)
print(f"Score: {int(accuracy*100)} %")

Score: 50 %
