<a href="https://colab.research.google.com/github/jrios46/neural-network-challenge-1/blob/main/Final_student_loans_with_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Student Loan Risk with Deep Learning

In [None]:
# Imports
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from pathlib import Path
import sklearn as skl
from tensorflow.keras.optimizers import Adam, RMSprop, SGD, Adagrad

---

## Prepare the data to be used on a neural network model

### Step 1: Read the `student-loans.csv` file into a Pandas DataFrame. Review the DataFrame, looking for columns that could eventually define your features and target variables.   

In [None]:
# Read the csv into a Pandas DataFrame
file_path = "https://static.bc-edx.com/ai/ail-v-1-0/m18/lms/datasets/student-loans.csv"
loans_df = pd.read_csv(file_path)

# Review the DataFrame
loans_df.head()

Unnamed: 0,payment_history,location_parameter,stem_degree_score,gpa_ranking,alumni_success,study_major_code,time_to_completion,finance_workshop_score,cohort_ranking,total_loan_score,financial_aid_score,credit_ranking
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,0
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,0
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0


In [None]:
# Review the data types associated with the columns
loans_df.dtypes

Unnamed: 0,0
payment_history,float64
location_parameter,float64
stem_degree_score,float64
gpa_ranking,float64
alumni_success,float64
study_major_code,float64
time_to_completion,float64
finance_workshop_score,float64
cohort_ranking,float64
total_loan_score,float64


In [None]:
# Check the credit_ranking value counts
loans_df["credit_ranking"].value_counts()

Unnamed: 0_level_0,count
credit_ranking,Unnamed: 1_level_1
1,855
0,744


### Step 2: Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “credit_ranking”. The remaining columns should define the features dataset.

In [None]:
# Find the number of unique values in each column
unique_counts = loans_df.nunique()

# Print the number of unique values for each column
print("Number of unique values in each column:")
print(unique_counts)


Number of unique values in each column:
payment_history            96
location_parameter        143
stem_degree_score          80
gpa_ranking                91
alumni_success            153
study_major_code           60
time_to_completion        144
finance_workshop_score    436
cohort_ranking             89
total_loan_score           96
financial_aid_score        65
credit_ranking              2
dtype: int64


In [None]:
# Define features set X by selecting all columns but credit_ranking
X=loans_df.drop('credit_ranking', axis=1)


# Review the features DataFrame
X.head()

Unnamed: 0,payment_history,location_parameter,stem_degree_score,gpa_ranking,alumni_success,study_major_code,time_to_completion,finance_workshop_score,cohort_ranking,total_loan_score,financial_aid_score
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4


### Step 3: Split the features and target sets into training and testing datasets.


In [None]:
# Split the preprocessed data into a training and testing dataset
# Assign the function a random_state equal to 1
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

### Step 4: Use scikit-learn's `StandardScaler` to scale the features data.

In [None]:
# Create a StandardScaler instance
X_scaler = skl.preprocessing.StandardScaler()

# Fit the scaler to the features training dataset
X_scaler.fit(X_train)

# Fit the scaler to the features training dataset
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

---

## Compile and Evaluate a Model Using a Neural Network

### Step 1: Create a deep neural network by assigning the number of input features, the number of layers, and the number of neurons on each layer using Tensorflow’s Keras.

> **Hint** You can start with a two-layer deep neural network model that uses the `relu` activation function for both layers.


In [None]:
len(X_train.columns)

11

In [None]:
# Define the the number of inputs (features) to the model
num_input_features = len(X_train.columns)

# Review the number of features
num_input_features


11

In [None]:
# Define the number of hidden nodes for the first hidden layer
hidden_layer_1 = num_input_features//2

# Define the number of hidden nodes for the second hidden layer
hidden_layer_2 = hidden_layer_1//2


# Define the number of neurons in the output layer
num_output_layer = 1

In [None]:
# Create the Sequential model instance
model = Sequential()

model.add(Input(shape=(num_input_features,)))

# Add the first hidden layer
model.add(Dense(units=hidden_layer_1, activation="relu"))

# Add the second hidden layer
model.add(Dense(units=hidden_layer_2, activation="relu"))

# Add the output layer to the model specifying the number of output neurons and activation function
model.add(Dense(units=num_output_layer, activation="sigmoid"))

In [None]:

# Display the Sequential model summary
model.summary()

### Step 2: Compile and fit the model using the `binary_crossentropy` loss function, the `adam` optimizer, and the `accuracy` evaluation metric.


In [None]:
# Compile the Sequential model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

In [None]:

# Fit the model using 50 epochs and the training data
model_fit = model.fit(X_train_scaled, y_train, epochs=50)

Epoch 1/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5531 - loss: 0.7320
Epoch 2/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5940 - loss: 0.6963  
Epoch 3/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6345 - loss: 0.6911 
Epoch 4/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6682 - loss: 0.6775 
Epoch 5/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6675 - loss: 0.6711 
Epoch 6/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6880 - loss: 0.6637
Epoch 7/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7038 - loss: 0.6621
Epoch 8/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7131 - loss: 0.6540
Epoch 9/50
[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━

### Step 3: Evaluate the model using the test data to determine the model’s loss and accuracy.


In [None]:
# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = model.evaluate(X_test_scaled, y_test)

# Display the model loss and accuracy results
display(model_loss)
display(model_accuracy)

[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7685 - loss: 0.5236


0.5544812679290771

0.7425000071525574

### Step 4: Save and export your model to a keras file, and name the file `student_loans.keras`.


In [None]:
# Set the model's file path
filepath = Path("student_loans.keras")

# Export your model to a keras file
model.save(filepath)


---
## Predict Loan Repayment Success by Using your Neural Network Model

### Step 1: Reload your saved model.

In [None]:
# Set the model's file path
filepath = Path("student_loans.keras")


# Load the model to a new object
model_imported = tf.keras.models.load_model(filepath)


### Step 2: Make predictions on the testing data and save the predictions to a DataFrame.

In [None]:
# Make predictions with the test data
predictions = model_imported.predict(X_test_scaled)


# Display a sample of the predictions
predictions[:7]


[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


array([[0.7223599 ],
       [0.3708474 ],
       [0.6338515 ],
       [0.73946565],
       [0.73946565],
       [0.73946565],
       [0.73946565]], dtype=float32)

In [None]:
# Save the predictions to a DataFrame and round the predictions to binary results
predictions_df = pd.DataFrame(columns=["predictions"], data=predictions)
predictions_df["predictions"]=round(predictions_df["predictions"], 0)
predictions_df

Unnamed: 0,predictions
0,1.0
1,0.0
2,1.0
3,1.0
4,1.0
...,...
395,1.0
396,0.0
397,1.0
398,0.0


### Step 4: Display a classification report with the y test data and predictions

In [None]:
# Print the classification report with the y test data and predictions
print(classification_report(y_test, predictions_df["predictions"].values))


              precision    recall  f1-score   support

           0       0.73      0.72      0.72       188
           1       0.75      0.76      0.76       212

    accuracy                           0.74       400
   macro avg       0.74      0.74      0.74       400
weighted avg       0.74      0.74      0.74       400



---
## Discuss creating a recommendation system for student loans

Briefly answer the following questions in the space provided:

1. Describe the data that you would need to collect to build a recommendation system to recommend student loan options for students. Explain why this data would be relevant and appropriate.

Basic information such as:

1. Age
2. Gender
3. Geographic Location
4. Academic History
5. Financial Background


---




2. Based on the data you chose to use in this recommendation system, would your model be using collaborative filtering, content-based filtering, or context-based filtering? Justify why the data you selected would be suitable for your choice of filtering method.

---

3. Describe two real-world challenges that you would take into consideration while building a recommendation system for student loans. Explain why these challenges would be of concern for a student loan recommendation system.

  Personalization: Providing personalized recommendations based on individual needs and circumstances. Analyze user financial situations, and preferences is a high concern for the financial institution in order to achive a win - win situation between student and financial institution. Incorporating feedback mechanisms to refine and improve personalization over time.



