<a href="https://colab.research.google.com/github/fintech-lex/deep-learning-funding/blob/main/Deep_Learning_Venture_Fund.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Venture Funding with Deep Learning

### Data Preparation to use on a Neural Network Model

Using Pandas and scikit-learn’s `StandardScaler()`, dataset will be used to compile and evaluate the neural network model later.

### Compiling and Evaluating a Binary Classification Model Using a Neural Network

Leveraging TensorFlow to design a binary classification deep neural network model. This model should use the dataset’s features to predict whether an company funded startup will be successful based on the features in the dataset. The number of inputs determine the number of layers that the model will contain or the number of neurons on each layer. Then, compile and fit your model. Finally, evaluate your binary classification model to calculate the model’s loss and accuracy.

### Optimization of the Neural Network Model

Using TensorFlow and Keras, we can optimize the model to improve it's accuracy.

In [1]:
# Imports
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

ModuleNotFoundError: No module named 'tensorflow'

---

## Prepare the data to be used on a neural network model

In [None]:
# Import applicants_data.csv to Google Colab
from google.colab import files
# uploaded = files.upload()

In [None]:
# Read the applicants_data.csv file from the Resources folder into a Pandas DataFrame
applicant_data_df = pd.read_csv("applicants_data.csv")

# Review the DataFrame
applicant_data_df.head()

In [None]:
# Review the data types associated with the columns
applicant_data_df.dtypes

### Drop the “EIN” (Employer Identification Number) and “NAME” columns from the DataFrame, because they are not relevant to the binary classification model.

In [None]:
# Drop the 'EIN' and 'NAME' columns from the DataFrame
applicant_data_df = applicant_data_df.drop(columns=['EIN', 'NAME'])

# Review the DataFrame
applicant_data_df

### Encode the dataset’s categorical variables using `OneHotEncoder` function, then placing the encoded variables into a new DataFrame.

In [None]:
# Create a list of categorical variables
categorical_variables = list(applicant_data_df.dtypes[applicant_data_df.dtypes == "object"].index)

# Display the categorical variables list
categorical_variables

In [None]:
# Create OneHotEncoder instance
enc = OneHotEncoder(sparse=False)

In [None]:
# Encode the categorcal variables using OneHotEncoder
encoded_data = enc.fit_transform(applicant_data_df[categorical_variables])

# Review the DataFrame
encoded_data.shape # Just to show that the encoded data is working and fit/shaped fine.

In [None]:
# Create a DataFrame with the encoded variables
encoded_df = pd.DataFrame(
    encoded_data)

# Review the DataFrame
encoded_df.head()

### Add original DataFrame’s numerical variables to the DataFrame containing the encoded variables.

In [None]:
#Preparing Data to be used below for Numerical Variables and encoding data accordingly to Concatenate Dataframe into original applicant data csv df
numerical_variables = list(applicant_data_df.dtypes[applicant_data_df.dtypes == "int"].index)

numerical_data = enc.fit_transform(applicant_data_df[numerical_variables])

num_var_df = pd.DataFrame(numerical_data)

In [None]:
# Add the numerical variables from the original DataFrame to the one-hot encoding DataFrame
encoded_df = pd.concat([num_var_df, applicant_data_df], axis=1)

# Review the DataFrame
encoded_df.head()

### Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “IS_SUCCESSFUL”. The remaining columns should define the features dataset.



In [None]:
# Define the target set y using the IS_SUCCESSFUL column
y = encoded_df['IS_SUCCESSFUL']

# Display a sample of y
display(y)

In [None]:
# Define features set X by selecting all columns but IS_SUCCESSFUL
X = encoded_df.drop(columns=encoded_df['IS_SUCCESSFUL'])

# Review the features DataFrame
X

### Split the features and target sets into training and testing datasets.


In [None]:
# Split the preprocessed data into a training and testing dataset
# Assign the function a random_state equal to 1
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

### Use scikit-learn's `StandardScaler` to scale the features data.

In [None]:
# Create a StandardScaler instance
scaler = StandardScaler()

# Convert feature names to string data type
X_train.columns = X_train.columns.astype(str)

# Fit the scaler to the features training dataset
X_scaler = scaler.fit(X_train)

# Fit the scaler to the features training dataset
X_train_scaled = X_scaler(X_train)
X_test_scaled = X_scaler(X_test)