# **Optimize the previously created neural net model**
**Challenge requirements:**
Adjust the input data to ensure that no variables or outliers are causing confusion in the model, such as:
* Dropping more or fewer columns. ([done](https://colab.research.google.com/drive/1HYhLTKx2PZI4Jt2pB2wPX4oQohdxMwYC#scrollTo=DqoKh54538Ti&line=1&uniqifier=1))
* Creating more bins for rare occurrences in columns.
* Increasing or decreasing the number of values for each bin.
* Add more neurons to a hidden layer.
* Add more hidden layers.
* Use different activation functions for the hidden layers.
* Add or reduce the number of epochs to the training regimen.

## Begin by setting up the environment and acquiring model & data

In [None]:
# install interactive tables & viz tools
!pip install itables
!pip install pygwalker
!pip install keras_tuner

In [16]:
# import dependencies
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import sklearn as skl

# ignore warning messages
import warnings
warnings.simplefilter('ignore')

# make all dataframes interactive
import itables
itables.init_notebook_mode(all_interactive=True)

In [9]:
# attach to Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [10]:
# import the saved model
model=tf.keras.models.load_model('/content/drive/MyDrive/charity_model.h5')

# import the csv as df
df=pd.read_csv('https://static.bc-edx.com/data/dl-1-2/m21/lms/starter/charity_data.csv')

In [11]:
# verify df & explore content
df

EIN,NAME,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
Loading... (need help?),,,,,,,,,,,


## Explore the dataset

In [12]:
# see all the unique values for each column
unique=pd.DataFrame()
for col in df.columns:
  print(df[col].unique())

[ 10520599  10531628  10547893 ... 996012607 996015768 996086871]
['BLUE KNIGHTS MOTORCYCLE CLUB' 'AMERICAN CHESAPEAKE CLUB CHARITABLE TR'
 'ST CLOUD PROFESSIONAL FIREFIGHTERS' ...
 'THE LIONS CLUB OF HONOLULU KAMEHAMEHA'
 'AMERICAN FEDERATION OF GOVERNMENT EMPLOYEES LOCAL 2886'
 'WATERHOUSE CHARITABLE TR']
['T10' 'T3' 'T5' 'T7' 'T4' 'T6' 'T2' 'T9' 'T19' 'T8' 'T13' 'T12' 'T29'
 'T25' 'T14' 'T17' 'T15']
['Independent' 'CompanySponsored' 'Family/Parent' 'National' 'Regional'
 'Other']
['C1000' 'C2000' 'C3000' 'C1200' 'C2700' 'C7000' 'C7200' 'C1700' 'C4000'
 'C7100' 'C2800' 'C6000' 'C2100' 'C1238' 'C5000' 'C7120' 'C1800' 'C4100'
 'C1400' 'C1270' 'C2300' 'C8200' 'C1500' 'C7210' 'C1300' 'C1230' 'C1280'
 'C1240' 'C2710' 'C2561' 'C1250' 'C8000' 'C1245' 'C1260' 'C1235' 'C1720'
 'C1257' 'C4500' 'C2400' 'C8210' 'C1600' 'C1278' 'C1237' 'C4120' 'C2170'
 'C1728' 'C1732' 'C2380' 'C1283' 'C1570' 'C2500' 'C1267' 'C3700' 'C1580'
 'C2570' 'C1256' 'C1236' 'C1234' 'C1246' 'C2190' 'C4200' 'C0' 'C3200'
 'C5

In [13]:
# create new df with only interesting columns

# Drop the columns that are not needed for the analysis
clean = df.drop(['EIN', 'NAME', 'APPLICATION_TYPE', 'AFFILIATION', 'CLASSIFICATION', 'STATUS', 'SPECIAL_CONSIDERATIONS'], axis=1)
clean=pd.DataFrame(clean,columns=['USE_CASE','ORGANIZATION','INCOME_AMT','ASK_AMT','IS_SUCCESSFUL'])

# Group the data by 'APPLICATION_TYPE' and calculate the mean of 'ASK_AMT' for each group
cleangroup = df.groupby('IS_SUCCESSFUL')['ASK_AMT'].mean()

# Reset the index to convert the groupby object to a DataFrame


In [14]:
print('-------Explore the clean df---------')
print('First 5 rows')
print(clean.head())
print('\nShape')
print(clean.shape)
print('\nInfo')
print(clean.info())
print('\nDescribe')
print(clean.describe(include='all'))#.apply(lambda s: s.apply('{0:.2}'.format)))
print('\nMean asks by Success')
print(cleangroup[:])
print('\nCount success/fail')
print(clean['IS_SUCCESSFUL'].value_counts())
print('\nCount use cases')
print(clean['USE_CASE'].value_counts())
print('\nCount orgs')
print(clean['ORGANIZATION'].value_counts())

-------Explore the clean df---------
First 5 rows
       USE_CASE  ORGANIZATION     INCOME_AMT  ASK_AMT  IS_SUCCESSFUL
0    ProductDev   Association              0     5000              1
1  Preservation  Co-operative         1-9999   108590              1
2    ProductDev   Association              0     5000              0
3  Preservation         Trust    10000-24999     6692              1
4     Heathcare         Trust  100000-499999   142590              1

Shape
(34299, 5)

Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34299 entries, 0 to 34298
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   USE_CASE       34299 non-null  object
 1   ORGANIZATION   34299 non-null  object
 2   INCOME_AMT     34299 non-null  object
 3   ASK_AMT        34299 non-null  int64 
 4   IS_SUCCESSFUL  34299 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 1.3+ MB
None

Describe
            USE_CASE ORGANIZATION INCO

## Tune the old model ("model")

In [20]:
# create a method to create a sequential model w/ hyperparameter options
def create_model(hp):
    nn_model = tf.keras.models.Sequential()

    # Allow kerastuner to decide which activation function to use in hidden layers
    activation = hp.Choice('activation',['relu','tanh'])

    # Allow kerastuner to decide number of neurons in first layer
    nn_model.add(tf.keras.layers.Dense(units=hp.Int('first_units',
        min_value=1,
        max_value=30,
        step=5), activation=activation, input_dim=2))

    # Allow kerastuner to decide number of hidden layers and neurons in hidden layers
    for i in range(hp.Int('num_layers', 1, 5)):
        nn_model.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i),
            min_value=1,
            max_value=30,
            step=5),
            activation=activation))

    nn_model.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

    # Compile the model
    nn_model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])

    return nn_model

In [21]:
# import kerastuner
import keras_tuner as kt
tuner=kt.Hyperband(
    create_model,
    objective="val_accuracy",
    max_epochs=20,
    hyperband_iterations=2)

In [24]:
# Split our preprocessed data into our features and target arrays
y=clean['IS_SUCCESSFUL'].values
X=clean.drop(columns='IS_SUCCESSFUL').values

# Split the preprocessed data into a training and testing dataset
X_train,X_test,y_train,y_test=train_test_split(X,y)

In [28]:
from sklearn.preprocessing import StandardScaler

# create scaler instance
scaler=StandardScaler()

# fit the scaler
X_scaler=scaler.fit(X_train)

# scale the data
X_train_scaled=X_scaler.transform(X_train)
X_test_scaled=X_scaler.transform(X_test)

ValueError: could not convert string to float: 'Preservation'

## Report
1. Overview
  * This analysis utilizes standard machine learning methods and techniques to predict the success or failure of philanthropic funding based on the performance of 34,000 past applicants.
  * Specifically, we created a binary classifier utilizing key applicant metadata -- organization type, use case, and ask amount -- to predict future applicant success likelihood.
  * Our current model achieves 73% prediction success from past applicant test data.
  * Future iterations will fine-tune the model to achieve greater prediction success.
2. Results
  *   Data Preprocessing
    *   What variables are the target(s) for your model?
        * IS_SUCCESSFUL
    *   What variable(s) are the features for your model?
        * USE_CASE, ORGANIZATION, ASK_AMT  
    *   What variablee(s) should be removed from the input data because they are neither targets nor features?
        * INCOME_AMT in addition to previously cleaned remaining variables
  *   Compiling, training, and evaluating the model
    *   How many neurons, layers, and activation functions did you select for your neural network model, and why?
        * Neurons = 8
        * Layers = 3
        * Activation functions = relu and sigmoid
    *   Were you able to chaieve the target model performance?
        * No, but close! 73%
    *   What steeps did you take in your attempts to increase model performance?
        * I attempted to use kerastuner but was unsuccessful
        * I intended to write a function to automate neuron, layer, and activation function selection but did not get that far
3. Summary
    * The overall effort was partially successful, achieving 73% prediction accuracy. However, our team wishes to continue working and is highly confident much higher prediction accuracy is readily available with two (2) more weeks' effort.