## Preprocessing

In [48]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import tensorflow as tf
import keras_tuner as kt

#  Import and read the charity_data.csv.
import pandas as pd 
application_df = pd.read_csv("https://static.bc-edx.com/data/dl-1-2/m21/lms/starter/charity_data.csv")
application_df.head()

Unnamed: 0,EIN,NAME,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,10520599,BLUE KNIGHTS MOTORCYCLE CLUB,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,10531628,AMERICAN CHESAPEAKE CLUB CHARITABLE TR,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,10547893,ST CLOUD PROFESSIONAL FIREFIGHTERS,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,10553066,SOUTHSIDE ATHLETIC ASSOCIATION,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,10556103,GENETIC RESEARCH INSTITUTE OF THE DESERT,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1


In [49]:
application_df.columns

Index(['EIN', 'NAME', 'APPLICATION_TYPE', 'AFFILIATION', 'CLASSIFICATION',
       'USE_CASE', 'ORGANIZATION', 'STATUS', 'INCOME_AMT',
       'SPECIAL_CONSIDERATIONS', 'ASK_AMT', 'IS_SUCCESSFUL'],
      dtype='object')

The target variable of the model is:\
"IS_SUCCESSFUL"

The features variables are:\
'APPLICATION_TYPE', 'AFFILIATION', 'CLASSIFICATION',\
'USE_CASE', 'ORGANIZATION', 'STATUS', 'INCOME_AMT',\
'SPECIAL_CONSIDERATIONS', 'ASK_AMT'


The goal is to create a binary classification model that can predict if an Alphabet Soup-funded organization will be successful based on the features in the dataset.

In [50]:
# Drop the non-beneficial ID columns, 'EIN' and 'NAME'.
# After conferring with a classmate, I left 'NAME' in place. This actually seems to have a significant affect on model accuracy.
application_df.drop(['EIN'], axis=1, inplace=True)

In [51]:
# Determine the number of unique values in each column.
nunique_column = application_df.nunique()

print(nunique_column)

NAME                      19568
APPLICATION_TYPE             17
AFFILIATION                   6
CLASSIFICATION               71
USE_CASE                      5
ORGANIZATION                  4
STATUS                        2
INCOME_AMT                    9
SPECIAL_CONSIDERATIONS        2
ASK_AMT                    8747
IS_SUCCESSFUL                 2
dtype: int64


In [52]:
# Look at APPLICATION_TYPE value counts for binning
application_counts = application_df['APPLICATION_TYPE'].value_counts()

print(application_counts)

T3     27037
T4      1542
T6      1216
T5      1173
T19     1065
T8       737
T7       725
T10      528
T9       156
T13       66
T12       27
T2        16
T25        3
T14        3
T29        2
T15        2
T17        1
Name: APPLICATION_TYPE, dtype: int64


In [53]:
# Choose a cutoff value and create a list of application types to be replaced
# use the variable name `application_types_to_replace`
application_cutoff_value = 250
application_types_to_replace = application_counts[application_counts < application_cutoff_value].index.tolist()

# Replace in dataframe
for app in application_types_to_replace:
    application_df['APPLICATION_TYPE'] = application_df['APPLICATION_TYPE'].replace(app,"Other")

# Check to make sure binning was successful
application_df['APPLICATION_TYPE'].value_counts()

T3       27037
T4        1542
T6        1216
T5        1173
T19       1065
T8         737
T7         725
T10        528
Other      276
Name: APPLICATION_TYPE, dtype: int64

In [54]:
# Look at CLASSIFICATION value counts for binning
classification_counts = application_df['CLASSIFICATION'].value_counts()
print(classification_counts)

C1000    17326
C2000     6074
C1200     4837
C3000     1918
C2100     1883
         ...  
C4120        1
C8210        1
C2561        1
C4500        1
C2150        1
Name: CLASSIFICATION, Length: 71, dtype: int64


In [55]:
# You may find it helpful to look at CLASSIFICATION value counts >1
print(classification_counts[classification_counts > 1])

C1000    17326
C2000     6074
C1200     4837
C3000     1918
C2100     1883
C7000      777
C1700      287
C4000      194
C5000      116
C1270      114
C2700      104
C2800       95
C7100       75
C1300       58
C1280       50
C1230       36
C1400       34
C7200       32
C2300       32
C1240       30
C8000       20
C7120       18
C1500       16
C1800       15
C6000       15
C1250       14
C8200       11
C1238       10
C1278       10
C1235        9
C1237        9
C7210        7
C2400        6
C1720        6
C4100        6
C1257        5
C1600        5
C1260        3
C2710        3
C0           3
C3200        2
C1234        2
C1246        2
C1267        2
C1256        2
Name: CLASSIFICATION, dtype: int64


In [56]:
# Choose a cutoff value and create a list of classifications to be replaced
# use the variable name `classifications_to_replace`
classification_cutoff_value = 1750
classifications_to_replace = classification_counts[classification_counts < classification_cutoff_value].index.tolist()

# Replace in dataframe
for cls in classifications_to_replace:
    application_df['CLASSIFICATION'] = application_df['CLASSIFICATION'].replace(cls,"Other")
    
# Check to make sure binning was successful
application_df['CLASSIFICATION'].value_counts()

C1000    17326
C2000     6074
C1200     4837
Other     2261
C3000     1918
C2100     1883
Name: CLASSIFICATION, dtype: int64

In [57]:
application_df.head(10)

Unnamed: 0,NAME,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,BLUE KNIGHTS MOTORCYCLE CLUB,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,AMERICAN CHESAPEAKE CLUB CHARITABLE TR,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,ST CLOUD PROFESSIONAL FIREFIGHTERS,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,SOUTHSIDE ATHLETIC ASSOCIATION,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,GENETIC RESEARCH INSTITUTE OF THE DESERT,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1
5,MINORITY ORGAN & TISSUE TRANSPLANT & EDUCATION...,T3,Independent,C1200,Preservation,Trust,1,0,N,5000,1
6,FRIENDS OF ARTS COUNCIL OF GREATER DENHAM SPRI...,T3,Independent,C1000,Preservation,Trust,1,100000-499999,N,31452,1
7,ISRAEL EMERGENCY ALLIANCE,T3,Independent,C2000,Preservation,Trust,1,10M-50M,N,7508025,1
8,ARAMCO BRATS INC,T7,Independent,C1000,ProductDev,Trust,1,1-9999,N,94389,1
9,INTERNATIONAL ASSOCIATION OF FIRE FIGHTERS,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0


In [58]:
# Convert categorical data to numeric with `pd.get_dummies`
application_df_dummies = pd.get_dummies(application_df)

In [59]:
# Split our preprocessed data into our features and target arrays
y = application_df_dummies['IS_SUCCESSFUL']
X = application_df_dummies.drop(columns='IS_SUCCESSFUL')

# Split the preprocessed data into a training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

In [60]:
# Create a StandardScaler instances
scaler = StandardScaler()

# Fit the StandardScaler
X_scaler = scaler.fit(X_train)

# Scale the data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

## Compile, Train and Evaluate the Model

In [65]:
# Define the model - deep neural net, i.e., the number of input features and hidden nodes for each layer. 
# The definitions below are based on the results of tuning coded below.
# Initial manual adjustments yeilded model with 73% accuacy. Below the tuner has configured models 
# with 79% and 80% accuracy.  
# Commented out sections are due to the tuner only selecting 2 layers.  

input_dim = X.shape[1]

nn_model = tf.keras.models.Sequential()

# First hidden layer
nn_model.add(tf.keras.layers.Dense(units=15, activation="linear", input_dim=input_dim))

# Second hidden layer
nn_model.add(tf.keras.layers.Dense(units=11, activation="linear"))

# # Third hidden layer
# nn_model.add(tf.keras.layers.Dense(units=97, activation="linear"))

# # Fourth hidden layer
# nn_model.add(tf.keras.layers.Dense(units=57, activation="linear"))

# # Fifth hidden layer
# nn_model.add(tf.keras.layers.Dense(units=15, activation="linear"))

# # Sixth hidden layer
# nn_model.add(tf.keras.layers.Dense(units=87, activation="linear"))

# Output layer
nn_model.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

# Check the structure of the model
nn_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_7 (Dense)             (None, 15)                294180    
                                                                 
 dense_8 (Dense)             (None, 11)                176       
                                                                 
 dense_9 (Dense)             (None, 1)                 12        
                                                                 
Total params: 294368 (1.12 MB)
Trainable params: 294368 (1.12 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [66]:
# Compile the model
nn_model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

In [None]:
# Train the model. Epochs above 6 seem to overfit the data, yeilding high training accuracy, 
# but resulting in poorer testing accuracy.
fit_model = nn_model.fit(X_train_scaled, y_train, epochs=6)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


In [68]:
# Evaluate the model using the test data
model_loss, model_accuracy = nn_model.evaluate(X_test_scaled,y_test,verbose=2)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

268/268 - 1s - loss: 0.4531 - accuracy: 0.7948 - 628ms/epoch - 2ms/step
Loss: 0.4531373083591461, Accuracy: 0.7947521805763245


In [83]:
# Create a method that creates a new Sequential model with hyperparameter options to determine best layers/activation.
# This will allow us to determine an accurate model. 

def create_model(hp):
    
    nn_model2 = tf.keras.models.Sequential()

    # Allow kerastuner to decide which activation function to use in hidden layers
    activation = hp.Choice('activation',['relu','linear','softmax'])
    
    # Allow kerastuner to decide number of neurons in first layer
    nn_model2.add(tf.keras.layers.Dense(units=hp.Int('first_units',
        min_value=5,
        max_value=100,
        step=5), activation=activation, input_dim=input_dim))

    # Allow kerastuner to decide number of hidden layers and neurons in hidden layers
    for i in range(hp.Int('num_layers', 1, 4)):
        nn_model2.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i),
            min_value=5,
            max_value=100,
            step=2),
            activation=activation))
    
    nn_model2.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

    # Compile the model
    nn_model2.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
    
    return nn_model2

In [84]:
tuner = kt.Hyperband(
    create_model,
    objective="val_accuracy",
    max_epochs=10,
    hyperband_iterations=3,
    project_name='tuner1_results')

In [85]:
tuner.search(X_train_scaled,y_train,epochs=1,validation_data=(X_test_scaled,y_test))

Trial 90 Complete [00h 01m 11s]
val_accuracy: 0.7916035056114197

Best val_accuracy So Far: 0.7995335459709167
Total elapsed time: 01h 49m 31s
INFO:tensorflow:Oracle triggered exit


In [86]:
# Get top model hyperparameters and print the values
top_hyper = tuner.get_best_hyperparameters(1)
for param in top_hyper:
    print(param.values)

{'activation': 'linear', 'first_units': 65, 'num_layers': 3, 'units_0': 95, 'units_1': 47, 'units_2': 51, 'units_3': 15, 'tuner/epochs': 4, 'tuner/initial_epoch': 2, 'tuner/bracket': 2, 'tuner/round': 1, 'tuner/trial_id': '0010'}


Prior to the above results, a tuner search set to max_epochs=50,\
and  hyperband_iterations=10 returned the best model below\
after 13 hours of processing: 

* activation: 'linear'
* first_units: 15
* num_layers: 2
* units_0: 11
* units_1: 97
* units_2: 57
* units_3: 15
* units_4: 87
* tuner/epochs: 6
* tuner/initial_epoch: 2
* tuner/bracket: 32
* tuner/round: 1
* tuner/trial_id: '0383'

In [87]:
# Evaluate the top models against the test dataset
top_model = tuner.get_best_models(1)
for model in top_model:
    model_loss, model_accuracy = model.evaluate(X_test_scaled,y_test,verbose=2)
    print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

268/268 - 1s - loss: 0.4603 - accuracy: 0.7995 - 912ms/epoch - 3ms/step
Loss: 0.4603234529495239, Accuracy: 0.7995335459709167


Here are the results for the model tuning with max_epochs=50 and hyperband_iterations=10:
```
268/268 - 1s - loss: 0.4470 - accuracy: 0.8003 - 673ms/epoch - 3ms/step
Loss: 0.44697320461273193, Accuracy: 0.8003498315811157
```

Though the model has exceeded the target accuracy, I am going to attempt another tuner, \
limiting the values based on the results of the first iteration. 

In [91]:
# Attempting a second tuner, limiting parameters to those close to the initial tuner's values. 

def create_model2(hp):
    
    nn_model3 = tf.keras.models.Sequential()

    # Allow kerastuner to decide which activation function to use in hidden layers
    activation = hp.Choice('activation',['linear'])
    
    # Allow kerastuner to decide number of neurons in first layer
    nn_model3.add(tf.keras.layers.Dense(units=hp.Int('first_units',
        min_value=1,
        max_value=100,
        step=5), activation=activation, input_dim=input_dim))

    # Allow kerastuner to decide number of hidden layers and neurons in hidden layers
    for i in range(hp.Int('num_layers', 1, 2)):
        nn_model3.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i),
            min_value=1,
            max_value=100,
            step=5),
            activation=activation))
    
    nn_model3.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

    # Compile the model
    nn_model3.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
    
    return nn_model3

In [92]:
tuner2 = kt.Hyperband(
    create_model2,
    objective="val_accuracy",
    max_epochs=10,
    hyperband_iterations=5,
    project_name='tuner2_results')

In [93]:
tuner2.search(X_train_scaled,y_train,epochs=10,validation_data=(X_test_scaled,y_test))

Trial 150 Complete [00h 02m 22s]
val_accuracy: 0.7970845699310303

Best val_accuracy So Far: 0.7995335459709167
Total elapsed time: 02h 55m 49s
INFO:tensorflow:Oracle triggered exit


In [94]:
top_hyper2 = tuner2.get_best_hyperparameters(1)
for param in top_hyper2:
    print(param.values)

{'activation': 'linear', 'first_units': 36, 'num_layers': 1, 'units_0': 26, 'units_1': 76, 'tuner/epochs': 4, 'tuner/initial_epoch': 0, 'tuner/bracket': 1, 'tuner/round': 0}


In [95]:
top_model2 = tuner2.get_best_models(1)
for model in top_model2:
    model_loss, model_accuracy = model.evaluate(X_test_scaled,y_test,verbose=2)
    print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

268/268 - 1s - loss: 0.4457 - accuracy: 0.7995 - 705ms/epoch - 3ms/step
Loss: 0.4457036256790161, Accuracy: 0.7995335459709167


In [96]:
# Export our model to HDF5 file
top_model2[0].save("top_model2.h5")

  saving_api.save_model(


### Second tuning attempt results: 

```
Best val_accuracy So Far: 0.7995335459709167
Total elapsed time: 01h 39m 27s

Search: Running Trial #90

Value             |Best Value So Far |Hyperparameter
linear            |linear            |activation
96                |36                |first_units
1                 |1                 |num_layers
61                |26                |units_0
21                |76                |units_1
10                |4                 |tuner/epochs
0                 |0                 |tuner/initial_epoch
0                 |1                 |tuner/bracket
0                 |0                 |tuner/round

```

### REPORT

The purpose of this analysis is to use the provided dataset to create a binary classifier that can predict whether applicants will be successful if funded by Alphabet Soup. I am working with a CSV containing more than 34,000 organizations that have received funding from Alphabet Soup over the year, which has been provided by From Alphabet Soup’s business team.

The first step in my analysis was data preprocessing. In this step, I identified the target variable of my model, from the "IS_SUCCESSFUL" column in the dataset. This column was binary, and determined whether the money applicants recieved was used effectively.

The remainder of the dataset constitued the features of our model:

* NAME — Identification column
* APPLICATION_TYPE — Alphabet Soup application type
* AFFILIATION — Affiliated sector of industry
* CLASSIFICATION — Government organization classification
* USE_CASE — Use case for funding
* ORGANIZATION — Organization type
* STATUS — Active status
* INCOME_AMT — Income classification
* SPECIAL_CONSIDERATIONS — Special considerations for application
* ASK_AMT — Funding amount requested

I removed "EIN" from our analysis, but kept the identifier column, 'NAME'. Initial testing suggests that NAME actually positively affects accuracy by as much as 7%. 

I decided to keep the data as complete as possible to avoid any bias in my assumptions regarding the importance of my features. The downfall from this approach is that the dataset remains large, with lots of features, thus taking additional computing power. 

### Summary

I initialized a Keras Sequential model for evaluating the dataset, and began experimenting with a variety of configurations manually. I expected that our model would preform best with initial rectilinear, softmax or linear activation functions and an sigmoid output layer. Employing a sigmoid function as your output layer yeild improved accuracy when the target variable is binary. Given the type of data, it did not seem approprite to employ other activations better suited to non-linear data. These assumptions seem to be merited given the results below. After various iterations, and manually changing these values, I was acheiving a prediction of approximately 73%. After including 'NAME' in the analysis, I was able to manually acheive an accuracy of 75%. However, I was keen to see the accuracy improved, so I initialized a keras tuner to evaluate a wide range of neurons, layers, and activation functions. After 13 hours of processing this is the current best model with an accuracy of 80.03%: 

* activation: 'linear'
* first_units: 15
* num_layers: 2
* units_0: 11
* units_1: 97
* units_2: 57
* units_3: 15
* units_4: 87
* tuner/epochs: 6
* tuner/initial_epoch: 2
* tuner/bracket: 3
* tuner/round: 1
* tuner/trial_id: '0383'

With this model, I have exceeded the target accuracy of 75% by 5%.

Due to some impatience on my part, I interrupted the model tuning after 13 hours and cooked an egg on my motherboard. Additional time spent allowing the tuner to fully complete may yet yeild a model with better performance. It is my recommendation that the first tuner with 'max_epochs=50' and 'hyperband_iterations=10' be initialized again and run to completion.

The limitations of this approach is that the tuner seems not to blend different types of activations, or did not run long enough to offer combinations of activations for our layers. Additional manual manipluation of the 'best-model' may also provide increased performance beyond 80% accuracy. For example, all the layers in our 'best-model' are linear, so tweaking them, or adding additional softmax or relu layers may increase performance. Manual tweaking has yet to achieve results greater than 80%. 

An attempt was made to find a better model within the parameters suggested by the first tuner, but has yet to yeild better results. The best results thus far for this second tuner has been 79.95%, acheiving results quite close to the performance of the initial tuner.

Lastly, a combined approach using an initial PCA to determine the most important components, may well lead to improvement in computational time. By determining the most impactful principal components, the dataset and features could be reduced. It is my recommendation that a PCA analysis be completed prior to preprocessing, giving a measure of the most important features to be used in a subsequent binary classifier. However, initial attempts at manual manipulation of the dataset by reducing features has not yeilded any improvment in model accuracy. 