# TensorFlow and TensorBoard with Regularization



## Purpose

The purpose of this lab is threefold.  

1.   to review using `TensorFlow` and `TensorBoard` for modeling and evaluation with neural networks
2.   to review using data science pipelines and cross-validation with neural networks
3.   to review using `TensorFlow` for neural network regularization

We'll be continuting our investigation of the canonical [Titanic Data Set](https://www.kaggle.com/competitions/titanic/overview) that we began [previously](https://github.com/learn-co-curriculum/enterprise-paired-nn-eval).

## The Titanic

### The Titanic and it's data



RMS Titanic was a British passenger liner built by Harland and Wolf and operated by the White Star Line. It sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, after striking an iceberg during her maiden voyage from Southampton, England to New York City, USA.

Of the estimated 2,224 passengers and crew aboard, more than 1,500 died, making the sinking one of modern history's deadliest peacetime commercial marine disasters. 

Though there were about 2,224 passengers and crew members, we are given data of about 1,300 passengers. Out of these 1,300 passengers details, about 900 data is used for training purpose and remaining 400 is used for test purpose. The test data has had the survived column removed and we'll use neural networks to predict whether the passengers in the test data survived or not. Both training and test data are not perfectly clean as we'll see.

Below is a picture of the Titanic Museum in Belfast, Northern Ireland.

### Data Dictionary

*   *Survival* : 0 = No, 1 = Yes
*   *Pclass* : A proxy for socio-economic status (SES)
  *   1st = Upper
  *   2nd = Middle
  *   3rd = Lower
*   *sibsp* : The number of siblings / spouses aboard the Titanic
  *   Sibling = brother, sister, stepbrother, stepsister
  *   Spouse = husband, wife (mistresses and fiancés were ignored)
*   *parch* : The # of parents / children aboard the Titanic
  *   Parent = mother, father
  *   Child = daughter, son, stepdaughter, stepson
  *   Some children travelled only with a nanny, therefore *parch*=0 for them.
*   *Ticket* : Ticket number
*   *Fare* : Passenger fare (British pounds)
*   *Cabin* : Cabin number embarked
*   *Embarked* : Port of Embarkation
  *   C = Cherbourg (now Cherbourg-en-Cotentin), France
  *   Q = Queenstown (now Cobh), Ireland
  *   S = Southampton, England
*   *Name*, *Sex*, *Age* (years) are all self-explanatory

## Libraries and the Data



### Importing libraries

### Loading the data

## EDA and Preprocessing

### Exploratory Data Analysis

You have already performed EDA on this data set. Look back on what you did before or see [here](https://github.com/learn-co-curriculum/enterprise-paired-nn-eval).

Of course, feel free to re-run what you have done before or try out some other EDA as you find useful.

### Preprocessing

Let's do the same prepricessing as before.

## Neural Network Model

### Building the model

#### Define the model as a pipeline

Let's use the data science pipeline for our neural network model.

As you are now using regularization to guard against high variance, i.e. overfitting the data, in the definition of the model below include *dropout* and/or *l2* regularization. Also, feel free to experiment with different activation functions.

In [6]:

# It will help to define our model in terms of a pipeline
def build_classifier(optimizer):
    classifier = Sequential()
    classifier.add(Dense(units=10,kernel_regularizer='l2',activation='relu',input_dim=14))
    classifier.add(Dropout(rate = 0.2))
    classifier.add(Dense(units=128,kernel_regularizer='l2',activation='relu'))
    classifier.add(Dropout(rate = 0.2))
    classifier.add(Dense(units=1,kernel_regularizer='l2',activation='sigmoid'))
    classifier.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])
    return classifier

#### Use grid search to find help you tune the parameters

You can play with optimizers, epochs, and batch sizes. The ones that we're suggesting are not necessarily the best.

#### `TensorBoard`

`TensorBoard` is `TensorFlow`'s visualization toolkit. It is a dashboard that provides visualization and tooling that is needed for machine learning experimentation. The code immediately below will allow us to use TensorBoard.

N.B. When we loaded the libraries, we loaded the TensorBoard notebook extension. (It is the last line of code in the first code chunk.)

#### Fitting the optimal model and evaluating with `TensorBoaard`

Define the early stopping callback. Use your best values from grid serarch with `KerasClassifer` and finally fit the model.

#### Results and Predictions

Calculate the predictions, save them as a csv, and print them.

In [13]:

# Results
preds = classifier.predict(test)
results = ids.assign(Survived=preds)
results['Survived'] = results['Survived'].apply(lambda row: 1 if row > 0.5 else 0)
results.to_csv('titanic_submission.csv',index=False)
results.head(20)




Unnamed: 0,PassengerId,Survived
0,892,0
1,893,0
2,894,0
3,895,0
4,896,0
5,897,0
6,898,1
7,899,0
8,900,1
9,901,0


Continue to tweak your model until you are happy with the results based on model evaluation.

## Conclusion

Now that you have the `TensorBoard` to help you look at your model, you can better understand how to tweak your model.

How do your predictions compare to what you did last time?

Remember that your "fancier" model may be less accurate... but that is okay if that is the case since we're trying to guard against variance with regularization techniques.