# Case Study 6 - Neural Networks

__Team Members:__ Amber Clark, Andrew Leppla, Jorge Olmos, Paritosh Rai

# Content
* [Business Understanding](#business-understanding)
    - [Scope](#scope)
    - [Introduction](#introduction)
    - [Methods](#methods)
    - [Results](#results)
* [Data Evaluation](#data-evaluation)
    - [Loading Data](#loading-data) 
    - [Data Summary](#data-summary)
    - [Missing Values](#missing-values)
    - [Feature Removal](#feature-removal)
    - [Exploratory Data Analysis (EDA)](#eda)
* [Model Preparations](#model-preparations)
    - [Sampling & Scaling Data](#sampling-scaling-data)
    - [Proposed Method](#proposed-metrics)
    - [Evaluation Metrics](#evaluation-metrics)
    - [Feature Selection](#feature-selection)
* [Model Building & Evaluations](#model-building)
    - [Sampling Methodology](#sampling-methodology)
    - [Model](#model)
    - [Performance Analysis](#performance-analysis)
* [Model Interpretability & Explainability](#model-explanation)
    - [Examining Feature Importance](#examining-feature-importance)
* [Conclusion](#conclusion)
    - [Final Model Proposal](#final-model-proposal)
    - [Future Considerations, Model Enhancements and Alternative Modeling Approaches](#model-enhancements)

# Business Understanding & Executive Summary <a id='business-understanding'/>

## Objective:

The objective of this case study is to predict the detection of a new subatomic particle with high accuracy from a dataset with 7 million records.  

## Introduction:
No information regarding the data in the case study was provided; the only stipulation given was to classify a binary variable representing "the existence of a particle" using a neural network. In terms of data detection of the binary classifier, 1 represents detection, and 0 represents non-detection. The client has advised that this is a massive amount of data best modeled with Neural Networks, and a high level of accuracy is critical.

### Artificial Neural Networks

Neural networks are based on brain biology and stimulate the brain's function. Based on neuroscience, neurons are connected by axons to other neurons. This concept is applied in an Artificial Neural Network (ANN). An ANN comprises groups of "neurons" called layers. These layers are connected in a network to take inputs from the dataset, fit model weights to the inputs, and eventually produce outputs that can be used to classify a target variable. The layers between the inputs and the target outputs in a neural network are called hidden layers.   

#### TODO: Add image

Physiologically, neurons work by firing signals only when a certain signal "threshold" is reached. This behavior is mimicked by ANNs. Any signal input below the threshold will not result in an output from the neuron, while any signal at or above the threshold will result in a constant output. Various activation functions are used to mathematically approximate how a neuron works. Activation functions are equations that determine the output of a neural network model. 

Some of the common activation functions are discussed below:

##### TODO: Second Image
Each neuron represents a regression in the neural network and calculates an output. A neural network is an ensemble of many regressors that will take the outputs of previous regressors as inputs. This results in a large ensemble of regression models.


## Modeling:

### Training and Test Split
70/30 split ---> batch and epochs?


### Key Metrics
talk about accuracy/auc and why that's our metric 


### Results



### Feature Importance
Add linear feature importances 


## Conclusion



## Future Considerations



# Data Evaluation <a id='data-evaluation'>
    

Summarize the data being used in the case using appropriate mediums (charts, graphs, tables); address questions such as: Are there missing values? Which variables are needed (which ones are not)? What assumptions or conclusions are you drawing that need to be relayed to your audience?

## Loading Data <a id='loading-data'>

## Data Summary <a id='data-summary'>
    
### Data Exploration and Manipulation:
    
The provided data, although not described in detail, is a large dataset consisting of 28 features and a binary class. The column mass is the only named feature; all others are arbitrarily numbered, and all features are numeric. There are seven million observations. There are no known missing values in the data with the caveat that it is unknown whether zeros could constitute missing data.
The only manipulation required for preparing this data for use in a neural network model is to change the target class object type to Boolean to save a small amount of space and to normalize the range of the features, which was performed after the data was split into test/train data set. In addition, the target classes are very well balanced in the dataset.


## Missing Values <a id='missing-values'>
There are no missing Values -- elaborate on this point later 


## Exploratory Data Analysis (EDA) <a id='eda'>

    
### Target Variable Class Distribution

### todo: add image

Also, correlations between features f6, f10, f14, f18, and f26 were observed and also with the target variable. However, all the variables will be included in the model fitting exercise as there is no domain knowledge of the features to assess if some of them can be excluded from the analysis instead of the others.

    
    
### todo: add correlation image
   

# Model Preparations <a id='model-preparations'/>

Which methods are you proposing to utilize to solve the problem?  Why is this method appropriate given the business objective? How will you determine if your approach is useful (or how will you differentiate which approach is more useful than another)?  More specifically, what evaluation metrics are most useful given that the problem is a classification one (ex., Accuracy, F1-score, Precision, Recall, AUC, etc.)?

## Sampling & Scaling Data <a id='sampling-scaling-data' />


Training and test sets were created from the data using the stratified method to maintain the ratio of the binary outcome.  This was done in an abundance of caution, because the classes are almost perfectly balanced. 30% of the data was withheld for the test set, and the defining features were normalized.


## Proposed Method <a id='proposed-metrics' />

The stakeholders wanted our team to focus on creating a model that would predict the existance of a new particle with high accuracy above all, and the model interpretability was not a priority. With this mind our team decided on using an Artificial Neural Network to achieve a high accuracy model. Our has input layer of 28 neurons for each of the feature with X hidden layers and a single neuron output layer with a sigmoid activation function and a BinaryCrossentropy loss since our target variable is binary. The hidden layers used a ReLu activation, which was chosen for its non-linear characteristics that helps estimating non-linear functions. The team decided to have X neurons and X neurons...for each hidden layer respectively. 

In experimentations our best results were achieved with a batch size of 1000. This gave a large sample size limit uncessary fluctations, this gave the model the right balance between variance and bias. Additionally, the batch sizes were small enough to compute in memory, but not so small it would increase processing time dramatically. The team ran 100 epochs however after 30 epochs there was no further improvment was observed. 

## Evaluation Metrics <a id='evaluation-metrics' />

### Baseline Model

For our baseline model the team decided to run a logistic regression model. The model used a 70/30 stratified split, with L1 penalty and saga solver. The logistic model was chosen as this a simple, quick, and interprateable model. This gave the team a benchmark for accuracy to compare the proposed artificial neural network accuracy. 

## Feature Selection <a id='feature-selection' />

All the features were used in the proposed neural network model. The team chose not use regularization since the training and test set evalution metric results aligned, which indicates that the neural network model was not overfitting. 


# Model Building & Evaluations <a id='model-building'/>

In this case, your primary task is to construct a neural network to detect the existence of new particles and will involve the following steps:

- Construct your neural network's architecture
- Fit your neural network to your training data
- Analyze your model's performance - referencing your chosen evaluation metric (including supplemental visuals and analysis where appropriate)


The team initially fit a Logistic Regression model to the data set to get a baseline accuracy rate for the prediction. Then, a neural network model was fit to assess the improvement in the accuracy rate.


## Sampling Methodology <a id='sampling-methodology'/>

## Modeling

### Final Model


## Model's Performance Analysis <a id='performance-analysis'/>

## Model Interpretability & Explainability <a id='model-explanation'>

### Final Model Proposal <a id='final-model-proposal'/>

### Examining Feature Importance <a id='examining-feature-importance'/>


# Conclusion <a id='conclusion'>

After all of your technical analysis and modeling; what are you proposing to your audience and why?  How should they view your results and what should they consider when moving forward?  Are there other approaches you'd recommend exploring?  This is where you "bring it all home" in language they understand.

### Future Considerations, Model Enhancements and Alternative Modeling Approaches <a id='model-enhancements'/>

## References