***
## ***Introducing Neural Networks***
***

Neural networks, also called **Artificial Neural Networks** (though it seems, in recent years, we’ve dropped the **“artificial”** part), are a type of ***machine learning*** often conflated with ***deep learning***. The defining characteristic of a deep​ neural network is having two or more hidden layers a concept that will be explained shortly, but these hidden layers are ones that the neural network controls. It’s reasonably safe to say that most neural networks in use are a form of deep learning.

***
## ***A Brief History***
***

Since the advent of computers, scientists have been formulating ways to enable machines to take input and produce desired output for tasks like **classification** and **regression** general, there’s **supervise** and **unsupervised**. Additionally, in machine learning. **Supervised machine** learning is used when you have pre-established and labeled data that can be used for training. suppose you have sensor data for a server with metrics such as upload/download rates, temperature, and humidity, all organized by time for every 10 minutes. Normally, this server operates as intended and has no outages, but sometimes parts fail and cause an outage. We might collect data and then divide it into two classes: one class for times/observations when the server is operating normally, and another class for times/observations when the server is experiencing an outage. When the server is failing, we want to label that sensor data leading up to failure as data that preceded a failure. When the server is operating normally, we simply label that data as “normal.” 

What each sensor measures in this example is called a **feature**. A group of features makes up a **feature set** (represented as vectors/arrays), and the values of a feature set can be referred to as a **sample**. Samples are fed into neural network models to train them to fit desired outputs from these inputs or to predict based on them during the inference phase.

**Neural networks** were conceived in the 1940s, but figuring out how to train them remained a mystery for 20 years. The concept of backpropagation (explained later) came in the 1960s, but neural networks still did not receive much attention until they started winning competitions in 2010. Since then, neural networks have been on a meteoric rise due to their sometimes seemingly magical ability to solve problems previously deemed unsolvable, such as **image captioning**, **language translation**, **udio** and **video synthesis**, and more. 

Currently, **neural networks** are the primary solution to most competitions and challenging technological problems like **self-driving cars**, **calculating risk**, **detecting fraud**, and early cancer detection, to name a few.

***
## ***What is a Neural Network?***
***

**Artificial neural networks** are inspired by the organic brain, translated to the computer. It’s not a perfect comparison, but there are **neurons**, **activations**, and **lots of interconnectivity**, even if the underlying processes are quite different. 

A typical neural network has thousands or even up to millions of adjustable parameters **(weights and biases)**. In this way, neural networks act as enormous functions with vast numbers of parameters. The concept of a long function with millions of variables that could be used to solve a problem isn’t all too difficult. With that many variables related to neurons, arranged as interconnected layers, we can imagine there exist some combinations of values for these variables that will yield desired outputs. Finding that combination of parameter (weight and bias) values is the challenging part.

The end goal for neural networks is to adjust their **weights** and **biases** (the parameters), so when applied to a yet unseen example in the input, they produce the desired output. When supervised machine learning algorithms are trained, we show the algorithm examples of inputs and their associated desired outputs. One major issue with this concept is overfitting when the algorithm only learns to fit the training data but doesn’t actually “understand” anything about underlying input-output dependencies. The network basically just **memorizes** the training data. 

Thus, we tend to use **in-sample** data to train a model and then use **out-of-sample** data to validate an algorithm (or a neural network model in our case). Certain percentages are set aside for both datasets to partition the data. 

For example: if there is a dataset of 100,000 samples of data and labels, you will immediately take 10,000 and set them aside to be your **out-of-sample** or **validation** data. You will then train your model with the other 90,000 in-sample or “training” data and finally validate your model with the 10,000 out-of-sample data that the model hasn’t yet seen. The goal is for the model to not only accurately predict on the training data, but also to be similarly accurate while predicting on the withheld out-of-sample validation data.