# Welcome


Lets start by making a simple synthetic data set using the make_moons library. 


For this first task, we will focus on classification. For those who forgot, it is simply predicting an output in it's correct class.


The neural net benefits from data that's not linearly separable, otherwise, there would be no reason for neural network to be used for this task. 


To be clear, it does not require non-linearly separable data, but you wouldn't be utilizing it to it's full potential. It has the capacity to model non-linear boundaries.


Linearly separable data benefit from cheaper classification models like logistic regression, linear SVMs, and perceptrons. 

### Identify Linearly Separable data

You would know that data is linearly separable if you trained on these models and achieved near 100% accuracy. Especially since these models limited by making linear boundaries for their predictions.


Another way of checking if your data is linearly separable is by graphing your data. Usually for 2-dimensional data, if you can draw straight, linear lines seperating your classes, then it confirms your data can be linearly separable.


Now which method is better? Completly depends on you. You can attempt to visualize the data and draw lines or you can train and predict a logistic regression model.

## The better model for linearly separable data

Do not underestimate the neural network. It can still learn the linear boundary. However it can also learn an uncessarily curved boundary sepearting the classes.

Now which models are best for dealing with linearly separable data? 
- A deep network with many different layers(or even small)
- Or a single linear decision boundary?


Yeah you'd want to choose the simpler model. You could probably get similar results, but not always.The simpler models in this case will be less computationally expensive, and you may not have to deal with uneccessarily complex decision surfaces. Due to the neural nets capabilities, it may learn small bumps, and not generalize. It may do something called overfit to the data. Overfitting, to those who may have forgotten, is when the model gets too used to the training data. If it's creating a non-linear boundary for linear data, and another dataset is put in for testing the neural net after it has been trained, it may perform poorly.


This relates to something called bias-variance trade off. Where bias (related to underfitting of data) stems from overly simple assumptions, and variance(related to overfitting) arises from excessive sensitivty to the training data fluctuations. If you have too much bias, your model may be too simple, and not capture a proper relationships between features(inputs) and target(output). And if you have too much variance, your model may be too complex where it's perfect on the training data, but it's useless on the new testing data, as it completly ignore a proper relationship between feature and target, and instead memorized the relation/boundaries.


We will go into more on this soon, along with the layers that compose a neural net, so sit tight.


Key take away: just because you can, doesn't mean you should. Your goal should be simplier with better results. 




# The data

The make moons library presents the data usually in crescent shapes, where each class are their own crescents. For now we can focus on two classes.

We also will just focus on 2 features.

Remember, if we have n-features, then our data exists in an n-dimensional space.

We can visualize up to 3 features into a 3-dimensional space, however beyond 3, we cannot visualize beyond 3D. The geometry still exists, don't worry, we just cannot see it.


The scare may come beyond 2 and 3 dimensions, where you cannot physically graph the data. However do not fret, the models can understand the data. And you can too. If you still want visual representations, you can plot 2 features against each other in different combinations. These features all live in their respective dimension space(if we say we have n features, then they live in the n dimensional space), but you can still plot two or three against each other to explore relationships between the features. 


One thing you need to know for classifcation of data for all dimensions is that it is always possible to have a separator of sorts. If you have 2 dimensional data, your separator is a 1 dimensional line, or if you have 3 features, your separator can be a plane. 4 dimensions and onwards, hyperplanes. These are called linear separators.


However in case, there will be no linear separators when using the make moons dataset, which represents 2 dimensional data in crescents. We may see some linear seperators between 2 or more features in multi-feature datasets, but keep in mind, that may not explain the full picture of what's going on.


Also one more thing before we move on, inorder for both features to show importance to identifying the data point's correct class, they need to be in similar magnitudes. If one feature is scaling in the millions and the second feature is scaling in the tens, then the first feature will have such a bigger impact on the conclusions. So to prevent this, we need to use a standard scaling to normalize the values for each feature to a 0 to 1 range, instead of a range for example -15002008 to 3000200. Do not worry, the data is not lost, just transformed. 

This transformation is very useful to many algorithms, not just neural nets. They can perform better because the features contribute equally to the model's performance, just as how they should. Ofcourse models can determine that some features are obsolete as it trains, but here atleast we give the features an equal chance to see if they are relevant.

The standard scalar will also come back soon in the neural net. (For those who already know, its the gradient descent. With normalized features, convergence can happen much faster than without.)

In [None]:
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1) Create synthetic 2D data (two interleaving crescents)
X, y = make_moons(n_samples=2000, noise=0.25, random_state=42)

# 2) Split into train/validation/test
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.30, random_state=42, stratify=y
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.50, random_state=42, stratify=y_temp
)

# 3) Scale features (VERY important for neural nets)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_val_s = scaler.transform(X_val)
X_test_s = scaler.transform(X_test)

print("Shapes:")
print("X_train:", X_train_s.shape, "y_train:", y_train.shape)
print("X_val  :", X_val_s.shape, "y_val  :", y_val.shape)
print("X_test :", X_test_s.shape, "y_test :", y_test.shape)