# Topic : Neural Network for Multiple Linear Regression  
## Objective for this template:

1. Introduce participants to fundamental concepts of multiple linear regression
2. Use tensorflow to build a simple sequential neural network regression model that accepts multiple features.
3. Demonstrate the process of inspecting attribute relationships as well as training  and evaluating the performance of the model
4. Allow participants to practice adjusting various parameters of the model to improve performance.

Designed By: _Rodolfo C. Raga Jr._  __Copyright @2019__

__Permission granted to use template for educational purposes so long as this heading is not removed.__
---



---



**Step 1**
:Import TensorFlow library as tf.

Import keras module from TensorFlow.

Import layers module from TensorFlow's Keras API.

Import MinMaxScaler class from scikit-learn.

Import train_test_split function from scikit-learn.

Import r2_score function from scikit-learn.

Import pyplot module from matplotlib.

Import pandas library as pd.

Import seaborn library as sns.

Import io module.

Print the current version of TensorFlow.

Print a message asking the user to select a dataset to load.

Import files module from google.colab.

Use files.upload() to prompt the user to upload a dataset and store it in the uploaded variable.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import io

print("Done with library declaration. Current version of Tensorflow is :",tf.__version__)
print("Select dataset to load...")
from google.colab import files
uploaded=files.upload()

Done with library declaration. Current version of Tensorflow is : 2.13.0
Select dataset to load...


**Step 2** :
dataset_raw = pd.read_csv(io.BytesIO(uploaded['auto-mpg.csv'])): This line reads data from the 'auto-mpg.csv' file, which was previously uploaded, and loads it into a Pandas DataFrame. It uses the Pandas library (imported as pd) and the io module to accomplish this.

print("Done with loading data to dataframes..."): This line prints a message to indicate that the data has been successfully loaded into DataFrames. It serves as a progress or status update.

dataset_raw.head(): This code displays the first few rows of the dataset_raw DataFrame. It provides a quick preview of the loaded data to ensure it was read correctly.


In [None]:
dataset_raw = pd.read_csv(io.BytesIO(uploaded['auto-mpg.csv']))
print("Done with loading data to dataframes...")

dataset_raw.head()

KeyError: ignored

**Step 3** :
dataset_raw.pop(car name):
This line removes the column named "car name" from the dataset_raw DataFrame.

origin = dataset_raw.pop(origin):
It removes the column named "origin" from the dataset_raw DataFrame and assigns its values to the variable 'origin.'

dataset_raw['USA'] = (origin == 1)*1.0:
This line creates a new column 'USA' in the dataset_raw DataFrame. It assigns a value of 1.0 to each row where the 'origin' column has a value of 1 (indicating cars from the USA) and 0.0 to all other rows.

dataset_raw['Europe'] = (origin == 2)*1.0:
Similar to the previous line, this one creates a new column 'Europe' in the dataset_raw DataFrame. It assigns a value of 1.0 to each row where the 'origin' column has a value of 2 (indicating cars from Europe) and 0.0 to all other rows.

dataset_raw['Japan'] = (origin == 3)*1.0:
This line creates a new column 'Japan' in the dataset_raw DataFrame. It assigns a value of 1.0 to each row where the 'origin' column has a value of 3 (indicating cars from Japan) and 0.0 to all other rows.

dataset_raw.head():
this code displays the first few rows of the modified dataset_raw DataFrame, which now includes the 'USA,' 'Europe,' and 'Japan' columns based on the 'origin' values.


In [None]:
dataset_raw.pop("car name")
#dataset_raw.isna().sum()
#dataset_raw = dataset_raw.dropna()
origin = dataset_raw.pop('origin')
dataset_raw['USA'] = (origin == 1)*1.0
dataset_raw['Europe'] = (origin == 2)*1.0
dataset_raw['Japan'] = (origin == 3)*1.0
dataset_raw.head()

**Step 4** :
sns.pairplot(dataset_raw[["mpg", "cylinders", "displacement", "weight", "acceleration"]], diag_kind="kde"):
This line uses Seaborn (imported as sns) to create a pairplot. A pairplot is a grid of scatterplots showing relationships between pairs of variables. In this case, it's plotting the variables "mpg," "cylinders," "displacement," "weight," and "acceleration" from the dataset_raw DataFrame. The `diag_kind="kde"` argument specifies that the diagonal plots should be kernel density estimate (KDE) plots instead of histograms.

train_stats = dataset_raw.describe():
This line computes summary statistics for the dataset_raw DataFrame using the describe() method. It calculates statistics like count, mean, standard deviation, minimum, and maximum for each numerical column in the DataFrame. The results are stored in the train_stats variable.


In [None]:
sns.pairplot(dataset_raw[["mpg", "cylinders", "displacement", "weight", "acceleration"]], diag_kind="kde")

train_stats = dataset_raw.describe()
#train_stats.pop("mpg")
#train_stats = train_stats.transpose()
#train_stats

**Step 5** :
scaler = MinMaxScaler(feature_range=(0, 1)):
This line creates an instance of the MinMaxScaler class (imported from scikit-learn). The `feature_range=(0, 1)` argument specifies that the scaler should scale features to a range between 0 and 1.

rescaledDS = scaler.fit_transform(dataset_raw):
It applies the MinMax scaling transformation to the dataset_raw DataFrame using the scaler instance created earlier. This scales all numerical features in the DataFrame to the specified feature range (0 to 1) and stores the scaled values in the rescaledDS variable.

rescaledDF = pd.DataFrame(rescaledDS):
This line creates a new Pandas DataFrame, rescaledDF, from the rescaledDS NumPy array. It's done to maintain the scaled data in a DataFrame format.

print(rescaledDF.head()): this code prints the first few rows of the rescaledDF DataFrame, showing the scaled values of the dataset.




In [None]:
scaler = MinMaxScaler(feature_range=(0, 1))
rescaledDS = scaler.fit_transform(dataset_raw)
rescaledDF = pd.DataFrame(rescaledDS)
print(rescaledDF.head())

**Step 6** :
predictor_dataset = rescaledDF.iloc[:, 1:10]:
This line creates a new DataFrame, predictor_dataset, by selecting all rows and columns from the rescaledDF DataFrame, but only columns from index 1 to 9 (10th column is excluded). These columns are likely the predictor variables for a machine learning model.

target_dataset = rescaledDF.iloc[:0]:
There is an error in this line. It attempts to create a target_dataset DataFrame by selecting rows up to index 0 (which would be an empty DataFrame) from rescaledDF. This line should be corrected to select the appropriate target variable(s) from the DataFrame.

X_train, X_test, y_train, y_test = train_test_split(predictor_dataset, target_dataset, random_state=42, test_size=0.3):
This line uses the train_test_split function from scikit-learn to split the predictor_dataset and target_dataset into training and testing subsets. The random_state parameter is set to 42 to ensure reproducibility, and the test_size parameter specifies that 30% of the data should be used for testing. The resulting datasets are assigned to X_train, X_test, y_train, and y_test variables.

print("Done with data separation..."):
This line prints a message to indicate that the data separation into training and testing sets has been completed.

print(y_train.tail()):
This code prints the last few rows of the y_train dataset. It's a useful step to inspect a part of the target variable for the training set.

X_train.tail():
This would have printed the last few rows of the X_train dataset, which contains the predictor variables for the training set. However, there's a typo in this line; it should be corrected to X_train.tail() to execute properly.


In [None]:
predictor_dataset = rescaledDF.iloc[:,1:10]
target_dataset = rescaledDF.iloc[:0]

X_train, X_test, y_train, y_test = train_test_split(predictor_dataset,target_dataset, random_state=42, test_size=0.3)
print("Done with data separation...")

print(y_train.tail())
X_train.tail()


**Step 7**:
This line of code initializes a new neural network model using TensorFlow and Keras. The tf.keras.Sequential() function creates an empty sequential model, which is a linear stack of layers. In this model, you can add layers one after the other to build a neural network architecture. The sequential model is suitable for building feedforward neural networks where data flows sequentially from one layer to the next. You can add layers to this model using the .add() method, specifying the type and configuration of each layer as you build your neural network.

In [None]:
model =  tf.keras.Sequential()

**Step 8**:

Set 1 (with activation functions):
1. layer_0 = tf.keras.layers.Dense(units=40, activation=relu, input_shape=[len(X_train.keys())]): This line defines the first layer (input layer) with 40 units (neurons), ReLU (Rectified Linear Unit) activation function, and an input shape based on the number of features in the training data.

2. layer_1 = tf.keras.layers.Dense(units=40, activation=relu): This line defines the second hidden layer with 40 units and a ReLU activation function.

3. layer_2 = tf.keras.layers.Dense(units=10, activation=relu): This line defines the third hidden layer with 10 units and a ReLU activation function.

4. layer_3 = tf.keras.layers.Dense(units=1): This line defines the output layer with 1 unit (typically used for regression tasks), and no activation function (i.e., it uses a linear activation by default).

Set 2 (without activation functions):
the layers are defined similarly, but without specifying activation functions. In this case, linear activation functions are used by default for all layers.


In [None]:
layer_0 = tf.keras.layers.Dense (units=40, activation="relu", input_shape=[len(X_train.keys())])
layer_1 = tf.keras.layers.Dense (units=40, activation="relu",)
layer_2 = tf.keras.layers.Dense (units=10, activation="relu",)
layer_3 = tf.keras.layers.Dense(units=1)

#layer_0 = tf.keras.layers.Dense (units=40, input_shape=[len(X_train.keys())])
#layer_1 = tf.keras.layers.Dense (units=40)
#layer_2 = tf.keras.layers.Dense (units=10)
#layer_3 = tf.keras.layers.Dense(units=1)

**Step 9**:
#model = tf.keras.Sequential([layer_0, layer_1, layer_2]):
It appears to be an attempt to create a sequential model (a neural network) and add layers layer_0, layer_1, and layer_2 to it in a single step.

model.add(layer_0):
This line adds the layer_0 (input layer) to the model. It's an example of how to add a layer to a neural network model sequentially.

model.add(layer_1):
This line adds the layer_1 (hidden layer) to the model. It's an example of how to add another layer to the neural network model.

model.add(layer_2):
This line adds the layer_2 (another hidden layer) to the model. It's an example of how to add yet another layer to the neural network model.

model.add(layer_3):
This line adds the layer_3 (output layer) to the model. It's an example of how to add the output layer to the neural network model.

model.summary():
This code prints a summary of the neural network model, including information about the layers, the number of parameters in each layer, and the total number of parameters in the model. It's a useful way to inspect the architecture and size of your neural network.


In [None]:
#model = tf.keras.Sequential([layer_0,layer_1,layer_2])
model.add(layer_0)
model.add(layer_1)
model.add(layer_2)
model.add(layer_3)
model.summary()

**Step 10**:
The provided code compiles a neural network model with the following configuration:

- Loss function: Mean squared error (MSE), commonly used for regression tasks.
- Optimizer: Adam optimizer, a popular choice for training neural networks.
- Evaluation metrics: Mean absolute error (MAE) and mean squared error (MSE) for assessing model performance.

After compiling the model, it prints a message indicating that the compilation is complete. Compiling a model is a crucial step in setting up its training process.

In [None]:
model.compile(loss='mean_squared_error',
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['mean_absolute_error', 'mean_squared_error'])
print("Done with compile")

**Step 11**:
The provided code trains a neural network model with the following configuration:

- Training data: It uses `X_train` as the input features and `y_train` as the target values.
- Number of epochs: The model is trained for 100 epochs.
- Verbosity: During training, it displays progress information for each epoch.
- After training, it prints a message indicating the completion of model training and displays a summary of the neural network model, including its architecture and the number of parameters in each layer.

In [None]:
trained_model = model.fit(X_train, y_train, epochs=100, verbose=1)
print("Done with model training")
model.summary()

**Step 12**:
Certainly, here's the summary without any quotes:

The provided code:

 Uses a trained neural network model to make predictions (y_pred) on a test dataset (X_test).

 Prints the sizes (number of elements) of the predicted values (y_pred) and the actual target values (y_test).

 Prints the actual target values (y_test) under the header "Actual Values."

 Prints the predicted values (y_pred) under the header "Predicted Values." The predicted values are reshaped to be displayed in a single row.

 Calculates the R-squared (R2) score to assess the goodness of fit of the model's predictions compared to the actual target values in the test dataset.

 Prints the overall R2 score as a percentage to evaluate the model's performance in predicting the target values.

In [None]:
y_pred = model.predict(X_test)
print(y_pred.size)
print(y_test.size)
print('Actual Values')
print(y_test)

print('Predicted Values')
print(y_pred.reshape(1,-1))

score=r2_score (y_test,y_pred)
print("Overall score: {}".format(score*100))

Other things we can do:
1. Analyze training statistics

In [None]:
plt.xlabel('Epoch Number')
plt.ylabel("Loss Magnitude")
plt.plot(trained_model.history['loss'])
