<a href="https://colab.research.google.com/github/mhuzaifadev/IBM-AI-Engineering/blob/master/RegressionwithKeras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Linear Regression using Keras**

Linear regression is a common Statistical Data Analysis technique. It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables.

We have implemented here using keras and concrete_data.csv

By [Muhammad Huzaifa Shahbaz](https://www.linkedin.com/in/mhuzaifadev)

## **Importing Libraries**

We will import NumPy library as np, Pandas as pd, Sequential{} from keras.models and Dense{} from keras.layers

In [0]:
import keras
from keras.models import Sequential
from keras.layers import Dense
import pandas as pd
import numpy as np

##**DataFraming**

Reading .csv data into a Dataframe

In [5]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


## **Cleaning Data**

Data Shape

In [6]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [7]:
concrete_data.describe()
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [0]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [9]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [10]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalizing the data by substracting the mean and dividing by the standard deviation

In [11]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [0]:
n_cols = predictors_norm.shape[1] # number of predictors

## **Building a Neural Network**

Defining a function that defines our regression model for us so that we can conveniently call it to create our model

In [0]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(50, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [0]:
# build the model
model = regression_model()

## **Train and test the Neural Network**

we will train and test the model at the same time using the fit method. We will leave out 30% of the data for validation and we will train the model for 250 epochs.

In [28]:
# fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=250, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/250
 - 0s - loss: 9.6792 - val_loss: 90.9219
Epoch 2/250
 - 0s - loss: 9.0791 - val_loss: 93.2151
Epoch 3/250
 - 0s - loss: 9.1407 - val_loss: 96.4908
Epoch 4/250
 - 0s - loss: 9.4566 - val_loss: 85.7645
Epoch 5/250
 - 0s - loss: 9.0421 - val_loss: 87.3612
Epoch 6/250
 - 0s - loss: 8.5146 - val_loss: 88.7310
Epoch 7/250
 - 0s - loss: 8.7531 - val_loss: 87.7249
Epoch 8/250
 - 0s - loss: 8.8220 - val_loss: 85.0742
Epoch 9/250
 - 0s - loss: 8.7036 - val_loss: 92.9390
Epoch 10/250
 - 0s - loss: 8.7434 - val_loss: 91.8916
Epoch 11/250
 - 0s - loss: 8.6369 - val_loss: 90.3870
Epoch 12/250
 - 0s - loss: 8.9282 - val_loss: 77.7300
Epoch 13/250
 - 0s - loss: 9.4286 - val_loss: 93.4739
Epoch 14/250
 - 0s - loss: 9.1417 - val_loss: 84.8057
Epoch 15/250
 - 0s - loss: 9.1122 - val_loss: 88.3719
Epoch 16/250
 - 0s - loss: 8.4834 - val_loss: 93.2775
Epoch 17/250
 - 0s - loss: 9.2937 - val_loss: 90.6500
Epoch 18/250
 - 0s - loss: 8.4113 - val_loss:

<keras.callbacks.callbacks.History at 0x7fa9accc4ac8>