**This is my first try at Deep Neural Networks using Tensorflow Keras; more for a learning purpose, I will be testing a linear regression and a DNN regression on a database that contains information about different components in concrete with the purpose of determining the strength of it.**

In [None]:

import numpy as np 
import pandas as pd 


import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print(tf.__version__)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



Let's have a look at the database and see if we have any missings:

In [None]:
db = pd.read_csv('/kaggle/input/yeh-concret-data/Concrete_Data_Yeh.csv')
db.head() # = dv

In [None]:
db.isnull().sum()

The target variable is csMPa; all variables are numeric and fortunately no missings.
Below we can see differences in terms of mean and standard deviation:

In [None]:
db.describe() 

Let's split the data into train and test, by samplying, and then separate the features from the target variable for both databases:

In [None]:
train_db = db.sample(frac=0.8, random_state= 10)
test_db = db.drop(train_db.index)


In [None]:
train_features = train_db.copy()
test_features = test_db.copy()

train_labels = train_features.pop('csMPa')
test_labels = test_features.pop('csMPa')

print(train_labels.head(), test_labels.head(), train_features.head(), test_features.head())

As it can be seen below there are big differences in means and scales between features, so I'll normalize the train data as follows, using Keras' preprocessing layer:

In [None]:
train_features.describe().transpose()[['mean', 'std']]

In [None]:
normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(train_features))
print(normalizer.mean.numpy())

The first two rows after normalization:

In [None]:
print(train_features[:2])
print(normalizer(np.array(train_features[:2])).numpy())

# Linear Regression

Will be using the Sequential model and the layers normalizer and Dense:

In [None]:
linear_model = tf.keras.Sequential([
    normalizer,
    layers.Dense(units=1)
]) # it produces units=1 outputs for each example

In [None]:
linear_model.predict(train_features) 

Using Adam optimization algorithm to update network weights:

In [None]:
linear_model.compile(
    optimizer=tf.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')

In [None]:
history = linear_model.fit(
    train_features, train_labels, 
    epochs=100,
    verbose=0,
    # Calculate validation results on 20% of the training data
    validation_split = 0.2)

In [None]:
test_results = {}
test_results['linear_model'] = linear_model.evaluate(
    test_features, test_labels, verbose=0) 

In [None]:
test_results

Comparing to the mean of the variable: 35

In [None]:
db['csMPa'].describe()

# DNN Regression

In [None]:
model = keras.Sequential([
      normalizer,
      layers.Dense(100, activation='relu'),
      layers.Dense(100, activation='relu'),
      layers.Dense(1)
  ]) 

model.compile(loss='mean_absolute_error',optimizer=tf.keras.optimizers.Adam(0.001))

In [None]:
model.summary()

In [None]:
history = model.fit(
    train_features, train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)

In [None]:
test_results['dnn_model'] = model.evaluate(test_features, test_labels, verbose=0)

Mean Absolute Error is 8.33 for linear regression; for DNN regression it's 4.35 with 64 units, 4.09 with 100 units:

In [None]:
test_results 