<a href="https://colab.research.google.com/github/shreydan/learning-tensorflow/blob/main/core_learning_algo_linregr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **TensorFlow Core Learning Algorithms - Linear Regression**

---


### **Linear Regression**

1. most basic
2. between axes of n-dimensions, a best fit line is drawn which creates the closest relation between the points of data.
3. **correlates linearly**

In [47]:
!pip install -q sklearn

In [48]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc

import tensorflow as tf

## Dataset

### Titanic Model

survived, sex, age, n_siblings_spouses, parch, fare, class, deck, embark_to, alone

In [49]:
# loading dataset:

# training data
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')

#testing data
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')


In [None]:
# print one specific data row

print(dftrain.loc[0], y_train.loc[0])

# get specific column

print(dftrain['age'])

# statistical analysis of dataframe

dftrain.describe()

# shape of data frames - (entries, columns)

### SOME GRAPHS

In [None]:
# age histogram - (age, count)
dftrain.age.hist(bins=20)


In [None]:
# sex values
dftrain['sex'].value_counts().plot(kind='barh')

In [None]:
# class values
dftrain['class'].value_counts().plot(kind='bar')

In [None]:
# % survival on the basis of sex

pd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='bar').set_xlabel('% survive')



---



---



### **Training and Testing Data**

training data size >> testing_data

In [61]:
CATEGORICAL_COLUMNS = ['sex',
                       'n_siblings_spouses',
                       'parch',
                       'class',
                       'deck',
                       'embark_town',
                       'alone']
NUMERIC_COLUMNS = ['age','fare']

**categorical data:** data which has non-numeric values.

**numberic data:** data which has numeric values.

**feature data:** data which is actually fed to the model to train



In [66]:
feature_columns = []

for feature_name in CATEGORICAL_COLUMNS:
    
    # get a list of unique values from feature col
    vocabulary = dftrain[feature_name].unique()

    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name,vocabulary))



for feature_name in NUMERIC_COLUMNS:
    feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))


### TRAINING PROCESS

-> data is always loaded in batches because large number of data can't fit in RAM all at once. Generally in batches of 32


-> **EPOCHS:** number of times model sees the same data

-> **Overfitting:** model sees the data too many times


//// Now to handle all the above things
`input_function()` takes all our data, splits into batches of `tf.data.Dataset`, shuffles if mentioned, and sets the number of `epochs` as well.

`input_function()` is always wrapped inside a `make_input_function()` which takes all the arguments necessary to do the above stuff



In [70]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():
      # creates tf dataset
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
    if shuffle:
      ds = ds.shuffle(1000) # shuffles
    ds = ds.batch(batch_size).repeat(num_epochs) # epochs
    return ds # returns the batch
  return input_function # returns the function object for use


# create an input function for training data
train_input_fn = make_input_fn(dftrain, y_train)

# create an input function for testing data
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)


### **Estimator:** an implementation of the core linear classifier algorithm.

In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)


### **Training the estimator model**

In [74]:
# pass the train_input_fn function to train it
linear_est.train(train_input_fn)
clear_output()

### **Testing the model**

In [75]:
result = linear_est.evaluate(eval_input_fn)
clear_output()

### **Geting result of the testing:**

In [76]:
print(result['accuracy'])

0.7651515


In [79]:
# complete result dict:

for col in result:
    print(f'{col} : {result[col]}')

accuracy : 0.7651515007019043
accuracy_baseline : 0.625
auc : 0.8308846354484558
auc_precision_recall : 0.7920464873313904
average_loss : 0.4723973274230957
label/mean : 0.375
loss : 0.46124327182769775
precision : 0.698924720287323
prediction/mean : 0.37945160269737244
recall : 0.6565656661987305
global_step : 400


### **Predict**

wanna predict the probabilities of survival of each passenger.



In [90]:
predictions = list(linear_est.predict(eval_input_fn))
clear_output()
for i in range(1):  #lets print the first passenger prediction instead of all len(predictions)
    print(dfeval.loc[i])
    print(f'did they survive: {y_eval.loc[i]}') # actually what happened
    print(f"[not_surviving, surviving] : {predictions[i]['probabilities']}")


sex                          male
age                            35
n_siblings_spouses              0
parch                           0
fare                         8.05
class                       Third
deck                      unknown
embark_town           Southampton
alone                           y
Name: 0, dtype: object
did they survive: 0
[not_surviving, surviving] : [0.9245796 0.0754204]
