# Load CSV Data

Tutorial provides examples of how to use csv data with tensorlfow

Two main parts:
1. Loading data off disk
2. Pre-processing it into a form suitable for training

Tutotial will focuses on the loading of the data and give some examples of preprocessing.

## Setup

In [1]:
import pandas as pd
import numpy as np

# Makes numpy values easier to read
np.set_printoptions(precision=3, suppress=True)

import tensorflow as tf
from tensorflow.keras import layers

## In Memory Data

for any small csv dataset the simplest way to train a tensorflow model on it is load it into memory as a pandas Dataframe or numpy array.

A realatively simple example is the abalone dataset
- The dataset is small
- All the input features are all limited range floating point values

Next we can download the data using pandas

In [2]:
abalone_train = pd.read_csv(
        "https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv",
    names=["Length", "Diameter", "Height", "Whole weight", "Shucked weight",
           "Viscera weight", "Shell weight", "Age"])

abalone_train.head()

Unnamed: 0,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Age
0,0.435,0.335,0.11,0.334,0.1355,0.0775,0.0965,7
1,0.585,0.45,0.125,0.874,0.3545,0.2075,0.225,6
2,0.655,0.51,0.16,1.092,0.396,0.2825,0.37,14
3,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
4,0.545,0.42,0.13,0.879,0.374,0.1695,0.23,13


The typical task for the dataset is to predict the age from the other mesurements, so seprate the features and label for training

In [4]:
abalone_features = abalone_train.copy()
abalone_labels = abalone_features.pop('Age')

For the dataset you will treat all features identically.

Pack the features into a single numpy array

In [5]:
abalone_features = np.array(abalone_features)

In [6]:
abalone_features

array([[0.435, 0.335, 0.11 , ..., 0.136, 0.077, 0.097],
       [0.585, 0.45 , 0.125, ..., 0.354, 0.207, 0.225],
       [0.655, 0.51 , 0.16 , ..., 0.396, 0.282, 0.37 ],
       ...,
       [0.53 , 0.42 , 0.13 , ..., 0.374, 0.167, 0.249],
       [0.395, 0.315, 0.105, ..., 0.118, 0.091, 0.119],
       [0.45 , 0.355, 0.12 , ..., 0.115, 0.067, 0.16 ]])

Make a regression model predict the age.

Sicne there is only a single input tensor, a tf.keras.sequential model is sufficient here

In [8]:
abalone_model = tf.keras.Sequential([
    layers.Dense(64),
    layers.Dense(1)
])

abalone_model.compile(loss= tf.keras.losses.MeanSquaredError(),
                        optimizer = tf.keras.optimizers.Adam())

train the model

In [9]:
abalone_model.fit(abalone_features, abalone_labels, epochs=10)

Epoch 1/10


2023-02-23 08:43:30.274982: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-02-23 08:43:30.580177: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2819e6a90>

## Basic Preprocessing

It good practice to normalize the inputs to your model. Keras preprocessing layers provide a convenient way to build this normalization into the model



First will be to great the layer

In [10]:
normalize = layers.Normalization()

Then use the normalization.adapt method to adapt the normalization layer to your data

In [11]:
normalize.adapt(abalone_features)

2023-02-23 08:46:29.973332: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-02-23 08:46:30.001302: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Then use the normalization layer in your model

In [12]:
norm_abalone_model = tf.keras.Sequential([
    normalize,
    layers.Dense(64),
    layers.Dense(1)
])

norm_abalone_model.compile(loss = tf.keras.losses.MeanSquaredError(),
                           optimizer = tf.keras.optimizers.Adam())

In [13]:
norm_abalone_model.fit(abalone_features, abalone_labels, epochs=10)

Epoch 1/10
  1/104 [..............................] - ETA: 37s - loss: 111.1335

2023-02-23 08:48:27.465404: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x281588d30>