[Reading the DataSet](#reading_the_dataset)<br>
[Handling Missing Data](#handling_missing_data)<br>
[Conversion to the Tensor Format](#tensor_format_conversion)<br>
[Exercises](#exercises)

---

### reading the dataset
<a id = 'reading_the_dataset'></a>

In [1]:
import os
import tensorflow as tf

def mkdir_if_not_exist(path):
    """make a directory if it does nto exit."""
    if not isinstance(path, str):
        path = os.path.join(*path)
    if not os.path.exists(path):
        os.makedirs(path)

In [2]:
data_file = '../data/house_tiny.csv'
mkdir_if_not_exist('../data')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley, Price \n')
    f.write('NA,Pave,127500\n')
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000')

In [8]:
import pandas as pd

data = pd.read_csv(data_file)
print(data)

(4, 3)


---

### Handling Missing Data
<a id= 'handling_missing_data'></a>

In [4]:
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean())
print(inputs)

   NumRooms Alley
0       3.0  Pave
1       2.0   NaN
2       4.0   NaN
3       3.0   NaN


In [5]:
#for columns with categorical values like alley, Nan is treated a true/false in conjuction with absolute value
inputs = pd.get_dummies(inputs, dummy_na = True)
print(inputs)

   NumRooms  Alley_Pave  Alley_nan
0       3.0           1          0
1       2.0           0          1
2       4.0           0          1
3       3.0           0          1


---

### Conversion to the Tensor format
<a id='tensor_format_conversion'></a>

In [6]:
X, y = tf.constant(inputs.values), tf.constant(outputs.values)
X, y

(<tf.Tensor: shape=(4, 3), dtype=float64, numpy=
 array([[3., 1., 0.],
        [2., 0., 1.],
        [4., 0., 1.],
        [3., 0., 1.]])>,
 <tf.Tensor: shape=(4,), dtype=int64, numpy=array([127500, 106000, 178100, 140000], dtype=int64)>)

---

### Exercises
<a id='exercises'></a>

Create a raw dataset with more rows and columns.

Delete the column with the most missing values.

Convert the preprocessed dataset to the tensor format.

In [7]:
data_file = '../data/oscars_tiny.csv'
mkdir_if_not_exist('../data')
with open(data_file, 'w') as f:
    f.write('movie_name,year_released, Price \n')
    f.write('NA,Pave,127500\n')
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000')