# Tensorflow Developer Certificate Preparation
___
## Introduction to Tensorflow in Python - DataCamp - ML-Scientist-Career-Track - by Isaiah Hull
___
## Chapter 2.1- Linear Models

### 1. Input data
In the previous chapter, we learned how to perform core TensorFlow operations. In this chapter, we will work towards training a linear model with TensorFlow.

### 2. Using data in TensorFlow
- So far, we've only generated data using functions like ones and random uniform; however, when we train a machine learning model, we will want to import data from an external source. 
- This may include numeric, image, or text data. Beyond simply importing the data, numeric data will need to be assigned a type, and text and image data will need to be converted to a usable format.
![Figure](./figures/2.1.1.PNG)

### 3. Importing data for use in TensorFlow
- External datasets can be imported using TensorFlow. 
- While this is useful for complex data pipelines, it will be unnecessarily complicated for what we do in this chapter.
- **For that reason, we will use simpler options to import data**. 
    - We will import data using ``pandas``
    - We will then convert the data into an ``numPy`` array, 
    - which we can use without further modification in ``tensorflow``.

### 4. How to import and convert data
- Let's start by importing numpy under the alias np and pandas under the alias pd. 
- We will then read housing transaction data from ``kc_housing.csv`` using the ``pandas`` method ``read_csv()`` and assign it to a dataframe called ``housing``. 
- When you are ready to train a model, you will want to convert the data into a ``numpy`` array by passing the pandas dataframe, housing, to ``np.array()``. 
- We will focus on loading data from csv files in this chapter, but you can also use pandas to load data from other formats, such as ``read_json``, ``read_html``, and ``read_excel``.

In [7]:
# import numpy and pandas
import numpy as np
import pandas as pd

# Load data from csv
housing = pd.read_csv('./datasets/kc_house_data.csv')
housing.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.0,1180,5650,1.0,0,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.0,770,10000,1.0,0,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.0,1960,5000,1.0,0,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.0,1680,8080,1.0,0,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503


In [9]:
# Convert to numpy array
housing = np.array(housing)
housing

array([[7129300520, '20141013T000000', 221900.0, ..., -122.257, 1340,
        5650],
       [6414100192, '20141209T000000', 538000.0, ..., -122.319, 1690,
        7639],
       [5631500400, '20150225T000000', 180000.0, ..., -122.233, 2720,
        8062],
       ...,
       [1523300141, '20140623T000000', 402101.0, ..., -122.299, 1020,
        2007],
       [291310100, '20150116T000000', 400000.0, ..., -122.069, 1410,
        1287],
       [1523300157, '20141015T000000', 325000.0, ..., -122.299, 1020,
        1357]], dtype=object)

In [10]:
# Checking number of axis of the array aka rank
housing.ndim

2

In [11]:
# Checking shape of the tensor
housing.shape

(21613, 21)

In [12]:
# Checking data type of the tensor
housing.dtype

dtype('O')

### 5. Parameters of read_csv()
| Parameter| Description | Default |
|-|-|-|
|``filepath_or_buffer``|  accepts a file path or url  |  ``None``  | 
|sep|  Delimiter between columns  |  ``,``  |
|``delim_whitespace``|  Boolean for whether to  delimit whitespace  |  ``False``  |
|``encoding``|  specifies encoding to be used if any  |  ``None``  |


### 6. Using mixed type datasets
Finally, we will end this lesson by talking about how to transform imported data for use in TensorFlow. We will use housing data from King County, Washington as an example. Notice how the dataset contains columns with different types. One column contains data on house prices in a floating point format. Another column is a boolean variable, which can either be true, 1, or false, 0. In this case, a 1 indicates that a property is located on the waterfront.

### 7. Setting the data type
- Let's say we want to perform TensorFlow operations that require ``price`` to be a ``32-bit floating point number`` and ``waterfront`` to be a ``boolean``. 
- We can do this in two ways:
    - The **first approach** uses the array method from numpy. 
        - We select the relevant column in the DataFrame, 
        - provide it as the first argument to array, 
        - and then provide the datatype as the second argument.

In [13]:
# Load data from csv
housing = pd.read_csv('./datasets/kc_house_data.csv')

# Convert price column to float32
price = np.array(housing['price'], np.float32)
print(price.dtype)

# Convert waterfront column to Boolean
waterfront = np.array(housing['waterfront'], np.bool)
print(waterfront.dtype)

float32
bool


   - The **second approach** uses the cast operation from TensorFlow.
        - Again, we supply the data first and the data type second. 
        - While either tf cast or np array will work, waterfront will be a tf dot Tensor type under the former option and a numpy array under the latter.

In [15]:
# import tensorflow as tf
import tensorflow as tf

# Load data from csv
housing = pd.read_csv('./datasets/kc_house_data.csv')

# Convert price column to float32
price = tf.cast(housing['price'], tf.float32)
print(price.dtype)

# Convert waterfron column to Boolean
waterfront = tf.cast(housing['waterfront'], tf.bool)
print(waterfront.dtype)

<dtype: 'float32'>
<dtype: 'bool'>
