[tensorflow submodule](https://www.tensorflow.org/get_started/premade_estimators)

### [Datasets Quick Start](https://www.tensorflow.org/get_started/datasets_quickstart)
1. Reading in-memory data from numpy arrays
2. Reading lines from a csv file

## Basic input

In [1]:
import tensorflow as tf

In [10]:
from sklearn.datasets import load_iris
iris = load_iris()
iris_feature = iris.data
iris_target = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

In [17]:
import pandas as pd
train = pd.DataFrame(iris_feature)
train.columns = feature_names

In [31]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    
    # Shuffle, repeat, and batch the examples
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    
    # Build the Iterator, and return the read end of the pipeline
    return dataset.make_one_shot_iterator().get_next()

In [32]:
batch_size = 100
train_input_fn(train, iris_target, batch_size)

({'petal length (cm)': <tf.Tensor 'IteratorGetNext_1:0' shape=(?,) dtype=float64>,
  'petal width (cm)': <tf.Tensor 'IteratorGetNext_1:1' shape=(?,) dtype=float64>,
  'sepal length (cm)': <tf.Tensor 'IteratorGetNext_1:2' shape=(?,) dtype=float64>,
  'sepal width (cm)': <tf.Tensor 'IteratorGetNext_1:3' shape=(?,) dtype=float64>},
 <tf.Tensor 'IteratorGetNext_1:4' shape=(?,) dtype=int64>)

### Slices
tf.data.Dataset.from_tensor_slices function takes an array and returns a tf.data.Dataset representing slices of the array.
> For example, an array containing the mnist training data has a shape of (6000, 28, 28). Passing this to from_tensor_slices returns a Dataset object containing 60000 slices, each one a 28x28 image.

In [37]:
dataset = tf.data.Dataset.from_tensor_slices((dict(train), iris_target))
print(dataset)

<TensorSliceDataset shapes: ({sepal width (cm): (), sepal length (cm): (), petal width (cm): (), petal length (cm): ()}, ()), types: ({sepal width (cm): tf.float64, sepal length (cm): tf.float64, petal width (cm): tf.float64, petal length (cm): tf.float64}, tf.int64)>


In [40]:
dataset.__doc__

'A `Dataset` of slices from a nested structure of tensors.'

### Manipulation
* shuffle()方法将序列的所有元素随机排序
> The shuffle method uses a fixed-size buffer to shuffle the items as the pass through. Setting a buffer_size greater than the number of examples in the Dataset ensures that the data is completely shuffled.
* The repeat method has the Dataset restart when it reaches the end.
* The batch method collects a number of examples and stacks them, to create batchs.

In [41]:
dataset = dataset.shuffle(1000).repeat().batch(100)
print(dataset)

<BatchDataset shapes: ({sepal width (cm): (?,), petal length (cm): (?,), petal width (cm): (?,), sepal length (cm): (?,)}, (?,)), types: ({sepal width (cm): tf.float64, sepal length (cm): tf.float64, petal width (cm): tf.float64, petal length (cm): tf.float64}, tf.int64)>


In [43]:
dataset.__doc__

'A `Dataset` that batches contiguous elements from its input.'

### Return
convert the Dataset into (features,label) pair containg tensorflow tensors for the **train, evaluate, and predict methods**

In [46]:
features_result, labels_result = dataset.make_one_shot_iterator().get_next()
print((features_result, labels_result))

({'sepal width (cm)': <tf.Tensor 'IteratorGetNext_4:3' shape=(?,) dtype=float64>, 'sepal length (cm)': <tf.Tensor 'IteratorGetNext_4:2' shape=(?,) dtype=float64>, 'petal width (cm)': <tf.Tensor 'IteratorGetNext_4:1' shape=(?,) dtype=float64>, 'petal length (cm)': <tf.Tensor 'IteratorGetNext_4:0' shape=(?,) dtype=float64>}, <tf.Tensor 'IteratorGetNext_4:4' shape=(?,) dtype=int64>)


## Reading a CSV File

### Build the Dataset

In [50]:
ds = tf.data.TextLineDataset(train_path).skip(1)

### Build a csv line parser

In [None]:
# Metadata describing the text columns
COLUMNS = ['SepalLength', 'SepalWidth',
          'PetalLength', 'PetalWidth',
          'label']
FIELD_DEFAULTS = [[0.0], [0.0], [0.0], [0.0], [0]]
def _parse_line(line):
    # Decode the line into its fields
    fields = tf.decode_csv(line, FIELD_DEFAULTS)
    
    # Pack the result into a dictionary
    features = dict(zip(COLUMNS, fields))
    
    # Separate the label from the features
    label = features.pop('label')
    
    return features, label

### Parse the lines