# Tutorial 2: Using In-Disk Data in FastEstimator

In Tutorial 1, we introduced our 3 main APIs and general workflow of a deep learning task:  `Pipeline` -> `Network` -> `Estimator`. Then we used in-memory data for training. But what if the dataset size is too big to fit in memory? Say, data is in the size of ImageNet? 

The short answer is: user will use one more API for disk data: `RecordWriter`, such that the overall workflow becomes: `RecordWriter` -> `Pipeline` -> `Network` -> `Estimator`.

In this tutorial, we are going to show you how to do in-disk data training in FastEstimator.

## Before we start:

Two things are required regarding in-disk data : 
* Data files, obviously :)
* A csv file that describes the data (prepare two csv files if you have a separate validation set)

In csv file, the rows of csv represent different examples and columns represent different features within example. For example, for a classification task, a csv may look like:

| image  | label  |
|---|---|
|/data/image1.png   | 0  |
|/data/image2.png   |  1 |
|/data/image3.png | 0  |
|... | .  |

The csv of a multi-mask segmentation task may look like:

| img  | msk1  | msk2  |
|---|---|---|
|/data/image1.png   | /maska/mask1.png  |/maskb/mask1.png|
|/data/image2.png   |  /maska/mask2.png |/maskb/mask2.png|
|/data/image3.png | /maska/mask3.png  |/maskb/mask3.png|
|... | ...  |...|


Please keep in mind that, there is no restriction on the data folder structures, number of features or name of features. 

Now, let's generate some in-disk data for this tutorial:

In [6]:
from fastestimator.dataset.mnist import load_data

train_csv, eval_csv, image_path = load_data()
print("training csv path is {}".format(train_csv))
print("evaluation csv path is {}".format(eval_csv))
print("mnist image path is {}".format(image_path))

writing image data to /var/folders/5g/d_ny7h211cj3zqkzrtq01s480000gn/T/.fe/Mnist/image
training csv path is /var/folders/5g/d_ny7h211cj3zqkzrtq01s480000gn/T/.fe/Mnist/train.csv
evaluation csv path is /var/folders/5g/d_ny7h211cj3zqkzrtq01s480000gn/T/.fe/Mnist/eval.csv
mnist image path is /var/folders/5g/d_ny7h211cj3zqkzrtq01s480000gn/T/.fe/Mnist/image


Let's take a look at the csv file 

In [7]:
import pandas as pd
import matplotlib.image as mpimg
import matplotlib.pyplot as plt


df_train = pd.read_csv(train_csv)
print(df_train.head)



<bound method NDFrame.head of        Unnamed: 0                x  y
0               0      train_0.png  5
1               1      train_1.png  0
2               2      train_2.png  4
3               3      train_3.png  1
4               4      train_4.png  9
5               5      train_5.png  2
6               6      train_6.png  1
7               7      train_7.png  3
8               8      train_8.png  1
9               9      train_9.png  4
10             10     train_10.png  3
11             11     train_11.png  5
12             12     train_12.png  3
13             13     train_13.png  6
14             14     train_14.png  1
15             15     train_15.png  7
16             16     train_16.png  2
17             17     train_17.png  8
18             18     train_18.png  6
19             19     train_19.png  9
20             20     train_20.png  4
21             21     train_21.png  0
22             22     train_22.png  9
23             23     train_23.png  1
24             24   

## Step 0: RecordWriter

