# Loading general data

On the previous tutorials, e.g. [linear regression](3a-linear-regression-scratch.ipynb#Data-iterators), we shown how to use [NDArrayIter](http://mxnet.io/api/python/io.html#mxnet.io.NDArrayIter). We will introduce more iterators in this tutorial. 

## What is an iterator


In this tutorial, we will cover more about how to load data

## Iterators

A data iterator scans one dataset to return *k*, which is called batch size, examples each time. The following example creates a *5-by-2* matrix and iterates over it with batch size 3. 

In [2]:
from mxnet import nd, io
data = nd.arange(10).reshape((5,2))
it = io.NDArrayIter(data, batch_size=3)
batch = it.next()
print(batch)
print('data[0]:', batch.data[0])

DataBatch: data shapes: [(3, 2)] label shapes: []
data[0]: 
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]]
<NDArray 3x2 @cpu(0)>


In [None]:
it.reset()  
for i, b in enumerate(it): 
    print('=== batch {} === {}\n'.format(i, b.data[0]))

The remaing of this tutorial will cover other iterators the `io` modules provides. 

## Iterating NDArray 

When data is loaded into memory.

Since the number of examples are not evenly divied by the batch size, the last batch is padded with examples from the beginning. We can change this behavior to either padding with 0s or discarding the last batch. 

In [None]:
it = io.NDArrayIter(data, batch_size=3, last_batch_handle='pad')
for i, b in enumerate(it): 
    print('=== batch {} === {}\n'.format(i, b.data[0]))
    print('pad = {}\n'.format(b.pad))

In [None]:
it = io.NDArrayIter(data, batch_size=3, last_batch_handle='discard')
for i, b in enumerate(it): 
    print('=== batch {} === {}\n'.format(i, b.data[0]))

Deal with labels

In [None]:
label = nd.arange(5)
it = io.NDArrayIter(data, label, batch_size=3)
for i, b in enumerate(it): 
    print('=== batch {} ===\ndata ={}\n\nlabel ={}\n'.format(i, b.data[0], b.label[0]))

Shuffle

In [None]:
it = io.NDArrayIter(data, batch_size=3, shuffle=True)
for i, b in enumerate(it): 
    print('=== batch {} === {}\n'.format(i, b.data[0]))

## CVS iterator


## Write your own customized data iterator



In [1]:
data = """1,2
2,3
3,4
4,5
"""
with open('data.csv', 'w') as f:
    f.write(data)

In [3]:
it = io.CSVIter(data_csv='data.csv', data_shape=(2,), batch_size=3)
for i, b in enumerate(it): 
    print('=== batch {} === {}\n'.format(i, b.data[0]))

=== batch 0 === 
[[ 1.  2.]
 [ 2.  3.]
 [ 3.  4.]]
<NDArray 3x2 @cpu(0)>

=== batch 1 === 
[[ 4.  5.]
 [ 1.  2.]
 [ 2.  3.]]
<NDArray 3x2 @cpu(0)>



## Record iterator

for large binary files
