# Split a DataFrame into chunks

Use [`numpy.array_split()`](https://numpy.org/doc/stable/reference/generated/numpy.array_split.html) to split a dataframe into chunks. This is useful, for example, when doing batch geocoding where the geocoding API only accepts a certain number of addresses per request.

In [1]:
# Setup

import math

import numpy as np
import pandas as pd



## Create some toy data

Use [np.random.rand](https://docs.scipy.org/doc//numpy-1.14.0/reference/generated/numpy.random.randn.html#numpy.random.randn) to generate a DataFrame with 1,090 rows and three columns. `randn()` returns [samples from the “standard normal” distribution](http://amsi.org.au/ESA_Senior_Years/SeniorTopic4/4b/4b_2content_6.html). 

In [2]:
df = pd.DataFrame(
    np.random.randn(1090, 3),
    columns=["column_1", "column_2", "column_3"]
)
                  
df

Unnamed: 0,column_1,column_2,column_3
0,-0.522229,0.227516,-0.700995
1,0.376896,0.951010,1.231711
2,-0.717082,0.319611,0.006255
3,0.514478,-0.525063,0.423066
4,-0.929870,0.864430,-0.618780
...,...,...,...
1085,-0.109836,-0.480967,-0.179000
1086,2.121514,-0.728259,-1.568832
1087,0.514900,-0.240476,0.039966
1088,-0.124603,-0.950393,1.695722


## Determine the number of chunks using `math.ceil()`

If we define a chunk size of 100, since we have 1,090 rows, we'll have 11 chunks.

In [3]:
chunk_size = 100
n_chunks = math.ceil(df.shape[0] / chunk_size)

n_chunks

11

## Use `numpy.array_split()` to split the dataframe into chunks

We use [`numpy.array_split()`](https://numpy.org/doc/stable/reference/generated/numpy.array_split.html) rather than [`numpy.split()`](https://numpy.org/doc/stable/reference/generated/numpy.split.html) because our odd number of records will result in uneven chunk sizes.

In [4]:
chunks = np.array_split(df, n_chunks)

for chunk in chunks:
    print(chunk.shape[0])

100
99
99
99
99
99
99
99
99
99
99
