When processing very large files or figuring out the right set of arguments to cor‐ rectly process a large file, you may only want to read in a small piece of a file or iterate through smaller chunks of the file.

In [3]:
import pandas as pd

In [4]:
# Before we look at a large file, we make the pandas display settings more compact:
pd.options.display.max_rows = 10

In [11]:
# If you want to only read a small number of rows 
# (avoiding reading the entire file),specify that with nrows:
df = pd.read_csv('ex6.csv', nrows=5)
df.columns

Index(['VIN (1-10)', 'County', 'City', 'State', 'Postal Code', 'Model Year',
       'Make', 'Model', 'Electric Vehicle Type',
       'Clean Alternative Fuel Vehicle (CAFV) Eligibility', 'Electric Range',
       'Base MSRP', 'Legislative District', 'DOL Vehicle ID',
       'Vehicle Location', 'Electric Utility', '2020 Census Tract'],
      dtype='object')

# To read a file in pieces:
specify a chunksize as a number of rows

In [8]:
chunker = pd.read_csv('ex6.csv', chunksize=1000)
chunker
# <pandas.io.parsers.readers.TextFileReader at 0x258886bac90>

<pandas.io.parsers.readers.TextFileReader at 0x258886bac90>

The TextParser object returned by read_csv allows you to iterate over the parts of the file according to the chunksize.

For example, we can iterate over ex6.csv, aggregating the value counts in the 'key' column like so:

In [12]:
chunker = pd.read_csv('ex6.csv', chunksize=1000)
tot = pd.Series([])
for piece in chunker:
    tot = tot.add(piece['VIN (1-10)'].value_counts(), fill_value=0)
    tot = tot.sort_values(ascending=False)

In [13]:
tot[:10]

VIN (1-10)
7SAYGDEE7P    572.0
7SAYGDEE2P    559.0
7SAYGDEE6P    557.0
7SAYGDEEXP    556.0
7SAYGDEE8P    552.0
7SAYGDEE0P    547.0
7SAYGDEE5P    540.0
7SAYGDEE1P    533.0
7SAYGDEE9P    532.0
7SAYGDEE4P    526.0
dtype: object

TextParser is also equipped with a get_chunk method that enables you to read pieces of an arbitrary size.