Pandas read_excel: only read first few lines #16645

vfridkin · 2017-06-09T04:54:43Z

Code Sample, a copy-pastable example if possible

workbook_dataframe = pd.read_excel(workbook_filename, nrows = 10)

Problem description

Using pandas read_excel on about 100 excel files - some are large - I want to read the first few lines of each (header and first few rows of data).

The above doesn't work but illustrates the goal (example reading 10 data rows).

chris-b1 · 2017-06-09T10:57:25Z

Sure, could support this, although xlrd (library we use to read Excel files) always reads the whole file into memory, so it wouldn't be as fast as you'd hope.

alysivji · 2017-06-10T17:18:27Z

I'm looking into adding this functionality. Thanks.

ppritish51 · 2017-06-25T14:08:19Z

You can add one line in your code below where you are reading your file. For example, if you want to read first 10 rows of the file then you can do this.

workbook_dataframe = pd.read_excel(workbook_filename)
workbook_dataframe =workbook_dataframe.iloc[:10]

or even you can simply do this
workbook_dataframe = pd.read_excel(workbook_filename).iloc[:10]

so that your data frame now contains only first 10 rows.

gmlander · 2017-09-25T20:51:59Z

To get nrows without reading the entire worksheet:

workbook = pd.ExcelFile(workbook_filename)

# get the total number of rows (assuming you're dealing with the first sheet)
rows = workbook.book.sheet_by_index(0).nrows

# define how many rows to read
nrows = 10

# subtract the number of rows to read from the total number of rows (and another 1 for the header)
workbook_dataframe = pd.read_excel(workbook, skip_footer = (rows - nrows - 1))

elllot · 2017-11-12T04:07:46Z

Could I take a crack at this issue? I'm new to open source and would really like to start contributing. If the last line in @gmlander 's code is valid, I'd just have to identify where to place the code right? Thanks for any suggestions in advance!

chris-b1 added Difficulty Novice IO Excel read_excel, to_excel labels Jun 9, 2017

chris-b1 added this to the Next Major Release milestone Jun 9, 2017

alysivji mentioned this issue Jun 11, 2017

Read excel nrows #16672

Closed

4 tasks

TomAugspurger added the good first issue label Oct 11, 2017

alysivji mentioned this issue Nov 26, 2017

Add nrows parameter to pandas.read_excel() #18507

Merged

4 tasks

jreback modified the milestones: Next Major Release, 0.22.0 Dec 3, 2017

jreback closed this as completed in #18507 Dec 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas read_excel: only read first few lines #16645

Pandas read_excel: only read first few lines #16645

vfridkin commented Jun 9, 2017

chris-b1 commented Jun 9, 2017

alysivji commented Jun 10, 2017 •

edited

Loading

ppritish51 commented Jun 25, 2017 •

edited

Loading

gmlander commented Sep 25, 2017

elllot commented Nov 12, 2017

Pandas read_excel: only read first few lines #16645

Pandas read_excel: only read first few lines #16645

Comments

vfridkin commented Jun 9, 2017

Code Sample, a copy-pastable example if possible

Problem description

chris-b1 commented Jun 9, 2017

alysivji commented Jun 10, 2017 • edited Loading

ppritish51 commented Jun 25, 2017 • edited Loading

gmlander commented Sep 25, 2017

elllot commented Nov 12, 2017

alysivji commented Jun 10, 2017 •

edited

Loading

ppritish51 commented Jun 25, 2017 •

edited

Loading