Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas read_excel: only read first few lines #16645

Closed
vfridkin opened this issue Jun 9, 2017 · 5 comments · Fixed by #18507
Closed

Pandas read_excel: only read first few lines #16645

vfridkin opened this issue Jun 9, 2017 · 5 comments · Fixed by #18507
Labels
good first issue IO Excel read_excel, to_excel
Milestone

Comments

@vfridkin
Copy link

vfridkin commented Jun 9, 2017

Code Sample, a copy-pastable example if possible

workbook_dataframe = pd.read_excel(workbook_filename, nrows = 10)

Problem description

Using pandas read_excel on about 100 excel files - some are large - I want to read the first few lines of each (header and first few rows of data).

The above doesn't work but illustrates the goal (example reading 10 data rows).

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 9, 2017

Sure, could support this, although xlrd (library we use to read Excel files) always reads the whole file into memory, so it wouldn't be as fast as you'd hope.

@chris-b1 chris-b1 added Difficulty Novice IO Excel read_excel, to_excel labels Jun 9, 2017
@chris-b1 chris-b1 added this to the Next Major Release milestone Jun 9, 2017
@alysivji
Copy link
Contributor

alysivji commented Jun 10, 2017

I'm looking into adding this functionality. Thanks.

@alysivji alysivji mentioned this issue Jun 11, 2017
4 tasks
@ppritish51
Copy link

ppritish51 commented Jun 25, 2017

You can add one line in your code below where you are reading your file. For example, if you want to read first 10 rows of the file then you can do this.

workbook_dataframe = pd.read_excel(workbook_filename)
workbook_dataframe =workbook_dataframe.iloc[:10]

or even you can simply do this
workbook_dataframe = pd.read_excel(workbook_filename).iloc[:10]

so that your data frame now contains only first 10 rows.

@gmlander
Copy link

To get nrows without reading the entire worksheet:

workbook = pd.ExcelFile(workbook_filename)

# get the total number of rows (assuming you're dealing with the first sheet)
rows = workbook.book.sheet_by_index(0).nrows

# define how many rows to read
nrows = 10

# subtract the number of rows to read from the total number of rows (and another 1 for the header)
workbook_dataframe = pd.read_excel(workbook, skip_footer = (rows - nrows - 1))

@elllot
Copy link

elllot commented Nov 12, 2017

Could I take a crack at this issue? I'm new to open source and would really like to start contributing. If the last line in @gmlander 's code is valid, I'd just have to identify where to place the code right? Thanks for any suggestions in advance!

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Dec 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants