Skip to content

pd.read_sas with chunksize option raises IndexError #31385

@jicky94

Description

@jicky94

Good morning,

Using Python 3.6.

My problem seems to be close to issue #14734 but with a different error type though. However, please forgive my lack of competence, but I am not able to understand 1. if my issue is really similar and 2. if the issue that seemed to surround some sas files has been solved or if there might still be some probleùs with some sas files (that i cannot provide for reasons detailed just below).

I have read the rules about posting but i cannot attach a sample of my data or reproduce the entire error message as the data i am working on is located on a server without access to internet. I apologize for this inconvenience. I’ll try to reproduce most of what is requested however below.

I am working with very big sas files (data on each job, hence millions of lines) and got memory error when i was trying to simple read them (they open fine in R or stata strangely). Therefore i searched and find the pandas.read_sas option to work with chunks of the data. My code is now the following:

import pandas as pd
df_chunk = pd.read_sas(r'file.sas7bdat', chunksize=500)

for chunk in df_chunk:  
    chunk_list.append(chunk)

At this point i get the following error (I am reproducing it here manually as i cannot copy paste):

line 660, in _chunk_to_dataframe
if self.column_formats[j] in const.sas_date_formats:
IndexError: list index out of range

I am aware the exposition of my issue is truncated and probably incomplete but many thanks for any help you could provide,
Axelle

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO SASSAS: read_sasNeeds InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions