Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas DataFrames with type == 'object' cannot be save/restored #2330

Open
jfoster17 opened this issue Oct 11, 2022 · 0 comments
Open

Pandas DataFrames with type == 'object' cannot be save/restored #2330

jfoster17 opened this issue Oct 11, 2022 · 0 comments
Labels

Comments

@jfoster17
Copy link
Member

Describe the bug
Pandas DataFrames created within glue and added to the data_collection manager may have columns of type 'object', which mean they cannot be save/restored by glue (glue.core.state._load_numpy calls np.load() without allow_pickle=True). This is generally not a problem when reading files using the Pandas data_factory (which converts columns), but does, for instance cause problems for datasets retrieved from external sources within a glue session.

To Reproduce
Steps to reproduce the behavior such as:

  1. Create a Pandas DataFrame within glue and add it to the data_collection. For instance, one might use the process described in the documentation
df1 = DataFrame()
df1['a'] = [1.2, 3.4, 2.9]
df1['g'] = ['r', 'q', 's']
dc['dataframe'] = df1
  1. Save Session (this new Data object will be stored as a numpy array within the session file since it did not come from an external file)
  2. Restore Session
  3. Get the following error:

value error: 'Object arrays cannot be loaded when allow_pickle=False'

Expected behavior
Pandas objects created within glue should not break session files.

We could simply add allow_pickle to np.load(), but perhaps this has undesired side effects?

Details:

  • Operating System: MacOS 12.6
  • Python version Python 3.9
  • Glue version 1.6
  • How you installed glue: conda

Additional context
Sample session file attached:
pandas_dataframe_session.glu.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant