Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable feather and parquet in S3 #361

Merged
merged 4 commits into from
Apr 3, 2024
Merged

Conversation

mamo3gr
Copy link
Contributor

@mamo3gr mamo3gr commented Mar 22, 2024

Problem

When I use .feather or .parquet with S3, got error

AttributeError: 'ReadableS3File' object has no attribute 'name'

ToDO

  • add tests for securing behavior with GCS

Copy link
Collaborator

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! except for todo.


def load(self, file):
loaded_df = pd.read_feather(file.name)
loaded_df = pd.read_feather(BytesIO(file.read()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mamo3gr

Can we avoid file.read() here? I think it will read all content onto memory.

I hope BytesIO accept file (reader-like) itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope BytesIO accept file (reader-like) itself

Unfortunately it seems not to do so.

https://docs.python.org/3/library/io.html#binary-i-o

Binary I/O (also called buffered I/O) expects bytes-like objects and produces bytes objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yokomotod

It is a little bit heavy to enable streaming read because we need to modify codes in luigi. I leave FIXME comment there in 7abbc33. Can we merge this PR with these comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I added a conditional branching with respect to whether the passed file is reader-like or not in 65bb5b0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted the way of loading except S3 in 0a6880f.

@mamo3gr
Copy link
Contributor Author

mamo3gr commented Apr 3, 2024

add tests for securing behavior with GCS

I tried it and found to need the time. I wrote down the details to #363 and would proceed to merge. May I do so?

Copy link
Collaborator

@yokomotod yokomotod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mamo3gr mamo3gr merged commit 624200b into master Apr 3, 2024
5 checks passed
@mamo3gr mamo3gr deleted the feat/feather-and-parquet branch April 3, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants