-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PandasCursor interfering with pandas.DataFrame writing to s3 #465
Comments
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/filesystem/s3.py#L403-L416 |
Thanks for the swift response. Totally understand that your package doesn't have support to write to s3. The issue is that once we import the |
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/pandas/__init__.py#L4-L5 |
hmmmm this seems like quite a problematic requirement for using the pandas cursor and means you can't really use pyathena for data pipelines as you can't read data from athena and the write back to s3. I've thought of a workaround by being explicit on which s3 file system I want when writing so s3: s3 = s3fs.S3FileSystem()
with s3.open(f"{s3_bucket}/my_prefix/file.csv", "w") as file_in:
my_df.to_csv(file_in, index=False) But would addressing this issue be something you are willing to think about? either by using s3fs, not registering your s3 implementation globally with fsspec (not sure if this is possible) or extending your implementation to support writing? |
I used to use S3fs, but stopped using it because S3fs depends on aiobotocore, and aiobotocore sets only certain versions of botocore as dependencies, making it difficult to install dependencies. |
Yes this is fair enough - it's also a problem we've been encountering in projects.
Touche. Given that you don't want to include s3fs as a dependancy do you think the best fix is to implement an s3 writer in this package? This seems slightly odd thing to be implementing in this package. The alternative would be to avoid registering your s3 implentation globally to fsspec. I'll admit I've not looked at the code for fsspec at all and don't know if there is a way to implement things without the global registration to fsspec. |
I am not sure what is odd about it. |
As a work around you can convert query results to pandas yourself instead of using the E.g. This succeeds.
I'll suggest that it would be preferred for this library to not support pandas conversion at all, rather than to support it in a way that breaks typical pandas usage in other contexts. |
Many users will need to solve the dependency problems surrounding s3fs/aiobotocore anyway, so I would not consider this a sufficient reason to take the decision from the user's hands. The usual way of handling such a choice would be to offer an extra like pyathena[replace_s3fs] pulling your s3fs implementation as a separate package. |
* Support for writing with s3 filesystem (fix #465) * Fix test cases * Apply fmt * Remove unnecessary vars * FIx fixture args * Add test cases for file writing * Add type hints
I've found that if we import PandasCursor this interfers with pandas using s3fs/fsspec to write to s3.
Minimal viable example:
and I get the following error:
If you remove the pandascursor import, this works fine:
The text was updated successfully, but these errors were encountered: