-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support storage_options
argument in read_parquet
#2071
Comments
I am guessing a bit here but you may be able to read directly. Our parquet IO uses s3 = fs.S3FileSystem(anonymous=True)
geopandas.read_parquet("s3://spatial-ucr/census/administrative/counties.parquet", filesystem=s3) However, in my env this doesn't work on Can you pin down the specification of the environment in which this fails? Since our parquet IO is different from pandas under the hood, I am not sure to which degree we can reasonably mirror pandas API here. |
I'll keep looking, but i think i've narrowed it to |
|
sorry, yeah i wrote that a little backwards. pinning |
Well, the latest |
i'll keep digging. thanks again as you can imagine, things usually work with conda, but when i need to resort to pip, this issue pops up and its hard to diagnose which combination of pkgs is responsible |
One guess: it might be that if you pass an explicit filesystem object, you need to leave out the |
And on the original topic: I think it's a good idea to add support for the Athough it's in theory superfluous with passing an actual filesystem object (and you can create an Implementation wise, I think we can do something like:
|
Is your feature request related to a problem?
I store lots of data in a quilt bucket (i.e. S3 storage) and use s3fs with geopandas to read data directly from the wire, like
often, that works perfectly. But depending on the botocore/sf3fs/aiobotocore/fsspec version collection, it can throw
botocore.exceptions.NoCredentialsError: Unable to locate credentials
.Describe the solution you'd like
the pandas version of
read_parquet
supports passingstorage_options={"anon": True}
which I believe will get around that particular error, but in geopandas that argument fails withTypeError: read_table() got an unexpected keyword argument 'storage_options'
. It would be great ifgpd.read_parquet
would allow me to pass that arg as well.API breaking implications
None
Describe alternatives you've considered
I could probably read the file directly with pandas, then convert the serialized geometry column myself, but that would skirt the nice efficient implementation already in the geopandas version of read_parquet :)
The text was updated successfully, but these errors were encountered: