Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on filesize? #359

Closed
ghost opened this issue Apr 17, 2020 · 7 comments
Closed

Query on filesize? #359

ghost opened this issue Apr 17, 2020 · 7 comments

Comments

@ghost
Copy link

ghost commented Apr 17, 2020

Is it possible to query on file size.
E.g. what if I would like to download only full tiles, which are over 750 MB?

@j08lue
Copy link
Contributor

j08lue commented Apr 17, 2020

E.g. what if I would like to download only full tiles, which are over 750 MB?

For that I would use the footprint area. Something like this (not tested), in the case of Sentinel-2:

import sentinelsat
import shapely.wkt

api = sentinelsat.SentinelAPI(**credentials)
aoi_geom = shapely.wkt.loads(aoi)
products = api.query(area=aoi, **query_parameters)
gdf = api.to_geodataframe(products).to_crs({"init": "epsg:3857"})  # get metadata in some UTM crs
cond = gdf.intersection(aoi_geom).area > 0.5e10  # only take scenes covering more than half a tile
api.download_all(gdf[cond])

@j08lue
Copy link
Contributor

j08lue commented Apr 17, 2020

But to answer your original question:

Is it possible to query on file size.

Have a look at the metadata that the query returns. Probably the file size is in there?

@ghost
Copy link
Author

ghost commented Apr 17, 2020

@j08lue hmm, like you implementation.
There was a comment here about OData example (https://sentinelsat.readthedocs.io/en/stable/api.html#odata-example)
Gonna try to fetch File Size from that.

@valgur
Copy link
Member

valgur commented Apr 17, 2020

You can filter by file size, but it is not implemented well on the server side. The values are strings like 1.5 GB and 653 MB and they are handled as strings when trying to filter by this field as well.

The closest thing to what you are trying to achieve would be

api.query(..., size="*MB")

The proper file size is only available from the OData API, unfortunately, and you will have to query it separately for each product.
See https://sentinelsat.readthedocs.io/en/stable/api.html#odata-example

Further filtering the returned product metadatas by the size field in Pandas, similar to what @j08lue suggested, would probably be the best approach.

@ghost
Copy link
Author

ghost commented Apr 17, 2020

@valgur Trying it, the OData way what you wrote.
But there are other problems as well:

products_df = api.to_dataframe(products)
## this one does not work, Pandas way
filtered_products = products_df[products_df['size'] >= 400]
## the string type have to converted a lot ('772.39 MB')
filesize = products_df['size'][0]

@valgur
Copy link
Member

valgur commented Apr 17, 2020

You can do

products_df['size'] = products_df['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
filtered_products = products_df.query('size >= 400')

to get only products >= 400 MB in size, for example.

@ghost
Copy link
Author

ghost commented Apr 17, 2020

@valgur Cool, best workaround :)

@kr-stn kr-stn closed this as completed Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants