Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download only enchanced_measrement.nc file from a Sen-3 SRAL product using node filter ? #566

Closed
usamarehan557 opened this issue Jul 3, 2022 · 13 comments
Labels

Comments

@usamarehan557
Copy link

Hi,

I'm trying to download only enhanced_measurement.nc file from a SEN 3 SRAL product using node_filter but I'm getting error every time:

Unexpected nav segment Navigation Property: org.apache.olingo.odata2.core...

I think I'm not making my node_filter right so kindly help me make a node_filter for my case.

Thanks alot.

@avalentino
Copy link
Contributor

Hi @usamarehan557, could you please provide more information?
A complete example reproducing the issue and a complete error traceback would surely help.

@usamarehan557
Copy link
Author

`def print_products():
api = SentinelAPI(username, password, 'https://apihub.copernicus.eu/apihub')
sukkur_footprint = geojson_to_wkt(read_geojson('sukkur_search_polygon.geojson'))

products = api.query(
    sukkur_footprint,
    relativeorbitnumber=105,
    date=('20220201', '20220228'),
    platformname='Sentinel-3',
    filename='S3A_*',
    producttype='SR_2_LAN___',
    timeliness='Non Time Critical',
    instrumentshortname='SRAL',
    productlevel='L2')


# print(products)

if products:
    geojson = api.to_geojson(products)
    # geodataframe = api.to_geodataframe(products)
    properties = geojson.features[0].properties
    uuid = properties['uuid']
    title = properties['title']
    file_path = '*measurement/*'
    path_filter = make_path_filter(file_path)
    product_info = api.get_product_odata(properties['uuid'])
    is_online = product_info['Online']

    if is_online:
        print(f'Product {uuid} is online. Starting download. Title : {title}.SEN3')
        # print(geodataframe)
        try:
            api.download(uuid, nodefilter=path_filter)
            # requests.get('https://scihub.copernicus.eu/dhus/odata/v1/Products('{uuid}')/Nodes('{title}')/Nodes(
            # 'enhanced_measurement.nc')/$value', headers={})
        except Exception as e:
            print(e)
    else:
        print(f'Product {uuid} is not online.')
        # api.trigger_offline_retrieval(uuid)
else:
    print('Products not found')`

Error Log:

Product a8cf9e3f-f91e-473e-9067-811969feda7f is online. Starting download. Title : S3A_SR_2_LAN____20220217T052540_20220217T061609_20220314T222335_3029_082_105______LN3_O_NT_004.SEN3 HTTP status 500 Internal Server Error: Unexpected nav segment Navigation Property: org.apache.olingo.odata2.core.edm.provider.EdmNavigationPropertyImplProv@65a4a2db, Target Entity Set: org.apache.olingo.odata2.core.edm.provider.EdmEntitySetImplProv@7c40a3ab, Key Predicates: [KeyPredicate: literal=manifest.safe, propertyName=org.apache.olingo.odata2.core.edm.provider.EdmSimplePropertyImplProv@6069d536]

Process finished with exit code 0

@avalentino
Copy link
Contributor

It seems that in S3 products the mainfest file is named xfdumanifest.xml (instead of manifest.xml).
IMHO the current implementation does not support this.

@valpamp
Copy link
Contributor

valpamp commented Sep 22, 2022

I encountered the same error while trying to download individual bands of a Sentinel 3 SY_2_SYN___ product. Could anyone point me to the code that handles partial downloads? Would a PR be welcome on this issue?

@avalentino
Copy link
Contributor

IMHO it would be indeed appreciated.
IMHO the key problem is in the sentinelsat.SentinelAPI._get_manifest method:

def _get_manifest(self, product_info, path=None):

@valpamp
Copy link
Contributor

valpamp commented Sep 22, 2022

Alright after delving into the code a little (it was my first time looking at the API), I managed to make it work:
Untitled

The issue lies with the fact that the code always assumes that the folder path of the product ends with '.SAFE', while Sentinel 3 paths end with '.SEN3', and that the manifest file is named 'manifest.safe', while in Sentinel 3 products it is named 'xfdumanifest.xml'.

More in detail, the portions of the code that prevent node filtering from working with Sentinel-3 are the following:

  1. The _get_manifest function always sets the "node_path" field to "manifest.safe"
    node_info["node_path"] = "./manifest.safe"
    and also appends "manifest.safe" to the return value of the _path_to_url function
    url = self._path_to_url(product_info, "manifest.safe", "json")
  2. The _path_to_url function always assumes that the folder path ends with ".SAFE" returns self._get_odata_url(id, f"/Nodes('{title}.SAFE')/{path}{urltype}")
    url = self._path_to_url(product_info, "manifest.safe", "value")
  3. Finally, the _download_with_node_filter function always assumes that the folder path ends in '.SAFE' and that the manifest file is named 'manifest.safe'
    product_path = Path(directory) / (product_info["title"] + ".SAFE")
    product_info["node_path"] = "./" + product_info["title"] + ".SAFE"
    manifest_path = product_path / "manifest.safe"

Simply replacing '.SAFE' with '.SEN3' and 'manifest.safe' with 'xfdumanifest.xml' allows the _filter_nodes function to correctly find the manifest file and to correctly return the filtered sub-products.

A simple fix would be to rewrite these portions of code while properly handling products that have a differently named path and manifest file. Is there a recommended way to go at this? Could one infer the product type using the OData output and set the manifest filename and path ending accordingly?

Furthermore, are there any other Sentinel products that have differently ending paths and differently named manifest files and that should be handled accordingly?

@avalentino
Copy link
Contributor

IMHO adding a couple of private methods for getting the "manifest" filename and the "product directory" name and using them consistently across the code would be the better solution.
Of course I let to @valgur and other developers to provide better hints about the implementation details.

@valpamp
Copy link
Contributor

valpamp commented Sep 23, 2022

Querying the full detailed product metadata when calling the get_product_odata function allows us to get the product directory using the 'Filename' key:

values = _parse_odata_response(response.json()["d"])

Unfortunately there is no mention of the manifest filename, and I see no way to retrieve it even going through the raw scihub response. However, the issue could very easily be fixed by

  1. calling the get_product_odata function with full=True to retrieve the detailed metadata
  2. using the 'Filename' key to get the product path
  3. creating a 'Manifest name' key using a simple conditional statement:
if values['Filename'].endswith('.SAFE'):
    values['Manifest name'] = 'manifest.safe'
elif values['Filename'].endswith('.SEN3'):
    values['Manifest name'] = 'xfdumanifest.xml'
  1. Use the 'Filename' and 'Manifest name' keys of the product_info and node_info to make sure that the functions I mentioned in my previous comment create the path and the filename correctly

I tried this in my local anaconda environment and tested it with Sentinel-1, 2 and 3, and everything seems to work fine.
Since the current code simply assumes that all product paths end in .SAFE and all manifest files are named manifest.safe, I think this fix should not break anything. I will soon fork the code and submit a PR in case the repo owners are interested.

@kr-stn
Copy link
Member

kr-stn commented Sep 23, 2022

PRs are always encouraged to fix bugs.

For now there was no need to change the naming, since that was designed when communication by Copernicus was that all files will end in .SAFE and will have a manifest.xml. They obviously changed that data format and we should account for this now.

One consideration that I can see becoming an issue is the performance of the query. If I am understanding you correctly the proposal is to use full=True for every query? If that is the case we are exploding the data that needs to transferred by a multitude and will likely run into performance issues / increased query time / timeout issues.

If that is the case we might have to fork this earlier in the application logic to get a list of the .SEN3 data and only treat those differently. I don't think using full=True for all queries will be performant enough.

@valpamp
Copy link
Contributor

valpamp commented Sep 23, 2022

If I am understanding you correctly the proposal is to use full=True for every query? If that is the case we are exploding the data that needs to transferred by a multitude and will likely run into performance issues / increased query time / timeout issues.

Yes, I am running the fork locally and I'm not seeing any noticeable slowdown, but I understand that server-side one may want to limit the traffic as much as possible. It would be ideal if the folder path and manifest filename were to be passed along with the basic query, but the latter is not included even in the full version, and I have no idea were to look to try and approach this.

@avalentino
Copy link
Contributor

@valpamp IMHO from the "title" you should be able to identify the sensor because Sentinel product have a fixed naming. The product name starts with S1 for Sentinel-1, S2 for Sentinel-2 and so on.
This information should be enough to switch to the proper extension ".SAFE" vs ".SEN3" and manifest file name.

@valpamp
Copy link
Contributor

valpamp commented Sep 23, 2022

@valpamp IMHO from the "title" you should be able to identify the sensor because Sentinel product have a fixed naming. The product name starts with S1 for Sentinel-1, S2 for Sentinel-2 and so on. This information should be enough to switch to the proper extension ".SAFE" vs ".SEN3" and manifest file name.

If this is enough I can just check if the "title" field starts with 'S3' and set the paths accordingly. It will only work for Sentinel-3 products, but I don't think there are any others that need this treatment at the moment.

@ngilles
Copy link

ngilles commented Sep 27, 2022

Yeah, "data fomatting" in Sentinel products is a mess... and slow bureaucracy didn't help in getting things aligned. Technically there is a spec for "Sentinel-SAFE" which was a sort of extension (loosening in some case) of the "SAFE" packaging format which is why the groups working on Sentinel-3 opted for the .SEN3 extension (Sentinel-2 peeps stuck to .SAFE) . This spec allows the manifest to be called either of (IIRC):

  • manifest.safe
  • manifest.xml
  • xfdumanifest.xml
  • or any of those 3 in capital letters (because you know, just in case someone decided to store these on an OpenVMS system 😉)

I worked on the Sentinel 3 processors (the software that makes the images), and when we were asked to review/comment on the spec, I pushed for for manifest.safe, because it's the one with the least generic extension meaning one could pick the .safe extension to associate it with their preferred product manipulation tool. Somehow for Sentinel-3, xfdumanifest.xml was chosen and remained, so now some tech support people have to answer questions like "why does my product open a web browser?"

I guess I didn't shout loud enough, I will totally accept the blame for this one 🤣

All this to say that yes, the product name (title) is pretty well defined (though there are some revisions), they will always be formatted this way, and Senintel 3 products will alway start with with S3?_ (here's the official spec: https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-3-olci/naming-convention)

@j08lue j08lue closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants