Fix nodefilter not working with Sentinel-3 products #573

valpamp · 2022-09-23T14:35:29Z

Fixes issue #566, which was caused by the code not handling the product folder path and manifest filename based on the type of product. Since most Sentinel products have a folder path ending in .SAFE and a manifest file named manifest.safe, this was an issue only with products such as Sentinel-3, whose folder path ends in .SEN3 and whose manifest file is named xfdumanifest.xml.
By fetching the full odata metadata of the queried product, it is possible to get the correct folder path, and sets the manifest filename accordingly.

If there are products whose folder path does not end in '.SAFE' or '.SEN3', they should be handled accordingly.

Introduced the 'manifest_name' key in the odata dictionary in order to properly handle the product path and manifest filename depending on the type of product.

The _download_with_node_filter function now uses the 'Filename' and 'manifest_name' dictionary keys to create the product folder and manifest filename depending on the type of product queried by the user. The _dataobj_to_node_info function sets the "node_path" dictionary key using the path variable rather than recalling the "href" key of the dataobj_info object, so that the initial './' characters are removed if they were present in the original object. Not doing so causes the failure of the nodefilter function for Sentinel-3 products. Partial downloads of other products work normally.

Now the product folder path and the manifest filename are set based on the 'title' key of the odata dictionary. This ensures that Sentinel-3 paths and manifest filenames are set correctly, allowing nodefilters to function also with these products. The previously introduced 'Filename' key was removed in favour of the newly introduced 'manifest_name', which is then used to build the paths throughout the code.

Build the product folder path without querying the full metadata through get_product_odata, making use of the new 'folder_path' key of the product_info dictionary.

avalentino

LGTM

j08lue · 2022-09-26T13:22:58Z

I agree, looks good and is an improvement in terms of avoiding hard-coded names.

CI shows you need to Black-format, please, then we'll know how the rest of the tests go.

valpamp · 2022-09-26T14:38:57Z

CI shows you need to Black-format, please, then we'll know how the rest of the tests go.

I used the black linter included in Spyder and the number of changes in the code is much higher than I expected. From what I can see basically all of them seem to be breaking down lines that are longer than 79 characters. Is this ok? Or did I use the wrong code formatter?

avalentino · 2022-09-26T15:45:39Z

@valpamp the configuration for black is in the pyproject.toml file

sentinelsat/pyproject.toml

Lines 1 to 16 in 3682e43

    
           [tool.black] 
        
           line-length = 100 
        
           target_version = ['py34'] # required for backwards compatability with py27 and trailing commas 
        
           include = '\.pyi?$' 
        
           exclude = ''' 
        
           ( 
        
             /( 
        
               \.cache 
        
               \.git 
        
               \.sentinelsat.egg-info 
        
               _build 
        
               docs 
        
             )/ 
        
           ) 
        
           '''

The line-length is supposed to be 100.

valpamp · 2022-09-27T09:14:15Z

I finally managed to run black with using the proper configuration file, it should be ok now.

j08lue · 2022-09-27T13:29:38Z

Great, now Black is happy, but one test is failing because the product info we expect (from recordings) is different from what we have now, where the folder_path and manifest_name properties are added. https://github.com/sentinelsat/sentinelsat/actions/runs/3134363937/jobs/5090260771#step:8:80

Would you try to re-record that cassette, @valpamp? There are some instructions on that here: https://github.com/sentinelsat/sentinelsat/#tests If it does not work for you, we can help.

Before you do, I am not super happy with the naming of folder_path. It is more like a filename - that, for unzipped products, is more like a folder name. You dub it filename yourself here:
https://github.com/sentinelsat/sentinelsat/pull/573/files#diff-8539d03b31a9a42eeab364c8eafc747a91bef19778c6cd4fc3630488c1fc7439R1075

So should we still consider renaming it to target_filename or product_name or something? I think folder_path is perhaps a bit too generic for what this property is.

valpamp · 2022-09-28T17:21:26Z

product_name sounds good to me, I will make sure to rename it. Regarding the test, I have skimmed through the vcrpy documentation, and if I understand correctly I should just run pytest on the failing test using the --vcr-record all argument to force the cassette to be re-recorded, and if everything goes right I should see the associated yaml file change in the vcr_cassettes folder.

j08lue · 2022-09-29T09:33:01Z

Great, thanks, @valpamp! Sounds right with --vcr-record all.

Removed the initial './' to match the actual content of the product_info set.

valpamp · 2022-11-21T16:34:56Z

Hey everyone, sorry this took so long, I had to turn in my PhD thesis and I have been completely absorbed in that for a while.

Today I finally ran pytest in my conda environment, and the test failure seems to be caused by a mismatch between the content of the product_node set and the second string:

sentinelsat/tests/test_download.py

Line 376 in 7fd4122

    
           assert set(product_info["nodes"]) == {"./manifest.safe", "./preview/map-overlay.kml"}

The product_node set contains the string 'preview/map-overlay.kml', but it is being checked against './preview/map-overlay.kml'. Removing the initial './' from the control string and running pytest with --vcr-record all apparently solves the issues, and the cassettes are re-recorded successfully.

However, if I now run pytest without the --vcr-record all option, test_download_nodes fails because of an xml parsing error. The culprit is the manifest.SAFE file of the S1A_IW_GRDH_1SDV_20210722T182826_20210722T182851_038894_0496E2_5BD9 product being downloaded during the test: checking the manifest.SAFE file, I can clearly see that it ends abruptly at the line where the xml parsing library complains, but I have no idea why. Any suggestions?

j08lue · 2022-11-21T21:37:39Z

I can see the test is using the first of these recent Sentinel-1 products:

sentinelsat/tests/conftest.py

Line 293 in 7fd4122

    
           products = api.query(date=("NOW-1MONTH", None), identifier="*IW_GRDH*", limit=3)

sentinelsat/tests/test_download.py

Line 366 in 7fd4122

uuid = node_test_products[0]["id"]

Have you tried using a different one, e.g. the second from that query? Or try again tomorrow, when the query hopefully returns some different products. If the problem persists, it might be more systematic.

Thanks for your efforts - and good luck with the thesis!

valpamp · 2022-11-23T10:33:02Z

I ran more tests with the other two products and I am getting different errors. If I run the test with --vcr-record all everything works fine in both cases, but then if I re-run the test without the option I get a CannotOverwriteExistingCassette error, which does not make much sense to me since I just re-recorded the cassette while pointing to the same product. But maybe it's just me having no idea how VCR works 😅
I also tried seeing if something changed with the first product, but the parsing error persists for now, the xml still appears to be truncated.

Finally, yesterday I found and solved another weird issue with the node filter on Sentinel-2 products from 2018. But I will open another issue/PR for that.

j08lue · 2022-11-25T13:23:31Z

I can reproduce the issue with the truncated manifest.safe even on current main:

Run pytest -v --record-vcr all tests/test_download.py::test_download_product_nodes --> test passes
Run pytest -v tests/test_download.py::test_download_product_nodes --> test fails

______________________________________________________________________________________________________________________________________________________________________ test_download_product_nodes _______________________________________________________________________________________________________________________________________________________________________

api = <sentinelsat.sentinel.SentinelAPI object at 0x11012bdc0>, tmpdir = local('/private/var/folders/fj/h7p8zmrs5pv538jqx4b0z7c80000gn/T/pytest-of-jonassolvsteen/pytest-8/test_download_product_nodes0')
node_test_products = [{'Creation Date': datetime.datetime(2021, 7, 22, 19, 35, 3, 413000), 'Ingestion Date': datetime.datetime(2021, 7, 22,...ime(2021, 7, 22, 19, 24, 14, 449000), 'Online': True, 'date': datetime.datetime(2021, 7, 22, 17, 43, 56, 232000), ...}]

    @pytest.mark.vcr
    @pytest.mark.scihub
    def test_download_product_nodes(api, tmpdir, node_test_products):
        uuid = node_test_products[0]["id"]
        product_dir = node_test_products[0]["title"] + ".SAFE"
        expected_path = tmpdir.join(product_dir)
    
        nodefilter = make_path_filter("*preview/*.kml")
>       product_info = api.download(uuid, str(tmpdir), checksum=True, nodefilter=nodefilter)

tests/test_download.py:371: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sentinelsat/sentinel.py:590: in download
    return downloader.download(id, directory_path)
sentinelsat/download.py:133: in download
    return self._download_with_node_filter(id, directory, stop_event)
sentinelsat/download.py:164: in _download_with_node_filter
    node_infos = self._filter_nodes(manifest_path, product_info, self.node_filter)
sentinelsat/download.py:779: in _filter_nodes
    xmldoc = etree.parse(manifest)
/opt/homebrew/Cellar/python@3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/xml/etree/ElementTree.py:1222: in parse
    tree.parse(source, parser)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <xml.etree.ElementTree.ElementTree object at 0x1101293c0>, source = <_io.BufferedReader name='/private/var/folders/fj/h7p8zmrs5pv538jqx4b0z7c80000gn/T/pytest-of-jonassolvsteen/pytest-8/test_download_product_nodes0/S1A_IW_GRDH_1SDV_20210722T182826_20210722T182851_038894_0496E2_5BD9.SAFE/manifest.safe'>
parser = <xml.etree.ElementTree.XMLParser object at 0x10773f520>

    def parse(self, source, parser=None):
        """Load external XML document into element tree.
    
        *source* is a file name or file object, *parser* is an optional parser
        instance that defaults to XMLParser.
    
        ParseError is raised if the parser fails to parse the document.
    
        Returns the root element of the given source document.
    
        """
        close_source = False
        if not hasattr(source, "read"):
            source = open(source, "rb")
            close_source = True
        try:
            if parser is None:
                # If no parser was specified, create a default XMLParser
                parser = XMLParser()
                if hasattr(parser, '_parse_whole'):
                    # The default XMLParser, when it comes from an accelerator,
                    # can define an internal _parse_whole API for efficiency.
                    # It can be used to parse the whole source without feeding
                    # it with chunks.
>                   self._root = parser._parse_whole(source)
E                   xml.etree.ElementTree.ParseError: unclosed token: line 292, column 69

/opt/homebrew/Cellar/python@3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/xml/etree/ElementTree.py:580: ParseError

The copy of the file S1A_IW_GRDH_1SDV_20210722T182826_20210722T182851_038894_0496E2_5BD9.SAFE/manifest.safe that pytest picks up from its temp dir is truncated after 292 lines.

j08lue · 2022-11-25T13:30:10Z

The recorded fixture tests/fixtures/vcr_cassettes/data/manifest.safe appears to be complete (332 lines). It is the copy that pytest creates in its temporary directory that is truncated for some reason.

j08lue · 2022-11-25T13:49:01Z

Btw, why is the manifest.safe even downloaded when the node filter pattern in the test does not seem to include it: "*preview/*.kml"?

sentinelsat/tests/test_download.py

Lines 370 to 371 in 8211224

    
           nodefilter = make_path_filter("*preview/*.kml") 
        
           product_info = api.download(uuid, str(tmpdir), checksum=True, nodefilter=nodefilter)

avalentino · 2022-11-25T14:52:16Z

@j08lue the manifest contains the full list of files included in the product. It is always downloaded and parsed to extract the file list (and associate metadata), then filters use this information to decide if a file shall be actually downloaded.

To keep things consistent since the rest of the code uses "directory" instead of "folder".

codecov · 2023-03-10T11:11:46Z

Codecov Report

Merging #573 (4e13ae5) into main (7fd4122) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #573      +/-   ##
==========================================
+ Coverage   84.44%   84.51%   +0.07%     
==========================================
  Files           7        7              
  Lines        1292     1298       +6     
  Branches      335      322      -13     
==========================================
+ Hits         1091     1097       +6     
  Misses        145      145              
  Partials       56       56

Impacted Files	Coverage Δ
sentinelsat/download.py	`73.15% <100.00%> (ø)`
sentinelsat/sentinel.py	`91.25% <100.00%> (+0.07%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

No longer supported by the GitHub action.

Windows and MacOS started to require a token for some reason. Might be a bug.

j08lue

Thanks for completing this long-standing PR, @valgur.

valpamp added 2 commits September 23, 2022 16:12

Update sentinel.py

0e260d6

Introduced the 'manifest_name' key in the odata dictionary in order to properly handle the product path and manifest filename depending on the type of product.

kr-stn requested a review from valgur September 23, 2022 15:25

kr-stn added hacktoberfest-accepted bug labels Sep 23, 2022

valpamp added 2 commits September 23, 2022 19:07

Avoid calling full=True in get_product_odata

e0f5498

Build the product folder path without querying the full metadata through get_product_odata, making use of the new 'folder_path' key of the product_info dictionary.

avalentino approved these changes Sep 23, 2022

View reviewed changes

valpamp added 4 commits September 27, 2022 10:47

Run black code-formatter with line lenght 100

3c30326

Run black code formatter with line length 100

2eca799

Ran black formatter with correct configuration file

ed050ec

Ran black formatter with correct configuration file

0b301c0

Corrected test_download_product_nodes assertion

bcc5f09

Removed the initial './' to match the actual content of the product_info set.

Valerio Pampanoni added 2 commits November 21, 2022 17:44

Re-recorded test_download cassettes

7ace32a

Merge branch 'main' of https://github.com/valpamp/sentinelsat

8211224

Merge branch 'main' into main

f5f91b8

valgur added 3 commits March 10, 2023 11:20

Use product_root_dir for the new custom OData field

99b4aeb

To keep things consistent since the rest of the code uses "directory" instead of "folder".

Update a broken test

882472d

Re-record all cassettes

d6ccc1b

Drop Python 3.6 from CI

c2a53e8

No longer supported by the GitHub action.

valgur approved these changes Mar 10, 2023

View reviewed changes

valgur requested a review from j08lue March 10, 2023 11:34

valgur force-pushed the main branch from 14e1f63 to 9dbd8ba Compare March 10, 2023 11:38

CI: Upload Codecov report only on Ubuntu

4e13ae5

Windows and MacOS started to require a token for some reason. Might be a bug.

valgur force-pushed the main branch from 9dbd8ba to 4e13ae5 Compare March 10, 2023 12:04

j08lue approved these changes Mar 10, 2023

View reviewed changes

valgur merged commit aa169a8 into sentinelsat:main Mar 10, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nodefilter not working with Sentinel-3 products #573

Fix nodefilter not working with Sentinel-3 products #573

valpamp commented Sep 23, 2022

avalentino left a comment

j08lue commented Sep 26, 2022

valpamp commented Sep 26, 2022

avalentino commented Sep 26, 2022

valpamp commented Sep 27, 2022

j08lue commented Sep 27, 2022 •

edited

valpamp commented Sep 28, 2022

j08lue commented Sep 29, 2022

valpamp commented Nov 21, 2022

j08lue commented Nov 21, 2022

valpamp commented Nov 23, 2022

j08lue commented Nov 25, 2022 •

edited

j08lue commented Nov 25, 2022 •

edited

j08lue commented Nov 25, 2022

avalentino commented Nov 25, 2022

codecov bot commented Mar 10, 2023 •

edited

j08lue left a comment

Fix nodefilter not working with Sentinel-3 products #573

Fix nodefilter not working with Sentinel-3 products #573

Conversation

valpamp commented Sep 23, 2022

avalentino left a comment

Choose a reason for hiding this comment

j08lue commented Sep 26, 2022

valpamp commented Sep 26, 2022

avalentino commented Sep 26, 2022

valpamp commented Sep 27, 2022

j08lue commented Sep 27, 2022 • edited

valpamp commented Sep 28, 2022

j08lue commented Sep 29, 2022

valpamp commented Nov 21, 2022

j08lue commented Nov 21, 2022

valpamp commented Nov 23, 2022

j08lue commented Nov 25, 2022 • edited

j08lue commented Nov 25, 2022 • edited

j08lue commented Nov 25, 2022

avalentino commented Nov 25, 2022

codecov bot commented Mar 10, 2023 • edited

Codecov Report

j08lue left a comment

Choose a reason for hiding this comment

j08lue commented Sep 27, 2022 •

edited

j08lue commented Nov 25, 2022 •

edited

j08lue commented Nov 25, 2022 •

edited

codecov bot commented Mar 10, 2023 •

edited