Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SAFE SAR azimuth noise array construction #1941

Merged
merged 5 commits into from Dec 15, 2021

Conversation

mraspaud
Copy link
Member

@mraspaud mraspaud commented Dec 15, 2021

The implementation of the azimuth noise array in the SAR-C SAFE reader was assuming a nicer arangement of data blocks than what can happen in reality.

In this PR, we fix this issue by filling in missing block that are between existing data blocks, so that each horizontal slice of the azimuth noise array has the correct number of columns.

Another thing this PR fixes is the use of the insecure xml.etree library by replacing it by defusedxml as recommended by bandit. This adds a new dependency for the SAR reader, and also for the satpy tests in general.

  • Tests added
  • Fully documented

@codecov
Copy link

codecov bot commented Dec 15, 2021

Codecov Report

Merging #1941 (0953fd9) into main (ebc57ff) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1941   +/-   ##
=======================================
  Coverage   93.52%   93.52%           
=======================================
  Files         277      277           
  Lines       41226    41249   +23     
=======================================
+ Hits        38556    38579   +23     
  Misses       2670     2670           
Flag Coverage Δ
behaviourtests 4.78% <0.00%> (-0.01%) ⬇️
unittests 94.06% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/sar_c_safe.py 97.23% <100.00%> (+0.11%) ⬆️
satpy/tests/reader_tests/test_sar_c_safe.py 100.00% <100.00%> (ø)
satpy/tests/compositor_tests/test_viirs.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebc57ff...0953fd9. Read the comment docs.

@coveralls
Copy link

coveralls commented Dec 15, 2021

Coverage Status

Coverage increased (+0.003%) to 93.989% when pulling 0953fd9 on mraspaud:fix-sar-azimuth-noise into ebc57ff on pytroll:main.

@djhoese
Copy link
Member

djhoese commented Dec 15, 2021

I saw that bandit issue and it talks about parsing untrusted XML. I considered doing a # nosec comment on the cases where XML parsing happens in satpy because isn't all our XML trusted XML?

@mraspaud
Copy link
Member Author

At least at SMHI, we do download data automatically and process is without manual intervention. The sources we download from we trust, but through a man in the middle attack, the satellite data/xml could be tempered with.
Another scenario which I have not heard of yet, but could happen, would be for example to have P2G as a service, where a user would upload their data through a web UI and get a batch of images back. There it would be even easier to attack a server running satpy.

In any case, as I said earlier, since all servers are now connected to the internet in one way or another, we should strive to make satpy as secure as possible. One obvious way to perform an attack is to feed satpy malicious satellite data, and I think we should therefore minimise the risks we take, eg by parsing XML files with defused libraries.

@mraspaud mraspaud closed this Dec 15, 2021
@mraspaud mraspaud reopened this Dec 15, 2021
@mraspaud
Copy link
Member Author

Here is what the azimuth noise array looks like that would crash the reader.

Figure_1

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the security of the source of the XML files. Makes sense. This looks good to me except I want to make sure that all of this new code is either working with numpy arrays or dask arrays. I know the docstrings say that it creates dask slices, but I only see da.full. Could/should this be replaced with a map_blocks call that generates these? Or since it is XML do we not know that until we've read the entire file?

Forgive me if I'm way wrong about this. I'm not familiar with what this code is trying to do.

@mraspaud
Copy link
Member Author

As you can see in the picture, the chunks are unfortunately not aligned with each other, and sometimes of varying width, so map_blocks isn't an alternative here. That's why it's a bit of a mess.

The only thing we are generating is nan-dask array to pad the full array to it's final dimensions. We are not using numpy because I think concatenating numpy arrays with dask array would trigger computations...

@mraspaud mraspaud merged commit cd435db into pytroll:main Dec 15, 2021
@mraspaud mraspaud deleted the fix-sar-azimuth-noise branch December 15, 2021 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants