New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SAFE SAR azimuth noise array construction #1941
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1941 +/- ##
=======================================
Coverage 93.52% 93.52%
=======================================
Files 277 277
Lines 41226 41249 +23
=======================================
+ Hits 38556 38579 +23
Misses 2670 2670
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I saw that bandit issue and it talks about parsing untrusted XML. I considered doing a |
At least at SMHI, we do download data automatically and process is without manual intervention. The sources we download from we trust, but through a man in the middle attack, the satellite data/xml could be tempered with. In any case, as I said earlier, since all servers are now connected to the internet in one way or another, we should strive to make satpy as secure as possible. One obvious way to perform an attack is to feed satpy malicious satellite data, and I think we should therefore minimise the risks we take, eg by parsing XML files with defused libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying the security of the source of the XML files. Makes sense. This looks good to me except I want to make sure that all of this new code is either working with numpy arrays or dask arrays. I know the docstrings say that it creates dask slices, but I only see da.full
. Could/should this be replaced with a map_blocks
call that generates these? Or since it is XML do we not know that until we've read the entire file?
Forgive me if I'm way wrong about this. I'm not familiar with what this code is trying to do.
As you can see in the picture, the chunks are unfortunately not aligned with each other, and sometimes of varying width, so The only thing we are generating is nan-dask array to pad the full array to it's final dimensions. We are not using numpy because I think concatenating numpy arrays with dask array would trigger computations... |
The implementation of the azimuth noise array in the SAR-C SAFE reader was assuming a nicer arangement of data blocks than what can happen in reality.
In this PR, we fix this issue by filling in missing block that are between existing data blocks, so that each horizontal slice of the azimuth noise array has the correct number of columns.
Another thing this PR fixes is the use of the insecure
xml.etree
library by replacing it bydefusedxml
as recommended by bandit. This adds a new dependency for the SAR reader, and also for the satpy tests in general.