New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MODIS L2 reader #611
Add MODIS L2 reader #611
Conversation
44842e7
to
c9da526
Compare
I'm not sure which was done by you or by @BENR0, so not sure who to ask: does the interpolation for the cloud mask need to be any different than the other products. I've had code for reading cloud masks in the past (Polar2Grid project) and I don't remember need fancy interpolation like this. Or maybe I'm used to dealing with a different cloud mask. You could also remove Also, maybe Nice work. |
@djhoese you probably are talking about the 250m cloud mask? The mod35_l2 product, in addition to the 1000m resolution product, has a 250m product which is encoded in the last two byte where each bit is one subpixel. |
Yes I meant the mod35 cloud mask. I wasn't aware of a 250m resolution version of the product. I'm also confused by the YAML file for the new reader since the longitude/latitude datasets have 1000m and 500m versions, but the cloud mask has 1000m and 250m. How does that work? |
The file contains dataset for latitude and longitude of 5km resolution. We use I assumed the YAML reader should list all the available resolutions. Hence:
|
The resolutions in the YAML should match between the datasets. So if it is possible to produce lat/lon/cloud_mask at 5km, 1km, 250m then all those resolutions should be listed in the YAML. It is then up to the file handler to do the interpolation to produce that level of data. Let me know if I'm missing something that makes this impossible. |
There is no issue to produce the cloud mask at all these resolutions. |
@mraspaud didn't someone (you?) add this to geotiepoints recently? Do we need to make another release? |
I was talking about this with Kathy Strabala (Project Manager of IMAPP) and she brought up (from what she remembers) that a QA flag should also be used when generating the 250m version of the product to know if the mask was valid. Her recollection was that without checking the QA flag you can't be sure if the mask is "cloud" or invalid because the default pixel value is "cloud" or something like that. Does this sound familiar? Are you handling a quality mask like this at all? |
No currently the QA flags are not used. I only implemented the 250m mask for completeness sake at first because from my experience the 250m cloud mask is not really useable. I checked the user guide again and came to the conclusion that from my point of view the cloud mask reader should itself make no decisions based on the QA flags because different users have different requirements. Thus the final decision should be up to the user. To this extent the reader, next to the cloud mask SDS, should also read the quality assurance SDS. Then the user has all needed information to decide which pixels to trust or not. |
347a865
to
5532354
Compare
5532354
to
b5deba6
Compare
So far this PR creates a reader for MODIS L2 product that only manages cloud mask, latitudes, and longitudes dataset. Is this enough? One could provide new dataset by adding them on top of this new reader. Test https://github.com/pytroll/satpy/blob/master/satpy/tests/reader_tests/test_modis_l1b.py#L625-L628 introduced by c322412 fails because the logic to parse MODIS HDF file metadata changed. See discussion in #626 and commit dc4c87d. To close this PR we should solve this logic duplication first. |
Actually I think this is kind of misleading. As far as I can see this is not a Modis Level 2 reader itself but a reader for the second and third bit of the level 2 cloud mask. I don't know how many other Modis level 2 products are bit encoded like the cloud mask. Ideally there would be a base reader for the level 2 products which also holds the functions for bitstripping and then there could be specific readers for each product. Right now we could then add a reader for the cloud mask where there could be datasets for all relevant bits in the readers yaml file. I hope that's not to confusing what I mean? |
Indeed, the PR title is misleading: it is rather a reader for MODIS 35 L2, than a generic reader for MODIS L2 products.
Agreed. There are different files for different products, so I guess it makes sense to have different readers too. To validate the current PR, I suggest to:
|
Thanks for all the info!
datasets:
cloud_mask:
name: cloud_mask
resolution: [5000, 1000]
file_key: Cloud_Mask
bits: [3, 4, 5]
file_type: hdf and the reader would know that if the |
I am not sure I understand the purpose of adding the |
Ok, I don't know the details of the format, sorry. The point was making the reader generic enough so that dataset specificities wouldn't have to be hardcoded. So maybe in this case, something like the following would be more appropriate: datasets:
cloud_mask:
name: cloud_mask
resolution: [5000, 1000]
file_key: Cloud_Mask
bits:
- [1000, 5000]:
[1, 2]
- [250, 500]:
[4, 5]
file_type: hdf Or even better, if there was a description of this in the file itself that was machine-readable maybe ? |
A brief description of the bit information is contained in the dataset metadata description. That could be parsed I guess, haven't really thought about that but is a good idea. I agree with @mraspaud that the bit flags shouldn't be hard coded. It's just a pecularity of the Modis file that they coded the different datasets in the bits instead of as separate datasets like it is in other satellite data formats. So it would be consistent with the other readers when the datasets are layed out in the yaml file. That's why I put them there when I did the first PR. I am not sure about the different resolutions though. I mean yes the mod25_l2 files only have the 5km geolocations but the data itself is in 1km resolution (and 250m but that is a special case) so I would think an end user would expect to get the data in the 1km resolution, so my opinion is the geolocation should be interpolated by default if a dataset gets loaded. If the user wants less resolution he can resample/slice the data afterwards. |
Indeed the cloud mask dataset in MODIS file has an attribute
For me to understand clearly, instead of hard coding the bit flags, it should rather be parsed in the dataset arritubes?
It makes sense. However, could we still provide the 5km resolution dataset location, since it is still the only raw dataset available? |
Out of curiosity, what would you use the 5km geolocation data for ? |
"Safety" measure I would say, if the use wants to interpolate the data himself instead of relaying trustfully on |
I suppose if you load the lons and lats by hand and say the resolution should be 5000, it won't interpolate. |
So location data are loaded with interpolation with a 1km resolution by default, and only accessible at 5km resolution by giving the specific resolution: |
Introduce bits and byte key value in YAML to read elements within HDF EOS datasets. Only include cloud_mask dataset so far. Add basic tests.
Remove unnecessary comment.
Add tests for the dimension of QA dataset
- Add parameters to the YAML to triggers specific bit parsing. - Add quality assurance filter.
I found a related issue in the pyhdf repository: fhs/pyhdf#15 It looks like you are creating INT8 fields but setting a fill value of 255. If INT8 is a signed 8-bit integer (which I would guess it is) then the maximum valid value is 128. You could try changing the data type to UINT8. |
c27170b
to
159c6de
Compare
It makes totally sense. I've changed the fill value accordingly. Now another error has been raised. I believe it to be related to the version of geotiepoints. It must use pytroll/python-geotiepoints#15 for a correct interpolation. |
@mraspaud Any ideas/updates ^? |
If the geotiepoints patch works, I'll merge it and make a release, so we can finish this one up :) |
Codecov Report
@@ Coverage Diff @@
## master #611 +/- ##
==========================================
+ Coverage 80.67% 81.07% +0.39%
==========================================
Files 149 152 +3
Lines 21661 21857 +196
==========================================
+ Hits 17475 17720 +245
+ Misses 4186 4137 -49
Continue to review full report at Codecov.
|
Thank you @mraspaud, all the datasets are not yet managed by the reader. However, I think they can be added later and the current PR can be merged. |
Sounds good, we'll merge this for the 0.15 release (planned in 2 weeks) |
@LTMeyer This pull request is causing major issues with loading L1B data now. Do you have some level 2 data files that I can access to verify some things? Main question now is if the L2 files geolocation is 1km or 5km resolution? |
I'm sorry to read that. L2 files come with geolocation datasets that have a 5km resolution. It has been decided to load the interpolated 1km resolution by default, and to load the 5km resolution only on request (c.f. #611 (comment)). |
I made some progress and downloaded a MOD35 file from LAADS. Another problem with this reader is that it didn't have any coordinates provided for the cloud mask so no |
This must have been getting lost in the refactorization. In the original PR #540 coordinates were included even though at that time due to the bug with the scan width in geotiepoints geolocation files needed to be supplied (#540 (comment)). I hoped that the 250m could get interpolated from the interpolated 1km coordinates. |
Maybe I wasn't using the newest geotiepoints when I first tested this. |
This PR is a continuation of #540
HDFEOSBaseFileReader
The parsing of file metada has been modified to avoid dictionary with too many levels.
Rewrite and rename
hdfeos_mod35.yaml
because yaml file reader was unable to process it.Location data are interpolated by
geotiepoints
functions introduced in PR #15git diff origin/master -- "*py" | flake8 --diff
Some work must still be done:
Rewritehdfeos_l1b.yaml
to meet standards of other reader yaml files;Add tests for L1b products;