Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MODIS L2 reader #611

Merged
merged 14 commits into from May 7, 2019
Merged

Add MODIS L2 reader #611

merged 14 commits into from May 7, 2019

Conversation

LTMeyer
Copy link
Contributor

@LTMeyer LTMeyer commented Feb 11, 2019

This PR is a continuation of #540

  • It factorizes common functions for L1b and L2 MODIS product through HDFEOSBaseFileReader
  • It adds features and tests for MODIS L2 Cloud Mask product based on @BENR0 's previous work.

The parsing of file metada has been modified to avoid dictionary with too many levels.
Rewrite and rename hdfeos_mod35.yaml because yaml file reader was unable to process it.
Location data are interpolated by geotiepoints functions introduced in PR #15

  • Tests added: Test MODIS L2 reader for longitude and cloud mask.
  • Tests passed
  • Passes git diff origin/master -- "*py" | flake8 --diff

Some work must still be done:

  • Rewrite hdfeos_l1b.yaml to meet standards of other reader yaml files;
  • Add tests for L1b products;
  • Verify that cloud mask reader output is consistent. So far, only the shape of the output array is checked, not its value;

@djhoese
Copy link
Member

djhoese commented Feb 14, 2019

I'm not sure which was done by you or by @BENR0, so not sure who to ask: does the interpolation for the cloud mask need to be any different than the other products. I've had code for reading cloud masks in the past (Polar2Grid project) and I don't remember need fancy interpolation like this. Or maybe I'm used to dealing with a different cloud mask.

You could also remove get_area_def for this reader since (unless I'm mistaken) it is a swath product.

Also, maybe modis_l2 instead of modis_l2_hdf? How many l2 products aren't HDF4 for MODIS?

Nice work.

@BENR0
Copy link
Collaborator

BENR0 commented Feb 14, 2019

@djhoese you probably are talking about the 250m cloud mask? The mod35_l2 product, in addition to the 1000m resolution product, has a 250m product which is encoded in the last two byte where each bit is one subpixel.

@djhoese
Copy link
Member

djhoese commented Feb 14, 2019

Yes I meant the mod35 cloud mask. I wasn't aware of a 250m resolution version of the product. I'm also confused by the YAML file for the new reader since the longitude/latitude datasets have 1000m and 500m versions, but the cloud mask has 1000m and 250m. How does that work?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Feb 14, 2019

The file contains dataset for latitude and longitude of 5km resolution. We use geotiepoints to interpolate to a thinner resolution i.e. 1km. Then, the goal is to interpolate again to get an even thinner resolution of 250m. This way we get the same resolutions for the location as for the cloud mask.

I assumed the YAML reader should list all the available resolutions. Hence:

  • for the location 5km (original data), 1km (interpolated), and 250m (interpolation function not yet checked)
  • for the cloud mask 1km and 250m (original data)

@djhoese
Copy link
Member

djhoese commented Feb 14, 2019

The resolutions in the YAML should match between the datasets. So if it is possible to produce lat/lon/cloud_mask at 5km, 1km, 250m then all those resolutions should be listed in the YAML. It is then up to the file handler to do the interpolation to produce that level of data. Let me know if I'm missing something that makes this impossible.

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Feb 14, 2019

There is no issue to produce the cloud mask at all these resolutions.
However, for the location I have to check if it's possible. I believe geotiepoints only offers so far interpolation from 5km to 1km and from 1km to 500m or 250m. I ignore if it can easily interpolate successively from 5km to 1km and then 250m. The reason is the interpolation algorithm using sensor zenith data that are not interpolated themselves.

@djhoese
Copy link
Member

djhoese commented Feb 14, 2019

@mraspaud didn't someone (you?) add this to geotiepoints recently? Do we need to make another release?

@djhoese
Copy link
Member

djhoese commented Feb 14, 2019

I was talking about this with Kathy Strabala (Project Manager of IMAPP) and she brought up (from what she remembers) that a QA flag should also be used when generating the 250m version of the product to know if the mask was valid. Her recollection was that without checking the QA flag you can't be sure if the mask is "cloud" or invalid because the default pixel value is "cloud" or something like that.

Does this sound familiar? Are you handling a quality mask like this at all?

@BENR0
Copy link
Collaborator

BENR0 commented Feb 15, 2019

No currently the QA flags are not used. I only implemented the 250m mask for completeness sake at first because from my experience the 250m cloud mask is not really useable. But you are probably right if the 250m mask is included as a possible datasets in the reader it should also check the QA flag if needed.

I checked the user guide again and came to the conclusion that from my point of view the cloud mask reader should itself make no decisions based on the QA flags because different users have different requirements. Thus the final decision should be up to the user. To this extent the reader, next to the cloud mask SDS, should also read the quality assurance SDS. Then the user has all needed information to decide which pixels to trust or not.

@mraspaud
Copy link
Member

@LTMeyer @BENR0 Any updates on this ? How far are we from completion ?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 13, 2019

So far this PR creates a reader for MODIS L2 product that only manages cloud mask, latitudes, and longitudes dataset. Is this enough? One could provide new dataset by adding them on top of this new reader.
It also factorizes reader functions for MODIS L1 and L2 products.

Test https://github.com/pytroll/satpy/blob/master/satpy/tests/reader_tests/test_modis_l1b.py#L625-L628 introduced by c322412 fails because the logic to parse MODIS HDF file metadata changed. See discussion in #626 and commit dc4c87d.

To close this PR we should solve this logic duplication first.

@BENR0
Copy link
Collaborator

BENR0 commented Mar 13, 2019

Actually I think this is kind of misleading. As far as I can see this is not a Modis Level 2 reader itself but a reader for the second and third bit of the level 2 cloud mask. I don't know how many other Modis level 2 products are bit encoded like the cloud mask. Ideally there would be a base reader for the level 2 products which also holds the functions for bitstripping and then there could be specific readers for each product.
Then documentation could be added in order to show people how to use that Modis level 2 base reader and its functions to build a reader for other products.

Right now we could then add a reader for the cloud mask where there could be datasets for all relevant bits in the readers yaml file.

I hope that's not to confusing what I mean?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 13, 2019

As far as I can see this is not a Modis Level 2 reader itself but a reader for the second and third bit of the level 2 cloud mask.

Indeed, the PR title is misleading: it is rather a reader for MODIS 35 L2, than a generic reader for MODIS L2 products.

Ideally there would be a base reader for the level 2 products which also holds the functions for bitstripping and then there could be specific readers for each product.

Agreed. There are different files for different products, so I guess it makes sense to have different readers too.

To validate the current PR, I suggest to:

@mraspaud
Copy link
Member

Thanks for all the info!

  1. Regarding the metadata, I would rather continue loading everything into a dictionary and attach it to the dataset in the end, so that the user can access whatever he/she likes without having to modify the satpy code.
  2. As for the specificity of the cloudmask, I don't think we need a reader for each product. We could just have something like this in the yaml file:
datasets:
  cloud_mask:
    name: cloud_mask
    resolution: [5000, 1000]
    file_key: Cloud_Mask
    bits: [3, 4, 5]
    file_type: hdf

and the reader would know that if the bits item is present, it should just read the given bits from the file dataset. Would that work ?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 14, 2019

I am not sure I understand the purpose of adding the bits field to the YAML file.
For the cloud mask dataset example. The read bits will depend on the resolution. For 5km, and 1km, itwill read bits 1 and 2 of the first byte. However, for 250m resolution, it should read 4th, 5th bytes for the data plus 6th byte for the cloud mask quality assurance.
To provide the dataset and the resolution should determine which bits to read.

@mraspaud
Copy link
Member

mraspaud commented Mar 14, 2019

Ok, I don't know the details of the format, sorry. The point was making the reader generic enough so that dataset specificities wouldn't have to be hardcoded. So maybe in this case, something like the following would be more appropriate:

datasets:
  cloud_mask:
    name: cloud_mask
    resolution: [5000, 1000]
    file_key: Cloud_Mask
    bits:
    - [1000, 5000]:
        [1, 2]
    - [250, 500]:
        [4, 5]
    file_type: hdf

Or even better, if there was a description of this in the file itself that was machine-readable maybe ?

@BENR0
Copy link
Collaborator

BENR0 commented Mar 14, 2019

A brief description of the bit information is contained in the dataset metadata description. That could be parsed I guess, haven't really thought about that but is a good idea.

I agree with @mraspaud that the bit flags shouldn't be hard coded. It's just a pecularity of the Modis file that they coded the different datasets in the bits instead of as separate datasets like it is in other satellite data formats. So it would be consistent with the other readers when the datasets are layed out in the yaml file. That's why I put them there when I did the first PR.

I am not sure about the different resolutions though. I mean yes the mod25_l2 files only have the 5km geolocations but the data itself is in 1km resolution (and 250m but that is a special case) so I would think an end user would expect to get the data in the 1km resolution, so my opinion is the geolocation should be interpolated by default if a dataset gets loaded. If the user wants less resolution he can resample/slice the data afterwards.

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 14, 2019

A brief description of the bit information is contained in the dataset metadata description. That could be parsed I guess, haven't really thought about that but is a good idea.

Indeed the cloud mask dataset in MODIS file has an attribute Cloud_Mask:description that gives a description of each bit in plain text, that could be parsed.

I agree with @mraspaud that the bit flags shouldn't be hard coded.

For me to understand clearly, instead of hard coding the bit flags, it should rather be parsed in the dataset arritubes?

my opinion is the geolocation should be interpolated by default if a dataset gets loaded. If the user wants less resolution he can resample/slice the data afterwards.

It makes sense. However, could we still provide the 5km resolution dataset location, since it is still the only raw dataset available?

@mraspaud
Copy link
Member

Out of curiosity, what would you use the 5km geolocation data for ?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 14, 2019

"Safety" measure I would say, if the use wants to interpolate the data himself instead of relaying trustfully on geotiepoints. So basically to maintain access to the raw data. But now I'm not sure it actually makes sense. Let's restrain resolution to 1km then.

@mraspaud
Copy link
Member

I suppose if you load the lons and lats by hand and say the resolution should be 5000, it won't interpolate.

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Mar 14, 2019

So location data are loaded with interpolation with a 1km resolution by default, and only accessible at 5km resolution by giving the specific resolution: scene.load('latitudes'], resolution=5000). Is that right?

Introduce bits and byte key value in YAML to read elements within HDF EOS
datasets.
Only include cloud_mask dataset so far.
Add basic tests.
Add tests for the dimension of QA dataset
- Add parameters to the YAML to triggers specific bit parsing.
- Add quality assurance filter.
@djhoese
Copy link
Member

djhoese commented Apr 24, 2019

I found a related issue in the pyhdf repository: fhs/pyhdf#15
What version of pyhdf are you using @LTMeyer ?

It looks like you are creating INT8 fields but setting a fill value of 255. If INT8 is a signed 8-bit integer (which I would guess it is) then the maximum valid value is 128. You could try changing the data type to UINT8.

@LTMeyer
Copy link
Contributor Author

LTMeyer commented Apr 24, 2019

I found a related issue in the pyhdf repository: fhs/pyhdf#15

It looks like you are creating INT8 fields but setting a fill value of 255. If INT8 is a signed 8-bit integer (which I would guess it is) then the maximum valid value is 128. You could try changing the data type to UINT8.

It makes totally sense. I've changed the fill value accordingly.

Now another error has been raised. I believe it to be related to the version of geotiepoints. It must use pytroll/python-geotiepoints#15 for a correct interpolation.

@djhoese
Copy link
Member

djhoese commented Apr 24, 2019

@mraspaud Any ideas/updates ^?

@mraspaud
Copy link
Member

If the geotiepoints patch works, I'll merge it and make a release, so we can finish this one up :)

@coveralls
Copy link

coveralls commented Apr 24, 2019

Coverage Status

Coverage increased (+0.4%) to 81.068% when pulling 5ead31e on LTMeyer:modis_l2_reader into 97c3451 on pytroll:master.

@codecov
Copy link

codecov bot commented Apr 24, 2019

Codecov Report

Merging #611 into master will increase coverage by 0.39%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #611      +/-   ##
==========================================
+ Coverage   80.67%   81.07%   +0.39%     
==========================================
  Files         149      152       +3     
  Lines       21661    21857     +196     
==========================================
+ Hits        17475    17720     +245     
+ Misses       4186     4137      -49
Impacted Files Coverage Δ
satpy/tests/reader_tests/test_hdfeos_base.py 95.65% <100%> (ø)
satpy/tests/reader_tests/__init__.py 97.82% <100%> (+0.04%) ⬆️
satpy/readers/modis_l1b.py 20% <50%> (-10.91%) ⬇️
satpy/readers/hdfeos_base.py 79.59% <79.59%> (ø)
satpy/tests/reader_tests/test_modis_l2.py 97.16% <97.16%> (ø)
satpy/readers/modis_l2.py 98.5% <98.5%> (ø)
satpy/composites/__init__.py 69.28% <0%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97c3451...5ead31e. Read the comment docs.

@mraspaud
Copy link
Member

@LTMeyer the new geotiepoints release is out, and this PR seems to pass the tests now, so do you consider this ready to merge ?

Thanks for all your work @LTMeyer and @BENR0 !

@LTMeyer LTMeyer changed the title WIP: Modis l2 reader Modis l2 reader Apr 25, 2019
@LTMeyer
Copy link
Contributor Author

LTMeyer commented Apr 25, 2019

Thank you @mraspaud, all the datasets are not yet managed by the reader. However, I think they can be added later and the current PR can be merged.

@mraspaud
Copy link
Member

Sounds good, we'll merge this for the 0.15 release (planned in 2 weeks)

@mraspaud mraspaud added this to the v0.15 milestone Apr 25, 2019
@mraspaud mraspaud merged commit 2687cc2 into pytroll:master May 7, 2019
@djhoese djhoese changed the title Modis l2 reader Add MODIS L2 reader May 8, 2019
@djhoese
Copy link
Member

djhoese commented May 8, 2019

@LTMeyer This pull request is causing major issues with loading L1B data now. Do you have some level 2 data files that I can access to verify some things?

Main question now is if the L2 files geolocation is 1km or 5km resolution?

@LTMeyer
Copy link
Contributor Author

LTMeyer commented May 8, 2019

I'm sorry to read that.
L2 data files can be found in the MODIS archive here. It is sorted by year and day of the year.

L2 files come with geolocation datasets that have a 5km resolution. It has been decided to load the interpolated 1km resolution by default, and to load the 5km resolution only on request (c.f. #611 (comment)).

@djhoese
Copy link
Member

djhoese commented May 8, 2019

I made some progress and downloaded a MOD35 file from LAADS. Another problem with this reader is that it didn't have any coordinates provided for the cloud mask so no SwathDefinition was being created for .attrs['area']. Additionally, the 250m cloud mask can never get its geolocation because 5km geolocation can't be interpolated to 250m resolution. We'll need to add the geolocation file (MOD03) is also included in the YAML. I'll do that now.

@BENR0
Copy link
Collaborator

BENR0 commented May 9, 2019

This must have been getting lost in the refactorization. In the original PR #540 coordinates were included even though at that time due to the bug with the scan width in geotiepoints geolocation files needed to be supplied (#540 (comment)). I hoped that the 250m could get interpolated from the interpolated 1km coordinates.

@djhoese
Copy link
Member

djhoese commented May 9, 2019

Maybe I wasn't using the newest geotiepoints when I first tested this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants