Add MODIS L2 reader #611

LTMeyer · 2019-02-11T15:15:53Z

This PR is a continuation of #540

It factorizes common functions for L1b and L2 MODIS product through HDFEOSBaseFileReader
It adds features and tests for MODIS L2 Cloud Mask product based on @BENR0 's previous work.

The parsing of file metada has been modified to avoid dictionary with too many levels.
Rewrite and rename hdfeos_mod35.yaml because yaml file reader was unable to process it.
Location data are interpolated by geotiepoints functions introduced in PR #15

Tests added: Test MODIS L2 reader for longitude and cloud mask.
Tests passed
Passes git diff origin/master -- "*py" | flake8 --diff

Some work must still be done:

~~Rewrite hdfeos_l1b.yaml to meet standards of other reader yaml files;~~
~~Add tests for L1b products;~~
Verify that cloud mask reader output is consistent. So far, only the shape of the output array is checked, not its value;

satpy/readers/hdfeos_base.py

djhoese · 2019-02-14T14:07:00Z

I'm not sure which was done by you or by @BENR0, so not sure who to ask: does the interpolation for the cloud mask need to be any different than the other products. I've had code for reading cloud masks in the past (Polar2Grid project) and I don't remember need fancy interpolation like this. Or maybe I'm used to dealing with a different cloud mask.

You could also remove get_area_def for this reader since (unless I'm mistaken) it is a swath product.

Also, maybe modis_l2 instead of modis_l2_hdf? How many l2 products aren't HDF4 for MODIS?

Nice work.

BENR0 · 2019-02-14T16:04:49Z

@djhoese you probably are talking about the 250m cloud mask? The mod35_l2 product, in addition to the 1000m resolution product, has a 250m product which is encoded in the last two byte where each bit is one subpixel.

djhoese · 2019-02-14T17:07:40Z

Yes I meant the mod35 cloud mask. I wasn't aware of a 250m resolution version of the product. I'm also confused by the YAML file for the new reader since the longitude/latitude datasets have 1000m and 500m versions, but the cloud mask has 1000m and 250m. How does that work?

LTMeyer · 2019-02-14T17:13:50Z

The file contains dataset for latitude and longitude of 5km resolution. We use geotiepoints to interpolate to a thinner resolution i.e. 1km. Then, the goal is to interpolate again to get an even thinner resolution of 250m. This way we get the same resolutions for the location as for the cloud mask.

I assumed the YAML reader should list all the available resolutions. Hence:

for the location 5km (original data), 1km (interpolated), and 250m (interpolation function not yet checked)
for the cloud mask 1km and 250m (original data)

djhoese · 2019-02-14T17:16:11Z

The resolutions in the YAML should match between the datasets. So if it is possible to produce lat/lon/cloud_mask at 5km, 1km, 250m then all those resolutions should be listed in the YAML. It is then up to the file handler to do the interpolation to produce that level of data. Let me know if I'm missing something that makes this impossible.

LTMeyer · 2019-02-14T17:19:55Z

There is no issue to produce the cloud mask at all these resolutions.
However, for the location I have to check if it's possible. I believe geotiepoints only offers so far interpolation from 5km to 1km and from 1km to 500m or 250m. I ignore if it can easily interpolate successively from 5km to 1km and then 250m. The reason is the interpolation algorithm using sensor zenith data that are not interpolated themselves.

djhoese · 2019-02-14T17:21:39Z

@mraspaud didn't someone (you?) add this to geotiepoints recently? Do we need to make another release?

djhoese · 2019-02-14T17:40:08Z

I was talking about this with Kathy Strabala (Project Manager of IMAPP) and she brought up (from what she remembers) that a QA flag should also be used when generating the 250m version of the product to know if the mask was valid. Her recollection was that without checking the QA flag you can't be sure if the mask is "cloud" or invalid because the default pixel value is "cloud" or something like that.

Does this sound familiar? Are you handling a quality mask like this at all?

BENR0 · 2019-02-15T00:43:10Z

No currently the QA flags are not used. I only implemented the 250m mask for completeness sake at first because from my experience the 250m cloud mask is not really useable. ~~But you are probably right if the 250m mask is included as a possible datasets in the reader it should also check the QA flag if needed~~.

I checked the user guide again and came to the conclusion that from my point of view the cloud mask reader should itself make no decisions based on the QA flags because different users have different requirements. Thus the final decision should be up to the user. To this extent the reader, next to the cloud mask SDS, should also read the quality assurance SDS. Then the user has all needed information to decide which pixels to trust or not.

mraspaud · 2019-03-12T09:18:32Z

@LTMeyer @BENR0 Any updates on this ? How far are we from completion ?

LTMeyer · 2019-03-13T11:03:58Z

So far this PR creates a reader for MODIS L2 product that only manages cloud mask, latitudes, and longitudes dataset. Is this enough? One could provide new dataset by adding them on top of this new reader.
It also factorizes reader functions for MODIS L1 and L2 products.

Test https://github.com/pytroll/satpy/blob/master/satpy/tests/reader_tests/test_modis_l1b.py#L625-L628 introduced by c322412 fails because the logic to parse MODIS HDF file metadata changed. See discussion in #626 and commit dc4c87d.

To close this PR we should solve this logic duplication first.

BENR0 · 2019-03-13T15:53:46Z

Actually I think this is kind of misleading. As far as I can see this is not a Modis Level 2 reader itself but a reader for the second and third bit of the level 2 cloud mask. I don't know how many other Modis level 2 products are bit encoded like the cloud mask. Ideally there would be a base reader for the level 2 products which also holds the functions for bitstripping and then there could be specific readers for each product.
Then documentation could be added in order to show people how to use that Modis level 2 base reader and its functions to build a reader for other products.

Right now we could then add a reader for the cloud mask where there could be datasets for all relevant bits in the readers yaml file.

I hope that's not to confusing what I mean?

LTMeyer · 2019-03-13T16:11:50Z

As far as I can see this is not a Modis Level 2 reader itself but a reader for the second and third bit of the level 2 cloud mask.

Indeed, the PR title is misleading: it is rather a reader for MODIS 35 L2, than a generic reader for MODIS L2 products.

Ideally there would be a base reader for the level 2 products which also holds the functions for bitstripping and then there could be specific readers for each product.

Agreed. There are different files for different products, so I guess it makes sense to have different readers too.

To validate the current PR, I suggest to:

Create a base MODIS L2 reader class;
Create a reader dedicated to MODIS 35 L2 (cloud mask) product;
Document (where?);
Fix the conflicting logic of parsing the metadata (should I apply Add wrapped line support for metadata in modis_l1b reader #648 @mraspaud?).

mraspaud · 2019-03-14T09:54:09Z

Thanks for all the info!

Regarding the metadata, I would rather continue loading everything into a dictionary and attach it to the dataset in the end, so that the user can access whatever he/she likes without having to modify the satpy code.
As for the specificity of the cloudmask, I don't think we need a reader for each product. We could just have something like this in the yaml file:

datasets:
  cloud_mask:
    name: cloud_mask
    resolution: [5000, 1000]
    file_key: Cloud_Mask
    bits: [3, 4, 5]
    file_type: hdf

and the reader would know that if the bits item is present, it should just read the given bits from the file dataset. Would that work ?

LTMeyer · 2019-03-14T12:11:42Z

I am not sure I understand the purpose of adding the bits field to the YAML file.
For the cloud mask dataset example. The read bits will depend on the resolution. For 5km, and 1km, itwill read bits 1 and 2 of the first byte. However, for 250m resolution, it should read 4th, 5th bytes for the data plus 6th byte for the cloud mask quality assurance.
To provide the dataset and the resolution should determine which bits to read.

mraspaud · 2019-03-14T12:57:37Z

Ok, I don't know the details of the format, sorry. The point was making the reader generic enough so that dataset specificities wouldn't have to be hardcoded. So maybe in this case, something like the following would be more appropriate:

datasets:
  cloud_mask:
    name: cloud_mask
    resolution: [5000, 1000]
    file_key: Cloud_Mask
    bits:
    - [1000, 5000]:
        [1, 2]
    - [250, 500]:
        [4, 5]
    file_type: hdf

Or even better, if there was a description of this in the file itself that was machine-readable maybe ?

BENR0 · 2019-03-14T13:50:02Z

A brief description of the bit information is contained in the dataset metadata description. That could be parsed I guess, haven't really thought about that but is a good idea.

I agree with @mraspaud that the bit flags shouldn't be hard coded. It's just a pecularity of the Modis file that they coded the different datasets in the bits instead of as separate datasets like it is in other satellite data formats. So it would be consistent with the other readers when the datasets are layed out in the yaml file. That's why I put them there when I did the first PR.

I am not sure about the different resolutions though. I mean yes the mod25_l2 files only have the 5km geolocations but the data itself is in 1km resolution (and 250m but that is a special case) so I would think an end user would expect to get the data in the 1km resolution, so my opinion is the geolocation should be interpolated by default if a dataset gets loaded. If the user wants less resolution he can resample/slice the data afterwards.

LTMeyer · 2019-03-14T14:13:13Z

A brief description of the bit information is contained in the dataset metadata description. That could be parsed I guess, haven't really thought about that but is a good idea.

Indeed the cloud mask dataset in MODIS file has an attribute Cloud_Mask:description that gives a description of each bit in plain text, that could be parsed.

I agree with @mraspaud that the bit flags shouldn't be hard coded.

For me to understand clearly, instead of hard coding the bit flags, it should rather be parsed in the dataset arritubes?

my opinion is the geolocation should be interpolated by default if a dataset gets loaded. If the user wants less resolution he can resample/slice the data afterwards.

It makes sense. However, could we still provide the 5km resolution dataset location, since it is still the only raw dataset available?

mraspaud · 2019-03-14T14:15:36Z

Out of curiosity, what would you use the 5km geolocation data for ?

LTMeyer · 2019-03-14T14:19:52Z

"Safety" measure I would say, if the use wants to interpolate the data himself instead of relaying trustfully on geotiepoints. So basically to maintain access to the raw data. But now I'm not sure it actually makes sense. Let's restrain resolution to 1km then.

mraspaud · 2019-03-14T14:28:09Z

I suppose if you load the lons and lats by hand and say the resolution should be 5000, it won't interpolate.

LTMeyer · 2019-03-14T14:40:38Z

So location data are loaded with interpolation with a 1km resolution by default, and only accessible at 5km resolution by giving the specific resolution: scene.load('latitudes'], resolution=5000). Is that right?

Introduce bits and byte key value in YAML to read elements within HDF EOS datasets. Only include cloud_mask dataset so far. Add basic tests.

Remove unnecessary comment.

Add tests for the dimension of QA dataset

- Add parameters to the YAML to triggers specific bit parsing. - Add quality assurance filter.

djhoese · 2019-04-24T13:10:06Z

I found a related issue in the pyhdf repository: fhs/pyhdf#15
What version of pyhdf are you using @LTMeyer ?

It looks like you are creating INT8 fields but setting a fill value of 255. If INT8 is a signed 8-bit integer (which I would guess it is) then the maximum valid value is 128. You could try changing the data type to UINT8.

LTMeyer · 2019-04-24T14:14:05Z

I found a related issue in the pyhdf repository: fhs/pyhdf#15

It looks like you are creating INT8 fields but setting a fill value of 255. If INT8 is a signed 8-bit integer (which I would guess it is) then the maximum valid value is 128. You could try changing the data type to UINT8.

It makes totally sense. I've changed the fill value accordingly.

Now another error has been raised. I believe it to be related to the version of geotiepoints. It must use pytroll/python-geotiepoints#15 for a correct interpolation.

djhoese · 2019-04-24T14:54:16Z

@mraspaud Any ideas/updates ^?

mraspaud · 2019-04-24T15:26:33Z

If the geotiepoints patch works, I'll merge it and make a release, so we can finish this one up :)

coveralls · 2019-04-24T19:06:29Z

Coverage increased (+0.4%) to 81.068% when pulling 5ead31e on LTMeyer:modis_l2_reader into 97c3451 on pytroll:master.

codecov · 2019-04-24T20:18:07Z

Codecov Report

Merging #611 into master will increase coverage by 0.39%.
The diff coverage is 89.28%.

@@            Coverage Diff             @@
##           master     #611      +/-   ##
==========================================
+ Coverage   80.67%   81.07%   +0.39%     
==========================================
  Files         149      152       +3     
  Lines       21661    21857     +196     
==========================================
+ Hits        17475    17720     +245     
+ Misses       4186     4137      -49

Impacted Files	Coverage Δ
satpy/tests/reader_tests/test_hdfeos_base.py	`95.65% <100%> (ø)`
satpy/tests/reader_tests/__init__.py	`97.82% <100%> (+0.04%)`	⬆️
satpy/readers/modis_l1b.py	`20% <50%> (-10.91%)`	⬇️
satpy/readers/hdfeos_base.py	`79.59% <79.59%> (ø)`
satpy/tests/reader_tests/test_modis_l2.py	`97.16% <97.16%> (ø)`
satpy/readers/modis_l2.py	`98.5% <98.5%> (ø)`
satpy/composites/__init__.py	`69.28% <0%> (+0.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97c3451...5ead31e. Read the comment docs.

mraspaud · 2019-04-25T07:05:47Z

@LTMeyer the new geotiepoints release is out, and this PR seems to pass the tests now, so do you consider this ready to merge ?

Thanks for all your work @LTMeyer and @BENR0 !

LTMeyer · 2019-04-25T09:10:56Z

Thank you @mraspaud, all the datasets are not yet managed by the reader. However, I think they can be added later and the current PR can be merged.

mraspaud · 2019-04-25T10:56:40Z

Sounds good, we'll merge this for the 0.15 release (planned in 2 weeks)

djhoese · 2019-05-08T15:20:40Z

@LTMeyer This pull request is causing major issues with loading L1B data now. Do you have some level 2 data files that I can access to verify some things?

Main question now is if the L2 files geolocation is 1km or 5km resolution?

LTMeyer · 2019-05-08T15:48:33Z

I'm sorry to read that.
L2 data files can be found in the MODIS archive here. It is sorted by year and day of the year.

L2 files come with geolocation datasets that have a 5km resolution. It has been decided to load the interpolated 1km resolution by default, and to load the 5km resolution only on request (c.f. #611 (comment)).

djhoese · 2019-05-08T15:54:38Z

I made some progress and downloaded a MOD35 file from LAADS. Another problem with this reader is that it didn't have any coordinates provided for the cloud mask so no SwathDefinition was being created for .attrs['area']. Additionally, the 250m cloud mask can never get its geolocation because 5km geolocation can't be interpolated to 250m resolution. We'll need to add the geolocation file (MOD03) is also included in the YAML. I'll do that now.

BENR0 · 2019-05-09T06:25:44Z

This must have been getting lost in the refactorization. In the original PR #540 coordinates were included even though at that time due to the bug with the scan width in geotiepoints geolocation files needed to be supplied (#540 (comment)). I hoped that the 250m could get interpolated from the interpolated 1km coordinates.

djhoese · 2019-05-09T10:46:49Z

Maybe I wasn't using the newest geotiepoints when I first tested this.

stickler-ci reviewed Feb 11, 2019

View reviewed changes

satpy/readers/hdfeos_base.py Outdated Show resolved Hide resolved

LTMeyer force-pushed the modis_l2_reader branch from 44842e7 to c9da526 Compare February 11, 2019 15:51

djhoese mentioned this pull request Feb 20, 2019

Issue loading MODIS Aqua data #626

Closed

LTMeyer force-pushed the modis_l2_reader branch from 347a865 to 5532354 Compare February 27, 2019 13:29

mraspaud added enhancement code enhancements, features, improvements component:readers labels Mar 12, 2019

mraspaud assigned LTMeyer Mar 12, 2019

LTMeyer force-pushed the modis_l2_reader branch from 5532354 to b5deba6 Compare March 13, 2019 10:18

LTMeyer added 8 commits April 24, 2019 13:46

Cache dataset reading from HDF EOS file

96b1ffe

Create a reader for MODIS L2 products

76952dd

Introduce bits and byte key value in YAML to read elements within HDF EOS datasets. Only include cloud_mask dataset so far. Add basic tests.

Fix test suite and pep8 errors

495aae1

Take into account dataset name different for L1b or L2 products

6a4c836

Remove unnecessary comment.

Remove cache logic

b9de76c

Add quality assurance dataset

370d895

Add tests for the dimension of QA dataset

Manage 250m resolution cloud mask dataset with extra YAML parameters

0649b46

Manage 250m resolution cloud mask

3b7a1e1

- Add parameters to the YAML to triggers specific bit parsing. - Add quality assurance filter.

Fix fill value error for test datasets

159c6de

LTMeyer force-pushed the modis_l2_reader branch from c27170b to 159c6de Compare April 24, 2019 13:25

Merge branch 'master' into modis_l2_reader

5ead31e

LTMeyer changed the title ~~WIP: Modis l2 reader~~ Modis l2 reader Apr 25, 2019

mraspaud added this to the v0.15 milestone Apr 25, 2019

mraspaud merged commit 2687cc2 into pytroll:master May 7, 2019

djhoese changed the title ~~Modis l2 reader~~ Add MODIS L2 reader May 8, 2019

djhoese mentioned this pull request May 8, 2019

Fix MODIS L1B and L2 readers not reading geolocation properly #757

Merged

5 tasks

Add MODIS L2 reader #611

Add MODIS L2 reader #611

Conversation

LTMeyer commented Feb 11, 2019 • edited by mraspaud

djhoese commented Feb 14, 2019

BENR0 commented Feb 14, 2019

djhoese commented Feb 14, 2019

LTMeyer commented Feb 14, 2019 • edited

djhoese commented Feb 14, 2019

LTMeyer commented Feb 14, 2019

djhoese commented Feb 14, 2019

djhoese commented Feb 14, 2019

BENR0 commented Feb 15, 2019 • edited

mraspaud commented Mar 12, 2019

LTMeyer commented Mar 13, 2019

BENR0 commented Mar 13, 2019

LTMeyer commented Mar 13, 2019 • edited

mraspaud commented Mar 14, 2019

LTMeyer commented Mar 14, 2019 • edited

mraspaud commented Mar 14, 2019 • edited

BENR0 commented Mar 14, 2019

LTMeyer commented Mar 14, 2019

mraspaud commented Mar 14, 2019

LTMeyer commented Mar 14, 2019

mraspaud commented Mar 14, 2019

LTMeyer commented Mar 14, 2019 • edited

djhoese commented Apr 24, 2019

LTMeyer commented Apr 24, 2019 • edited

djhoese commented Apr 24, 2019

mraspaud commented Apr 24, 2019

coveralls commented Apr 24, 2019 • edited

codecov bot commented Apr 24, 2019 • edited

Codecov Report

mraspaud commented Apr 25, 2019

LTMeyer commented Apr 25, 2019

mraspaud commented Apr 25, 2019

djhoese commented May 8, 2019

LTMeyer commented May 8, 2019

djhoese commented May 8, 2019

BENR0 commented May 9, 2019

djhoese commented May 9, 2019

LTMeyer commented Feb 11, 2019 •

edited by mraspaud

LTMeyer commented Feb 14, 2019 •

edited

BENR0 commented Feb 15, 2019 •

edited

LTMeyer commented Mar 13, 2019 •

edited

LTMeyer commented Mar 14, 2019 •

edited

mraspaud commented Mar 14, 2019 •

edited

LTMeyer commented Mar 14, 2019 •

edited

LTMeyer commented Apr 24, 2019 •

edited

coveralls commented Apr 24, 2019 •

edited

codecov bot commented Apr 24, 2019 •

edited