Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 OPERA IMPORTER FAILS READING QUALITY DATA #172

Merged
merged 9 commits into from
Oct 8, 2020

Conversation

chiara-arpae
Copy link
Contributor

Hi!

I changed the code of the implementation file for pysteps importers in the data importer from OPERA HDF5 files, 'opera_hdf5'.
Actually the path where quality dataset is placed in OPERA HDF5 files by DPC didn't correspond to the path used in the previous code.

Consequently quality data couldn't be read and a list of None, long as the number of timesteps wanted to read, was returned instead of the expected nd-array of shape (t,x,y), where t is the number of timesteps, x and y the grid points coordinates of the domain.
Before the quality dataset was searched looking if the quantity defined in the group attributes was 'QIND', but looking in a higher level of the hierarchy. Instead quality data are in the dataset inside a group in the underlying level respect to the group containing rainrate data.

In this way, I search for quality data, looking for the group containing 'quality' in its name and then it is verified it contains 'QIND' as the quantity attribute.

I hope you appreciate the issue and receive your ouput soon.

Fix #171

@codecov
Copy link

codecov bot commented Sep 1, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@a70c4ef). Click here to learn what that means.
The diff coverage is 70.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #172   +/-   ##
=========================================
  Coverage          ?   74.05%           
=========================================
  Files             ?      106           
  Lines             ?     8718           
  Branches          ?        0           
=========================================
  Hits              ?     6456           
  Misses            ?     2262           
  Partials          ?        0           
Impacted Files Coverage Δ
pysteps/io/importers.py 71.75% <70.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a70c4ef...42a16ac. Read the comment docs.

@aperezhortal
Copy link
Member

Hi @chiara-arpae, welcome to pysteps! Thanks a lot for the PR.

I'll take a look at it shortly. Are the DPC's HDF files available online? They would be helpful to test the code and to compare them with the other OPERA files we have in the pysteps-data repo that are being used to test the library.

@dnerini
Copy link
Member

dnerini commented Sep 1, 2020

Hi @chiara-arpae, thanks a lot for the PR!!

This week unfortunately I won't be able to look into it, but I see that @aperezhortal is already on it, (thanks!). Perhaps also @pulkkins would like to have a look, since he wrote the opera importer in the first place?

We might want to add a test for reading the quality data, as I don't think we currently test for them.

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 2, 2020 via email

@aperezhortal
Copy link
Member

@chiara-arpae That is more enough. Thanks!

The modifications work well with the DPC and the OPERA files that we have in the pysteps-data repo (PAAH opera product I think). As you pointed out, the files in the pysteps-data repo organize the data in a slightly different way.

Below there is an overview of the context of each opera files for reference (blue square: the PAAH opera product in the pysteps-data repo, below in red: the files from the DPC).

image

Right now, the importer looks for the quality data in two places: inside the "data" group, and in a separate dataset. However, if the quality data is found in both places, it returns the quality data from the separate dataset in the case this one is placed after the "precipitation" dataset. I am wondering if we should leave this as the default behavior, or add a keyword to control which quality flag is prioritized? Although this is not strictly a problem for the PAAH and the DPC files, I don't know if there are other Opera files using a different data layout. What do you think @chiara-arpae, @pulkkins, @dnerini?

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 4, 2020 via email

@dnerini
Copy link
Member

dnerini commented Sep 6, 2020

In pysteps, so far, there was no real ambition to provide general data importers: each importer was implemented to read a specific radar product and format only. The existing importers are simply a consequence of the datasets that are used by pysteps contributors (e.g., I work with MeteoSwiss radar composites and therefore implemented the importers for those data).

If we continue to follow this same logic, the import_opera_hdf5 importer needs to be able to read OPERA composites (i.e. the pan-european composite) in the HDF5 format only, while the DPC HDF5 files, which use a slightly different structure, would be using a separate importer called import_dpc_hdf5 or even import_arpae_hdf5, for example. This means that this PR should not change the existing ìmport_opera_hdf5`, but instead implement a new one.

Please note that this is only true if the contribution by @chiara-arpae is a consequence of a legit difference between the OPERA and DPC datasets, and not because of a bug in the import_opera_hdf5 function.

Finally, I would be in favor of implementing a more general importer for datasets following the ODIM specifications, i.e. a import_odim_hdf5 function.

To sum up:

  • if there is nothing wrong with the current import_opera_hdf5 importer, I suggest not modifying it (again, here we consider that "opera" refers to the pan-european composite product only, while "odim" is the data model used by opera).
  • instead, we can either include a new import_dpc_hdf5 importer or a more general import_odim_hdf5 one, if this proves to be possible.

@chiara-arpae what do you think? @aperezhortal @pulkkins any thoughts?

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 7, 2020 via email

@dnerini
Copy link
Member

dnerini commented Sep 9, 2020

Good, thanks for the feedback, @chiara-arpae.
What you suggest for this PR? Do you want to refactor you changes into the new ODIM importer?

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 9, 2020 via email

@pulkkins
Copy link
Member

pulkkins commented Sep 11, 2020

My suggestion is to replace the old importer (import_opera_hdf5) with the new one and rename it as import_odim_hdf5 (and modify the docstring accordingly). When I wrote import_opera_hdf5, I tried to make it as general as possible and not to include anything that is specific to the HDF5 files provided by OPERA.

I totally agree that someone needs to implement an importer that fully supports the ODIM specification. Now it only supports a small subset of it. So, the restrictions should be clearly described in the docstring if we decide to rename it as import_odim_hdf5. The main restriction is that it currently reads only one dataset from the file.

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 11, 2020 via email

@aperezhortal
Copy link
Member

To sum up:

  • if there is nothing wrong with the current import_opera_hdf5 importer, I suggest not modifying it (again, here we consider that "opera" refers to the pan-european composite product only, while "odim" is the data model used by opera).
  • instead, we can either include a new import_dpc_hdf5 importer or a more general import_odim_hdf5 one, if this proves to be possible.

Along these lines, to maintain backward compatibility, we should leave the import_opera_hdf5 function for a while, and add also in parallel a general odim importer. Since the pan-european composite is a particular case of the ODIM standard, the general odim imported could have keywords that control which particular type of product is read (pan-european, DPC, generic, etc.).

@chiara-arpae, do you want to push the changes? We can use that implementation as a starting point for the more general implementation. I have the impression that a more general ODIM implementation will take a considerable effort and it may be outside the scope of this PR.

Perhaps we can start small, merging this PR with an odim reader that deals with the pan-european and DPC format, and then, incrementally make it more general.

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Sep 11, 2020 via email

@dnerini
Copy link
Member

dnerini commented Oct 3, 2020

Hi @chiara-arpae

My apologies for this lengthy process, unfortunately I had little time to dedicate to pysteps lately.

Can we all try to quickly summarize the point at which we are now? Also, I believe @aperezhortal had an idea on how to proceed with this PR?

From what I gathered so far, we want to

  • implement first version of general ODIM importer named import_odim_hdf5 which can deal with both OPERA and DCP -> how? optional keywords?
  • we keep the old import_opera_hdf5 for backward compatibility -> make it a wrapper to import_odim_hdf5?

Agreed ?

pysteps/io/importers.py Outdated Show resolved Hide resolved
@aperezhortal
Copy link
Member

aperezhortal commented Oct 3, 2020

Hi @chiara-arpae, sorry for stalling on this.

I agree with @dnerini comments.

The original goal of this PR was to fix the problem with the opera importer reading the quality data from the DPC files. Looking backward, commit b5aa2fe solved that issue and import_opera_hdf5 deals with the OPERA and DCP files.

Although we all agreed that a general ODIM importer is a nice addition to pysteps, I think that it is better to maintain it outside the scope of this PR.

Following on @Daniele comments, a possible way to proceed with this PR is:

implement first version of general ODIM importer named import_odim_hdf5, which can deal with both OPERA and DCP.

The following steps can address this point:

  1. Revert to commit b5aa2fe
  2. Rename import_opera_hdf5 to import_odim_hdf5
  3. Update the doc-strings to mention that the OPERA and the DPC files are correctly supported at the moment. Also make clear that other ODIM-compliant files may not be read correctly.

we keep the old import_opera_hdf5 for backward compatibility

This is addressed by adding import_opera_hdf5=import_odim_hdf5 in the importers module .

After this PR is merged, we can incrementally work on the more general ODIM reader by adding functionalities to import_odim_hdf5.

What do you think?

@chiara-arpae
Copy link
Contributor Author

chiara-arpae commented Oct 5, 2020 via email

@aperezhortal
Copy link
Member

Thanks @chiara-arpae! I left two additional suggestions. Besides that, it looks ready to merge for me.

@dnerini
Copy link
Member

dnerini commented Oct 7, 2020

Hi @chiara-arpae (and colleagues), thanks a lot for the nice contribution.
Thanks to @aperezhortal, too, for the support.

This looks good to be merged!

edigiacomo and others added 2 commits October 7, 2020 07:35
Co-authored-by: Andres Perez Hortal <andresperezcba@gmail.com>
Co-authored-by: Andres Perez Hortal <andresperezcba@gmail.com>
pysteps/io/importers.py Outdated Show resolved Hide resolved
@dnerini dnerini self-requested a review October 7, 2020 06:58
Co-authored-by: Daniele Nerini <daniele.nerini@gmail.com>
Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last detail, we forgot to update the information in io/interface.py. This will only take few more lines:

- [ ] add the new importer to the _importer_methods dictionary: odim_hdf5=importers.import_odim_hdf5
- [ ] update table in the get_method docstrings.

Edit: Well, I suppose this is something I can myself after merging, too ;-)

@dnerini dnerini merged commit 14a9d94 into pySTEPS:master Oct 8, 2020
@dnerini dnerini added this to the release v1.4 milestone Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible importing quality data by importer_opera_hdf5
5 participants