Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525

Merged
merged 41 commits into from Mar 2, 2021

Conversation

TAlonglong
Copy link
Collaborator

@TAlonglong TAlonglong commented Feb 1, 2021

In NetCDF CF variables should not start with a digit.

Some channels, like AVHRR, are just named by a number and this number is used as the dataset name. When saving to NetCDF cf the PR prepend CHANNEL_ to each dataset starting with a digit so the variables in the resulting NetCDF CF file does not start with a digit, but instead CHANNEL_<original_dataset_name>

@TAlonglong
Copy link
Collaborator Author

I think this does what I want.

Even if this is a draft please comment if this will not work.

@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #1525 (5ad1ecb) into master (d556c80) will increase coverage by 0.02%.
The diff coverage is 96.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1525      +/-   ##
==========================================
+ Coverage   92.54%   92.57%   +0.02%     
==========================================
  Files         251      251              
  Lines       36761    36969     +208     
==========================================
+ Hits        34022    34225     +203     
- Misses       2739     2744       +5     
Flag Coverage Δ
behaviourtests 4.47% <2.28%> (-0.02%) ⬇️
unittests 92.71% <96.95%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/resample.py 89.50% <ø> (ø)
satpy/scene.py 91.65% <50.00%> (-0.61%) ⬇️
satpy/readers/satpy_cf_nc.py 97.56% <95.23%> (-1.04%) ⬇️
satpy/composites/__init__.py 88.28% <100.00%> (+0.01%) ⬆️
satpy/composites/sar.py 67.21% <100.00%> (+8.02%) ⬆️
satpy/tests/compositor_tests/test_sar.py 100.00% <100.00%> (ø)
satpy/tests/reader_tests/test_satpy_cf_nc.py 100.00% <100.00%> (ø)
satpy/tests/test_composites.py 99.87% <100.00%> (+<0.01%) ⬆️
satpy/tests/test_regressions.py 100.00% <100.00%> (ø)
satpy/tests/test_scene.py 99.72% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d556c80...3e9d293. Read the comment docs.

@TAlonglong
Copy link
Collaborator Author

This also needs a corresponding PR in the netcdf cf reader. Should I add that in this PR? Or should I make a separate PR for that?

@mraspaud
Copy link
Member

mraspaud commented Feb 1, 2021

I think it makes sense to have the two parts in the same PR

@ghost
Copy link

ghost commented Feb 12, 2021

Congratulations 🎉. DeepCode analyzed your code in 3.019 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@TAlonglong
Copy link
Collaborator Author

Oh no, this is messed up.

@TAlonglong TAlonglong force-pushed the dataset_names_when_saving_to_cf branch from 152fb6d to 8de0bb9 Compare February 13, 2021 08:17
@TAlonglong
Copy link
Collaborator Author

OK I think I got it right now

@TAlonglong
Copy link
Collaborator Author

I messed some update from another PR, I think I got it right reverting, but the coverage with all those files does not look correct.

@TAlonglong TAlonglong marked this pull request as ready for review February 14, 2021 18:48
@TAlonglong
Copy link
Collaborator Author

Now that I think about it, maybe a flag needs to be given to turn this on. Else this will apply to all products with dataset names starting with a digit without notifying the user.

@mraspaud
Copy link
Member

One could argue that a variable name starting with a number isn't legal netcdf anyway...
How about adding a flag, but make sure a warning is still issued in that case?

@TAlonglong
Copy link
Collaborator Author

Hm, adding a flag resulted in higher complexity, so codebeat did not like it.

The problem is to pass the flag to the da2cf function.

@TAlonglong
Copy link
Collaborator Author

Do you @mraspaud or @djhoese have any comments on this?

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions...and a complete redesign option.

@@ -694,6 +716,9 @@ def save_datasets(self, datasets, filename=None, groups=None, header_attrs=None,
for kwarg in satpy_kwargs:
to_netcdf_kwargs.pop(kwarg, None)

# Allow to prepend CHANNEL_ to datasets name staring with digit
valid_cf_dataset_name = to_netcdf_kwargs.pop('valid_cf_dataset_name', False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a defined keyword argument in the method definition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how I should do it. Adding it as a keyword would increase the complexity if the method definition, but better the readability.

"""
if exclude_attrs is None:
exclude_attrs = []

new_data = dataarray.copy()
if 'name' in new_data.attrs:
name = new_data.attrs.pop('name')
if valid_cf_dataset_name and name[0].isdigit():
_orig_name = name
name = 'CHANNEL_' + name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming this keyword argument numeric_name_prefix and if it is None, don't do anything. If it is specified then it is assumed to be a string. So someone could do scn.save_datasets(writer='cf', numeric_name_prefix='ch') if they wanted to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea.

I just have not figured out how to read this back if we are to skip the extra attribute I have added for now.

_orig_name = name
name = 'CHANNEL_' + name
warnings.warn('Rename dataset {} to {}.'.format(_orig_name, name))
new_data.attrs['satpy_dataset_name'] = _orig_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this attribute show up in the resulting NetCDF file? Does @mraspaud recent work with wavelength range and all that change how this could/should work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this show up in the netcdf file.

I would need to dig into this investigating if this can be skipped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I did not understand that with the wavelength work by @mraspaud you actually don't need the dataset or variable name.

What happened, I messed up my code. I thought I added renaming of the dataset back to the original when reading back the netcdf file. But I messed up my git branch, and removed by accident this when resetting to a previous commit. So I thought my code worked, but it was the work of @mraspaud fixing this. If this make any sense. Anyway this makes it a lot simpler I think. Will try to clean up this tomorrow.

@TAlonglong
Copy link
Collaborator Author

Ah ok. Now I understand. But the writer does not include the orig_name attribute.

So you suggest to add a test for that anyway? Even if the cf writer does not support it?

@mraspaud
Copy link
Member

oh, ok, I thought you added it to the writer. Should we do it?

@TAlonglong
Copy link
Collaborator Author

I was hoping to avoid it. Adding an extra attribute to each variable in the resulting netcdf file.

But it will make things easier, in that way the reader will know exactly what to rename to and we can skip the parameter name numeric_name_prefix to the satpy_cf_nc reader. It will be more self describing in a way, more the netcdf way, than kind of guessing what to strip of.

But at the cost of the extra attribute for each variable starting with a digit in the netcdf file.

@mraspaud
Copy link
Member

mraspaud commented Mar 1, 2021

@TAlonglong I think I'm in favour. It's just one attribute with a little text, and we find a good name for it, I think it makes sense in the "self-describing" way you mention.
The mechanism you added for adding and removing the prefix should stay though, I think it adds more possibilities to the user.
About the name, maybe original_name is fine? or official_channel_name or official_band_name?

@TAlonglong
Copy link
Collaborator Author

Thanks for your patience in the PR @mraspaud .

I think your suggestion to allow both specified by user is a good idea. I will look into that.

For the attribute name I suggest original_name.

But what should take precedence? I think the attribute name first, then user specified. For reading the data that is.

@mraspaud
Copy link
Member

mraspaud commented Mar 1, 2021

For the attribute name I suggest original_name.

Sounds good

But what should take precedence? I think the attribute name first, then user specified. For reading the data that is.

I'm good with that.

@TAlonglong
Copy link
Collaborator Author

TAlonglong commented Mar 1, 2021

OK, so I did add the option to store original_name in the netcdf @mraspaud. Is it possible you can have a look when you find time?

There are some codebeat issues. But I don't see how I can make codebeat any more happy

@mraspaud
Copy link
Member

mraspaud commented Mar 2, 2021

Ok, last comments: codebeat complains about too many arguments, but the way satpy works at the moment, there isn't really to avoid that. Regarding the nesting and complexity, this can be addressed by extracting smaller methods from the culprit function.
So for example this part https://github.com/pytroll/satpy/pull/1525/files#diff-7820bda1cf98874e35903ebe24e7463f427af6734e72e8ea3ccb600a71e6037bR254-R265 could be its own method.
If this isn't enough, the name part could be put in its own function too: https://github.com/pytroll/satpy/pull/1525/files#diff-7820bda1cf98874e35903ebe24e7463f427af6734e72e8ea3ccb600a71e6037bR256-R260

@TAlonglong
Copy link
Collaborator Author

Hm, codebeat is happy? OK, I'm fine. What do you think @mraspaud ?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, almost there!

satpy/tests/reader_tests/test_satpy_cf_nc.py Outdated Show resolved Hide resolved
satpy/readers/satpy_cf_nc.py Outdated Show resolved Hide resolved
satpy/writers/cf_writer.py Outdated Show resolved Hide resolved
@TAlonglong
Copy link
Collaborator Author

When looking into the np.testing.assert_array_equal I got a problem here https://github.com/pytroll/satpy/blob/master/satpy/tests/writer_tests/test_cf.py#L127. I had to change the code to make it pass. Looks like the test was wrong all the time but passed due to use of assertEqual(np.all

@mraspaud
Copy link
Member

mraspaud commented Mar 2, 2021

I guess that because of the dimensions, np.all is probably more forgiving.

@TAlonglong
Copy link
Collaborator Author

OK, I made the corrections you suggested @mraspaud. Please let me know what you think.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the hard work!

@mraspaud mraspaud merged commit 3c2cdc1 into pytroll:master Mar 2, 2021
@TAlonglong TAlonglong deleted the dataset_names_when_saving_to_cf branch March 4, 2021 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:writers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

satpy_cf_nc reader fails to read satpy cf writer generated netcdf files where variables start with a number.
5 participants