When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525

TAlonglong · 2021-02-01T18:19:35Z

In NetCDF CF variables should not start with a digit.

Some channels, like AVHRR, are just named by a number and this number is used as the dataset name. When saving to NetCDF cf the PR prepend CHANNEL_ to each dataset starting with a digit so the variables in the resulting NetCDF CF file does not start with a digit, but instead CHANNEL_<original_dataset_name>

Closes satpy_cf_nc reader fails to read satpy cf writer generated netcdf files where variables start with a number. #1518
Tests added
Passes flake8 satpy
Fully documented

satpy/writers/cf_writer.py

TAlonglong · 2021-02-01T18:22:24Z

I think this does what I want.

Even if this is a draft please comment if this will not work.

codecov · 2021-02-01T18:27:41Z

Codecov Report

Merging #1525 (5ad1ecb) into master (d556c80) will increase coverage by 0.02%.
The diff coverage is 96.95%.

@@            Coverage Diff             @@
##           master    #1525      +/-   ##
==========================================
+ Coverage   92.54%   92.57%   +0.02%     
==========================================
  Files         251      251              
  Lines       36761    36969     +208     
==========================================
+ Hits        34022    34225     +203     
- Misses       2739     2744       +5

Flag	Coverage Δ
behaviourtests	`4.47% <2.28%> (-0.02%)`	⬇️
unittests	`92.71% <96.95%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
satpy/resample.py	`89.50% <ø> (ø)`
satpy/scene.py	`91.65% <50.00%> (-0.61%)`	⬇️
satpy/readers/satpy_cf_nc.py	`97.56% <95.23%> (-1.04%)`	⬇️
satpy/composites/__init__.py	`88.28% <100.00%> (+0.01%)`	⬆️
satpy/composites/sar.py	`67.21% <100.00%> (+8.02%)`	⬆️
satpy/tests/compositor_tests/test_sar.py	`100.00% <100.00%> (ø)`
satpy/tests/reader_tests/test_satpy_cf_nc.py	`100.00% <100.00%> (ø)`
satpy/tests/test_composites.py	`99.87% <100.00%> (+<0.01%)`	⬆️
satpy/tests/test_regressions.py	`100.00% <100.00%> (ø)`
satpy/tests/test_scene.py	`99.72% <100.00%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d556c80...3e9d293. Read the comment docs.

TAlonglong · 2021-02-01T19:30:16Z

This also needs a corresponding PR in the netcdf cf reader. Should I add that in this PR? Or should I make a separate PR for that?

mraspaud · 2021-02-01T19:45:19Z

I think it makes sense to have the two parts in the same PR

…saving_to_cf

ghost · 2021-02-12T17:29:48Z

Congratulations 🎉. DeepCode analyzed your code in 3.019 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

TAlonglong · 2021-02-13T08:15:02Z

Oh no, this is messed up.

TAlonglong · 2021-02-13T08:19:01Z

OK I think I got it right now

TAlonglong · 2021-02-14T18:47:31Z

I messed some update from another PR, I think I got it right reverting, but the coverage with all those files does not look correct.

TAlonglong · 2021-02-15T09:33:15Z

Now that I think about it, maybe a flag needs to be given to turn this on. Else this will apply to all products with dataset names starting with a digit without notifying the user.

mraspaud · 2021-02-15T09:46:34Z

One could argue that a variable name starting with a number isn't legal netcdf anyway...
How about adding a flag, but make sure a warning is still issued in that case?

TAlonglong · 2021-02-15T13:10:14Z

Hm, adding a flag resulted in higher complexity, so codebeat did not like it.

The problem is to pass the flag to the da2cf function.

TAlonglong · 2021-02-16T18:54:00Z

Do you @mraspaud or @djhoese have any comments on this?

djhoese

I have some questions...and a complete redesign option.

djhoese · 2021-02-16T19:01:04Z

satpy/writers/cf_writer.py

@@ -694,6 +716,9 @@ def save_datasets(self, datasets, filename=None, groups=None, header_attrs=None,
        for kwarg in satpy_kwargs:
            to_netcdf_kwargs.pop(kwarg, None)

+        # Allow to prepend CHANNEL_ to datasets name staring with digit
+        valid_cf_dataset_name = to_netcdf_kwargs.pop('valid_cf_dataset_name', False)


Why is this not a defined keyword argument in the method definition?

I was not sure how I should do it. Adding it as a keyword would increase the complexity if the method definition, but better the readability.

djhoese · 2021-02-16T19:03:00Z

satpy/writers/cf_writer.py

        """
        if exclude_attrs is None:
            exclude_attrs = []

        new_data = dataarray.copy()
        if 'name' in new_data.attrs:
            name = new_data.attrs.pop('name')
+            if valid_cf_dataset_name and name[0].isdigit():
+                _orig_name = name
+                name = 'CHANNEL_' + name


What about renaming this keyword argument numeric_name_prefix and if it is None, don't do anything. If it is specified then it is assumed to be a string. So someone could do scn.save_datasets(writer='cf', numeric_name_prefix='ch') if they wanted to?

I second that.

Sounds like a good idea.

I just have not figured out how to read this back if we are to skip the extra attribute I have added for now.

djhoese · 2021-02-16T19:03:39Z

satpy/writers/cf_writer.py

+                _orig_name = name
+                name = 'CHANNEL_' + name
+                warnings.warn('Rename dataset {} to {}.'.format(_orig_name, name))
+                new_data.attrs['satpy_dataset_name'] = _orig_name


Does this attribute show up in the resulting NetCDF file? Does @mraspaud recent work with wavelength range and all that change how this could/should work?

Yes this show up in the netcdf file.

I would need to dig into this investigating if this can be skipped.

So, I did not understand that with the wavelength work by @mraspaud you actually don't need the dataset or variable name.

What happened, I messed up my code. I thought I added renaming of the dataset back to the original when reading back the netcdf file. But I messed up my git branch, and removed by accident this when resetting to a previous commit. So I thought my code worked, but it was the work of @mraspaud fixing this. If this make any sense. Anyway this makes it a lot simpler I think. Will try to clean up this tomorrow.

TAlonglong · 2021-02-26T10:00:27Z

Ah ok. Now I understand. But the writer does not include the orig_name attribute.

So you suggest to add a test for that anyway? Even if the cf writer does not support it?

mraspaud · 2021-02-26T10:27:58Z

oh, ok, I thought you added it to the writer. Should we do it?

TAlonglong · 2021-02-26T10:49:42Z

I was hoping to avoid it. Adding an extra attribute to each variable in the resulting netcdf file.

But it will make things easier, in that way the reader will know exactly what to rename to and we can skip the parameter name numeric_name_prefix to the satpy_cf_nc reader. It will be more self describing in a way, more the netcdf way, than kind of guessing what to strip of.

But at the cost of the extra attribute for each variable starting with a digit in the netcdf file.

mraspaud · 2021-03-01T08:22:35Z

@TAlonglong I think I'm in favour. It's just one attribute with a little text, and we find a good name for it, I think it makes sense in the "self-describing" way you mention.
The mechanism you added for adding and removing the prefix should stay though, I think it adds more possibilities to the user.
About the name, maybe original_name is fine? or official_channel_name or official_band_name?

TAlonglong · 2021-03-01T14:52:24Z

Thanks for your patience in the PR @mraspaud .

I think your suggestion to allow both specified by user is a good idea. I will look into that.

For the attribute name I suggest original_name.

But what should take precedence? I think the attribute name first, then user specified. For reading the data that is.

mraspaud · 2021-03-01T14:53:56Z

For the attribute name I suggest original_name.

Sounds good

But what should take precedence? I think the attribute name first, then user specified. For reading the data that is.

I'm good with that.

TAlonglong · 2021-03-01T20:12:59Z

OK, so I did add the option to store original_name in the netcdf @mraspaud. Is it possible you can have a look when you find time?

There are some codebeat issues. But I don't see how I can make codebeat any more happy

satpy/readers/satpy_cf_nc.py

mraspaud · 2021-03-02T08:00:15Z

Ok, last comments: codebeat complains about too many arguments, but the way satpy works at the moment, there isn't really to avoid that. Regarding the nesting and complexity, this can be addressed by extracting smaller methods from the culprit function.
So for example this part https://github.com/pytroll/satpy/pull/1525/files#diff-7820bda1cf98874e35903ebe24e7463f427af6734e72e8ea3ccb600a71e6037bR254-R265 could be its own method.
If this isn't enough, the name part could be put in its own function too: https://github.com/pytroll/satpy/pull/1525/files#diff-7820bda1cf98874e35903ebe24e7463f427af6734e72e8ea3ccb600a71e6037bR256-R260

TAlonglong · 2021-03-02T09:48:01Z

Hm, codebeat is happy? OK, I'm fine. What do you think @mraspaud ?

mraspaud

Ok, almost there!

satpy/tests/reader_tests/test_satpy_cf_nc.py

satpy/readers/satpy_cf_nc.py

satpy/writers/cf_writer.py

TAlonglong · 2021-03-02T11:33:17Z

When looking into the np.testing.assert_array_equal I got a problem here https://github.com/pytroll/satpy/blob/master/satpy/tests/writer_tests/test_cf.py#L127. I had to change the code to make it pass. Looks like the test was wrong all the time but passed due to use of assertEqual(np.all

mraspaud · 2021-03-02T11:40:44Z

I guess that because of the dimensions, np.all is probably more forgiving.

TAlonglong · 2021-03-02T12:50:37Z

OK, I made the corrections you suggested @mraspaud. Please let me know what you think.

mraspaud

LGTM! Thanks for the hard work!

Trygve Aspenes added 2 commits February 1, 2021 19:12

Prepend datasets begining with a digit by CHANNEL_

5af9f73

Fix newline

bda9f1c

stickler-ci reviewed Feb 1, 2021

View reviewed changes

satpy/writers/cf_writer.py Outdated Show resolved Hide resolved

Trygve Aspenes added 7 commits February 12, 2021 16:03

Merge remote-tracking branch 'origin/master' into dataset_names_when_…

9bc7821

…saving_to_cf

parsing of wavelength from netcdf variable attribute

c537672

stickler long line

1c1898a

flake8 lint b007

9d55746

codebeat nesting to deep

9dc8d27

lint

02a3ce2

More propper name

8de0bb9

TAlonglong force-pushed the dataset_names_when_saving_to_cf branch from 152fb6d to 8de0bb9 Compare February 13, 2021 08:17

Trygve Aspenes added 2 commits February 13, 2021 09:28

Remove not needed code. Some test

a0d916c

Add test to cf writer dataset starting with a digit

7e8b651

TAlonglong marked this pull request as ready for review February 14, 2021 18:48

TAlonglong requested review from djhoese and mraspaud as code owners February 14, 2021 18:48

Allow passing flag to turn on valid cf dataset name

c0859bd

djhoese reviewed Feb 16, 2021

View reviewed changes

Trygve Aspenes added 3 commits March 1, 2021 17:42

Possible to include original_name in nc var attrs

e5b8c17

Fix and add test

db34a63

Try fix test again

5a39dbc

mraspaud reviewed Mar 2, 2021

View reviewed changes

satpy/readers/satpy_cf_nc.py Outdated Show resolved Hide resolved

Trygve Aspenes added 2 commits March 2, 2021 08:16

fix ds_id vis ds_info

88d8320

deepcode

0351ba4

Trygve Aspenes added 2 commits March 2, 2021 09:19

Deepcode2

19f9397

codebeat nesting to deep, complexity

6d66cf9

mraspaud reviewed Mar 2, 2021

View reviewed changes

satpy/tests/reader_tests/test_satpy_cf_nc.py Outdated Show resolved Hide resolved

satpy/readers/satpy_cf_nc.py Outdated Show resolved Hide resolved

satpy/writers/cf_writer.py Outdated Show resolved Hide resolved

Trygve Aspenes added 5 commits March 2, 2021 11:46

Fix assign_ds_info

4b4fcd1

include_orig_name=True

ebdce35

use np.testing.assert_array_equal. Fix one test

0042192

Only add original_name attr when prefix is used

8eaeedd

replace assertequal with np.testing.assert_array_equal

3e9d293

mraspaud approved these changes Mar 2, 2021

View reviewed changes

mraspaud merged commit 3c2cdc1 into pytroll:master Mar 2, 2021

mraspaud assigned TAlonglong Mar 2, 2021

TAlonglong deleted the dataset_names_when_saving_to_cf branch March 4, 2021 15:08

When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525

When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525

Conversation

TAlonglong commented Feb 1, 2021 • edited

TAlonglong commented Feb 1, 2021

codecov bot commented Feb 1, 2021 • edited

Codecov Report

TAlonglong commented Feb 1, 2021

mraspaud commented Feb 1, 2021

ghost commented Feb 12, 2021 • edited by ghost

👉 View analysis in DeepCode’s Dashboard | Configure the bot

TAlonglong commented Feb 13, 2021

TAlonglong commented Feb 13, 2021

TAlonglong commented Feb 14, 2021

TAlonglong commented Feb 15, 2021

mraspaud commented Feb 15, 2021

TAlonglong commented Feb 15, 2021

TAlonglong commented Feb 16, 2021

djhoese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TAlonglong commented Feb 26, 2021

mraspaud commented Feb 26, 2021

TAlonglong commented Feb 26, 2021

mraspaud commented Mar 1, 2021

TAlonglong commented Mar 1, 2021

mraspaud commented Mar 1, 2021

TAlonglong commented Mar 1, 2021 • edited

mraspaud commented Mar 2, 2021

TAlonglong commented Mar 2, 2021

mraspaud left a comment

Choose a reason for hiding this comment

TAlonglong commented Mar 2, 2021

mraspaud commented Mar 2, 2021

TAlonglong commented Mar 2, 2021

mraspaud left a comment

Choose a reason for hiding this comment

TAlonglong commented Feb 1, 2021 •

edited

codecov bot commented Feb 1, 2021 •

edited

ghost commented Feb 12, 2021 •

edited by ghost

TAlonglong commented Mar 1, 2021 •

edited