New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525
When saving to CF prepend datasets starting with a digit by CHANNEL_ #1525
Conversation
I think this does what I want. Even if this is a draft please comment if this will not work. |
Codecov Report
@@ Coverage Diff @@
## master #1525 +/- ##
==========================================
+ Coverage 92.54% 92.57% +0.02%
==========================================
Files 251 251
Lines 36761 36969 +208
==========================================
+ Hits 34022 34225 +203
- Misses 2739 2744 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This also needs a corresponding PR in the netcdf cf reader. Should I add that in this PR? Or should I make a separate PR for that? |
I think it makes sense to have the two parts in the same PR |
Congratulations 🎉. DeepCode analyzed your code in 3.019 seconds and we found no issues. Enjoy a moment of no bugs ☀️. 👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Oh no, this is messed up. |
152fb6d
to
8de0bb9
Compare
OK I think I got it right now |
I messed some update from another PR, I think I got it right reverting, but the coverage with all those files does not look correct. |
Now that I think about it, maybe a flag needs to be given to turn this on. Else this will apply to all products with dataset names starting with a digit without notifying the user. |
One could argue that a variable name starting with a number isn't legal netcdf anyway... |
Hm, adding a flag resulted in higher complexity, so codebeat did not like it. The problem is to pass the flag to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions...and a complete redesign option.
satpy/writers/cf_writer.py
Outdated
@@ -694,6 +716,9 @@ def save_datasets(self, datasets, filename=None, groups=None, header_attrs=None, | |||
for kwarg in satpy_kwargs: | |||
to_netcdf_kwargs.pop(kwarg, None) | |||
|
|||
# Allow to prepend CHANNEL_ to datasets name staring with digit | |||
valid_cf_dataset_name = to_netcdf_kwargs.pop('valid_cf_dataset_name', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a defined keyword argument in the method definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure how I should do it. Adding it as a keyword would increase the complexity if the method definition, but better the readability.
satpy/writers/cf_writer.py
Outdated
""" | ||
if exclude_attrs is None: | ||
exclude_attrs = [] | ||
|
||
new_data = dataarray.copy() | ||
if 'name' in new_data.attrs: | ||
name = new_data.attrs.pop('name') | ||
if valid_cf_dataset_name and name[0].isdigit(): | ||
_orig_name = name | ||
name = 'CHANNEL_' + name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about renaming this keyword argument numeric_name_prefix
and if it is None, don't do anything. If it is specified then it is assumed to be a string. So someone could do scn.save_datasets(writer='cf', numeric_name_prefix='ch')
if they wanted to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea.
I just have not figured out how to read this back if we are to skip the extra attribute I have added for now.
satpy/writers/cf_writer.py
Outdated
_orig_name = name | ||
name = 'CHANNEL_' + name | ||
warnings.warn('Rename dataset {} to {}.'.format(_orig_name, name)) | ||
new_data.attrs['satpy_dataset_name'] = _orig_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this attribute show up in the resulting NetCDF file? Does @mraspaud recent work with wavelength range and all that change how this could/should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this show up in the netcdf file.
I would need to dig into this investigating if this can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I did not understand that with the wavelength work by @mraspaud you actually don't need the dataset or variable name.
What happened, I messed up my code. I thought I added renaming of the dataset back to the original when reading back the netcdf file. But I messed up my git branch, and removed by accident this when resetting to a previous commit. So I thought my code worked, but it was the work of @mraspaud fixing this. If this make any sense. Anyway this makes it a lot simpler I think. Will try to clean up this tomorrow.
Ah ok. Now I understand. But the writer does not include the So you suggest to add a test for that anyway? Even if the cf writer does not support it? |
oh, ok, I thought you added it to the writer. Should we do it? |
I was hoping to avoid it. Adding an extra attribute to each variable in the resulting netcdf file. But it will make things easier, in that way the reader will know exactly what to rename to and we can skip the parameter name But at the cost of the extra attribute for each variable starting with a digit in the netcdf file. |
@TAlonglong I think I'm in favour. It's just one attribute with a little text, and we find a good name for it, I think it makes sense in the "self-describing" way you mention. |
Thanks for your patience in the PR @mraspaud . I think your suggestion to allow both specified by user is a good idea. I will look into that. For the attribute name I suggest But what should take precedence? I think the attribute name first, then user specified. For reading the data that is. |
Sounds good
I'm good with that. |
OK, so I did add the option to store There are some codebeat issues. But I don't see how I can make codebeat any more happy |
Ok, last comments: codebeat complains about too many arguments, but the way satpy works at the moment, there isn't really to avoid that. Regarding the nesting and complexity, this can be addressed by extracting smaller methods from the culprit function. |
Hm, codebeat is happy? OK, I'm fine. What do you think @mraspaud ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, almost there!
When looking into the np.testing.assert_array_equal I got a problem here https://github.com/pytroll/satpy/blob/master/satpy/tests/writer_tests/test_cf.py#L127. I had to change the code to make it pass. Looks like the test was wrong all the time but passed due to use of assertEqual(np.all |
I guess that because of the dimensions, |
OK, I made the corrections you suggested @mraspaud. Please let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the hard work!
In NetCDF CF variables should not start with a digit.
Some channels, like AVHRR, are just named by a number and this number is used as the dataset name. When saving to NetCDF cf the PR prepend CHANNEL_ to each dataset starting with a digit so the variables in the resulting NetCDF CF file does not start with a digit, but instead
CHANNEL_<original_dataset_name>
flake8 satpy