Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting an extra layer when reading geotop from cache #218

Closed
OnnoEbbens opened this issue Aug 14, 2023 · 8 comments · Fixed by #238
Closed

getting an extra layer when reading geotop from cache #218

OnnoEbbens opened this issue Aug 14, 2023 · 8 comments · Fixed by #238
Assignees

Comments

@OnnoEbbens
Copy link
Collaborator

When I run this code the first time I get 37 layers. When I run it the second time I get 38 layers. It has something to do with the -127 layer in geotop but I have no idea yet what.


import flopy as fp
import matplotlib.pyplot as plt
import nlmod
import numpy as np
import os
import pandas as pd

extent =  (120000, 121000, 520000, 521000)
model_name = 'test'
model_ws = os.path.join("../test", model_name)
figdir, cachedir = nlmod.util.get_model_dirs(model_ws)

# %% lagenmodel
regis_ds = nlmod.read.regis.get_regis(
    extent, cachedir=cachedir, cachename="regis")
geotop_ds = nlmod.read.geotop.get_geotop(
    extent, cachedir=cachedir, cachename="geotop")

# split regis layer
split_dict = {"PZWAz3": (1, 1, 1, 2, 2, 2)}
regis_split_ds, split_reindexer = nlmod.layers.split_layers_ds(
    regis_ds, split_dict, return_reindexer=True
)

regis_geotop_ds = nlmod.read.regis.add_geotop_to_regis_layers(
    regis_ds,
    geotop_ds,
    layers="HLc",
)

ds = nlmod.to_model_ds(
    regis_geotop_ds,
    # regis_ds,
    model_name,
    model_ws,
    delr=500.0,
    delc=500.0,
    extrapolate=False,
    transport=True,
)
@OnnoEbbens
Copy link
Collaborator Author

When geotop is loaded from cache two extra attributes ('_FillValue' and 'missing_value') appear in de 'strat' DataArray. These variables are not there if geotop is not loaded from cache.

@OnnoEbbens
Copy link
Collaborator Author

Probably related to #91

@OnnoEbbens
Copy link
Collaborator Author

OnnoEbbens commented Aug 14, 2023

The cache function does removes the _FillValue if the function is called with a dataset using:

if "_FillValue" in ds2[coord].attrs and "_FillValue" not in ds[coord].attrs:
    del ds2[coord].attrs["_FillValue"]

Unfortunately the nlmod.read.geotop.get_geotop is not called with a dataset.

@OnnoEbbens
Copy link
Collaborator Author

The _FillValue is only added because we read the cache using mask_and_scale=False.

@OnnoEbbens
Copy link
Collaborator Author

I think the best solution is to remove mask_and_scale=False when reading the cache and solve the problems with icvert changing from int to float after.

@OnnoEbbens OnnoEbbens self-assigned this Aug 24, 2023
@OnnoEbbens
Copy link
Collaborator Author

The _FillValue is used in xarray and netcdf to efficiently store float data as integers on disk. Also described in the documentation. In our code we use the _FillValue to store the nodata values of an integer DataArray together with the array. These two different uses are the cause of conflicts (#91, #128 and this issue). My proposal is to use a different attribute name e.g. nodata to store the nodata value of an integer array. Some more details of the conflicts below.

When you write a DataArray with the attribute _FillValue to a netcdf file and then load it using mask_and_scale=True (which is the default setting) in xr.open_dataset it will use the _FillValue to replace all these values with NaN values. If the DataArray was of type 'int' replacing the _FillValue with NaN values will also change the dtype to type 'float'. The previous solution was to use mask_and_scale=False. This happened with the icvert variable, see #91.

When you save a DataArray with NaN values to a netcdf file the attributes _FillValue and missing_value are automatically added to store the data more efficient. When such a DataArray is loaded from a netcdf file the _FillValue and missing_value are automatically interpreted if mask_and_scale=True and not added as attributes to the DataArray. When mask_and_scale=False the _FillValue and missing_value are not interpreted and added as attributes to the DataArray. This can also change the dtype from float to int. This happened with the strat variable, see comments in this issue.

@bdestombe
Copy link
Collaborator

pydata/xarray#7654

@dbrakenhoff
Copy link
Collaborator

Fixed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants