-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
- Status: This bug is currently active, and awaiting the completion of @timcera
iomanagerbranch to be fully functional - see code here - Location: See Add update_uci mode in CLI and test UCI files #200
- Problem: readUCI() renames columns before setting defaults, but does not flush to hdf5, which creates duplicate columns
- Outcome:
- if calling "readUCI()" with the 3rd parameter
overwrite=Falseduplicate columns get written to the hdf5 - This seems to have the result of causing an index error when doing multiple UCI re-reads with
overwrite=False
- if calling "readUCI()" with the 3rd parameter
- Use case: this is important as the
import_ucistep can be very time consuming if you have large WDM inputs, and so I am working on a command line function to allow one to only re-import UCI parameters, and leaving WDM/timeseries things untouched in the originalh5file.
Testing:
Get Branch branch:
First Import and Run
cd tests/testcbp/HSP2Results
hsp2 import_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5
hsp2 run PL3_5250_0001eq.h5
2026-01-02 20:35:10.69 Simulation Start: 2001-01-01 00:00:00, Stop: 2002-01-01 00:00:00
2026-01-02 20:35:10.69 RCHRES R001 DELT(minutes): 60
2026-01-02 20:35:18.66 HYDR
2026-01-02 20:35:58.46 ADCALC
2026-01-02 20:35:59.41 Done; Run time is about 01:37.8 (mm:ss)
First Update with longer sim and Run
2026-01-02 20:39:48.09 Simulation Start: 1984-01-01 00:00:00, Stop: 2020-01-01 00:00:00
2026-01-02 20:39:48.09 RCHRES R001 DELT(minutes): 60
2026-01-02 20:39:55.57 HYDR
2026-01-02 20:42:14.50 ADCALC
2026-01-02 20:42:17.27 Done; Run time is about 03:24.5 (mm:ss)
2nd Update (back to shorter duration) and Run
hsp2 update_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5
Traceback (most recent call last):
File "/usr/local/share/venv/hsp2dev_py10/bin/hsp2", line 7, in <module>
sys.exit(main())
<snip>
File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/hsp2/hsp2tools/readUCI.py", line 318, in readUCI
df.to_hdf(store, key=path, data_columns=True)
<snip>
ValueError: cannot reindex on an axis with duplicate labels
Analysis
- Line 318 is a renaming of columns in the PERLND/SNOW space.
- I suspect that the cause of this is the renaming of PERLND SNOW parameters due to duplicate columns {"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
import h5py
import pandas as pd
store = pd.HDFStore(str("./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"), mode='r')
pd.read_hdf(store, "/PERLND/SNOW/STATES")
Traceback (most recent call last):
File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/pandas/io/pytables.py", line 1819, in _create_storer
cls = _TABLE_MAP[tt]
KeyError: None
- But the h5 is valid, as is evidenced by other tables being read just fine:
pd.read_hdf(store, "/RCHRES/ACIDPH/STATES")
OPNID ACCONC1 ACCONC2 ACCONC3 ACCONC4 ACCONC5 ACCONC6 ACCONC7
R001 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0
- Lines 312-318 in readUCI.py rename some columns in the
/PERLND/SNOW/STATEStable - In Line 466, each existing hsp path in the hdf is loaded in the var
df - In line 477, the defaults for that matching table are loaded into
dct_params- But for at least some, like
/PERLND/SNOW/STATESthe default columns still have their old names
- But for at least some, like
- In Line 470-483 any missing default columns are added to an updated df for that table then pushed back to the hdf5
- For some, like the
/PERLND/SNOW/STATEStable, this adds deprecated column names back into the df, in addition to the recently renamed versions of those old columns. - Then, the next time
readUCI()is run withoverwrite = False, the renaming line adds duplicate copies of the renamed columns. - I think this is the issue, or at last 1 issue.
- Below is some slightly edited code from
readUCI()that allows one to step through this, not as efficient as a command line debugger, but my VSCode is partially broken since a recent update and I can't step into code for the time being on Windows.
The Entire Code as command line runnable test
"""
Read data from a UCI file and create an HDF file with the data.
Parameters
----------
uciname : str
The name of the UCI file to read.
hdfname : str
The name of the HDF file to store the data.
overwrite : bool, optional
Whether to overwrite existing data in the HDF file. Defaults to True.
Returns
-------
None
"""
import h5py
import pandas as pd
from hsp2.hsp2tools.readUCI import *
from hsp2.hsp2io import *
# Hard code inputs for example testing
uciname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.uci"
hdfname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"
overwrite = False
# Load needed functions
convert = {"C": str, "I": int, "R": float}
if overwrite is True and os.path.exists(hdfname):
os.remove(hdfname)
# create lookup dictionaries from 'ParseTable.csv' and 'rename.csv'
parse = defaultdict(list)
defaults = {}
cat = {}
path = {}
hsp_paths = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "ParseTable.csv")
for row in pd.read_csv(datapath).itertuples():
parse[row.OP, row.TABLE].append(
(row.NAME, row.TYPE, row.START, row.STOP, row.DEFAULT)
)
defaults[row.OP, row.SAVE, row.NAME] = convert[row.TYPE](row.DEFAULT)
cat[row.OP, row.TABLE] = row.CAT
path[row.OP, row.TABLE] = row.SAVE
# store paths for checking defaults:
hsp_path = f"/{row.OP}/{row.SAVE}/{row.CAT}"
if not hsp_path in hsp_paths:
hsp_paths[hsp_path] = {}
hsp_paths[hsp_path][row.NAME] = defaults[row.OP, row.SAVE, row.NAME]
rename = {}
extendlen = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "rename.csv")
for row in pd.read_csv(datapath).itertuples():
if row.LENGTH != 1:
extendlen[row.OPERATION, row.TABLE] = row.LENGTH
rename[row.OPERATION, row.TABLE] = row.RENAME
net = None
sc = None
store = pd.HDFStore(hdfname, mode="a")
info = (store, parse, path, defaults, cat, rename, extendlen)
f = reader(uciname)
for line in f:
if line.startswith("GLOBAL"):
global_(info, getlines(f))
elif line.startswith("OPN"):
opn(info, getlines(f))
elif line.startswith("NETWORK"):
net = network(info, getlines(f))
elif line.startswith("SCHEMATIC"):
sc = schematic(info, getlines(f))
elif line.startswith("MASS-LINK"):
masslink(info, getlines(f))
elif line.startswith("FTABLES"):
ftables(info, getlines(f))
elif line.startswith("EXT"):
ext(info, getlines(f))
elif line.startswith("GENER"):
gener(info, getlines(f))
elif line.startswith("PERLND"):
operation(info, getlines(f), "PERLND")
elif line.startswith("IMPLND"):
operation(info, getlines(f), "IMPLND")
elif line.startswith("RCHRES"):
operation(info, getlines(f), "RCHRES")
elif line.startswith("MONTH-DATA"):
monthdata(info, getlines(f))
elif line.startswith("SPEC-ACTIONS"):
specactions(info, getlines(f))
colnames = (
"AFACTR",
"MFACTOR",
"MLNO",
"SGRPN",
"SMEMN",
"SMEMSB",
"SVOL",
"SVOLNO",
"TGRPN",
"TMEMN",
"TMEMSB",
"TRAN",
"TVOL",
"TVOLNO",
"COMMENTS",
)
if not ((net is None) and (sc is None)):
linkage = pd.concat((net, sc), ignore_index=True, sort=True)
for cname in colnames:
if cname not in linkage.columns:
linkage[cname] = ""
linkage = linkage.sort_values(by=["TVOLNO"]).replace("na", "")
linkage.to_hdf(store, key="/CONTROL/LINKS", data_columns=True)
Lapse.to_hdf(store, key="TIMESERIES/LAPSE_Table")
Seasons.to_hdf(store, key="TIMESERIES/SEASONS_Table")
Svp.to_hdf(store, key="TIMESERIES/Saturated_Vapor_Pressure_Table")
keys = set(store.keys())
# rename needed for restart. NOTE issue with line 157 in PERLND SNOW HSPF
# where PKSNOW = PKSNOW + PKICE at start - ONLY
path = "/PERLND/SNOW/STATES"
if path in keys:
df = pd.read_hdf(store, path)
df = df.rename(
columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
)
df.to_hdf(store, key=path, data_columns=True)
path = "/IMPLND/SNOW/STATES"
if path in keys:
df = pd.read_hdf(store, path)
df = df.rename(
columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
)
df.to_hdf(store, key=path, data_columns=True)
path = "/PERLND/SNOW/FLAGS"
if path in keys:
df = pd.read_hdf(store, path)
if "SNOPFG" not in df.columns: # didn't read SNOW-FLAGS table
df["SNOPFG"] = 0
df.to_hdf(store, key=path, data_columns=True)
path = "/IMPLND/SNOW/FLAGS"
if path in keys:
df = pd.read_hdf(store, path)
if "SNOPFG" not in df.columns: # didn't read SNOW-FLAGS table
df["SNOPFG"] = 0
df.to_hdf(store, key=path, data_columns=True)
# Need to fixup missing data
path = "/IMPLND/IWATER/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "PETMIN" not in df.columns: # didn't read IWAT-PARM2 table
df["PETMIN"] = 0.35
df["PETMAX"] = 40.0
df.to_hdf(store, key=path, data_columns=True)
path = "/IMPLND/IWTGAS/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "SDLFAC" not in df.columns: # didn't read LAT-FACTOR table
df["SDLFAC"] = 0.0
df["SLIFAC"] = 0.0
df.to_hdf(store, key=path, data_columns=True)
path = "/IMPLND/IQUAL/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "SDLFAC" not in df.columns: # didn't read LAT-FACTOR table
df["SDLFAC"] = 0.0
df["SLIFAC"] = 0.0
df.to_hdf(store, key=path, data_columns=True)
path = "/PERLND/PWTGAS/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "SDLFAC" not in df.columns: # didn't read LAT-FACTOR table
df["SDLFAC"] = 0.0
df["SLIFAC"] = 0.0
df["ILIFAC"] = 0.0
df["ALIFAC"] = 0.0
df.to_hdf(store, key=path, data_columns=True)
if "SOTMP" not in df.columns: # didn't read PWT-TEMPS table
df["SOTMP"] = 60.0
df["IOTMP"] = 60.0
df["AOTMP"] = 60.0
df.to_hdf(store, key=path, data_columns=True)
if "SODOX" not in df.columns: # didn't read PWT-GASES table
df["SODOX"] = 0.0
df["SOCO2"] = 0.0
df["IODOX"] = 0.0
df["IOCO2"] = 0.0
df["AODOX"] = 0.0
df["AOCO2"] = 0.0
df.to_hdf(store, key=path, data_columns=True)
path = "/PERLND/PWATER/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "FZG" not in df.columns: # didn't read PWAT-PARM5 table
df["FZG"] = 1.0
df["FZGL"] = 0.1
df.to_hdf(store, key=path, data_columns=True)
path = "/PERLND/PQUAL/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "SDLFAC" not in df.columns: # didn't read LAT-FACTOR table
df["SDLFAC"] = 0.0
df["SLIFAC"] = 0.0
df["ILIFAC"] = 0.0
df["ALIFAC"] = 0.0
df.to_hdf(store, key=path, data_columns=True)
path = "/RCHRES/GENERAL/INFO"
if path in keys:
dfinfo = pd.read_hdf(store, path)
path = "/RCHRES/HYDR/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
df["NEXITS"] = dfinfo["NEXITS"]
df["LKFG"] = dfinfo["LKFG"]
if "IREXIT" not in df.columns: # didn't read HYDR-IRRIG table
df["IREXIT"] = 0
df["IRMINV"] = 0.0
df["FTBUCI"] = df["FTBUCI"].map(lambda x: f"FT{int(x):03d}")
df.to_hdf(store, key=path, data_columns=True)
del dfinfo["NEXITS"]
del dfinfo["LKFG"]
dfinfo.to_hdf(store, key="RCHRES/GENERAL/INFO", data_columns=True)
path = "/RCHRES/HTRCH/FLAGS"
if path in keys:
df = pd.read_hdf(store, path)
if "BEDFLG" not in df.columns: # didn't read HT-BED-FLAGS table
df["BEDFLG"] = 0
df["TGFLG"] = 2
df["TSTOP"] = 55
df.to_hdf(store, key=path, data_columns=True)
path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "ELEV" not in df.columns: # didn't read HEAT-PARM table
df["ELEV"] = 0.0
df["ELDAT"] = 0.0
df["CFSAEX"] = 1.0
df["KATRAD"] = 9.37
df["KCOND"] = 6.12
df["KEVAP"] = 2.24
df.to_hdf(store, key=path, data_columns=True)
path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
df = pd.read_hdf(store, path)
if "MUDDEP" not in df.columns: # didn't read HT-BED-PARM table
df["MUDDEP"] = 0.33
df["TGRND"] = 59.0
df["KMUD"] = 50.0
df["KGRND"] = 1.4
df.to_hdf(store, key=path, data_columns=True)
path = "/RCHRES/HTRCH/STATES"
if path in keys:
df = pd.read_hdf(store, path)
# if 'TW' not in df.columns: # didn't read HEAT-INIT table
# df['TW'] = 60.0
# df['AIRTMP']= 60.0
# apply defaults:
# JUST FOR TESTING WE OVERWRITE hsp_paths to ONLY CONTAIN the SNOW PERLND
path = '/PERLND/SNOW/STATES'
hsp_paths = {path:hsp_paths[path]}
for path in hsp_paths:
if path in keys:
df = pd.read_hdf(store, path)
dct_params = hsp_paths[path]
new_columns = {}
for par_name in dct_params:
if par_name == "CFOREA":
ichk = 0
if par_name not in df.columns: # missing value in HDF5 path
def_val = dct_params[par_name]
if def_val != "None":
# df[par_name] = def_val
new_columns[par_name] = def_val
new_columns = pd.DataFrame(new_columns, index=df.index)
df1 = pd.concat([df, new_columns], axis=1)
df1.to_hdf(store, key=path, data_columns=True)
else:
if path[-6:] == "STATES":
# need to add states if it doesn't already exist to save initial state variables
# such as the case where entire IWAT-STATE1 table is being defaulted
if not "df" in locals():
x = 1 # sometimes when debugging keys gets creamed, seems like an IDE bug
for column in df.columns: # clear out existing data frame columns
df = df.drop([column], axis=1)
dct_params = hsp_paths[path]
for par_name in dct_params:
def_val = dct_params[par_name]
if def_val != "None":
df[par_name] = def_val
df.to_hdf(store, key=path, data_columns=True)
# now see what the hdf has in it
pd.read_hdf(store, path)
store.close()
Metadata
Metadata
Assignees
Labels
No labels