Skip to content

Bug: readUCI() renames columns before setting defaults, which creates duplicate columns #201

@rburghol

Description

@rburghol
  • Status: This bug is currently active, and awaiting the completion of @timcera iomanager branch to be fully functional - see code here
  • Location: See Add update_uci mode in CLI and test UCI files #200
  • Problem: readUCI() renames columns before setting defaults, but does not flush to hdf5, which creates duplicate columns
  • Outcome:
    • if calling "readUCI()" with the 3rd parameter overwrite=False duplicate columns get written to the hdf5
    • This seems to have the result of causing an index error when doing multiple UCI re-reads with overwrite=False
  • Use case: this is important as the import_uci step can be very time consuming if you have large WDM inputs, and so I am working on a command line function to allow one to only re-import UCI parameters, and leaving WDM/timeseries things untouched in the original h5 file.

Testing:

Get Branch branch:

First Import and Run

cd tests/testcbp/HSP2Results
hsp2 import_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5
hsp2 run PL3_5250_0001eq.h5

2026-01-02 20:35:10.69   Simulation Start: 2001-01-01 00:00:00, Stop: 2002-01-01 00:00:00
2026-01-02 20:35:10.69      RCHRES R001 DELT(minutes): 60
2026-01-02 20:35:18.66         HYDR
2026-01-02 20:35:58.46         ADCALC
2026-01-02 20:35:59.41   Done; Run time is about 01:37.8 (mm:ss)

First Update with longer sim and Run

2026-01-02 20:39:48.09   Simulation Start: 1984-01-01 00:00:00, Stop: 2020-01-01 00:00:00
2026-01-02 20:39:48.09      RCHRES R001 DELT(minutes): 60
2026-01-02 20:39:55.57         HYDR
2026-01-02 20:42:14.50         ADCALC
2026-01-02 20:42:17.27   Done; Run time is about 03:24.5 (mm:ss)

2nd Update (back to shorter duration) and Run

hsp2 update_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5

Traceback (most recent call last):
  File "/usr/local/share/venv/hsp2dev_py10/bin/hsp2", line 7, in <module>
    sys.exit(main())
<snip>
File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/hsp2/hsp2tools/readUCI.py", line 318, in readUCI
    df.to_hdf(store, key=path, data_columns=True)
<snip>
ValueError: cannot reindex on an axis with duplicate labels

Analysis

  • Line 318 is a renaming of columns in the PERLND/SNOW space.
  • I suspect that the cause of this is the renaming of PERLND SNOW parameters due to duplicate columns {"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
import h5py
import pandas as pd

store = pd.HDFStore(str("./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"), mode='r')
pd.read_hdf(store, "/PERLND/SNOW/STATES")
Traceback (most recent call last):
  File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/pandas/io/pytables.py", line 1819, in _create_storer
    cls = _TABLE_MAP[tt]
KeyError: None
  • But the h5 is valid, as is evidenced by other tables being read just fine:
pd.read_hdf(store, "/RCHRES/ACIDPH/STATES")

     OPNID  ACCONC1  ACCONC2  ACCONC3  ACCONC4  ACCONC5  ACCONC6  ACCONC7
R001   NaN      0.0      0.0      0.0      0.0      0.0      0.0      0.0
  • Lines 312-318 in readUCI.py rename some columns in the /PERLND/SNOW/STATES table
  • In Line 466, each existing hsp path in the hdf is loaded in the var df
  • In line 477, the defaults for that matching table are loaded into dct_params
    • But for at least some, like /PERLND/SNOW/STATES the default columns still have their old names
  • In Line 470-483 any missing default columns are added to an updated df for that table then pushed back to the hdf5
  • For some, like the /PERLND/SNOW/STATES table, this adds deprecated column names back into the df, in addition to the recently renamed versions of those old columns.
  • Then, the next time readUCI() is run with overwrite = False, the renaming line adds duplicate copies of the renamed columns.
  • I think this is the issue, or at last 1 issue.
  • Below is some slightly edited code from readUCI() that allows one to step through this, not as efficient as a command line debugger, but my VSCode is partially broken since a recent update and I can't step into code for the time being on Windows.

The Entire Code as command line runnable test

"""
Read data from a UCI file and create an HDF file with the data.

Parameters
----------
uciname : str
    The name of the UCI file to read.
hdfname : str
    The name of the HDF file to store the data.
overwrite : bool, optional
    Whether to overwrite existing data in the HDF file. Defaults to True.

Returns
-------
None
"""
import h5py
import pandas as pd
from hsp2.hsp2tools.readUCI import *
from hsp2.hsp2io import *


# Hard code inputs for example testing
uciname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.uci"
hdfname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"
overwrite = False

# Load needed functions
convert = {"C": str, "I": int, "R": float}


if overwrite is True and os.path.exists(hdfname):
    os.remove(hdfname)

# create lookup dictionaries from 'ParseTable.csv' and 'rename.csv'
parse = defaultdict(list)
defaults = {}
cat = {}
path = {}
hsp_paths = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "ParseTable.csv")
for row in pd.read_csv(datapath).itertuples():
    parse[row.OP, row.TABLE].append(
        (row.NAME, row.TYPE, row.START, row.STOP, row.DEFAULT)
    )
    defaults[row.OP, row.SAVE, row.NAME] = convert[row.TYPE](row.DEFAULT)
    cat[row.OP, row.TABLE] = row.CAT
    path[row.OP, row.TABLE] = row.SAVE
    # store paths for checking defaults:
    hsp_path = f"/{row.OP}/{row.SAVE}/{row.CAT}"
    if not hsp_path in hsp_paths:
        hsp_paths[hsp_path] = {}
    hsp_paths[hsp_path][row.NAME] = defaults[row.OP, row.SAVE, row.NAME]

rename = {}
extendlen = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "rename.csv")
for row in pd.read_csv(datapath).itertuples():
    if row.LENGTH != 1:
        extendlen[row.OPERATION, row.TABLE] = row.LENGTH
    rename[row.OPERATION, row.TABLE] = row.RENAME

net = None
sc = None
store = pd.HDFStore(hdfname, mode="a")
info = (store, parse, path, defaults, cat, rename, extendlen)
f = reader(uciname)
for line in f:
    if line.startswith("GLOBAL"):
        global_(info, getlines(f))
    elif line.startswith("OPN"):
        opn(info, getlines(f))
    elif line.startswith("NETWORK"):
        net = network(info, getlines(f))
    elif line.startswith("SCHEMATIC"):
        sc = schematic(info, getlines(f))
    elif line.startswith("MASS-LINK"):
        masslink(info, getlines(f))
    elif line.startswith("FTABLES"):
        ftables(info, getlines(f))
    elif line.startswith("EXT"):
        ext(info, getlines(f))
    elif line.startswith("GENER"):
        gener(info, getlines(f))
    elif line.startswith("PERLND"):
        operation(info, getlines(f), "PERLND")
    elif line.startswith("IMPLND"):
        operation(info, getlines(f), "IMPLND")
    elif line.startswith("RCHRES"):
        operation(info, getlines(f), "RCHRES")
    elif line.startswith("MONTH-DATA"):
        monthdata(info, getlines(f))
    elif line.startswith("SPEC-ACTIONS"):
        specactions(info, getlines(f))

colnames = (
    "AFACTR",
    "MFACTOR",
    "MLNO",
    "SGRPN",
    "SMEMN",
    "SMEMSB",
    "SVOL",
    "SVOLNO",
    "TGRPN",
    "TMEMN",
    "TMEMSB",
    "TRAN",
    "TVOL",
    "TVOLNO",
    "COMMENTS",
)
if not ((net is None) and (sc is None)):
    linkage = pd.concat((net, sc), ignore_index=True, sort=True)
    for cname in colnames:
        if cname not in linkage.columns:
            linkage[cname] = ""
    linkage = linkage.sort_values(by=["TVOLNO"]).replace("na", "")
    linkage.to_hdf(store, key="/CONTROL/LINKS", data_columns=True)

Lapse.to_hdf(store, key="TIMESERIES/LAPSE_Table")
Seasons.to_hdf(store, key="TIMESERIES/SEASONS_Table")
Svp.to_hdf(store, key="TIMESERIES/Saturated_Vapor_Pressure_Table")
keys = set(store.keys())
# rename needed for restart. NOTE issue with line 157 in PERLND SNOW HSPF
# where PKSNOW = PKSNOW + PKICE at start - ONLY
path = "/PERLND/SNOW/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    df = df.rename(
        columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
    )
    df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/SNOW/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    df = df.rename(
        columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
    )
    df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/SNOW/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SNOPFG" not in df.columns:  # didn't read SNOW-FLAGS table
        df["SNOPFG"] = 0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/SNOW/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SNOPFG" not in df.columns:  # didn't read SNOW-FLAGS table
        df["SNOPFG"] = 0
        df.to_hdf(store, key=path, data_columns=True)

# Need to fixup missing data
path = "/IMPLND/IWATER/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "PETMIN" not in df.columns:  # didn't read IWAT-PARM2 table
        df["PETMIN"] = 0.35
        df["PETMAX"] = 40.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/IWTGAS/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/IQUAL/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PWTGAS/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df["ILIFAC"] = 0.0
        df["ALIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)
    if "SOTMP" not in df.columns:  # didn't read PWT-TEMPS table
        df["SOTMP"] = 60.0
        df["IOTMP"] = 60.0
        df["AOTMP"] = 60.0
        df.to_hdf(store, key=path, data_columns=True)
    if "SODOX" not in df.columns:  # didn't read PWT-GASES table
        df["SODOX"] = 0.0
        df["SOCO2"] = 0.0
        df["IODOX"] = 0.0
        df["IOCO2"] = 0.0
        df["AODOX"] = 0.0
        df["AOCO2"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PWATER/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "FZG" not in df.columns:  # didn't read PWAT-PARM5 table
        df["FZG"] = 1.0
        df["FZGL"] = 0.1
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PQUAL/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df["ILIFAC"] = 0.0
        df["ALIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/GENERAL/INFO"
if path in keys:
    dfinfo = pd.read_hdf(store, path)
    path = "/RCHRES/HYDR/PARAMETERS"
    if path in keys:
        df = pd.read_hdf(store, path)
        df["NEXITS"] = dfinfo["NEXITS"]
        df["LKFG"] = dfinfo["LKFG"]
        if "IREXIT" not in df.columns:  # didn't read HYDR-IRRIG table
            df["IREXIT"] = 0
            df["IRMINV"] = 0.0
        df["FTBUCI"] = df["FTBUCI"].map(lambda x: f"FT{int(x):03d}")
        df.to_hdf(store, key=path, data_columns=True)
    del dfinfo["NEXITS"]
    del dfinfo["LKFG"]
    dfinfo.to_hdf(store, key="RCHRES/GENERAL/INFO", data_columns=True)

path = "/RCHRES/HTRCH/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "BEDFLG" not in df.columns:  # didn't read HT-BED-FLAGS table
        df["BEDFLG"] = 0
        df["TGFLG"] = 2
        df["TSTOP"] = 55
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "ELEV" not in df.columns:  # didn't read HEAT-PARM table
        df["ELEV"] = 0.0
        df["ELDAT"] = 0.0
        df["CFSAEX"] = 1.0
        df["KATRAD"] = 9.37
        df["KCOND"] = 6.12
        df["KEVAP"] = 2.24
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "MUDDEP" not in df.columns:  # didn't read HT-BED-PARM table
        df["MUDDEP"] = 0.33
        df["TGRND"] = 59.0
        df["KMUD"] = 50.0
        df["KGRND"] = 1.4
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    # if 'TW' not in df.columns:  # didn't read HEAT-INIT table
    #    df['TW']    = 60.0
    #    df['AIRTMP']= 60.0

# apply defaults:
# JUST FOR TESTING WE OVERWRITE hsp_paths to ONLY CONTAIN the SNOW PERLND
path = '/PERLND/SNOW/STATES'
hsp_paths = {path:hsp_paths[path]}

for path in hsp_paths:
    if path in keys:
        df = pd.read_hdf(store, path)
        dct_params = hsp_paths[path]
        
        new_columns = {}
        for par_name in dct_params:
            if par_name == "CFOREA":
                ichk = 0
            
            if par_name not in df.columns:  # missing value in HDF5 path
                def_val = dct_params[par_name]
                if def_val != "None":
                    # df[par_name] = def_val
                    new_columns[par_name] = def_val
        
        new_columns = pd.DataFrame(new_columns, index=df.index)
        df1 = pd.concat([df, new_columns], axis=1)
        
        df1.to_hdf(store, key=path, data_columns=True)
    else:
        if path[-6:] == "STATES":
            # need to add states if it doesn't already exist to save initial state variables
            # such as the case where entire IWAT-STATE1 table is being defaulted
            if not "df" in locals():
                x = 1  # sometimes when debugging keys gets creamed, seems like an IDE bug
            for column in df.columns:  # clear out existing data frame columns
                df = df.drop([column], axis=1)
            dct_params = hsp_paths[path]
            for par_name in dct_params:
                def_val = dct_params[par_name]
                if def_val != "None":
                    df[par_name] = def_val
            df.to_hdf(store, key=path, data_columns=True)

# now see what the hdf has in it
pd.read_hdf(store, path)
store.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions