Skip to content

Running recipe with resampling "year" returns error #175

@SarahAlidoost

Description

@SarahAlidoost

The recipe:

datasets:
  npn_obs:
    dataset: RNPN
    species_ids:
      functional_type: "Deciduous broadleaf" # multiple species
    phenophase_ids:
        name: breaking leaf buds
    years: [2015, 2020]
    area:
      name: Washington # 500km boundingbox centered at latitude: 47.751076 and longitude: -120.740135
      bbox:
        [
          -124.08406940413612,
          45.50277198520317,
          -117.39620059586387,
          49.99938001479683,
        ]
  daymet:
    dataset: daymet_multiple_points
    points:
      source: npn_obs
    years: [2015, 2020] # TODO don't duplicate
    variables:
      - tmin
    resample:
      frequency: year
      operator: mean
preparation:
  dropna: True

The error:

Dataset npn_obs loaded with 241 rows
Dataset npn_obs resampled to 241 rows
Downloading dataset:  npn_obs
/tmp/data/rnpn/rnpn_npn_data_y_2015_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
/tmp/data/rnpn/rnpn_npn_data_y_2016_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
/tmp/data/rnpn/rnpn_npn_data_y_2017_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
/tmp/data/rnpn/rnpn_npn_data_y_2018_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
/tmp/data/rnpn/rnpn_npn_data_y_2019_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
/tmp/data/rnpn/rnpn_npn_data_y_2020_Deciduous broadleaf_breaking leaf buds_Washington.csv already exists, skipping
Downloading dataset:  daymet
Dataset daymet loaded with 326310 rows
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_32781/1198658873.py in ?()
      1 recipe = "thermaline.yaml"
----> 2 Workflow.from_recipe(recipe).execute()

~/GitHub/springtime/src/springtime/main.py in ?(self)
    111             ds = dataset.load()
    112             logger.warning(f"Dataset {dataset_name} loaded with {len(ds)} rows")
    113             if dataset.resample:
    114                 if issubclass(ds.__class__, pd.DataFrame):
--> 115                     ds = resample(
    116                         ds,
    117                         freq=dataset.resample.frequency,
    118                         operator=dataset.resample.operator,

~/GitHub/springtime/src/springtime/utils.py in ?(df, freq, operator, column)
    254     ]
    255 
    256     # Can't sort when grouping on geometry
    257     new_df = (
--> 258         df.groupby(groups, sort=False).agg(operator, numeric_only=True).reset_index()
    259     )
    260 
    261     return gpd.GeoDataFrame(new_df)

~/mambaforge/envs/springtime/lib/python3.10/site-packages/pandas/util/_decorators.py in ?(*args, **kwargs)
    327                     msg.format(arguments=_format_argument_list(allow_args)),
    328                     FutureWarning,
    329                     stacklevel=find_stack_level(),
    330                 )
--> 331             return func(*args, **kwargs)

~/mambaforge/envs/springtime/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, level, drop, inplace, col_level, col_fill, allow_duplicates, names)
   6357                     level_values = algorithms.take(
   6358                         level_values, lab, allow_fill=True, fill_value=lev._na_value
   6359                     )
   6360 
-> 6361                 new_obj.insert(
   6362                     0,
   6363                     name,
   6364                     level_values,

~/mambaforge/envs/springtime/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, loc, column, value, allow_duplicates)
   4813                 "'self.flags.allows_duplicate_labels' is False."
   4814             )
   4815         if not allow_duplicates and column in self.columns:
   4816             # Should this be a different kind of error??
-> 4817             raise ValueError(f"cannot insert {column}, already exists")
   4818         if not isinstance(loc, int):
   4819             raise TypeError("loc must be int")
   4820 

ValueError: cannot insert year, already exists

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions