KeyError while sampling using freshly trained PAR model #943

DamianUS · 2022-08-09T10:11:33Z

Environment Details

Please indicate the following details about the environment in which you found the bug:

SDV version: 0.16.0
Python version: 3.8.13 (default, May 8 2022, 17:48:02) \n[Clang 13.1.6 (clang-1316.0.21.2)]
Operating System: Macbook Pro M1 Mac OS X 12.0.1

Error description

The key error is also being raised when trying to sample from a freshly-trained PAR model in v0.16.0.

I tried both passing the field types metadata and without it, nothing seems to help.

I printed the model metadata just to check if the model inferred properly the data types and everything seems correct.

Here I attach the code used just in case it helps (this is the last version used in which the model infers the field types):

import pandas as pd
from sdv.timeseries import PAR
from sdv.metrics.timeseries import TSFClassifierEfficacy

data = pd.read_csv("data/micro_batch_task.csv")
sequence_index = 'start_time'
field_types = {
    "instance_num": {
        "type": "numerical",
        'subtype': 'integer'
    },
    "start_time": {
        "type": "numerical",
        'subtype': 'integer'
    },
    "plan_cpu": {
        "type": "numerical",
        'subtype': 'float'
    },
    "plan_mem": {
        "type": "numerical",
        'subtype': 'float'
    },
    "makespan": {
        "type": "numerical",
        'subtype': 'integer'
    },
}
model = PAR(
    sequence_index=sequence_index,
    segment_size=10,
    epochs=1,
    verbose=True
)
model.fit(data)
print(model.get_metadata().to_dict())
new_data = model.sample(1)
print(new_data)
print(TSFClassifierEfficacy.compute(data, new_data, field_types, target='makespan'))

When trying to sample:

PARModel(epochs=1, sample_size=1, cuda='cpu', verbose=True) instance created
Epoch 1 | Loss 0.001459105173125863: 100%|██████████| 1/1 [00:51<00:00, 51.42s/it]
{'fields': {'instance_num': {'type': 'numerical', 'subtype': 'float', 'transformer': None}, 'start_time': {'type': 'numerical', 'subtype': 'integer', 'transformer': None}, 'plan_cpu': {'type': 'numerical', 'subtype': 'float', 'transformer': None}, 'plan_mem': {'type': 'numerical', 'subtype': 'float', 'transformer': None}, 'makespan': {'type': 'numerical', 'subtype': 'integer', 'transformer': None}}, 'constraints': [], 'model_kwargs': {}, 'name': None, 'primary_key': None, 'sequence_index': 'start_time', 'entity_columns': [], 'context_columns': []}
100%|██████████| 1/1 [00:00<00:00, 85.72it/s]
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'start_time'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/damianfernandez/PycharmProjects/sdv/main.py", line 46, in <module>
    new_data = model.sample(1)
  File "/opt/homebrew/lib/python3.8/site-packages/sdv/timeseries/base.py", line 268, in sample
    return self._metadata.reverse_transform(sampled)
  File "/opt/homebrew/lib/python3.8/site-packages/sdv/metadata/table.py", line 700, in reverse_transform
    field_data = reversed_data[name]
  File "/opt/homebrew/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/homebrew/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'start_time'

Process finished with exit code 1

Maybe I'm not doing something properly. I'm new to the library!

The text was updated successfully, but these errors were encountered:

yamidibarra · 2022-08-10T08:30:00Z

Dear @npatki thank you in advance for your support! I´m having a similar issue. Here I describe it:

Environment Details

SDV version: 0.16.0
Python version: 3.8.13
Operating System: Windows 10

Error:
Exception has occurred: KeyError
'Time'

The above exception was the direct cause of the following exception:
File "C:\Users\Data_Augmentation\PAR_Model.py", line 13, in
new_data = model.sample(1)

import pandas as pd
from sdv.timeseries import PAR

data = pd.read_pickle('df_PAR.pkl')
context_columns = ['POM', 'Mold Temperature [°C]', 'Injection velocity [cmm/s]', 'Holding pressure [bar]'] 
entity_columns = ['id']
sequence_index = 'Time'

model = PAR(entity_columns=entity_columns,  context_columns=context_columns,  sequence_index=sequence_index)
 
model.fit(data)
new_data = model.sample(1)

model.save('Timeseries_synthetic_model.pkl')

Attached you will find .py file and .pkl file with data
PS: I tried to reproduce the example shown here: https://sdv.dev/SDV/user_guides/timeseries/par.html but I can´t access the file. I wanted to check the type of data variables.

yamidibarra · 2022-08-10T09:22:20Z

#808 (comment)

I understand what´s going on. My Time column is float-type, PAR allows only Data-Time type though...

dharmesh1007 · 2022-08-10T13:08:23Z

@yamidibarra, I'm having the same issue. Time column needing to be in date time format.

npatki · 2022-08-10T16:07:07Z

Hi everyone,

Yes @yamidibarra, I agree with you. Issue #808 is likely the root cause for all these errors: It is a known issue that the PAR model currently produces a sampling error when sequence_index is numerical (float, int). The error should go away if you express sequence_index as a datetime or if you remove it altogether.

Does this accurately describe everyone's scenario? If so, I can close this issue in favor of #808 for tracking.

npatki · 2022-08-10T16:08:38Z

BTW --

@DamianUS, thanks for filing this issue! I will delete the comments in #935 since you copied it over here

@yamidibarra, re the link:

PS: I tried to reproduce the example shown here: https://sdv.dev/SDV/user_guides/timeseries/par.html but I can´t access the file. I wanted to check the type of data variables.

The text of the link is correct by the hyperlink is pointing to some other URL. You should be able to open the page if you click on this: https://sdv.dev/SDV/user_guides/timeseries/par.html.

yamidibarra · 2022-08-11T06:53:11Z

Hi everyone,

Yes @yamidibarra, I agree with you. Issue #808 is likely the root cause for all these errors: It is a known issue that the PAR model currently produces a sampling error when sequence_index is numerical (float, int). The error should go away if you express sequence_index as a datetime or if you remove it altogether.

Does this accurately describe everyone's scenario? If so, I can close this issue in favor of #808 for tracking.

yes, it resolves this specific issue. Here my workaround. I´ll open up another issue regarding the synthetic data. I have some questions and I would appreciate your opinion dear @npatki

data = pd.read_pickle('df_PAR.pkl')
data['Time'] = data['Time'].multiply(1E9)
data['Time'] = pd.to_datetime(data['Time'])

context_columns = ['POM', 'Mold Temperature [°C]', 'Injection velocity [cmm/s]', 'Holding pressure [bar]'] 
entity_columns = ['id']
sequence_index = 'Time'
model = PAR(entity_columns=entity_columns,  context_columns=context_columns,  sequence_index=sequence_index)
 
model.fit(data)
new_data = model.sample(1)
   
 # get seconds
new_data['Time']=new_data['Time'].apply(lambda x:'%02d.%06d' %(x.second, x.microsecond)).astype(float)

npatki · 2022-08-11T20:11:00Z

Great, thanks for confirming! I'll close this issue in favor of #808.

Please feel free to reply if you continue to see a KeyError on the PAR model even if you have a datetime sequence_index and I can reopen this issue for discussion.

mohammedsabiya · 2023-07-20T09:13:31Z

Hi, I am facing the same KeyError issue in PARsynthesizer as here, even though sequence_index is datetime. Please see the issue #1510.

p.s. the KeyError that I get is from the context_columns

Great, thanks for confirming! I'll close this issue in favor of #808.

Please feel free to reply if you continue to see a KeyError on the PAR model even if you have a datetime sequence_index and I can reopen this issue for discussion.

npatki · 2023-07-21T03:09:25Z

@mohammedsabiya Thanks for filing! We'll follow up in the new issue, as it's been some time since this original one was resolved.

DamianUS added bug Something isn't working new Automatic label applied to new issues labels Aug 9, 2022

npatki added data:sequential Related to timeseries datasets under discussion Issue is currently being discussed and removed new Automatic label applied to new issues labels Aug 10, 2022

npatki closed this as completed Aug 11, 2022

npatki added resolution:duplicate This issue or pull request already exists and removed under discussion Issue is currently being discussed labels Aug 11, 2022

npatki mentioned this issue Aug 11, 2022

PAR model sampling error when there is a numerical sequence_index (float, int) #808

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError while sampling using freshly trained PAR model #943

KeyError while sampling using freshly trained PAR model #943

DamianUS commented Aug 9, 2022 •

edited

yamidibarra commented Aug 10, 2022 •

edited

yamidibarra commented Aug 10, 2022

dharmesh1007 commented Aug 10, 2022

npatki commented Aug 10, 2022

npatki commented Aug 10, 2022

yamidibarra commented Aug 11, 2022

npatki commented Aug 11, 2022 •

edited

mohammedsabiya commented Jul 20, 2023 •

edited

npatki commented Jul 21, 2023

KeyError while sampling using freshly trained PAR model #943

KeyError while sampling using freshly trained PAR model #943

Comments

DamianUS commented Aug 9, 2022 • edited

Environment Details

Error description

yamidibarra commented Aug 10, 2022 • edited

yamidibarra commented Aug 10, 2022

dharmesh1007 commented Aug 10, 2022

npatki commented Aug 10, 2022

npatki commented Aug 10, 2022

yamidibarra commented Aug 11, 2022

npatki commented Aug 11, 2022 • edited

mohammedsabiya commented Jul 20, 2023 • edited

npatki commented Jul 21, 2023

DamianUS commented Aug 9, 2022 •

edited

yamidibarra commented Aug 10, 2022 •

edited

npatki commented Aug 11, 2022 •

edited

mohammedsabiya commented Jul 20, 2023 •

edited