-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAR model sampling error when there is a numerical sequence_index
(float, int)
#808
Comments
Here is the traceback when I run the above:
|
Thanks for filing @doolingdavidrs21. I can replicate this issue. For SDV developers: I did some digging and found the following --
|
sequence_index
(float, int)
Potential Workarounds
data = data.drop([sequence_index], axis=1)
import pandas as pd
sequence_index = 'my_sequence_index_column_name' # name of column
data[sequence_index] = pd.to_datetime(data[sequence_index]) Remember to cast the synthetic data back to an int at the end synthetic_data[sequence_index] = synthetic_data[sequence_index].astype(int) |
This is because in the Lines 74 to 80 in f822903
However, in sampling we add the SDV/sdv/timeseries/deepecho.py Lines 141 to 143 in f822903
This is a bug |
Great news! This issue has now been resolved in our new SDV 1.0 (Beta!) release. Check it out and let us know if you're still encountering any problems. Resources:
|
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
I am unable to use an integer type field in the PAR models for the index_sequence parameter.
I would like to be able to do so so that a PAR model trained with one will have the values for that field with increasingly larger integers be able to be mapped back to a datetime field that has a frequency other than days.
Below is an example where setting index_sequence parameter to an integer value allows for model training, but the model methods all fail, cannot sample:
Steps to reproduce
from sdv.demo import load_timeseries_demo
import pandas as pd
data = load_timeseries_demo()
sequence_map = {
sorted(data["Date"].unique())[i]: i for i in range(len(data["Date"].unique()))
}
data["Date"] = data["Date"].map(sequence_map)
entity_columns = ["Symbol"]
context_columns = ["MarketCap", "Sector", "Industry"]
sequence_index = "Date"
from sdv.timeseries import PAR
model = PAR(
entity_columns=entity_columns,
context_columns=context_columns,
sequence_index=sequence_index,
verbose=True,
epochs=45,
)
model.fit(data)
In[247]:
throws error
new_data = model.sample(num_sequences=1, sequence_length=10)
The text was updated successfully, but these errors were encountered: