-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pycaret version checks
-
I have checked that this issue has not already been reported here.
-
I have confirmed this bug exists on the latest version of pycaret.
-
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).
Issue Description
Hi there I have a dataset with the following columns.
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PricePerUnit 146758 non-null float64
1 LeaseContractLength 146758 non-null int64
2 PurchaseOption 146758 non-null object
3 OfferingClass 146758 non-null object
4 Product Family 146758 non-null object
5 Location 146758 non-null object
6 Current Generation 146758 non-null object
7 vCPU 146758 non-null int64
8 Memory 146758 non-null int64
9 Tenancy 146758 non-null object
10 Operating System 146758 non-null object
11 License Model 146758 non-null object
12 year 146758 non-null int64
13 Network Performance 146758 non-null float64
14 EffectiveDate 146758 non-null object
15 DiskType 146758 non-null object
16 StorageSize 146758 non-null int64
17 dateTime 146758 non-null datetime64[ns]
I am trying to use the time series functionality by version '3.0.0.rc3' of pycaret.
My setup is as follows
s = setup(df, target='PricePerUnit', ignore_features=['Network Performance', 'OfferingClass', 'Current Generation'], index= 'dateTime' )
And the index column I use is like this.
I get the following error
1169 freq = self.freqstr or self.inferred_freq
1171 if freq is None:
-> 1172 raise ValueError(
1173 "You must pass a freq argument as current index has none."
1174 )
1176 res = get_period_alias(freq)
1178 # https://github.com/pandas-dev/pandas/issues/33358
ValueError: You must pass a freq argument as current index has none.
You may find the dataset here:
https://dagshub.com/gfragi/PriceIndex/raw/ff0f7754da688915232f98c2a3fbbf6202e3b2b5/data/amazon_unique_dates.csv
Reproducible Example
from pycaret.time_series import *
# from pycaret.time_series import *
import pandas as pd
import numpy as np
import random
from random import randrange
import datetime
import plotly.express as px
df = pd.read_csv(f'data/amazon_unique_dates.csv')
s = setup(df, target='PricePerUnit', ignore_features=['Network Performance', 'OfferingClass', 'Current Generation'], index= 'dateTime' )
Expected Behavior
I suppose that by using the datetime64[ns] type on index was the appropriate to do in order to run the setup.
Actual Results
1169 freq = self.freqstr or self.inferred_freq
1171 if freq is None:
-> 1172 raise ValueError(
1173 "You must pass a freq argument as current index has none."
1174 )
1176 res = get_period_alias(freq)
1178 # https://github.com/pandas-dev/pandas/issues/33358