[PoC] AutoGluon TimeSeries Prototype #494

Innixma · 2022-09-13T22:42:49Z

[Don't merge this PR]

This PR is a proof of concept of time series data and framework support in AutoMLBenchmark.

To run, follow the instructions in the newly added frameworks/AutoGluonTS/README.md.

Innixma · 2022-09-13T23:14:11Z

@sebhrusen @PGijsbers

Some questions I have:

[Solved in AutoMLBenchmark TimeSeries Prototype. Innixma/automlbenchmark#6] Is there a way to specify information such as prediction_length=5 on a per dataset basis? prediction_length is the look-ahead requirement for prediction and dictates the difficulty of the task. I'm wondering if I can specify it as part of the yaml file definition of the dataset in ts.yaml. Ditto for a couple other things like timestamp_column="Date" and item_id="name".
[Solved in AutoMLBenchmark TimeSeries Prototype. Innixma/automlbenchmark#6] How can I update and specify the logic that does the final scoring based on predictions and truth? It needs to be altered to work with TimeSeries. Additionally, it may take a different form, such as if the metric requires quantile predictions to calculate.

PGijsbers · 2022-09-14T10:00:37Z

@sebhrusen I'd appreciate it if you can have a look, I have very limited availability due to a paper deadline.

sebhrusen · 2022-09-14T11:09:46Z

@PGijsbers sunny holidays right now: will look at it when I'm back next week.

* fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access

Innixma · 2022-09-21T21:36:54Z

Update: Several of the TODO / FIXME comments have been addressed by @limpbot in Innixma#6

Innixma · 2022-09-21T21:43:11Z

Code example:

python3 runbenchmark.py autogluonts ts test

Log output:

Running benchmark `autogluonts` on `ts` framework in `local` mode.
Loading frameworks definitions from ['/Users/neerick/workspace/code/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/Users/neerick/workspace/code/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /Users/neerick/workspace/code/automlbenchmark/resources/benchmarks/ts.yaml.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 21.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 64.3%

-----------------------------------------------
Starting job local.ts.test.covid.0.AutoGluonTS.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
Assigning 4 cores (total=12) for new task covid.
Assigning 3803 MB (total=16384 MB) for new covid task.
Using training set /Users/neerick/.openml/train.csv with test set /Users/neerick/.openml/test.csv.
Running task covid on framework AutoGluonTS with config:
TaskConfig({'framework': 'AutoGluonTS', 'framework_params': {}, 'framework_version': '0.5.2', 'type': 'timeseries', 'name': 'covid', 'fold': 0, 'metric': 'mase', 'metrics': ['mase', 'mape', 'smape', 'rmse', 'mse', 'nrmse', 'wape', 'ncrps'], 'seed': 949238273, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 4, 'max_mem_size_mb': 3803, 'min_vol_size_mb': -1, 'input_dir': '/Users/neerick/.openml', 'output_dir': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514', 'output_predictions_file': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv', 'ext': {}, 'type_': 'timeseries', 'output_metadata_file': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/metadata.json'})
Running cmd `/Users/neerick/workspace/code/automlbenchmark/frameworks/AutoGluonTS/venv/bin/python -W ignore /Users/neerick/workspace/code/automlbenchmark/frameworks/AutoGluonTS/exec.py`

**** AutoGluon TimeSeries [v0.5.2] ****

Warning: path already exists! This predictor may overwrite an existing predictor! path="/var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi/"
Learner random seed set to 0
================ TimeSeriesPredictor ================
TimeSeriesPredictor.fit() called
Fitting with arguments:
{'evaluation_metric': 'MASE',
 'hyperparameter_tune_kwargs': None,
 'hyperparameters': 'default',
 'prediction_length': 30,
 'target_column': 'ConfirmedCases',
 'time_limit': 600}
Provided training data set with 22536 rows, 313 items. Average time series length is 72.0.
Training artifacts will be saved to: /private/var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi
=====================================================
Validation data is None, will hold the last prediction_length 30 time steps out to use as validation set.
AutoGluon will save models to /var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi/

Starting training. Start time is 2022-09-21 09:25:33
Models that will be trained: ['AutoETS', 'ARIMA', 'SimpleFeedForward', 'DeepAR', 'Transformer']
Training timeseries model AutoETS. Training for up to 599.36s of the 599.36s of remaining time.
        -4261.6502    = Validation score (-MASE)
        7.06    s     = Training runtime
        23.90   s     = Validation (prediction) runtime
Training timeseries model ARIMA. Training for up to 568.20s of the 568.20s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 22.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 64.1%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
        -4291.2952    = Validation score (-MASE)
        36.87   s     = Training runtime
        49.88   s     = Validation (prediction) runtime
Training timeseries model SimpleFeedForward. Training for up to 480.49s of the 480.49s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 29.3%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 66.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
        -4319.9065    = Validation score (-MASE)
        100.00  s     = Training runtime
        2.43    s     = Validation (prediction) runtime
Training timeseries model DeepAR. Training for up to 378.04s of the 378.04s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 19.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 66.3%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 14.5%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 65.1%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.8%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 65.2%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
        -4332.0235    = Validation score (-MASE)
        380.97  s     = Training runtime
        10.45   s     = Validation (prediction) runtime
Stopping training due to lack of time remaining. Time left: -13.39 seconds
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 68.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
Fitting simple weighted ensemble.
        -4261.6502    = Validation score (-MASE)
        138.62  s     = Training runtime
        23.90   s     = Validation (prediction) runtime
Training complete. Models trained: ['AutoETS', 'ARIMA', 'SimpleFeedForward', 'DeepAR', 'WeightedEnsemble']
Total runtime: 816.54 s
Best model: AutoETS
Best model score: -4261.6502
Model not specified in predict, will default to the model with the best validation score: AutoETS
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 14.2%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 60.5%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
                              mean        0.1  ...        0.8        0.9
item_id      timestamp                         ...                      
Afghanistan_ 2020-03-23  43.673204  40.929207  ...  45.475244  46.417202
             2020-03-24  47.477861  43.269943  ...  50.241288  51.685780
             2020-03-25  51.282519  45.705039  ...  54.945364  56.859998
             2020-03-26  55.087176  48.146160  ...  59.645483  62.028192
             2020-03-27  58.891833  50.563691  ...  64.361095  67.219975
...                            ...        ...  ...        ...        ...
Zimbabwe_    2020-04-17  16.572826   8.359642  ...  21.966592  24.786010
             2020-04-18  17.094855   8.458588  ...  22.766468  25.731121
             2020-04-19  17.616884   8.550552  ...  23.570930  26.683216
             2020-04-20  18.138913   8.635642  ...  24.379906  27.642183
             2020-04-21  18.660942   8.713965  ...  25.193326  28.607919

[9390 rows x 10 columns]
[43.67320426 47.47786141 51.28251855 ... 17.61688379 18.13891286
 18.66094192]
[40. 74. 84. ... 25. 25. 28.]
Additional data provided, testing on additional data. Resulting leaderboard will be sorted according to test score (`score_test`).
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
Different set of items than those provided during training were provided for prediction. The model ARIMA will be re-trained on newly provided data
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.4%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 62.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
               model  score_test    score_val  pred_time_test  pred_time_val  fit_time_marginal  fit_order
0   WeightedEnsemble -444.037098 -4261.650234       29.051365      23.899191         138.624613          5
1            AutoETS -444.037098 -4261.650234       25.333331      23.899191           7.057499          1
2              ARIMA -475.878400 -4291.295201       51.673333      49.880237          36.868394          2
3  SimpleFeedForward -526.892250 -4319.906528        1.442273       2.432864          99.998205          3
4             DeepAR -591.905430 -4332.023525        9.755238      10.447713         380.970382          4
Terminating process psutil.Process(pid=25939, name='Python', status='running', started='09:27:32').
Killing process psutil.Process(pid=25939, name='Python', status='running', started='09:27:32').
Early stopping based on learning rate scheduler callback (min_lr was reached).
Traceback (most recent call last):

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py", line 201, in main

    cache[rtype].remove(name)

KeyError: '/loky-25767-o7csihc6'



Predictions preview:
     predictions  truth        0.1        0.2         0.3         0.4         0.5         0.6         0.7         0.8         0.9  y_past_period_error
0     43.673204   40.0  40.929207  41.871165   42.550383   43.130749   43.673204   44.215659   44.796026   45.475244   46.417202             0.666667
1     47.477861   74.0  43.269943  44.714435   45.756015   46.646007   47.477861   48.309715   49.199707   50.241288   51.685780             0.666667
2     51.282519   84.0  45.705039  47.619673   49.000259   50.179919   51.282519   52.385118   53.564778   54.945364   56.859998             0.666667
3     55.087176   94.0  48.146160  50.528868   52.246968   53.715022   55.087176   56.459330   57.927383   59.645483   62.028192             0.666667
4     58.891833  110.0  50.563691  53.422571   55.484025   57.245461   58.891833   60.538205   62.299641   64.361095   67.219975             0.666667
5     62.696490  110.0  52.944972  56.292468   58.706248   60.768734   62.696490   64.624246   66.686732   69.100512   72.448007             0.666667
6     66.501147  120.0  55.284052  59.134650   61.911203   64.283664   66.501147   68.718630   71.091092   73.867644   77.718243             0.666667
7     70.305804  170.0  57.578101  61.947260   65.097731   67.789693   70.305804   72.821916   75.513877   78.664349   83.033508             0.666667
8     74.110461  174.0  59.825896  64.729494   68.265333   71.286577   74.110461   76.934346   79.955590   83.491429   88.395027             0.666667
9     77.915119  237.0  62.027091  67.481124   71.413866   74.774249   77.915119   81.055988   84.416371   88.349113   93.803146             0.666667
10    81.719776  273.0  64.181835  70.202250   74.543392   78.252739   81.719776   85.186813   88.896159   93.237302   99.257717             0.666667
11    85.524433  281.0  66.290563  72.893156   77.654089   81.722132   85.524433   89.326734   93.394776   98.155710  104.758302             0.666667
12    89.329090  299.0  68.353876  75.554236   80.746202   85.182546   89.329090   93.475634   97.911978  103.103944  110.304304             0.666667
13    93.133747  349.0  70.372463  78.185944   83.820014   88.634119   93.133747   97.633375  102.447480  108.081550  115.895032             0.666667
14    96.938404  367.0  72.347060  80.788763   86.875826   92.076996   96.938404  101.799813  107.000983  113.088045  121.529748             0.666667
15   100.743061  423.0  74.278423  83.363190   89.913946   95.511325  100.743061  105.974797  111.572177  118.122933  127.207700             0.666667
16   104.547719  444.0  76.167305  85.909718   92.934683   98.937257  104.547719  110.158180  116.160754  123.185719  132.928132             0.666667
17   108.352376  484.0  78.014453  88.428839   95.938344  102.354939  108.352376  114.349813  120.766408  128.275912  138.690298             0.666667
18   112.157033  521.0  79.820595  90.921030   98.925224  105.764514  112.157033  118.549552  125.388841  133.393036  144.493471             0.666667
19   115.961690  555.0  81.586437  93.386755  101.895615  109.166122  115.961690  122.757258  130.027765  138.536625  150.336943             0.666667

Predictions saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv`.
Loading metadata from `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/metadata.json`.
fatal: not a git repository (or any of the parent directories): .git

Loading predictions from `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv`.
Metric scores: { 'app_version': 'dev [NA, NA, NA]',
  'constraint': 'test',
  'duration': nan,
  'fold': 0,
  'framework': 'AutoGluonTS',
  'id': 'covid',
  'info': None,
  'mape': 0.47176599878084985,
  'mase': 444.03709806992947,
  'metric': 'neg_mase',
  'mode': 'local',
  'models_count': 5,
  'mse': 66512955.519554704,
  'ncrps': 3.7180833818575727,
  'nrmse': 1.8264137433841354,
  'params': '',
  'predict_duration': 24.022034168243408,
  'result': -444.03709806992947,
  'rmse': 8155.547530335085,
  'seed': 949238273,
  'smape': 0.6795078347334532,
  'task': 'covid',
  'training_duration': 817.2716138362885,
  'type': 'timeseries',
  'utc': '2022-09-21T16:41:38',
  'version': '0.5.2',
  'wape': 0.4395118445505013}
Job `local.ts.test.covid.0.AutoGluonTS` executed in 984.191 seconds.
All jobs executed in 984.232 seconds.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 16.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 59.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
Processing results for autogluonts.ts.test.local.20220921T162514
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/scores/AutoGluonTS.benchmark_ts.csv`.
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/scores/results.csv`.
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/results.csv`.
Summing up scores for current run:
   id  task fold   framework constraint   result   metric  duration      seed
covid covid    0 AutoGluonTS       test -444.037 neg_mase     984.2 949238273

Innixma · 2022-09-29T23:56:25Z

@sebhrusen Sorry to ping but would you be interested in reviewing this PR? A large chunk of the logic was written by @limpbot who is interning with us currently, and it would be great if he received feedback so as not to block his time-series benchmarking efforts.

sebhrusen · 2022-09-30T18:19:08Z

@Innixma I'm looking at it now and will make a full review before Monday.
Outside implementation details/modularity, I mainly want to be sure that it is not designed to first satisfy AG's timeseries implementation and can be generalized to other implementations (would be nice to have an alternative implementation): for now to satisfy your needs, I'll mainly ensure that the changes are limited to data loading + AG implementation as much as possible.

Innixma · 2022-09-30T21:20:40Z

Sounds good! I agree that we should make sure the input/ouput/scoring definitions are generic and not AG specific. perhaps the AutoPyTorch-TimeSeries folks (@dengdifan) would be interesting in reviewing / trying to add on their AutoML system as a framework extension to this logic?

sebhrusen

thanks for this contribution, this is a first feedback.
I'm good with most of the file loading logic, and the added metrics.
For the "middle" layers like benchmark and results transformations, I'd like to avoid changes as much as possible there as they look more ad-hoc.

Also, please use the AutoGluon framework instead of this new one, they don't seem to be different enough to require a completely different setup especially.

sebhrusen · 2022-09-30T18:15:36Z

frameworks/AutoGluonTS/__init__.py

+    if hasattr(dataset, 'timestamp_column') is False:
+        dataset.timestamp_column = None


for this one and below

Suggested change

if hasattr(dataset, 'timestamp_column') is False:

dataset.timestamp_column = None

if not hasattr(dataset, 'timestamp_column'):

dataset.timestamp_column = None

sebhrusen · 2022-10-03T14:31:24Z

amlb/results.py

@@ -255,7 +259,8 @@ def save_predictions(dataset: Dataset, output_file: str,
                         predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None,
                         probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None,
                         target_is_encoded: bool = False,
-                         preview: bool = True):
+                         preview: bool = True,
+                         quantiles: Union[A, DF] = None):


nitpick: let's try to group the params functionally, makes it easier to read and understand params. Here quantiles has a function similar to probabilities.

sebhrusen · 2022-10-03T14:35:56Z

amlb/results.py

+                if  'y_past_period_error' in df.columns:
+                    return TimeSeriesResult(df)


please don't bypass test mode by adding your own test block: it should remain the first check and also be applied for time series. Not asking you to add the test dataset in our workflow right now, but we will need to add this soon after your changes.

sebhrusen · 2022-10-03T16:25:05Z

resources/benchmarks/ts.yaml

@@ -0,0 +1,15 @@
+---


please rename file to timeseries.yaml : explicit is good

sebhrusen · 2022-10-03T16:29:46Z

resources/benchmarks/ts.yaml

+    # s3://autogluon-ts-bench/data/covid_deaths/csv/test.csv  |  https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv
+    target: ConfirmedCases              # target    | ConfirmedCases
+    type: timeseries
+    prediction_length: 30


what is the length unit?
30 entries? days? hours?
if this is number of entries, then please rename to num_predictions to avoid confusion.
otherwise, please allow a unit:
prediction_length: 30d #provide predictions over the next 30 days, accept d (days), m (months), y (years)...

it is the number of predictions per sequence. So 'num_predictions_per_id' sounds good?

so as predictions is a more general term, I suppose forecast_range_in_steps is sufficiently explicit?

sebhrusen · 2022-10-03T18:22:28Z

amlb/results.py

+
+        if dataset.type == DatasetType.timeseries:
+            if quantiles is not None:
+                quantiles = quantiles.reset_index(drop=True)
+                df = pd.concat([df, quantiles], axis=1)
+
+            period_length = 1 # TODO: This period length could be adapted to the Dataset, but then we need to pass this information as well. As of now this works.
+
+            # we aim to calculate the mean period error from the past for each sequence: 1/N sum_{i=1}^N |x(t_i) - x(t_i - T)|
+            # 1. retrieve item_ids for each sequence/item
+            item_ids, inverse_item_ids = np.unique(dataset.test.X[dataset.id_column].squeeze().to_numpy(), return_index=False, return_inverse=True)
+            # 2. capture sequences in a list
+            y_past = [dataset.test.y.squeeze().to_numpy()[inverse_item_ids == i][:-dataset.prediction_length] for i in range(len(item_ids))]
+            # 3. calculate period error per sequence
+            y_past_period_error = [np.abs(y_past_item[period_length:] - y_past_item[:-period_length]).mean() for y_past_item in y_past]
+            # 4. repeat period error for each sequence, to save one for each element
+            y_past_period_error_rep = np.repeat(y_past_period_error, dataset.prediction_length)
+            df = df.assign(y_past_period_error=y_past_period_error_rep)


I'd rather not have this here, this looks like a lot of calculations + assumptions (apparently you can't have time series without an id_column) for a method that just supposed to be save predictions into a standard format. Even more as this y_past_period_error seems to be useful only for the mase metric, therefore, either you can compute it with the metric or you compute it before (in AG framework integration).

For now, I'd move your computations to the __init__.py or exec.py file, and simply ensure that we can customize the result by adding optional columns (in this case, this includes both quantiles and your additional results).

Suggestion:
change signature to

def save_predictions(dataset: Dataset, output_file: str, predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None, probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None, optional_columns: Union[A, DF] = None, target_is_encoded: bool = False, preview: bool = True):

and automatically concatenate the optional_columns to the predictions if provided. For now, you should be able to generate those in exec.py

sebhrusen · 2022-10-03T18:28:14Z

frameworks/AutoGluonTS/setup.sh

@@ -0,0 +1,36 @@
+#!/usr/bin/env bash


it is the same setup as for default AutoGluon, right?
Why create another framework then? It adds a lot of complexity regarding testing, distribution, docker images and so on...
In the AutoGluon.__init__.py you could just fork the logic like this:

exec_file = "exec_ts.py" if dataset.type is DatasetType.timeseries else "exec.py" return run_in_venv(__file__, exec_file, input_data=data, dataset=dataset, config=config)

One difference @limpbot, I install MXNet in addition to the other normal dependencies for TimeSeries, since it isn't a default install. We can simply install MXNet by default for now, hopefully it won't cause issues.

@sebhrusen The one concern is if AutoGluon becomes too monolithic of an install, we may want to consider having separate install logic for the different submodules that are unrelated to eachother (for example, timeseries doesn't need vision and text modules, tabular doesn't need timeseries module). Probably not needed now, but something to keep in mind since AutoGluon covers more data types/domains than most AutoML systems and that comes with many dependencies.

@Innixma I understand the concern.
for better encapsulation and to allow you to reuse code easily in benchmarks when using different submodules, I'd still advise to keep a single AutoGluon folder.
You can then provide different flavors for the setup just in the framework definition using the setup_env syntax:

AutoGluon_TimeSeries: extends: AutoGluon setup_env: MODULE: timeseries VAR: string

this makes the 2 variables MODULE and VAR directly available in setup.sh (right after the call to . ${HERE}/../shared/setup.sh ${HERE} true) and allows you to customize the setup: you may already be using this for dev environment.

Also, thanks to the definitions hierarchy (extends: AutoGluon), maybe we can later tweak the results to make it appear as just AutoGluon, or we can add a notion of group, whatever...

This may not be perfect when you switch frequently between definitions, but for now, I'd like to keep the framework folders to a minimum.

I agree with you that we probably need to start thinking about distinguishing the setup/exec for different kind of tasks, ideally it should be smooth and not even require the additional definition above: for a given type of task, the framework should be able to tell early if it can handle it, if not it could try to apply some additional setup before replying, and if it's ready handle it, then it continues as before: but all of this is much easier to change if there's already one single folder in the first place.
I can create an issue for this, although I don't have huge time to dedicate to AMLB lately, but this may change in a couple of months.

sounds good to me, how can I access the variables MODULE and VAR?

sebhrusen · 2022-10-03T18:33:44Z

resources/frameworks.yaml

+####################################
+### TimeSeries AutoML frameworks ###
+####################################

+AutoGluonTS:
+  version: "stable"
+  description: |
+    AutoGluon-TimeSeries
+  project: https://auto.gluon.ai


Ideally, we don't want to have a new framework if it supports only a specific kind of dataset. See my comment above.
If we start to have one framework for regression, one for classification, one for time series, one for anomaly detection and so on... then if becomes hard to compare "AutoML" frameworks.

sebhrusen · 2022-10-03T18:34:24Z

amlb/benchmark.py

+            if self._task_def.dataset['type'] == 'timeseries' and self._task_def.dataset['timestamp_column'] is None:
+                log.warning("Warning: For timeseries task setting undefined timestamp column to `timestamp`.")
+                self._task_def.dataset['timestamp_column'] = "timestamp"
+            self._dataset = Benchmark.data_loader.load(DataSourceType.file, dataset=self._task_def.dataset, fold=self.fold, timestamp_column=self._task_def.dataset['timestamp_column'])
+            if self._dataset.type == DatasetType.timeseries:
+                if self._task_def.dataset['id_column'] is None:
+                    log.warning("Warning: For timeseries task setting undefined itemid column to `item_id`.")
+                    self._task_def.dataset['id_column'] = "item_id"
+                if self._task_def.dataset['prediction_length'] is None:
+                    log.warning("Warning: For timeseries task setting undefined prediction length to `1`.")
+                    self._task_def.dataset['prediction_length'] = "1"
+                self._dataset.timestamp_column=self._task_def.dataset['timestamp_column']
+                self._dataset.id_column=self._task_def.dataset['id_column']
+                self._dataset.prediction_length=self._task_def.dataset['prediction_length']


looks like most of this logic could reside in the loading logic itself as this is dealing with information available in self._task_def.dataset which is directly available to the file loader.
I'd move the logic to dataset/file.py for now to minimize scope of changes.

Okay, so you want me to extend the FileDataset or the CsvDataset?

I think you can extract this logic in a dedicated method in file.py for clarity (it's just mutating dataset after all), and if you just support CVS right now, then please apply it only there.

so I added it to a dedicated method in file.py inside the FileLoader class.

sebhrusen · 2022-10-03T18:36:15Z

amlb/datasets/file.py

@@ -30,7 +30,7 @@ def __init__(self, cache_dir=None):
        self._cache_dir = cache_dir if cache_dir else tempfile.mkdtemp(prefix='amlb_cache')

    @profile(logger=log)
-    def load(self, dataset, fold=0):
+    def load(self, dataset, fold=0, timestamp_column=None):


you obtained this new column using

timestamp_column=self._task_def.dataset['timestamp_column']

so you already have the information in the dataset object

Innixma · 2022-10-03T23:46:06Z

Thanks @sebhrusen for the detailed review!

@limpbot would you like to have a go at addressing some of the comments? Feel free to send a PR to my branch as you did in your prior update.

* fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit

Innixma · 2022-10-06T18:45:36Z

I merged @limpbot's changes into this branch via his PR: Innixma#7

@sebhrusen The branch should be ready for 2nd round of review.

sebhrusen

The changes to the core logic are much smaller now, which is what I mostly care about for this feature in its current state and scope as I don't want to prevent you from moving forward.

I think it will be interesting for us (cc: @PGijsbers) to start thinking about supporting new kind of tasks, and see how we can integrate this smoothly (mixins after restructuring code?). Maybe even some kind of plugin logic (I have a PoC PR allowing the user to plug custom code in various places, mainly thinking about data loading, result metrics, and whatever the framework may need.
If you have any idea on your side, feel free to make suggestions in the https://github.com/openml/automlbenchmark/discussions or contribute directly.

Thanks a lot for this @limpbot and @Innixma

sebhrusen · 2022-10-06T19:53:32Z

frameworks/AutoGluon/setup.sh

+if [[ ${MODULE} == "timeseries" ]]; then
+    PY -c "from autogluon.tabular.version import __version__; print(__version__)" >> "${HERE}/.setup/installed"
+    # TODO: GPU version install
+    PIP install "mxnet<2.0"
+else
+    PY -c "from autogluon.timeseries.version import __version__; print(__version__)" >> "${HERE}/.setup/installed"
+fi


I think you swapped tabular.version and timeseries.version here.
Can the versions actually be different?

Oh yes, that got mixed-up. I don't think the versions should ever differ, but to be safe I will correct it and add it in a future pull request. Thank you for the reviews and merge @sebhrusen!

default autogluon setup looks broken: see https://github.com/openml/automlbenchmark/actions/runs/3199648554/jobs/5225661120

apparently, the forecasting/timeseries module being always installed currently, the mxnet dependency is always required

I can merge only once the default setup works

okay thanks for the pointer, most likely because of the mixed-up version call. I am taking a look at it.

…series modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions

Innixma · 2022-10-07T22:20:03Z

Thanks @sebhrusen for the detailed review! @limpbot has addressed some final comments in the latest update, which should also fix the autogluon.tabular error you mentioned.

PGijsbers · 2022-10-11T09:07:53Z

I think it will be interesting for us (cc: @PGijsbers) to start thinking about supporting new kind of tasks

I missed the "mention" ping (just thought it said a "subscribed"), sorry I didn't check earlier. Definitely, I want to first wait for the JMLR reviews and finish "that part of the project", but creating a more flexible environment for people to add new types of tasks would be a great next thing that invites more people to use (and extend) the benchmark tool.

Thanks Innixma and Limpbot for your contribution 🎉

* Add AutoGluon TimeSeries Prototype * AutoMLBenchmark TimeSeries Prototype. (#6) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * Update readme * Autogluon timeseries, addressed comments by sebhrusen (#7) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions Co-authored-by: Leo <LeonhardSommer96@gmail.com>

* Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400) * Update version to 2.0 * Revert "Update version to 2.0" This reverts commit 9e0791a. * Fix/docker tag (#404) * Add the version tag to the image name if present * Fix casing for MLNet framework definition * Sync stable-v2 and master (#407) * Update version to 2.0.2 * Revert version change * Add support for the OpenML test server (#423) * Add support for the OpenML test server * change domain from openmltestserver to test.openml * update error message * Apply suggestions from code review Co-authored-by: seb. <sebastien@h2o.ai> * fix syntax error due to online merging Co-authored-by: seb. <sebastien@h2o.ai> * Switch from release:created to release:published (#429) * Added support for dataset files stored on s3 (#420) * s3 functionality * Update amlb/datasets/fileutils.py Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl> * OOD * add s3n * move boto3 import Co-authored-by: Weisu Yin <weisuyin96@gmail.com> Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl> * Respect TMP, TMPDIR, TEMP (#442) * Respect tmpdir * Fixed submodule * feat: retain environment vars for framework venv * minor fix on compatibility (#454) Co-authored-by: Qingyun Wu <qxw5138@psu.edu> * Ignore decoding errors on Windows (#459) By default it can use cp1252 decoding which sometimes raises an error and halts the process. * Fix a typo (#462) will used -> will be used * Merge back stable-v2 to master (#472) * Add `stable` tag workflow, bump auto-sklearn (#401) * Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400) * Fix/docker tag (#404) * Add the version tag to the image name if present * Fix casing for MLNet framework definition * Changed latest from master to main * Update version to 2.0.1 * Improv/aws meta (#413) * Add volume meta data to aws meta info * Add constraints for v2 benchmark (#415) * Add constraints for v2 benchmark For ease of reproducibility, we want to include our experimental setup in the constraints file. For our experiments we increase the volume size to 100gb and require gp3 volumes (general purpose SSD). * Update version to 2.0.2 * Fix AWS random cancel issue (#422) * let the job runner handle the rescheduling logic to ensure that the job is always can't be acted upon by current worker after being rescheduled * remove commented code * Add a GAMA configuration intended for benchmarking (#426) Made the previous version abstract to avoid accidentally running the wrong version of GAMA for the benchmark. * Unsparsify target variables for (Tuned)RF (#425) * Unsparsify target variables for (Tuned)RF Sparse targets are not supported in scikit-learn 0.24.2, and are used with tasks 360932 and 360933 (QSAR) in the benchmark. * cosmetic change to make de/serialization easier to debug Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * ensure that openml is configured when loading the tasks (#427) * Expect a possible `NoSuchProcess` error (#428) Since it's entirely possible that the processes were already terminating, but only completed termination between the process.children call and the proc.terminate/kill calls. * Reset version for versioning workflow * Update version to 2.0.3 * ensure that the docker images can be built from linux (#437) * Avoid querying terminated instance with CloudWatch (#438) * fixes #432 add precision to runtimes in results.csv (#433) * fixes #432 add precision to runtimes in results.csv * Update amlb/results.py Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: seb. <sebastien@h2o.ai> * Iteratively build the forest to honor constraints (#439) * Iteratively build the forest to honor constraints In particular depending on the dataset size either memory or time constraints can become a problem which makes it unreliable as a baseline. Gradually growing the forest sidesteps both issues. * Make iterative fit default, parameterize execution * Step_size as script parameter, safer check if done When final_forest_size is not an exact multiple of step_size, randomforest should still terminate. Additionally step_size is escaped with an underscore as it is not a RandomForestEstimator hyperparameter. * Iterative fit for TunedRandomForest to meet memory and time constraints (#441) * Iterative fit to meet memory and time constraints Specifically for each value of `max_features` to try, an equal time budget is alloted, with one additional budget being reserved for the final fit. This does mean that different `max_features` can lead to different number of trees, but it keeps it simple. * Abort tuning when close to total time budget The first fit of each iterative fit for a `max_features` value was not guarded, which can lead to exceeding the total time budget. This adds a check before the first fit to estimate whether the budget will be exceeded, and if so aborting further tuning and continue with the final fit. * Make k_folds configurable * Add scikit-learn code with explanation * Modify cross_validate, allow 1 estimator per split This is useful when we maintain a warm_started model for each individual split. * Use custom cv function to allow warm-start By default estimators are cloned in any scikit-learn cross_validate function (which stops warm-start) and it is not possible to specify a specific estimator-object per fold (which stops warm-start). The added custom_validate module makes changes to the scikit-learn code to allow warm-starting to work in conjunction with the cross-validate functionality. For more info see scikit-learn#22044 and scikit-learn#22087. * Add parameter to set tune time, rest is for fit The previous iteration where the final fit was treated as an equivalent budget to any other optimization sometimes left too little time to train the final forest, in particular when the last fit took longer than expected. This would often lead to very small forests for the final model. The new system guarantees roughly 10% of budget for the final forest, guaranteeing a better final fit. * Revert version to _dev_version to prepare release (#444) * Update version to 2.0.4 * Signal to encode predictions as proba now works (#447) In a previous iteration it was encoded as a numpy file, but now it's serialized to JSON which means that results.probabilities is simply a string if imputation is required. * Monkeypatch openml to keep whitespace in features (#446) Technically monkeypatch xmltodict function used by openml when reading the features xml * fixe for mlr3automl (#443) * Reset version for Github workflow (#448) * Update version to 2.0.5 * Update mlr3automl to latest Was supposed to be included with #443 * Update MLR3 (#461) * Reset version for version bump * Updatet version because GA failed * Issue 416: fixing versioning workflow for releases and merges to master (#468) * change workflow to correctly modify the app version on releases and when forcing merged version back to master * protect main branch from accidental releases * fix stress test Co-authored-by: PGijsbers <p.gijsbers@tue.nl> Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Erin LeDell <erin@h2o.ai> Co-authored-by: Stefan Coors <stefan.coors@gmx.net> * useless workflow reintroduced during merge (#475) * tag all AWS entities (#469) * fixed parsing of int targets when loading file in CSV format (#467) * Avoid root owned files from docker (#464) * New site (#479) * First draft of new website * Add framework descriptions, papers and logos * Update footer with Github link * Remove under construction banner * Add redirect from old page to new one * Update page title * Add text links to new paper to be added later * Move static site to /docs * Whitelist documentation images * Remove temporary work directory * Add documentation images * Place holder for mobile * Move old notebooks and visualizations To make sure they are not confusing for new users, as these will no longer work out-of-the-box. New notebooks will be added soon but I don't have the files available right now. * Tell github this is not Jekyll * Update minimal responsiveness (#480) * Make results responsive (hacky) * Make Frameworks page more responsive * Make Home more responsive * Bare minimum mobile navbar * Make sure phones report fake width * Link to arxiv paper (#481) * Update to support AutoGluon v0.4 (#455) * Update to support AutoGluon v0.4 * Address comments * Updated setup.py for `hyperoptsklearn` as it no longer uses PyPi (also now accepts shas) (#410) * Updated hyper opt not to use PyPi and accept shas * case-sensitive PIP command in setup Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * AutoGluon TimeSeries Support (first version) (#494) * Add AutoGluon TimeSeries Prototype * AutoMLBenchmark TimeSeries Prototype. (#6) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * Update readme * Autogluon timeseries, addressed comments by sebhrusen (#7) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions Co-authored-by: Leo <LeonhardSommer96@gmail.com> * Add workflow to manually run `runbenchmark.py` on Github Actions (#516) * Add workflow for manually running a test benchmark * Use built-in context for getting the branch * Add more info to step names * Add ability to specify options * Fixed user and sudo under docker (#495) * Fixed user and sudo under docker * Reverted format * Update docker.py * Addressing #497 #497 * Keep wget quiet * Use :, . is deprecated Co-authored-by: seb. <sebastien@h2o.ai> * Set username and userid in Dockerfile generation * Install HDF5 to Docker for tables * Avoid using unix-specific workarounds on Windows * Re-enable caching for building docker images --------- Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: PGijsbers <p.gijsbers@tue.nl> * [no-ci] Fix broken link (#514) * Remove autoxgboost, add `removed` field for frameworks (#519) * Add redirect for dataset page (#521) * Upgrade Python version and dependencies (#520) * Remove usage of np.float alias and just use float * Bump to Py3.9 * Update requirements for March 2023, Py3.9 * Pin packaging, since LegacyVersion was removed. Also remove scipy pin, since later autosklearn needs higher scipy. * Install packages to ranger/lib * Set secret PAT used when installing with R remotes Specifically for mlr3automl integration * Update usage for oct 21 release * Disable custom installed packages * Remove installation of reqiurements altogether * Insert oboe example * Add monkeypatch * Make error matrix numpy array * Upgrade to Ubuntu 22.04 from 18.04 * Update pip cache to look at 3.9 directory * Add Github PAT to run_all_frameworks script * bump github action versions * Adding tarfile member sanitization to extractall() (#508) * Included lightautoml in frameworks_stable (#412) * Included lightautoml in frameworks_stable * Added MLNet to frameworks_latest * Added mlr3 to both stable and latest * copy/paste fix * Remove travis file (#529) * Remove travis file since it is not used * Update readme to reflect Python 3.9 support * Add github action workflow to replace old travis file * Add job id, improve name * Fix bug where task inference would lead to KeyError * Update type data for new openml/pandas Probably ought to remove the specific check if we don't enforce it. * Write numeric categories as str, see renatopp/liac-arff/issues/126 * [Open for review] Store results after each job completion (#526) * ensure that results are solved progressively in all situations instead of only when all jobs are completed * rename config flag * don't forget to cleanup job runner exec thread * Improve type hints * Adding file lock on global results file (#453) * adding file lock on global results file * fix imports * fix amlb.utils export * cosmetic * clranup util imports (also magic strings) + remove ruamel dependency in subprocesses --------- Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * Update the requirements files to exclude yaml and include filelock The remainder of dependencies are not re-generated to avoid additional changes in the PR. * Add missing import * Add fallback for when job is not started * Return an empty dataframe if dataframe is empty This avoids a bug where an empty dataframe is indexed. * Inform the user result summary is not available in AWS mode As results are processed in a different manner (files are directly copied over from S3). This avoids a bug where a benchmark results.csv file tries to be accessed. * Separate scoreboard generation to two lines instead Which makes it easier to tell which part of the generation generates an error, if any. * re-enable logging * Provide a warning and return early if no process output is detected This avoids potentially crashing if the logging is configured incorrectly. In the future, we should expand this to first check how logging is configured in order to see whether or not the issue should be reported and possibly give a more detailed warning if it is likely the cause of an error. --------- Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> Co-authored-by: seb <sebastien.poirier@h2o.ai> * maint: upgrade AMI to Ubuntu 22.04 #512 (#525) * Add `flaml_benchmark` (#528) * dont discard setup_args if it already is a list * Add flaml and flaml_benchmark It is not added to latest since install from latest seems to be broken * Set up alternative way for benchmark mode of flaml This is only temporarily allowed - we expect an easily configurable algorithm, instead of having to carefully install specific dependencies. * limit install, since >2 incompatible * Measure inference time (#532) Add the option to measure inference time (disabled by default) for most frameworks. For those frameworks, inference time is measured capturing both the data loading and the inference. This is done to make things more equal between the different frameworks (as some _need_ to read the file if they don't operator in Python). Inference time is measured multiple times for different batch sizes (configurable). By default, the median is reported in the results file (as it is less sensitive to e.g., cold-starts) but all measured inference times are stored in the predictions folder of a run. For Python frameworks, inference time for in-memory single row predictions is also measured. * Upload to OpenML (#523) Adds a script that allows uploading run results to openml. Additional metadata is stored in the task information to be able to provide a complete description for openml upload. Additional parameters are added to `run_benchmark` to allow runs to automatically be tagged, and to connect to the test server. Also fixes TPOT integration for newer versions, where if a model has no `predict_proba` an `AttributeError` is raised instead of a `RuntimeError`. * Fix a race condition of checking vs adding results (#535) Specifically, adding results was queued in a job executor, while checking results was directly called by the worker threads. If the worker thread checks before the executor had added results, it is possible to get into a deadlock condition. The deadlock arises from the fact that the `stop` condition is never called and the main thread will continue to wait for its END_Q signal. * Add scikit_safe inference time measurement files (#537) * Add scikit_safe inference time measurement files These files have categorical values numerically encoded and missing values imputed, which makes them usable for any scikit-learn algo. * Only generate inference measurement files if enabled * Optionally limit inference time measurements by dataset size (#538) * Add versions 2023 q2 (#539) * Fix versions for June 2023 benchmark * Add 2023Q2 framework tag * Use encoded values for inference * Add us-east-2 AMI * Run docker as root on AWS * Add option to add build options for docker build command * Remove 'infer_speed' artifact as it is not supported in main repo * Fix pandas 2 not compatible with autosklearn 2 see askl#1672 --------- Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: Weisu Yin <weisy@amazon.com> Co-authored-by: Weisu Yin <weisuyin96@gmail.com> Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> Co-authored-by: Qingyun Wu <qw2ky@virginia.edu> Co-authored-by: Qingyun Wu <qxw5138@psu.edu> Co-authored-by: Robinnibor <robinksskss@gmail.com> Co-authored-by: Erin LeDell <erin@h2o.ai> Co-authored-by: Stefan Coors <stefan.coors@gmx.net> Co-authored-by: Alan Silva <3899850+alanwilter@users.noreply.github.com> Co-authored-by: Nick Erickson <neerick@amazon.com> Co-authored-by: Leo <LeonhardSommer96@gmail.com> Co-authored-by: TrellixVulnTeam <112716341+TrellixVulnTeam@users.noreply.github.com> Co-authored-by: seb <sebastien.poirier@h2o.ai>

Add AutoGluon TimeSeries Prototype

e2f08df

PGijsbers marked this pull request as draft September 14, 2022 09:58

sebhrusen self-requested a review September 19, 2022 15:38

Update readme

3f53533

sebhrusen requested changes Oct 3, 2022

View reviewed changes

sebhrusen approved these changes Oct 6, 2022

View reviewed changes

sebhrusen marked this pull request as ready for review October 7, 2022 10:43

sebhrusen merged commit 4029472 into openml:master Oct 10, 2022

Lopa10ko mentioned this pull request Mar 6, 2024

Time series forecasting support for FEDOT #610

Open

		if hasattr(dataset, 'timestamp_column') is False:
		dataset.timestamp_column = None

		if 'y_past_period_error' in df.columns:
		return TimeSeriesResult(df)

[PoC] AutoGluon TimeSeries Prototype #494

[PoC] AutoGluon TimeSeries Prototype #494

Conversation

Innixma commented Sep 13, 2022 • edited Loading

Innixma commented Sep 13, 2022 • edited Loading

PGijsbers commented Sep 14, 2022

sebhrusen commented Sep 14, 2022

Innixma commented Sep 21, 2022

Innixma commented Sep 21, 2022

Innixma commented Sep 29, 2022

sebhrusen commented Sep 30, 2022 • edited Loading

Innixma commented Sep 30, 2022

sebhrusen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebhrusen Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebhrusen Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebhrusen Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma commented Oct 3, 2022

Innixma commented Oct 6, 2022

sebhrusen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebhrusen Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma commented Oct 7, 2022

PGijsbers commented Oct 11, 2022

Innixma commented Sep 13, 2022 •

edited

Loading

Innixma commented Sep 13, 2022 •

edited

Loading

sebhrusen commented Sep 30, 2022 •

edited

Loading

sebhrusen left a comment •

edited

Loading

sebhrusen Oct 3, 2022 •

edited

Loading

sebhrusen Oct 4, 2022 •

edited

Loading

sebhrusen Oct 3, 2022 •

edited

Loading

sebhrusen Oct 7, 2022 •

edited

Loading