Skip to content

Commit

Permalink
Merge unaligned-data into master (#282)
Browse files Browse the repository at this point in the history
  • Loading branch information
d-a-bunin committed Mar 26, 2024
1 parent 9903da1 commit a0e4cb2
Show file tree
Hide file tree
Showing 143 changed files with 12,569 additions and 3,566 deletions.
17 changes: 10 additions & 7 deletions .github/workflows/test.yml
Expand Up @@ -19,12 +19,15 @@ jobs:
with:
python-version: "3.10"

- name: Install Dependencies
- name: Install Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true

- name: Install dependencies
run: |
pip install poetry==1.4.0 # TODO: remove after poetry fix
poetry --version
poetry config virtualenvs.in-project true
poetry install -E style --no-root
poetry install -E style -vv
- name: Static Analysis
run: poetry run make lint
Expand Down Expand Up @@ -62,7 +65,7 @@ jobs:
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: |
poetry install -E "all tests" -vv
poetry install -E "all jupyter tests" -vv
- name: PyTest with sharding
run: |
Expand Down Expand Up @@ -101,7 +104,7 @@ jobs:

- name: Install dependencies
run: |
poetry install -E "all tests" -vv
poetry install -E "all jupyter tests" -vv
poetry run pip install "pandas${{ matrix.pandas-version }}"
- name: PyTest ("tsdataset transforms")
Expand Down
51 changes: 51 additions & 0 deletions CHANGELOG.md
Expand Up @@ -13,6 +13,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
-
-
- Add warning on trying to pass numeric timestamp if freq is not None and add `_cast_index_to_datetime` ([#214](https://github.com/etna-team/etna/pull/214))
-
-
-
-
-
-
- Add `infer_alignment`, `apply_alignment`, `make_timestamp_df` into `etna.dataset.utils` ([#256](https://github.com/etna-team/etna/pull/256))
-
-
-
- Add `TSDataset.create_from_misaligned` constructor ([#269](https://github.com/etna-team/etna/pull/269))
-
-
-
-
Expand All @@ -35,6 +48,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
-
-
- Add ignoring of integer timestamp as a feature into native DL models ([#210](https://github.com/etna-team/etna/pull/210))
- Update `pytorch_forecasting` models to handle integer timestamp ([#208](https://github.com/etna-team/etna/pull/208))
- Update `datasets` module to work with integer timestamp ([#146](https://github.com/etna-team/etna/pull/146))
-
-
- Add tests for `transform` on data with integer timestamp ([#153](https://github.com/etna-team/etna/pull/153))
- Add tests for `models` on data with integer timestamp ([#188](https://github.com/etna-team/etna/pull/188))
-
- Update `DateFlagsTransform`, `TimeFlagsTransform`, `HolidayTransform`, `SpecialDaysTransform`, `FourierTransform` to work with external timestamp ([#169](https://github.com/etna-team/etna/pull/169))
- Update `analysis` module to work with integer timestamp ([#161](https://github.com/etna-team/etna/pull/161))
-
- Update `StatsForecastARIMAModel`, `StatsForecastAutoARIMAModel`, `StatsForecastAutoCESModel`, `StatsForecastAutoETSModel`, `StatsForecastAutoThetaModel` to handle integer timestamp ([#197](https://github.com/etna-team/etna/pull/197))
- Update `MRMRFeatureSelectionTransform` to handle integer timestamp ([#164](https://github.com/etna-team/etna/pull/164))
-
- Update deseasonality transforms (`STLTransform`, `DeseasonalityTransform`) to handle integer timestamp ([#174](https://github.com/etna-team/etna/pull/174))
- Update `HoltModel`, `HoltWintersModel`, `SimpleExpSmoothingModel`, `SARIMAXModel`, `AutoARIMAModel` to handle integer timestamp ((#200)[https://github.com/etna-team/etna/pull/200])
- Update detrend transforms (`LinearTrendTransform`, `TheilSenTrendTransform`) to handle integer timestamp ([#163](https://github.com/etna-team/etna/pull/163))
- Update `ResampleWithDistributionTransform` to work with integer timestamp ([#165](https://github.com/etna-team/etna/pull/165))
-
- Update change point transforms (`ChangePointsSegmentationTransform`, `ChangePointsTrendTransform`, `ChangePointsLevelTransform`, `TrendTransform`) to handle integer timestamp ([#176](https://github.com/etna-team/etna/pull/176))
- Update `BATSModel`, `TBATSModel` models to work with integer timestamp ([#195](https://github.com/etna-team/etna/pull/195))
- Update `ProphetModel` to handle external timestamp ([#203](https://github.com/etna-team/etna/pull/203))
- Remove checking frequency in `timestamp_column` of `ProphetModel` ([#222](https://github.com/etna-team/etna/pull/222))
- Update `FourierTransform` to handle external datetime timestamp ([#223](https://github.com/etna-team/etna/pull/223))
- Update `FoldMask` to work with integer timestamp, in `validate_on_dataset` method add validation on presence of `FoldMask` parameters in `ts.index`, add tests for `FoldMask` ([#226](https://github.com/etna-team/etna/pull/226))
- Fix `FourierTransform` on integer index, add inference tests ([#230](https://github.com/etna-team/etna/pull/230))
- Update outliers transforms to handle integer timestamp ([#229](https://github.com/etna-team/etna/pull/229))
- Update pipelines to handle integer timestamp ([#241](https://github.com/etna-team/etna/pull/241))
- Add `timestamp_range` and refactor code with it ([#244](https://github.com/etna-team/etna/pull/244))
- Update CLI to handle integer timestamp ([#246](https://github.com/etna-team/etna/pull/246))
- Update `ExogShiftTransform` to handle integer timestamp ([#254](https://github.com/etna-team/etna/pull/254))
- Extend base `TSDataset` constructor to handle long format dataframes, update documentation and tutorials with this change ([#266](https://github.com/etna-team/etna/pull/266))
-
-
-
Expand All @@ -54,7 +99,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
-
-
- Prohibit empty list value and duplication of `target_timestamps` parameter in `FoldMask` ([#226](https://github.com/etna-team/etna/pull/226))
-
-
- Fix `DeseasonalityTransform` fails to inverse transform short series ([#174](https://github.com/etna-team/etna/pull/174))
-
- Fix indexing in `stl_plot`, `plot_periodogram`, `plot_holidays`, `plot_backtest`, `plot_backtest_interactive`, `ResampleWithDistributionTransform` ([#244](https://github.com/etna-team/etna/pull/244))
- Fix `DifferencingTransform` to handle integer timestamp on test ([#244](https://github.com/etna-team/etna/pull/244))
-
-
-
Expand Down
1 change: 0 additions & 1 deletion README.md
Expand Up @@ -55,7 +55,6 @@ from etna.datasets import TSDataset
df = pd.read_csv("examples/data/example_dataset.csv")

# Create a TSDataset
df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")

# Choose a horizon
Expand Down
1 change: 1 addition & 0 deletions docs/source/api_reference/datasets.rst
Expand Up @@ -19,6 +19,7 @@ Basic structures:
:template: class.rst

TSDataset
DataFrameFormat
HierarchicalStructure

Utilities for dataset generation:
Expand Down
48 changes: 36 additions & 12 deletions etna/analysis/decomposition/plots.py
Expand Up @@ -6,7 +6,9 @@
from typing import List
from typing import Optional
from typing import Tuple
from typing import Type
from typing import Union
from typing import cast

import matplotlib.pyplot as plt
import numpy as np
Expand Down Expand Up @@ -84,12 +86,12 @@ def plot_trend(

def plot_time_series_with_change_points(
ts: "TSDataset",
change_points: Dict[str, List[pd.Timestamp]],
change_points: Dict[str, List[Union[pd.Timestamp, int]]],
segments: Optional[List[str]] = None,
columns_num: int = 2,
figsize: Tuple[int, int] = (10, 5),
start: Optional[str] = None,
end: Optional[str] = None,
start: Optional[Union[pd.Timestamp, int, str]] = None,
end: Optional[Union[pd.Timestamp, int, str]] = None,
):
"""Plot segments with their trend change points.
Expand All @@ -110,6 +112,11 @@ def plot_time_series_with_change_points(
start timestamp for plot
end:
end timestamp for plot
Raises
------
ValueError:
Incorrect type of ``start`` or ``end`` is used according to ``ts.freq``.
"""
start, end = _get_borders_ts(ts, start, end)

Expand Down Expand Up @@ -147,17 +154,17 @@ def plot_time_series_with_change_points(

def plot_change_points_interactive(
ts,
change_point_model: BaseEstimator,
model: BaseCost,
change_point_model: Type[BaseEstimator],
model: Union[str, BaseCost],
params_bounds: Dict[str, Tuple[Union[int, float], Union[int, float], Union[int, float]]],
model_params: List[str],
predict_params: List[str],
in_column: str = "target",
segments: Optional[List[str]] = None,
columns_num: int = 2,
figsize: Tuple[int, int] = (10, 5),
start: Optional[str] = None,
end: Optional[str] = None,
start: Optional[Union[pd.Timestamp, int, str]] = None,
end: Optional[Union[pd.Timestamp, int, str]] = None,
):
"""Plot a time series with indicated change points.
Expand Down Expand Up @@ -196,14 +203,18 @@ def plot_change_points_interactive(
Jupyter notebook might display the results incorrectly,
in this case try to use ``!jupyter nbextension enable --py widgetsnbextension``.
Raises
------
ValueError:
Incorrect type of ``start`` or ``end`` is used according to ``ts.freq``.
Examples
--------
>>> from etna.datasets import TSDataset
>>> from etna.datasets import generate_ar_df
>>> from etna.analysis import plot_change_points_interactive
>>> from ruptures.detection import Binseg
>>> classic_df = generate_ar_df(periods=1000, start_time="2021-08-01", n_segments=2)
>>> df = TSDataset.to_dataset(classic_df)
>>> df = generate_ar_df(periods=1000, start_time="2021-08-01", n_segments=2)
>>> ts = TSDataset(df, "D")
>>> params_bounds = {"n_bkps": [0, 5, 1], "min_size":[1,10,3]}
>>> plot_change_points_interactive(ts=ts, change_point_model=Binseg, model="l2", params_bounds=params_bounds, model_params=["min_size"], predict_params=["n_bkps"], figsize=(20, 10)) # doctest: +SKIP
Expand All @@ -212,6 +223,8 @@ def plot_change_points_interactive(
from ipywidgets import IntSlider
from ipywidgets import interact

start, end = _get_borders_ts(ts, start, end)

if segments is None:
segments = sorted(ts.segments)

Expand Down Expand Up @@ -329,7 +342,7 @@ def stl_plot(
df = ts.to_pandas()
for i, segment in enumerate(segments):
segment_df = df.loc[:, pd.IndexSlice[segment, :]][segment]
segment_df = segment_df[segment_df.first_valid_index() : segment_df.last_valid_index()]
segment_df = segment_df.loc[segment_df.first_valid_index() : segment_df.last_valid_index()]
decompose_result = STL(endog=segment_df[in_column], period=period, **stl_kwargs).fit()

# start plotting
Expand Down Expand Up @@ -360,7 +373,7 @@ def stl_plot(

def seasonal_plot(
ts: "TSDataset",
freq: Optional[str] = None,
freq: Union[Optional[str], Literal["not_given"]] = "not_given",
cycle: Union[
Literal["hour"], Literal["day"], Literal["week"], Literal["month"], Literal["quarter"], Literal["year"], int
] = "year",
Expand All @@ -386,6 +399,7 @@ def seasonal_plot(
* if set, resampling will be made using ``aggregation`` parameter.
If given frequency is too low, then the frequency of ``ts`` will be used.
This option isn't supported for data with integer timestamp.
cycle:
period of seasonality to capture (see :class:`~etna.analysis.decomposition.utils.SeasonalPlotCycle`)
Expand All @@ -406,11 +420,21 @@ def seasonal_plot(
number of columns in subplots
figsize:
size of the figure per subplot with one segment in inches
Raises
------
ValueError:
Resampling isn't supported for data with integer timestamp
ValueError:
Setting non-integer cycle isn't supported for data with integer timestamp
ValueError:
Value None for freq parameter isn't supported for data with datetime timestamp
"""
if plot_params is None:
plot_params = {}
if freq is None:
if freq == "not_given":
freq = ts.freq
freq = cast(Optional[str], freq)
if segments is None:
segments = sorted(ts.segments)

Expand Down
3 changes: 2 additions & 1 deletion etna/analysis/decomposition/search.py
@@ -1,5 +1,6 @@
from typing import Dict
from typing import List
from typing import Union

import pandas as pd
from ruptures.base import BaseEstimator
Expand All @@ -9,7 +10,7 @@

def find_change_points(
ts: TSDataset, in_column: str, change_point_model: BaseEstimator, **model_predict_params
) -> Dict[str, List[pd.Timestamp]]:
) -> Dict[str, List[Union[int, pd.Timestamp]]]:
"""Find trend change points using ruptures models.
Parameters
Expand Down

0 comments on commit a0e4cb2

Please sign in to comment.