Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/api_doc/datetime/DatetimeOrdinal.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
DatetimeOrdinal
===============

.. automodule:: feature_engine.datetime.datetime_ordinal
.. autoclass:: feature_engine.datetime.DatetimeOrdinal
:members:

68 changes: 46 additions & 22 deletions docs/user_guide/datetime/DatetimeOrdinal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,30 @@

.. currentmodule:: feature_engine.datetime


DatetimeOrdinal
================
===============

:class:`DatetimeOrdinal()` converts datetime variables into ordinal numbers, thereby
providing a numerical representation of the date. By default, it returns the proleptic
Gregorian ordinal of the date, where 1st January of year 1 has ordinal 1.

:class:`DatetimeOrdinal()` converts datetime variables into ordinal numbers, that is, a numerical representation of the date.
If 1st January of year 1 has ordinal number 1 then, 2nd January of year 1 will have ordinal
number 2, and so on.

By default, :class:`DatetimeOrdinal()` returns the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1.
Optionally, :class:`DatetimeOrdinal()` can compute the number of days relative to a
user-defined `start_date`. This can be useful for reducing the magnitude of the ordinal
values and for aligning them to a specific project timeline.

Ordinal numbers preserve the relative distances between dates (e.g., the number of days
between events), allowing algorithms to capture linear trends and temporal distances.

Optionally, :class:`DatetimeOrdinal()` can compute the number of days relative to a user-defined `start_date`.

Datetime ordinals with pandas
-----------------------------

In Python, we can get the Gregorian ordinal of a date using the `toordinal()` method from a datetime object.
In Python, we can get the Gregorian ordinal of a date using the `toordinal()` method
from a datetime object as follows:

.. code:: python

Expand All @@ -34,13 +45,18 @@ The output shows the new ordinal feature:
0 2023-01-01 738521
1 2023-01-10 738530

In the variable `ordinal`, the value `738521` means that `2023-01-01` is 738521 days
*after* the 1st of January of the year 1.

Datetime ordinals with Feature-engine
-------------------------------------
Datetime ordinal with Feature-engine
------------------------------------

:class:`DatetimeOrdinal()` automatically converts one or more datetime variables into ordinal numbers. It works with variables whose dtype is datetime, as well as with object-type variables, provided that they can be parsed into datetime format.
:class:`DatetimeOrdinal()` automatically converts one or more datetime variables into
ordinal numbers. It works with variables whose dtype is datetime, as well as with
object-type variables, provided that they can be parsed into datetime format.

:class:`DatetimeOrdinal()` uses pandas `toordinal()` under the hood. The main functionalities are:
:class:`DatetimeOrdinal()` uses pandas `toordinal()` under the hood. The main
functionalities are:

- It can convert multiple datetime variables at once.
- It can compute the ordinal number relative to a `start_date`.
Expand Down Expand Up @@ -82,12 +98,14 @@ We see the new ordinal feature in the output:
2 Jan-1999 3 733987
3 Feb-2002 4 737729

By default, :class:`DatetimeOrdinal()` drops the original datetime variable. To keep it, you can set `drop_original=False`.
By default, :class:`DatetimeOrdinal()` drops the original datetime variable. To keep
it, you can set `drop_original=False`.

Calculate days from a start date
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`DatetimeOrdinal()` can also calculate the number of days elapsed since a specific `start_date`.
:class:`DatetimeOrdinal()` can also calculate the number of days elapsed since a
specific `start_date`.

.. code:: python

Expand All @@ -100,7 +118,8 @@ Calculate days from a start date

df_transf

The new feature now represents the number of days between `var_date2` and January 1st, 2010. Note that dates before the `start_date` will result in negative numbers.
The new feature now represents the number of days between `var_date2` and January 1st,
2010. Note that dates before the `start_date` will result in negative numbers.

.. code:: python

Expand All @@ -114,17 +133,14 @@ The new feature now represents the number of days between `var_date2` and Januar
Missing timestamps
------------------

:class:`DatetimeOrdinal()` handles missing values (NaT) in datetime variables through the `missing_values` parameter, which can be set to `"raise"` or `"ignore"`.

If `missing_values="raise"`, the transformer will raise an error if NaT values are found in the datetime variables during `fit()` or `transform()`.
:class:`DatetimeOrdinal()` handles missing values (NaT) in datetime variables through
the `missing_values` parameter, which can be set to `"raise"` or `"ignore"`.

If `missing_values="ignore"`, the transformer will ignore NaT values, and the resulting ordinal feature will contain `NaN` (or `pd.NA`) in their place.
If `missing_values="raise"`, the transformer will raise an error if NaT values are
found in the datetime variables during `fit()` or `transform()`.


.. autoclass:: DatetimeOrdinal
:members:
:undoc-members:
:show-inheritance:
If `missing_values="ignore"`, the transformer will ignore NaT values, and the resulting
ordinal feature will contain `NaN` (or `pd.NA`) in their place.


Additional resources
Expand All @@ -143,7 +159,7 @@ For tutorials on how to create and use features from datetime columns, check the
.. figure:: ../../images/fetsf.png
:width: 300
:figclass: align-center
:align: right
:align: left
:target: https://www.trainindata.com/p/feature-engineering-for-forecasting

Feature Engineering for Time Series Forecasting
Expand All @@ -158,6 +174,14 @@ For tutorials on how to create and use features from datetime columns, check the
|
|
|
|
|
|
|
|
|
|
|

Or read our book:

Expand Down
40 changes: 21 additions & 19 deletions feature_engine/datetime/datetime_ordinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,38 +40,39 @@ class DatetimeOrdinal(TransformerMixin, BaseEstimator, GetFeatureNamesOutMixin):
since January 1, 0001 in the Gregorian calendar.

Optionally, a `start_date` can be provided to set a custom reference point,
making the ordinal values relative to this date (starting from 1). This can be
useful for reducing the magnitude of the ordinal values and for aligning them
to a specific project timeline.
making the ordinal values relative to this date (starting from 1).

More details in the :ref:`User Guide <datetime_ordinal>`.

Parameters
----------
variables: str, list, default=None
List with the variables from which date and time information will be extracted.
If None, the transformer will find and select all datetime variables,
including variables of type object that can be converted to datetime.
List of the variables to convert into ordinal. If None, the transformer will
find and select all datetime variables, including variables of type object that
can be converted to datetime.

missing_values: string, default='raise'
Indicates if missing values should be ignored or raised. If 'raise' the
transformer will return an error if the datasets to `fit` or `transform`
transformer will return an error if the datasets passed to `fit` or `transform`
contain missing values. If 'ignore', missing data will be ignored when
performing the feature extraction.
performing the transformation.

start_date: str, datetime.datetime, default=None
A reference date from which the ordinal values will be calculated.
If provided, the ordinal value of `start_date` will be subtracted from
each datetime variable's ordinal value, and 1 will be added, so the
`start_date` itself corresponds to an ordinal value of 1.
If None, the standard `datetime.toordinal()` value will be used.
The `start_date` can be a string (e.g., "YYYY-MM-DD") or a datetime object.
If provided, the ordinal value of `start_date` will be 1, the day after will be
2, and so on. Days before `start_date` will take negative values.
If None, the transformation will represent the number of days since
January 1, 0001. `start_date` can be a string (e.g., "YYYY-MM-DD")
or a datetime object.

drop_original: bool, default=True
If True, the original datetime variables will be dropped from the dataframe.
If True, the original datetime variables will be dropped from the dataframe
after the transformation.

Attributes
----------
variables_:
List of variables from which date and time features will be extracted.
List of variables to convert into ordinals.

start_date_ordinal_:
The ordinal value of the provided `start_date`, if applicable.
Expand All @@ -91,8 +92,8 @@ class DatetimeOrdinal(TransformerMixin, BaseEstimator, GetFeatureNamesOutMixin):

See also
--------
pandas.to_datetime
datetime.toordinal
feature_engine.datetime.DatetimeFeatures
feature_engine.datetime.DatetimeSubtraction

Examples
--------
Expand Down Expand Up @@ -186,7 +187,8 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):

def transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""
Extract the ordinal datetime features and add them to the dataframe.
Calculate ordinal representation of datetime features and add them to the
dataframe.

Parameters
----------
Expand All @@ -196,7 +198,7 @@ def transform(self, X: pd.DataFrame) -> pd.DataFrame:
Returns
-------
X_new: Pandas dataframe, shape = [n_samples, n_features x n_df_features]
The dataframe with the original variables plus the new variables.
The dataframe with the original variables plus the new features.
"""

# Check method fit has been called
Expand Down