You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When defining a feature set, it's expected that pivot will have all categories and, as a consequence, the resulting Source dataframe will be suitable to be transformed. When a different behavior happens, FeatureSet and AggregatedFeatureSet breaks.
Feature related:
Age: legacy
Estimated cost: investigation_needed
Type: documentation, coding and testing.
Description 📋
If we have a pivot transformation defined in a reader, it's straightforward to define the expected categories as features during FeatureSet or AggregatedFeatureSet instantiation. If for some reason, not all categories are found in the Source resulting dataframe (this could happen if we use a smaller time window, for instance), then our feature set will break due to not finding this expected column.
In order to illustrate what's happening, suppose we have the following resulting dataframe from the Source:
Now, if we take a different time window and, for some reason, there is no information regarding the pool amenity, we'd have a resulting Source dataframe like this:
Therefore, the pool_amenity feature would break, since there's no pool column anymore.
Impact 💣
We'll not be able to use the pivot operation for incremental loads (since we can't be sure that all categories will be available).
Solution Hints
We could have a parameter for making a given feature optional. As a result, the expected behavior should be the following: if the column that this feature is dependent exists, then we perform the transformations, otherwise we could simply consider as null (we could raise a warning in these cases).
Observations 🤔
We should take care, when implementing this solution, to avoid hiding errors.
The text was updated successfully, but these errors were encountered:
Pivot missing categories breaks FeatureSet/AggregatedFeatureSet
Summary
When defining a feature set, it's expected that
pivot
will have all categories and, as a consequence, the resultingSource
dataframe will be suitable to be transformed. When a different behavior happens,FeatureSet
andAggregatedFeatureSet
breaks.Feature related:
Age: legacy
Estimated cost: investigation_needed
Type: documentation, coding and testing.
Description 📋
If we have a
pivot
transformation defined in a reader, it's straightforward to define the expected categories as features duringFeatureSet
orAggregatedFeatureSet
instantiation. If for some reason, not all categories are found in theSource
resulting dataframe (this could happen if we use a smaller time window, for instance), then our feature set will break due to not finding this expected column.In order to illustrate what's happening, suppose we have the following resulting dataframe from the
Source
:As a result, a possible AggregatedFeatureSet could be:
Now, if we take a different time window and, for some reason, there is no information regarding the
pool
amenity, we'd have a resultingSource
dataframe like this:Therefore, the
pool_amenity
feature would break, since there's nopool
column anymore.Impact 💣
We'll not be able to use the pivot operation for incremental loads (since we can't be sure that all categories will be available).
Solution Hints
We could have a parameter for making a given feature
optional
. As a result, the expected behavior should be the following: if the column that this feature is dependent exists, then we perform the transformations, otherwise we could simply consider asnull
(we could raise a warning in these cases).Observations 🤔
We should take care, when implementing this solution, to avoid hiding errors.
The text was updated successfully, but these errors were encountered: