## <span style="color:#ff5f27">📝 Imports </span>

In [1]:
import datetime
from features.price import generate_historical_data, to_wide_format, plot_historical_id
from features.averages import calculate_second_order_features

import great_expectations as ge
from great_expectations.core import ExpectationSuite, ExpectationConfiguration

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27">⚙️ Data Generation </span>

Let's define the `START_DATE` variable (format: %Y-%m-%d) which will indicate the start date for data generation.

In [2]:
# Define a constant START_DATE with a specific date (September 1, 2022)
START_DATE = datetime.date(2023, 9, 1)

In [3]:
# Generate synthetic historical data using the generate_historical_data function from START_DATE till current date
data_generated = generate_historical_data(
    START_DATE,  # Start date for data generation (September 1, 2022)
)

# Display the first 3 rows of the generated data
data_generated.head(3)

Generating Data: 100%|██████████| 313/313 [00:13<00:00, 23.22it/s]


Unnamed: 0,date,id,price
0,2023-09-01,4941,200.0
1,2023-09-01,300,200.0
2,2023-09-01,4622,200.0


Look at historical values for 1 and 2 IDs.

In [4]:
plot_historical_id([1,2], data_generated)

## <span style="color:#ff5f27"> 👮🏻‍♂️ Great Expectations </span>

In [5]:
# Convert the generated historical data DataFrame to a Great Expectations DataFrame
ge_price_df = ge.from_pandas(data_generated)

# Retrieve the expectation suite associated with the ge DataFrame
expectation_suite_price = ge_price_df.get_expectation_suite()

# Set the expectation suite name to "price_suite"
expectation_suite_price.expectation_suite_name = "price_suite"

In [6]:
# Add expectation for the 'id' column values to be between 0 and 5000
expectation_suite_price.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "id",
            "min_value": 0,
            "max_value": 5000,
        }
    )
)

# Add expectation for the 'price' column values to be between 0 and 1000
expectation_suite_price.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "price",
            "min_value": 0,
            "max_value": 1000,
        }
    )
)

# Loop through specified columns ('date', 'id', 'price') and add expectations for null values
for column in ['date', 'id', 'price']:
    expectation_suite_price.add_expectation(
        ExpectationConfiguration(
            expectation_type="expect_column_values_to_be_null",
            kwargs={
                "column": column,
                "mostly": 0.0,
            }
        )
    )


## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [7]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Multiple projects found. 

	 (1) Car_Prices
	 (2) rixdemo
	 (3) GraphEmbeddingsDemo
	 (4) BeerVolumePrediction

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/189590
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [8]:
# Get or create the 'price' feature group
price_fg = fs.get_or_create_feature_group(
    name='price',
    description='Price Data',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    expectation_suite=expectation_suite_price,
)    
# Insert data
price_fg.insert(data_generated)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/189590/fs/189509/fg/982387
Validation succeeded.
Validation Report saved successfully, explore a summary at https://c.app.hopsworks.ai:443/p/189590/fs/189509/fg/982387


Uploading Dataframe: 0.00% |          | Rows 0/1541738 | Elapsed Time: 00:00 | Remaining Time: ?

Launching job: price_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/189590/jobs/named/price_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x306c8f490>,
 {
   "results": [
     {
       "meta": {
         "ingestionResult": "INGESTED",
         "validationTime": "2024-07-09T09:05:24.000441Z"
       },
       "exception_info": {
         "raised_exception": false,
         "exception_message": null,
         "exception_traceback": null
       },
       "result": {
         "element_count": 1541738,
         "missing_count": 0,
         "missing_percent": 0.0,
         "unexpected_count": 0,
         "unexpected_percent": 0.0,
         "unexpected_percent_total": 0.0,
         "unexpected_percent_nonmissing": 0.0,
         "partial_unexpected_list": []
       },
       "success": true,
       "expectation_config": {
         "meta": {
           "expectationId": 585732
         },
         "expectation_type": "expect_column_values_to_be_between",
         "kwargs": {
           "column": "id",
           "min_value": 0,
           "max_value": 5000
         }
       }
     },
     {
       "meta": {
 

## <span style="color:#ff5f27">⚙️ Feature Engineering  </span>

We will engineer the next features:

- `ma_7`: This feature represents the 7-day moving average of the 'price' data, providing a smoothed representation of short-term price trends.

- `ma_14`: This feature represents the 14-day moving average of the 'price' data, offering a slightly longer-term smoothed price trend.

- `ma_30`: This feature represents the 30-day moving average of the 'price' data, providing a longer-term smoothed representation of price trends.

- `daily_rate_of_change`: This feature calculates the daily rate of change in prices as a percentage change, indicating how much the price has changed from the previous day.

- `volatility_30_day`: This feature measures the volatility of prices over a 30-day window using the standard deviation. Higher values indicate greater price fluctuations.

- `ema_02`: This feature calculates the exponential moving average (EMA) of 'price' with a smoothing factor of 0.2, giving more weight to recent data points in the calculation.

- `ema_05`: Similar to ema_02, this feature calculates the EMA of 'price' with a smoothing factor of 0.5, providing a different degree of responsiveness to recent data.

- `rsi`: The Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. It ranges from 0 to 100, with values above 70 indicating overbought conditions and values below 30 indicating oversold conditions.

In [None]:
# Calculate second-order features
averages_df = calculate_second_order_features(data_generated)

# Display the first 3 rows of the resulting DataFrame
averages_df.head(3)

## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [None]:
# Get or create the 'averages' feature group
averages_fg = fs.get_or_create_feature_group(
    name='averages',
    description='Calculated second order features',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    parents=[price_fg],
)
# Insert data
averages_fg.insert(averages_df)

---