<a href="https://colab.research.google.com/github/rvanbruggen/rix-hopsworks-demos/blob/main/Beerconsumption_Colab/1_beervolume_feature_backfill.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BeerVolume Prediction Demo
This is a short demonstration of how you can use Hopsworks for creating a Machine Learning System that creates predictions. The hypothetical use case is that of a bar owner who would want to predict how much beer will be consumed in his bar, based on past trends and behaviour.

![](https://lh3.googleusercontent.com/blogger_img_proxy/ALY8t1uqu0YUTdfoFJYGV2r9a2iHEewpP3daVa9J3qzCzV3rZm8EX8YyhHhOHbfG450AhHYQXu6Hgf8pj2fTpSzg4uio4X_qv9TTEfMnEtO6rYLevnGBxF6sO97tGeYyzaAkGSyVBnw8WtWS1P_2RLY=s0-d)

## <span style="color:#ff5f27">📝 Code Library Imports </span>
Before we can get started with our machine learning pipelines, we have to import a number of libraries.

### Colab / Gdrive integration


In [None]:
from google.colab import drive
drive.mount('/content/drive')
!cp '/content/drive/MyDrive/Colab Notebooks/Beervolume predictions/beervolume.py' .
!cp '/content/drive/MyDrive/Colab Notebooks/Beervolume predictions/averages.py' .
!cp '/content/drive/MyDrive/Colab Notebooks/Beervolume predictions/functions.py' .

!pip install -U hopsworks --quiet

In [None]:
import datetime
from beervolume import generate_historical_data, to_wide_format, plot_historical_id
from averages import calculate_second_order_features

import great_expectations as ge
from great_expectations.core import ExpectationSuite, ExpectationConfiguration

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27">⚙️ Beer Consumption Data Import </span>

Let's define the `START_DATE` variable (format: %Y-%m-%d) which will indicate the start date for data generation.

In [None]:
# Define a constant START_DATE with a specific date (September 1, 2022)
START_DATE = datetime.date(2022, 9, 1)

In [None]:
# Generate synthetic historical data using the generate_historical_data function from START_DATE till current date
data_generated = generate_historical_data(
    START_DATE,  # Start date for data generation (September 1, 2022)
)

# Display the first 3 rows of the generated data
data_generated.head(3)

Look at historical values for 1 and 2 IDs.

In [None]:
plot_historical_id([1,2], data_generated)

## <span style="color:#ff5f27"> 👮🏻‍♂️ Great Expectations </span>

In [None]:
# Convert the generated historical data DataFrame to a Great Expectations DataFrame
ge_beervolume_df = ge.from_pandas(data_generated)

# Retrieve the expectation suite associated with the ge DataFrame
expectation_suite_beervolume = ge_beervolume_df.get_expectation_suite()

# Set the expectation suite name to "beervolume_suite"
expectation_suite_beervolume.expectation_suite_name = "beervolume_suite"

In [None]:
# Add expectation for the 'id' column values to be between 0 and 5000
expectation_suite_beervolume.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "id",
            "min_value": 0,
            "max_value": 5000,
        }
    )
)

# Add expectation for the 'beervolume' column values to be between 0 and 1000
expectation_suite_beervolume.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "beervolume",
            "min_value": 0,
            "max_value": 1000,
        }
    )
)

# Loop through specified columns ('date', 'id', 'beervolume') and add expectations for null values
for column in ['date', 'id', 'beervolume']:
    expectation_suite_beervolume.add_expectation(
        ExpectationConfiguration(
            expectation_type="expect_column_values_to_be_null",
            kwargs={
                "column": column,
                "mostly": 0.0,
            }
        )
    )


## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [None]:
# Get or create the 'beervolume' feature group
beervolume_fg = fs.get_or_create_feature_group(
    name='beervolume',
    description='Beer Volume Consumption Data',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    expectation_suite=expectation_suite_beervolume,
)
# Insert data
beervolume_fg.insert(data_generated)

## <span style="color:#ff5f27">⚙️ Feature Engineering  </span>

We will engineer the next features:

- `ma_7`: This feature represents the 7-day moving average of the 'beervolume' data, providing a smoothed representation of short-term beervolume trends.

- `ma_14`: This feature represents the 14-day moving average of the 'beervolume' data, offering a slightly longer-term smoothed beervolume trend.

- `ma_30`: This feature represents the 30-day moving average of the 'beervolume' data, providing a longer-term smoothed representation of beervolume trends.

- `daily_rate_of_change`: This feature calculates the daily rate of change in beer volumes as a percentage change, indicating how much the beer volume has changed from the previous day.

- `volatility_30_day`: This feature measures the volatility of beer volume over a 30-day window using the standard deviation. Higher values indicate greater beer volume fluctuations.

- `ema_02`: This feature calculates the exponential moving average (EMA) of 'beer volume' with a smoothing factor of 0.2, giving more weight to recent data points in the calculation.

- `ema_05`: Similar to ema_02, this feature calculates the EMA of 'beer volume' with a smoothing factor of 0.5, providing a different degree of responsiveness to recent data.

# need to check this!
- `rsi`: The Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of beer volume movements. It ranges from 0 to 100, with values above 70 indicating higher-than-normal volume conditions and values below 30 indicating lower-than-normal volume conditions.

In [None]:
# Read the beer volume data from the 'beervolume' feature group
beervolume_df = beervolume_fg.read()
beervolume_df.head(3)

In [None]:
# Calculate second-order features
beervolume_averages_df = calculate_second_order_features(beervolume_df)

# Display the first 3 rows of the resulting DataFrame
beervolume_averages_df.head(3)

## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [None]:
# Get or create the 'beervolume averages' feature group
averages_fg = fs.get_or_create_feature_group(
    name='beervolume_averages',
    description='Calculated second order beervolume features',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    parents=[beervolume_fg],
)
# Insert data
beervolume_averages_fg.insert(beervolume_averages_df)

---