# Feature Backfill for Google Trends
**Goal of this notebook**

This notebook will backfill the feature groups containing google trends data and the flight data
* Supports backfill
* Produces daily features
* Is point-in-time correct
* Uploads features to Hopsworks Feature Store

**Imports & setup**

In [2]:
# Core
import pandas as pd
import numpy as np
from datetime import datetime, timedelta, date
import asyncio

# Google Trends
from pytrends.request import TrendReq

# Hopsworks
import hopsworks


## Google Trends Feature Backfilll

### Feature Config

In [3]:
# Search terms used as predictors
KEYWORDS = [
    "vikings",
    "fika",
    "stockholm",
    "ikea",
    "abba"
]

# Country code for Sweden
COUNTRY = "SE"

### Backfill Dates

In [4]:
# Backfill range (used only in backfill mode)
start_date = date(2019, 12, 31)
end_date   = date(2025, 12, 29)

print(f"Running feature pipeline from {start_date} to {end_date}")

Running feature pipeline from 2019-12-31 to 2025-12-29


### Fetch Google Trends Data

In [6]:
def fetch_google_trends_daily(keywords, start_date, end_date):
    pytrends = TrendReq(hl="en-US", tz=360)
    all_data = []

    start_date = pd.to_datetime(start_date)
    end_date = pd.to_datetime(end_date)

    while start_date < end_date:
        window_end = min(start_date + pd.Timedelta(days=89), end_date)

        pytrends.build_payload(
            kw_list=keywords,
            timeframe=f"{start_date:%Y-%m-%d} {window_end:%Y-%m-%d}",
            geo=COUNTRY
        )

        df = pytrends.interest_over_time()
        if not df.empty:
            df = df[~df["isPartial"]]
            all_data.append(df)

        start_date = window_end + pd.Timedelta(days=1)

    return pd.concat(all_data)

### Clean and resample to Daily Data

In [8]:
# Fetch raw data
raw_trends = fetch_google_trends_daily(KEYWORDS, start_date, end_date)

# Remove partial rows (important!)
raw_trends = raw_trends[raw_trends["isPartial"] == False]

# Drop metadata column
raw_trends = raw_trends.drop(columns=["isPartial"])

# Convert to daily frequency using forward-fill
daily_trends = (
    raw_trends
    .resample("D")
    .ffill()
    .reset_index()
)

daily_trends["date"] = pd.to_datetime(daily_trends["date"]).dt.date

daily_trends.head()

TooManyRequestsError: The request failed: Google returned a response with code 429

### Connect to Hopsworks

In [9]:
project = hopsworks.login()
fs = project.get_feature_store()

2025-12-31 12:36:55,802 INFO: Initializing external client
2025-12-31 12:36:55,804 INFO: Base URL: https://c.app.hopsworks.ai:443




To ensure compatibility please install the latest bug fix release matching the minor version of your backend (4.2) by running 'pip install hopsworks==4.2.*'


2025-12-31 12:36:57,857 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1296539


### Create/get Feature Group

In [12]:
feature_group = fs.get_or_create_feature_group(
    name="google_trends_daily",
    version=1,
    primary_key=["date"],
    description="Daily Google Trends features for Sweden tourism prediction",
    online_enabled=False
)

### Write features to Hopsworks

In [13]:
feature_group.insert(
    daily_trends,
    write_options={"wait_for_job": True}
)

NameError: name 'daily_trends' is not defined

In [None]:
"""
Tog bort feature engineeringen nu för vet inte var den ska ligga någonstans. Om den ligger här måste man också köra den i daily feature pipeline. 
Och i daily feature pipeline behövs data från tidigare dagar för att beräkna dom ny fieldsen. 

features_df = daily_trends.copy()

for kw in KEYWORDS:
    # Rolling averages
    for window in ROLLING_WINDOWS:
        features_df[f"{kw}_{window}d_avg"] = (
            features_df[kw]
            .rolling(window=window, min_periods=1)
            .mean()
        )

    # Weekly change
    features_df[f"{kw}_7d_delta"] = (
        features_df[kw] - features_df[kw].shift(7)
    )

features_df.head()

"""