# Feature Backfill for Google Trends
**Goal of this notebook**

This notebook will backfill the feature groups containing google trends data and the flight data
* Supports backfill
* Produces daily features
* Is point-in-time correct
* Uploads features to Hopsworks Feature Store

**Imports & setup**

In [1]:
# Core
import pandas as pd
import numpy as np
from datetime import datetime, timedelta, date
import asyncio

# Google Trends
from pytrends.request import TrendReq

# Hopsworks
import hopsworks


## Google Trends Feature Backfilll

### Feature Config

In [2]:
# Search terms used as predictors
KEYWORDS = [
    "vikings",
    "fika",
    "stockholm",
    "ikea",
    "abba"
]

# Country code for Sweden
COUNTRY = "SE"

### Backfill Dates

In [3]:
# Backfill range (used only in backfill mode)
start_date = date(2019, 12, 31)
end_date   = date(2025, 12, 27)

print(f"Running feature pipeline from {start_date} to {end_date}")

Running feature pipeline from 2019-12-31 to 2025-12-27


### Fetch Google Trends Data

In [4]:
def fetch_google_trends_daily(keywords, start_date, end_date):
    pytrends = TrendReq(hl="en-US", tz=360)
    all_data = []

    start_date = pd.to_datetime(start_date)
    end_date = pd.to_datetime(end_date)

    while start_date < end_date:
        window_end = min(start_date + pd.Timedelta(days=89), end_date)

        pytrends.build_payload(
            kw_list=keywords,
            timeframe=f"{start_date:%Y-%m-%d} {window_end:%Y-%m-%d}",
            geo=COUNTRY
        )

        df = pytrends.interest_over_time()
        if not df.empty:
            df = df[~df["isPartial"]]
            all_data.append(df)

        start_date = window_end + pd.Timedelta(days=1)

    return pd.concat(all_data)

### Clean and resample to Daily Data

In [5]:
# Fetch raw data
raw_trends = fetch_google_trends_daily(KEYWORDS, start_date, end_date)

# Remove partial rows (important!)
raw_trends = raw_trends[raw_trends["isPartial"] == False]

# Drop metadata column
raw_trends = raw_trends.drop(columns=["isPartial"])

# Convert to daily frequency using forward-fill
daily_trends = (
    raw_trends
    .resample("D")
    .ffill()
    .reset_index()
)

# Ensure proper datetime + ordering BEFORE rolling
daily_trends["date"] = pd.to_datetime(daily_trends["date"], errors="coerce").dt.normalize()
daily_trends = daily_trends.sort_values("date")

daily_trends


TooManyRequestsError: The request failed: Google returned a response with code 429

In [6]:
features_df['city'] = "Märsta"

missing_days = pd.to_datetime([
    '2023-12-02',
    '2023-12-03',
    '2023-12-04'
])

features_df = features_df[~features_df['date'].isin(missing_days)].copy()

In [18]:
features_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2179 entries, 7 to 2188
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   date              2179 non-null   datetime64[ns]
 1   vikings           2179 non-null   int64         
 2   fika              2179 non-null   int64         
 3   stockholm         2179 non-null   int64         
 4   ikea              2179 non-null   int64         
 5   abba              2179 non-null   int64         
 6   vikings_7d_avg    2179 non-null   float64       
 7   fika_7d_avg       2179 non-null   float64       
 8   stockholm_7d_avg  2179 non-null   float64       
 9   ikea_7d_avg       2179 non-null   float64       
 10  abba_7d_avg       2179 non-null   float64       
 11  city              2179 non-null   object        
dtypes: datetime64[ns](1), float64(5), int64(5), object(1)
memory usage: 221.3+ KB


### Connect to Hopsworks

In [8]:
project = hopsworks.login()
fs = project.get_feature_store()

2026-01-06 09:26:51,296 INFO: Initializing external client
2026-01-06 09:26:51,298 INFO: Base URL: https://c.app.hopsworks.ai:443
2026-01-06 09:26:53,142 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1286325


### Create/get Feature Group

In [10]:
feature_group = fs.get_or_create_feature_group(
    name="google_trends_daily",
    version=1,
    primary_key=['city'],
    event_time="date",
    description="Daily Google Trends features for Sweden tourism prediction",
    online_enabled=False
)

### Write features to Hopsworks

In [11]:
feature_group.insert(
    features_df,
    write_options={"wait_for_job": True}
)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1286325/fs/1265794/fg/1893830


Uploading Dataframe: 100.00% |█| Rows 2179/2179 | Elapsed Time: 00:00 | Remainin


Launching job: google_trends_daily_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1286325/jobs/named/google_trends_daily_1_offline_fg_materialization/executions
2026-01-06 09:27:55,890 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2026-01-06 09:27:59,087 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2026-01-06 09:29:51,289 INFO: Waiting for log aggregation to finish.
2026-01-06 09:30:00,011 INFO: Execution finished successfully.


(Job('google_trends_daily_1_offline_fg_materialization', 'SPARK'), None)

In [None]:
"""
Tog bort feature engineeringen nu för vet inte var den ska ligga någonstans. Om den ligger här måste man också köra den i daily feature pipeline. 
Och i daily feature pipeline behövs data från tidigare dagar för att beräkna dom ny fieldsen. 

features_df = daily_trends.copy()

for kw in KEYWORDS:
    # Rolling averages
    for window in ROLLING_WINDOWS:
        features_df[f"{kw}_{window}d_avg"] = (
            features_df[kw]
            .rolling(window=window, min_periods=1)
            .mean()
        )

    # Weekly change
    features_df[f"{kw}_7d_delta"] = (
        features_df[kw] - features_df[kw].shift(7)
    )

features_df.head()

"""