## Google Trends “Previous-Day” Rule

1. **Fetch window**  
   * **Once per day** (e.g. around 00:10 local time) we pull *hourly* Google Trends data for the **entire previous day** (00:00 – 23:59).  
   * We average the 24 hourly points → a single **integer 0-100**.

2. **Missing data**  
   * If Google returns *no* values for that day (keyword too low to sample), we save **0**.  
   * `0` therefore means “search volume negligible”.

3. **Output (CSV)**  

   | date (d-1) | keyword | interest |
   |------------|---------|----------|
   | 2025-06-27 | Times Square | 32 |
   | … | … | … |

4. **Alignment rule**  
   * The `date` column is always the **previous day (d-1)**.  
   * When you build today’s (d) feature vector, treat `interest` as a **lag-1 feature**.  
     * **Online**: just read the value and drop it straight into today’s payload.  
     * **Offline merge**: `date += 1 day` before joining to today’s master table.

---

### API example 
```python
from google_trends_fetch import fetch_prev_day_trends

zone_keywords = ["Times Square", "Central Park", ...]
df_trend = fetch_prev_day_trends(zone_keywords)   # DataFrame as above

# ---- Real-time prediction sample ----
today_features = {
    "predict_date": "2025-06-28",
    "trend_prev_day": int(
        df_trend.loc[df_trend["keyword"] == "Times Square", "interest"]
    ),
    # other features ...
}

pred = model.predict(today_features)


In [7]:

"""
Fetch the previous day (00:00–23:59 local) average hourly Google Trends score
for a list of keywords.  Missing or below‑threshold values are treated as 0.
Outputs a CSV named e.g. google_trends_prev_day_20250628.csv.
"""

from pytrends.request import TrendReq
import pandas as pd
from datetime import datetime, timedelta
import time, random, math


def fetch_prev_day_trends(kw_list, *, geo: str = "US-NY", tz: int = 360,
                        batch_size: int = 5, sleep_range: tuple = (1, 2)) -> pd.DataFrame:
    """Return a DataFrame with columns: date, keyword, interest."""

    pytrends = TrendReq(hl="en-US", tz=tz)

    # Define yesterday's 24‑hour window in local time
    yesterday = datetime.now() - timedelta(days=1)
    y_start = yesterday.replace(hour=0, minute=0, second=0, microsecond=0)
    y_end = yesterday.replace(hour=23, minute=59, second=59, microsecond=0)

    records: list[dict] = []

    # Google Trends allows max 5 keywords per request
    for i in range(0, len(kw_list), batch_size):
        batch = kw_list[i : i + batch_size]
        try:
            # Use hourly data
            pytrends.build_payload(batch, timeframe="now 7-d", geo=geo)
            df = pytrends.interest_over_time()

            if df.empty:
                raise ValueError("Empty dataframe returned by Google Trends API")

            
            df_y = df.loc[(df.index >= y_start) & (df.index <= y_end)]

            for kw in batch:
                val = df_y[kw].mean() if kw in df_y else None
                if val is None or math.isnan(val):
                    val = 0
                else:
                    val = int(round(val))
                records.append({"date": yesterday.date(), "keyword": kw, "interest": val})
        except Exception as exc:
            print(f"[ERROR] {batch}: {exc}")

        time.sleep(random.uniform(*sleep_range))

    return pd.DataFrame(records)


if __name__ == "__main__":
    zone_keywords = [
        "Times Square", "Central Park", "Empire State Building",
        "Brooklyn Bridge", "Statue of Liberty Ferry", "Rockefeller Center",
        "One World Trade Center", "Metropolitan Museum of Art",
        "Grand Central Terminal", "MoMA",
        "Roosevelt Island Tram", "Hudson River Kayaking", "The High Line",
    ]

    df_out = fetch_prev_day_trends(zone_keywords, geo="US-NY")
    filename = f"google_trends_prev_day_{datetime.now():%Y%m%d}.csv"
    df_out.to_csv(filename, index=False)
    print(f"Saved {len(df_out)} rows to {filename}")


  df = df.fillna(False)
  df = df.fillna(False)
  df = df.fillna(False)


Saved 13 rows to google_trends_prev_day_20250628.csv
