The purpose of this notebook is to demonstrate how you can get relevant forecast-ready features for your locations at scale. To do this, we will use the [Beam API](https://docs.predicthq.com/api/beam) (as well as the [Suggested Radius API](https://docs.predicthq.com/api/suggested-radius)). 

Make sure you have a predefined set of locations with corresponding demand data ready before running this notebook. The output is a list of important features for each of your locations.

# Steps

* [Setup](#setup)
* [Step 1. Prepare Data](#step-1-prepare-data)
* [Step 2. Beam: Upload Data](#step-2-beam-upload-data)
* [Step 3. Beam: Get Feature Importance](#step-3-beam-get-feature-importance)

# Setup

Complete the following steps before proceeding:

1. Install `requirements.txt`
2. Update `DATA_DIR` and `OUTPUT_DIR` as necessary
3. Replace `ACCESS_TOKEN` with a valid token (for help creating an access token, see [the API Quickstart](https://docs.predicthq.com/getting-started/api-quickstart))

In [None]:
# install requirements
# %pip install --user -r requirements.txt

In [1]:
import pandas as pd
import json
import os
import requests
from predicthq import Client

import beam_api_utils as bau

In [2]:
DATA_DIR = "data"
OUTPUT_DIR = "output"

ACCESS_TOKEN = "REPLACE_WITH_ACCESS_TOKEN"

In [3]:
phq = Client(access_token=ACCESS_TOKEN)

# Step 1. Prepare Data

**New Analyses**

To create new Analyses in Beam, the following are required:

1. Demand data

    a. One csv file with columns for `location`, `date` and `demand` (see [here](https://docs.predicthq.com/api/beam/upload-demand-data#request-body) for more details)

2. Config file with the following information per `location`:

    a. `lat`/`lon`: the coordinates of the location 

    b. `analysis_name`: a user-created free-form string to reference the Analysis in Beam

    c. `industry`: (optional) the industry relevant to your locations (`accommodation`, `parking`, `food_and_beverage` or `retail`), default is `other`

    d. `min_phq_rank`: (optional) the PHQ Rank threshold for capturing relevant events around the location, default is set by industry

See the example config below for how this should look.


In [4]:
# read and inspect demand file
demand_df = pd.read_csv(os.path.join(DATA_DIR, "demand.csv"))
demand_df.head()

Unnamed: 0,location,date,demand
0,store_0,2017-01-02,5552.019186
1,store_0,2017-01-03,8299.941863
2,store_0,2017-01-04,8556.730072
3,store_0,2017-01-05,8595.100423
4,store_0,2017-01-06,8198.941337


In [5]:
# load config file
with open(os.path.join(DATA_DIR, "config.json"), "r") as f:
    config = json.load(f)

config = bau.supplement_config(config, demand_df, phq)

In [6]:
config

{'store_0': {'lat': 40.74559205863674,
  'lon': -73.9945205785237,
  'timezone': 'America/New_York',
  'analysis_name': 'store_0_daily_analysis',
  'industry': 'restaurants',
  'min_phq_rank': 30,
  'start': '2017-01-02',
  'end': '2019-12-31',
  'radius': 1.27,
  'radius_unit': 'mi'},
 'store_1': {'lat': 33.971191942428334,
  'lon': -118.16436160102515,
  'timezone': 'America/Los_Angeles',
  'analysis_name': 'store_1_daily_analysis',
  'industry': 'restaurants',
  'min_phq_rank': 30,
  'start': '2017-01-02',
  'end': '2018-11-30',
  'radius': 1.56,
  'radius_unit': 'mi'},
 'store_2': {'lat': 40.751582853596915,
  'lon': -73.98155897848956,
  'timezone': 'America/New_York',
  'analysis_name': 'store_2_monday_weekly_analysis',
  'industry': 'restaurants',
  'min_phq_rank': 30,
  'start': '2017-01-09',
  'end': '2019-12-23',
  'radius': 1.21,
  'radius_unit': 'mi'},
 'store_3': {'lat': 40.74559205863674,
  'lon': -73.9945205785237,
  'timezone': 'America/New_York',
  'analysis_name': 'st

**Existing Analyses**

If you have existing Analyses in Beam, the following are required:

1. Config file

    a. `analysis_id`: the ID associated with a Beam Analysis 

    b. `analysis_name`: the user-created free-form string to reference the Analysis in Beam

If your Analyses were created before the launch of Feature Importance, a refresh is also required.

Otherwise, skip to [Step 3. Beam: Get Feature Importance](#step-3-beam-get-feature-importance). 

In [7]:
# example config for existing Analyses
# config = {
#     "store_0": {
#         "analysis_id": "abc123",
#         "analysis_name": "store_0_analysis",
#     },
#     "store_1": {
#         "analysis_id": "def456",
#         "analysis_name": "store_1_analysis",
#     },
#     "store_2": {
#         "analysis_id": "ghi789",
#         "analysis_name": "store_2_analysis",
#     },
# }

In [8]:
# refresh Analyses
# for location, info in config.items():
#     print(f"Refreshing analysis for {location}...")
#     bau.refresh_analysis(analysis_id=info["analysis_id"], access_token=ACCESS_TOKEN)

# Step 2. Beam: Upload Data

**New Analyses**

This step involves using the [Beam API](https://docs.predicthq.com/api/beam) to:

1. Create an `analysis_id` for each location
2. Upload demand for each Analysis
3. Check `readiness_status` of the Analysis and make sure it is `ready` before proceeding

For more info on the Beam API and other functionality such as updating and deleting Analyses, see the [PredictHQ Docs](https://docs.predicthq.com/api/beam).

In [9]:
# create Analysis id and upload demand
for location, info in config.items():
    print(f"Uploading demand for {location}...")

    try:
        analysis_id = bau.create_analysis_id(info, access_token=ACCESS_TOKEN)

        config[location]["analysis_id"] = analysis_id

        location_demand_df = demand_df[demand_df["location"] == location].drop(
            columns=["location"]
        )
        location_demand_json = location_demand_df.to_json(orient="records")

        bau.upload_demand(
            demand_json=location_demand_json,
            analysis_id=analysis_id,
            access_token=ACCESS_TOKEN,
        )

    except Exception as e:
        print(f"--- an error occurred: {e}")
        continue

Uploading demand for store_0...
--- the request has been accepted for processing.
Uploading demand for store_1...
--- the request has been accepted for processing.
Uploading demand for store_2...
--- the request has been accepted for processing.
Uploading demand for store_3...
--- the request has been accepted for processing.


The more Analyses you process, the longer it takes. Analyses need to be `ready` before proceeding to the next steps. Refresh as needed to get the latest status.

In [10]:
# check readiness status is `ready`
for location, info in config.items():
    print(f"Readiness status for {location}...")
    status = bau.readiness_status(
        analysis_id=info["analysis_id"], access_token=ACCESS_TOKEN
    )
    print(f"--- {status}")
    config[location]["analysis_readiness_status"] = status
    # update the demand type once the status is ready
    if status == "ready":
        demand_type = bau.get_demand_type(
            analysis_id=info["analysis_id"], access_token=ACCESS_TOKEN
        )
        config[location]["interval"] = demand_type.get("interval", "day")
        config[location]["week_start_day"] = demand_type.get("week_start_day", None)


Readiness status for store_0...
--- ready
Readiness status for store_1...
--- ready
Readiness status for store_2...
--- ready
Readiness status for store_3...
--- ready


# Step 3. Beam: Get Feature Importance

Feature Importance results can be retrieved for all Analyses via their `analysis_id`.

**All Analyses**

In [11]:
# get feature importance
for location, info in config.items():
    print(f"Getting feature importance for {location}...")

    try:
        feature_importance = bau.get_feature_importance(
            analysis_id=info["analysis_id"], access_token=ACCESS_TOKEN
        )
        important_features = [
            item
            for feature in feature_importance["feature_importance"]
            if feature["important"]
            for item in feature["features"]
        ]
        config[location]["important_features"] = important_features
        config[location]["feature_importance"] = feature_importance[
            "feature_importance"
        ]

        print("--- feature importance retrieved")

    except Exception as e:
        print(f"--- an error occurred: {e}")
        continue

# save config file
with open(os.path.join(OUTPUT_DIR, "config_with_features.json"), "w") as f:
    json.dump(config, f, indent=4)

Getting feature importance for store_0...
--- feature importance retrieved
Getting feature importance for store_1...
--- feature importance retrieved
Getting feature importance for store_2...
--- feature importance retrieved
Getting feature importance for store_3...
--- feature importance retrieved


In [12]:
config

{'store_0': {'lat': 40.74559205863674,
  'lon': -73.9945205785237,
  'timezone': 'America/New_York',
  'analysis_name': 'store_0_daily_analysis',
  'industry': 'restaurants',
  'min_phq_rank': 30,
  'start': '2017-01-02',
  'end': '2019-12-31',
  'radius': 1.27,
  'radius_unit': 'mi',
  'analysis_id': 'FwAg8FItj_g',
  'analysis_readiness_status': 'ready',
  'interval': 'day',
  'week_start_day': None,
  'important_features': ['phq_rank_public_holidays',
   'phq_attendance_conferences',
   'phq_attendance_performing_arts',
   'phq_attendance_festivals',
   'phq_attendance_school_holidays',
   'phq_attendance_expos',
   'phq_attendance_sports',
   'phq_impact_severe_weather_air_quality_retail',
   'phq_impact_severe_weather_blizzard_retail',
   'phq_impact_severe_weather_cold_wave_retail',
   'phq_impact_severe_weather_cold_wave_snow_retail',
   'phq_impact_severe_weather_cold_wave_storm_retail',
   'phq_impact_severe_weather_dust_retail',
   'phq_impact_severe_weather_dust_storm_retail'