The purpose of this notebook is to demonstrate how you can get forecast-ready features for your locations at scale. To do this, we will use the [Features API](https://docs.predicthq.com/api/features). 

Make sure you have a predefined set of features and locations before running this notebook. The output is a ready-to-use dataframe of features for each of your locations.

# Steps

* [Setup](#setup)
* [Step 1. Prepare Locations](#step-1-prepare-locations)
* [Step 2. Features API: Get Features](#step-2-features-api-get-features)

# Setup

Complete the following steps before proceeding:

1. Install `requirements.txt`
2. Update `DATA_DIR` and `OUTPUT_DIR` as necessary
3. Replace `ACCESS_TOKEN` with a valid token (for help creating an access token, see [the API Quickstart](https://docs.predicthq.com/getting-started/api-quickstart))

In [None]:
# install requirements
# %pip install --user -r requirements.txt

In [1]:
import pandas as pd
import json
import os
from predicthq import Client
import features_api_utils as fau
import beam_api_utils as bau

In [2]:
DATA_DIR = "data"
OUTPUT_DIR = "output"

ACCESS_TOKEN = "REPLACE_WITH_ACCESS_TOKEN"

In [3]:
phq = Client(access_token=ACCESS_TOKEN)

# Step 1. Prepare Locations

**Features by Location**

If you have a list of features for each `location`, prepare a config file with the following information:

1. `features` or `important_features`: a list of features to get from the [Features API](https://docs.predicthq.com/api/features/get-features)

2. `place_id` or `lat`/`lon` for each location, with a unique location id as key

3. `interval` and `week_start_day`: the interval at which data is required, `day` or `week`; and if `week`, the first day of the week e.g. `sunday`, `monday`

4. `industry`: (optional) the industry relevant to your locations (`accommodation`, `parking`, `food_and_beverage`, or `retail`), default is `other`

5. `min_phq_rank`: (optional) the PHQ Rank threshold for capturing relevant events around the location, default is set by industry

6. `start` and `end`: (optional) the date range to calculate features for, default is 2 years before and 3 months after today

**Features by Group of Locations**

If you have a list of features for multiple locations, prepare the above config file (excluding `features` or `important_features`). Then create a group config file with the following information:

1. `group_id`: the ID associated with a Beam Analysis Group

2. `locations`: list of location IDs associated with each contributing location within the group

3. `features` or `important_features`: a list of features to get from the [Features API](https://docs.predicthq.com/api/features/get-features)

See the example `config` and `group_config` below for how this should look. 

In [4]:
# load config file
with open(os.path.join(OUTPUT_DIR, "config_with_features.json"), "r") as f:
    config = json.load(f)

config = bau.supplement_config(config=config, phq_client=phq)

In [5]:
config

{'store_0': {'lat': 40.74559205863674,
  'lon': -73.9945205785237,
  'analysis_name': 'store_0_daily_analysis',
  'industry': 'restaurants',
  'min_phq_rank': 30,
  'start': '2017-01-02',
  'end': '2019-12-31',
  'radius': 1.27,
  'radius_unit': 'mi',
  'analysis_id': 'lgMSAce96GU',
  'analysis_readiness_status': 'ready',
  'interval': 'day',
  'week_start_day': None,
  'important_features': ['phq_rank_public_holidays',
   'phq_attendance_conferences',
   'phq_attendance_concerts',
   'phq_attendance_festivals',
   'phq_attendance_expos',
   'phq_attendance_school_holidays',
   'phq_attendance_performing_arts',
   'phq_attendance_sports',
   'phq_rank_observances'],
  'feature_importance': [{'feature_group': 'public-holidays',
    'features': ['phq_rank_public_holidays'],
    'p_value': 0.0,
    'important': True},
   {'feature_group': 'conferences',
    'features': ['phq_attendance_conferences'],
    'p_value': 0.0,
    'important': True},
   {'feature_group': 'concerts',
    'feature

In [6]:
# load group config file
with open(os.path.join(OUTPUT_DIR, "group_config_with_features.json"), "r") as f:
    group_config = json.load(f)

In [7]:
group_config

{'group_A': {'name': 'group_A_analysis',
  'locations': ['store_0', 'store_1'],
  'analysis_ids': ['lgMSAce96GU', 'tQObpRbWpq0'],
  'group_id': 'spfEtaF4hBk',
  'group_status': {'readiness_status': 'ready',
   'feature_importance_processing_completed': True},
  'important_features': ['phq_attendance_concerts',
   'phq_attendance_conferences',
   'phq_attendance_performing_arts',
   'phq_rank_public_holidays',
   'phq_attendance_school_holidays',
   'phq_attendance_expos',
   'phq_attendance_sports',
   'phq_rank_observances',
   'phq_attendance_festivals'],
  'feature_importance': [{'feature_group': 'concerts',
    'features': ['phq_attendance_concerts'],
    'p_value': 0.0,
    'important': True},
   {'feature_group': 'conferences',
    'features': ['phq_attendance_conferences'],
    'p_value': 0.0,
    'important': True},
   {'feature_group': 'performing-arts',
    'features': ['phq_attendance_performing_arts'],
    'p_value': 0.0,
    'important': True},
   {'feature_group': 'public

# Step 2. Features API: Get Features

Get your features of interest for all locations using [Features API](https://docs.predicthq.com/api/features). Features that are not important for a specific location appear as `NaN` and should be ignored.

**Features by Location**

In [8]:
# get features data
features_data = []
for location, info in config.items():
    print(f"Getting features data for {location}...")

    features_list = info.get("features", info.get("important_features", []))
    try:
        features = fau.get_features(info=info, features=features_list, phq_client=phq)
        features.insert(0, "location", location)

        features_data.append(features)

        print(f"--- features retrieved")

    except Exception as e:
        print(f"--- an error occurred: {e}")
        continue

features_df = pd.concat(features_data)

# save features
features_df.to_csv(os.path.join(OUTPUT_DIR, "features.csv"), index=False)

Getting features data for store_0...
--- features retrieved
Getting features data for store_1...
--- features retrieved
Getting features data for store_2...
--- features retrieved
Getting features data for store_3...
--- features retrieved


In [9]:
features_df

Unnamed: 0,location,date,phq_attendance_concerts,phq_attendance_conferences,phq_attendance_expos,phq_attendance_festivals,phq_attendance_performing_arts,phq_attendance_school_holidays,phq_attendance_sports,phq_rank_observances,...,phq_impact_severe_weather_dust_retail,phq_impact_severe_weather_dust_storm_retail,phq_impact_severe_weather_flood_retail,phq_impact_severe_weather_heat_wave_retail,phq_impact_severe_weather_hurricane_retail,phq_impact_severe_weather_thunderstorm_retail,phq_impact_severe_weather_tornado_retail,phq_impact_severe_weather_tropical_storm_retail,phq_rank_academic_exam,phq_rank_academic_holiday
0,store_0,2017-01-02,33356.0,0.0,0.0,0.0,8889.0,0.0,19812.0,0.0,...,,,,,,,,,,
1,store_0,2017-01-03,33469.0,0.0,0.0,0.0,16207.0,0.0,18006.0,0.0,...,,,,,,,,,,
2,store_0,2017-01-04,2790.0,0.0,0.0,0.0,25210.0,0.0,19812.0,1.0,...,,,,,,,,,,
3,store_0,2017-01-05,5991.0,0.0,0.0,0.0,17663.0,0.0,0.0,0.0,...,,,,,,,,,,
4,store_0,2017-01-06,9501.0,0.0,0.0,0.0,19981.0,0.0,19500.0,3.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150,store_3,2019-11-25,,2750.0,28365.0,,242058.0,4772180.0,84080.0,,...,0.0,0.0,24.0,0.0,0.0,0.0,0.0,0.0,0.0,32.0
151,store_3,2019-12-02,,31115.0,105230.0,,231639.0,0.0,83841.0,,...,0.0,0.0,49.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
152,store_3,2019-12-09,,27295.0,27746.0,,224462.0,0.0,19500.0,,...,0.0,0.0,49.0,0.0,0.0,0.0,0.0,0.0,20.0,0.0
153,store_3,2019-12-16,,11545.0,6786.0,,222740.0,0.0,88052.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,36.0


**Features by Group of Locations**

In [10]:
# get features data
features_data = []
for group, group_info in group_config.items():
    print(f"Getting features data for {group}...")

    features_list = group_info.get("features", group_info.get("important_features", []))
    try:
        for location in group_info["locations"]:
            print(f"--- Getting features data for {location}...")

            info = config[location]

            try:
                features = fau.get_features(
                    info=info, features=features_list, phq_client=phq
                )
                features.insert(0, "group", group)
                features.insert(1, "location", location)
                features_data.append(features)

                print(f"    --- features retrieved")

            except Exception as e:
                print(f"--- an error occurred: {e}")
                continue

    except Exception as e:
        print(f"--- an error occurred: {e}")
        continue

features_df = pd.concat(features_data)

# save features
features_df.to_csv(os.path.join(OUTPUT_DIR, "group_features.csv"), index=False)

Getting features data for group_A...
--- Getting features data for store_0...
    --- features retrieved
--- Getting features data for store_1...
    --- features retrieved
Getting features data for group_B...
--- Getting features data for store_2...
    --- features retrieved
--- Getting features data for store_3...
    --- features retrieved


In [11]:
features_df

Unnamed: 0,group,location,date,phq_attendance_concerts,phq_attendance_conferences,phq_attendance_expos,phq_attendance_festivals,phq_attendance_performing_arts,phq_attendance_school_holidays,phq_attendance_sports,...,phq_impact_severe_weather_cold_wave_storm_retail,phq_impact_severe_weather_dust_retail,phq_impact_severe_weather_dust_storm_retail,phq_impact_severe_weather_flood_retail,phq_impact_severe_weather_heat_wave_retail,phq_impact_severe_weather_hurricane_retail,phq_impact_severe_weather_thunderstorm_retail,phq_impact_severe_weather_tornado_retail,phq_impact_severe_weather_tropical_storm_retail,phq_attendance_community
0,group_A,store_0,2017-01-02,33356.0,0.0,0.0,0.0,14517.0,0.0,19812.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,group_A,store_0,2017-01-03,33469.0,0.0,0.0,0.0,26858.0,0.0,18006.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,group_A,store_0,2017-01-04,2210.0,0.0,0.0,0.0,38296.0,0.0,19812.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,group_A,store_0,2017-01-05,5777.0,0.0,0.0,0.0,27215.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,group_A,store_0,2017-01-06,9941.0,0.0,0.0,0.0,31153.0,0.0,19500.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150,group_B,store_3,2019-11-25,,2750.0,28365.0,,,,84080.0,...,,,,,,,,,,7142.0
151,group_B,store_3,2019-12-02,,30823.0,105230.0,,,,83841.0,...,,,,,,,,,,23240.0
152,group_B,store_3,2019-12-09,,27283.0,27746.0,,,,19500.0,...,,,,,,,,,,13647.0
153,group_B,store_3,2019-12-16,,11545.0,6786.0,,,,88052.0,...,,,,,,,,,,3812.0
