# Feature Engineering Guide


- [Overview](#overview)
- [How to use event based features in your model](#usage)
- [The Features API features summary](#summary)
    - [Features for attendance and rank based events](#summary_attend_rank)
    - [Features for impact based features(severe weather)[retail only)]](#summary_weather)
- [Setup](#setup)
- [Access token](#access_token)
- [SDK parameters](#setting_params)
- [Using the Features API to query features for forecasting](#features_api)
    - [Functions for formating data frame](#functions)
    - [Attendance based features](#attend)
    - [Rank based features](#rank)
    - [Impact based features](#impact)
- [Using a longer date range](#wide_range)
    - [Common functions](#functions_wide)
    - [Attendance based features](#attend_wide)
    - [Rank based features](#rank_wide)
    - [Impact based features](#impact_wide)

<a id='overview'></a>
## Overview

Creating Event-Based Features for Demand Forecasting Using PredictHQ's Features API SDK

<a id='usage'></a>
## How to use event based features in your model

1. Exploration of Available Event-based Features
- Familiarize yourself with all the event-based features outlined in this guide.
2. Data Preparation
- Select your location of interest by specifying the latitude and longitude coordinates.
- Generate suggested radius for your industry using the Suggested Radius API.
- Define the time period of interest with a start and end date, which will be utilized for the Features API query.
- Aggregate your training data on a daily basis, ensuring to include the date as a feature for subsequent data consolidation.
3. Event-based Features Evaluation
- Integrate event-based features into your model.
- Assess model performance and the importance of the newly incorporated features.
4. Model Selection
- Choose your final model and prepare it for deployment in a production environment.
5. Engineering Collaboration
- Collaborate with your engineering team to incorporate the new features into your production pipeline.
- Utilize the Features API for querying and retrieving these features as needed.
6. Production Deployment
- Deploy your enhanced model, now integrated with event-based features, in a production setting.

<p>Below is a simplified outline on integrating event-based features into your system. For a more robust production implementation, it is advisable to store or cache the features retrieved from the Features API prior to utilizing them in a production setting. This measure enhances the robustness of your implementation, as online service calls inherently carry a level of risk. Subsequently, ensure to update your cached copy of the features on a regular basis. </p>

<img src="./features-engineering-architecture-diagram.png">


<a id='summary'></a>
## The Features API features summary
<p>Below is a summary of available features from the Features API, which you may consider integrating into your models. The table shows the name of each feature, the type of statistical value from the Features API to utilize for aggregation (e.g., sum represents the total of values of PHQ attendance on a specified day), and notes instructing on the appropriate radius setting for each feature. Further down in this guide, example code and detailed instructions on utilizing these features are provided.</p>

<a id='summary_attend_rank'></a>
### Features for Attendance and Rank Based Events
 
<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Available Features from Features API</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Aggregation Stat Type</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Radius Setting Notes</strong></span></p></td></tr>
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Community</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_community</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Concerts</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_concerts</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Conferences</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_conferences</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Expos</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_expos</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Festivals</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_festivals</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Performing Arts</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_performing_arts</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Sports</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_sports</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Observances</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_observances</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Public Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_public_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">School Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_school_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">School Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_school_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Graduation</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_academic_graduation</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Social</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_academic_social</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Session</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_session</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Exam</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_exam</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Holiday</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_holiday</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr>
</table>

<a id='summary_weather'></a>
### Features for Severe Weather (retail only)
<p>The features below are for the retail industry only. The severe weather features use demand impact patterns. Demand impact patterns calculate impact duration of a severe weather event and are based on industry specific information. Our severe weather features are currently designed and tested on data for the retail segment only. If your business is in an industry segment other than retail (e.g. accomodation or travel) then the features below may not work for you or may be less effective.</p><p>

<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c36" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Available Features from Features API</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Aggregation Stat Type</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Radius Setting Notes</strong></span></p></td></tr>
<tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Air quality)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_impact_severe_weather_air_quality_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Blizzard)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_impact_severe_weather_blizzard_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave - snow)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_snow_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave - storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Dust)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_dust_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Dust - Storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_dust_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Flood)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_flood_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Heat wave)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_heat_wave_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Hurricane)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_hurricane_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Thunderstorm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_thunderstorm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Tornado)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_tornado_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Tropical Storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_tropical_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter(3 feet)</span></p></td></tr>
</table>
</p>

<a id='setup'></a>
## Setup

- If you're using Google Colab, uncomment and run the following code block.

In [13]:
# %%capture
# !git clone https://github.com/predicthq/phq-data-science-docs.git
# %cd phq-data-science-docs/feature-engineering-guide
# !pip install pandas==1.1.5 shapely==1.8.0 timezonefinder==5.2.0 predicthq==2.0.6 numpy==1.20.3

- Alternatively if you're running this notebook on a local machine, you can set up a Python environment using [requirements.txt](https://github.com/predicthq/phq-data-science-docs/blob/master/feature-engineering-guide/requirements.txt) file shared alongside the notebook.
You can install the necessary requirements by executing the command `pip install -r requirements.txt`.

In [14]:
import pandas as pd
from predicthq import Client
import requests
import collections
import numpy as np
from datetime import datetime, date, timedelta

# To display more columns and with a larger width in the DataFrame
pd.set_option("display.max_columns", 50)

<a id='access_token'></a>
## Access token
An Access Token is required to query the API.

The following link will guide you through the process of creating an account and generating an access token. 

 - https://docs.predicthq.com/guides/quickstart/

In [15]:
# Replace Access Token with your access token.
ACCESS_TOKEN = 'aasjrd3NPHZPlPrA0TXyDZIKq73t-kCEJbUu70WV'
phq = Client(access_token=ACCESS_TOKEN)

<a id='setting_params'></a>
## SDK parameters
To initiate a search for event-based features, begin by constructing a parameter dictionary to house the SDK parameters, and incorporate the necessary filters.

In [16]:
parameters = dict()

### Location
Specifying the location is crucial as it ensures that the events utilized for calculating features are relevant to the specified area.

In this notebook, the default location is set to a point in New York, specifically at coordinates 40.7079, -74.0115, which corresponds to Wall Street. Should you be executing this notebook, you have the option to modify the latitude and longitude values to correspond to a location of your choice. 

Location can be set in two ways:  

  1) Utilizing the `location__geo` Parameter
  This parameter encompasses the latitude and longitude of the desired location, coupled with a radius and a designated unit for the radius. This option is particularly useful when targeting events in the vicinity of a specific point, such as a store or hotel.
  
    * Avaliable Units:
        - m: meter
        - km: kilometer
        - mi: mile
  
  
  2) Employing a `place_id`
  This alternative is optimal when the objective is to retrieve events occurring within a broader area like an entire city.



When leveraging the Features API with specified `latitude and longitude` coordinates, it's imperative to define a radius for the query. The Features API will help generate aggregate features representing all events occurring within that defined radius. Events situated outside this radius will not be encompassed in the generated features. To ascertain a suitable radius for a particular location, you may utilize the Suggested Radius API.

On the other hand, if you opt to use a `place_id`, the necessity to set a radius is obviated. This option automatically fetches all events within the designated area associated with the place_id, thus providing a broader scope of event data. This distinction allows for flexibility in data retrieval based on the granularity or expansiveness of the geographical area you are interested in examining.

In [17]:
# Using latitude, longitude and a radius
# Comment out this cell if you want to use a place_id
LATTITUDE = "40.7079" # lat, lon for center of New York City
LONGITUDE = "-74.0115" 

##### Using Suggested Radius API to set radius
The Suggested Radius API is powered by a machine learning model that looks at factors like population density, the number of events around a location, the customer’s industry, and many other factors to determine the ideal radius.
The Suggested Radius API returns a radius that can be used to find attended events around a given location. When looking for events around a business location (such as a store, a hotel, or another business location) a key question is how far should you look for events. For example, should you look at events in a 0.5-mile radius, a 2-mile radius, or a 10-mile radius from your location? The Suggested Radius API answers this question by returning a radius based on a number of factors that can be used to retrieve events around a location.

If you've used the Suggested Radius API (beta) before, please note that this updated version now allows you to specify the radius unit. The previous response value was in ***meters***.

However, you now have the flexibility to choose from the following units:
- m: meters (default)
- km: kilometers
- ft: feet
- mi: miles 


For more information, please refer to our [Suggested Radius API](https://docs.predicthq.com/resources/suggested-radius) doc.



In [18]:
def get_suggested_radius(lat, lon, industry, radius_unit):
    """
    Returns the suggested radius for a given latitude and longitude.

    Args:
        lat: The latitude of the location.
        lon: The longitude of the location.
        industry: The industry of interest that the radius will be calculated for. 
        radius_unit: Unit in which the suggested radius will be returned.
        
    Returns:
        The suggested radius in your perferred unit.
    """
     # Set the url for the API call
    url = "https://api.predicthq.com/v1/suggested-radius/"
    # Set the query parameters for the API call
    params = {
        "location.origin": f"{lat},{lon}", 
        "industry": industry, 
        "radius_unit": radius_unit 
    }
     # Set the headers for the API call (including the access token)
    headers={
              "Authorization": "Bearer " + ACCESS_TOKEN,
              "Accept": "application/json"
            }
    # Make the API call and get the JSON response
    response = requests.get(url, params=params, headers=headers)
    if response.status_code == 200:
        SUGGESTED_RADIUS = f"{response.json()['radius']}{response.json()['radius_unit']}"
        return SUGGESTED_RADIUS
    else:
        print("Error: " + str(response.status_code))
        print(response.text)
    

In [19]:
# Modify this part if you want to use a different radius
SUGGESTED_RADIUS = get_suggested_radius(LATTITUDE, LONGITUDE, 'other','mi')
SUGGESTED_RADIUS

'2.08mi'

In [20]:
# Set the parameters for the API call
parameters.update(location__geo=dict(lat=LATTITUDE, lon=LONGITUDE,radius=SUGGESTED_RADIUS))

Alternatively, we can utilize a `place_id` for our search.

In [21]:
## Keep this part commented if you want to use lat and lon
#place_ids = [5128638]
#parameters.update(location__place_id=place_ids) 

### Date "YYYY-MM-DD"

To specify the time frame for which you want events to be retrieved, you can set the `active__gte` (greater than or equal to) and `active__lte` (less than or equal to) parameters. This will filter for all Attendance-Based Events that are active within this time frame.

You may also use the following parameters to fine-tune your time frame based on your specific needs:

`gte - Greater than or equal to.` <br>
`gt - Greater than.`<br>
`lte - Less than or equal to.`<br>
`lt - Less than.`<br>


Each request can fetch data for up to a 90-day period. If you need data for a longer time frame, you will need to make multiple requests. We have provided [examples](#using-a-longer-date-range) on how to do this within this notebook. Please note that the Features API does not support pagination.

In [22]:
START_TIME = "2021-09-01"
END_TIME = "2021-11-28"
parameters.update(active__gte = START_TIME, active__lte = END_TIME)

<a id='features_api'></a>
## Using the Features API to query features for forecasting

<a id='functions'></a>
### Functions for formating data frame
The default response from the Features API is in JSON format. To convert this response into a more usable data frame format, the following functions have been defined:


In [23]:
def dict_value_by_flatten_key(dict_record, flatten_key):
    return reduce(lambda d, k: d.get(k) if isinstance(d, dict) else None,
                  flatten_key.split('.'),
                  dict_record)


def flatten_dict(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

<a id='attend'></a>
### Attendance-based features

This group of features is based on PHQ Attendance. The following features are supported:

- `phq_attendance_sports`
- `phq_attendance_conferences`
- `phq_attendance_expos`
- `phq_attendance_concerts`
- `phq_attendance_festivals`
- `phq_attendance_performing_arts`
- `phq_attendance_community`
- `phq_attendance_academic_graduation`
- `phq_attendance_academic_social` (For Academic features, we recommend using the three rank-based features shown in Rank-Based Features section.)

Each of these features provides statistics. You can specify which statistics you need, or if none are specified, you will receive the default set of statistics. The supported statistics are: 

- `sum` (Recommended as a starting point)
- `count` 
- `min`
- `max`
- `avg`
- `median`
- `std_dev`

Additionally, these features allow for filtering by PHQ Rank, as demonstrated in the example below.

#### Setup SDK parameters
Specify a list of Attendance-Based Event Features to return.

In [24]:
CATEGORIIES_ATTENDED = [
    "phq_attendance_sports",
    "phq_attendance_conferences",
    "phq_attendance_expos",
    "phq_attendance_concerts",
    "phq_attendance_festivals",
    "phq_attendance_performing_arts",
    "phq_attendance_community",
    "phq_attendance_school_holidays",
]

# only return sum of the attendance
stats = ["sum"]

for i in CATEGORIIES_ATTENDED:
    parameters.update({f"{i}__stats": stats})

parameters

{'location__geo': {'lat': '40.7079', 'lon': '-74.0115', 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_sports__stats': ['sum'],
 'phq_attendance_conferences__stats': ['sum'],
 'phq_attendance_expos__stats': ['sum'],
 'phq_attendance_concerts__stats': ['sum'],
 'phq_attendance_festivals__stats': ['sum'],
 'phq_attendance_performing_arts__stats': ['sum'],
 'phq_attendance_community__stats': ['sum'],
 'phq_attendance_school_holidays__stats': ['sum']}

#### Rank filter
Events of low or high rank can be filtered out when calculating features, if desired. Simply set the parameters for greater than and equal to/greater than (`gte/gt`) and less than and equal to/less than (`lte/lt`) for the features of interest. For instance, this setup enables you to exclude smaller events if you initially want to focus on larger ones.

For further insight on how rank is related with attendance, please refer to [Predicted Attendance](https://docs.predicthq.com/getting-started/predicthq-data/predicted-attendance).

In [25]:
PHQ_RANK_THRESHOLD = 30

parameters.update(phq_attendance_sports__phq_rank=dict(gte = PHQ_RANK_THRESHOLD))

for i in CATEGORIIES_ATTENDED:
    parameters.update({f"{i}__phq_rank":{'gte': PHQ_RANK_THRESHOLD}})

parameters

{'location__geo': {'lat': '40.7079', 'lon': '-74.0115', 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_sports__stats': ['sum'],
 'phq_attendance_conferences__stats': ['sum'],
 'phq_attendance_expos__stats': ['sum'],
 'phq_attendance_concerts__stats': ['sum'],
 'phq_attendance_festivals__stats': ['sum'],
 'phq_attendance_performing_arts__stats': ['sum'],
 'phq_attendance_community__stats': ['sum'],
 'phq_attendance_school_holidays__stats': ['sum'],
 'phq_attendance_sports__phq_rank': {'gte': 30},
 'phq_attendance_conferences__phq_rank': {'gte': 30},
 'phq_attendance_expos__phq_rank': {'gte': 30},
 'phq_attendance_concerts__phq_rank': {'gte': 30},
 'phq_attendance_festivals__phq_rank': {'gte': 30},
 'phq_attendance_performing_arts__phq_rank': {'gte': 30},
 'phq_attendance_community__phq_rank': {'gte': 30},
 'phq_attendance_school_holidays__phq_rank': {'gte': 30}}

#### Query features

In [27]:
results = []

for feature in phq.features.obtain_features(parameters):
    results.append(flatten_dict(feature.to_dict(), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

  if isinstance(v, collections.MutableMapping):


Unnamed: 0,date,phq_attendance_community_stats_sum,phq_attendance_concerts_stats_sum,phq_attendance_conferences_stats_sum,phq_attendance_expos_stats_sum,phq_attendance_festivals_stats_sum,phq_attendance_performing_arts_stats_sum,phq_attendance_school_holidays_stats_sum,phq_attendance_sports_stats_sum
0,2021-09-01,360.0,2247.0,0.0,593.0,0.0,232.0,1259481.0,0.0
1,2021-09-02,540.0,948.0,0.0,510.0,0.0,232.0,1259481.0,0.0
2,2021-09-03,440.0,3288.0,0.0,791.0,0.0,632.0,1259481.0,0.0
3,2021-09-04,960.0,1177.0,0.0,780.0,45869.0,864.0,1259481.0,240.0
4,2021-09-05,520.0,1829.0,0.0,514.0,38224.0,1064.0,1259481.0,0.0


#### Additional Features for School Holidays

<b>Two additional useful features can be derived from </b> `phq_attendance_school_holidays_stats_sum`:
* `phq_school_holidays_first_day_flag`: A binary variable indicating whether the day is the first day of any school holidays within the selected radius at the specified location.
* `phq_school_holidays_last_day_flag`: A binary variable indicating whether the day is the last day of any school holidays within the selected radius at the specified location.
* please note that the value of first row's `phq_school_holidays_first_day_flag` and the value of last row's `phq_school_holidays_last_day_flag` could be </b>NaN</b> as these two features are derivated from your selected time range, which may not cover the entire school holiday period.

In [28]:
# creating shifted attendance
feature_df['temp_pre'] = feature_df['phq_attendance_school_holidays_stats_sum'].shift(1)
feature_df['temp_after'] = feature_df['phq_attendance_school_holidays_stats_sum'].shift(-1)

# first day flag
feature_df['phq_school_holidays_first_day_flag'] = feature_df[[
    'phq_attendance_school_holidays_stats_sum','temp_pre']].apply(
        lambda x: np.nan if pd.isna(x[1]) else 1 if  x[0] > x[1] else 0, axis=1)

# last day flag
feature_df['phq_school_holidays_last_day_flag'] = feature_df[[
    'phq_attendance_school_holidays_stats_sum','temp_after']].apply(
        lambda x: np.nan if pd.isna(x[1]) else 1 if  x[0] > x[1] else 0, axis=1)

# remove temeorary features
feature_df.drop(['phq_school_holidays_first_day_flag', 'phq_school_holidays_last_day_flag'], axis=1)

feature_df.head()

Unnamed: 0,date,phq_attendance_community_stats_sum,phq_attendance_concerts_stats_sum,phq_attendance_conferences_stats_sum,phq_attendance_expos_stats_sum,phq_attendance_festivals_stats_sum,phq_attendance_performing_arts_stats_sum,phq_attendance_school_holidays_stats_sum,phq_attendance_sports_stats_sum,temp_pre,temp_after,phq_school_holidays_first_day_flag,phq_school_holidays_last_day_flag
0,2021-09-01,360.0,2247.0,0.0,593.0,0.0,232.0,1259481.0,0.0,,1259481.0,,0.0
1,2021-09-02,540.0,948.0,0.0,510.0,0.0,232.0,1259481.0,0.0,1259481.0,1259481.0,0.0,0.0
2,2021-09-03,440.0,3288.0,0.0,791.0,0.0,632.0,1259481.0,0.0,1259481.0,1259481.0,0.0,0.0
3,2021-09-04,960.0,1177.0,0.0,780.0,45869.0,864.0,1259481.0,240.0,1259481.0,1259481.0,0.0,0.0
4,2021-09-05,520.0,1829.0,0.0,514.0,38224.0,1064.0,1259481.0,0.0,1259481.0,1259481.0,0.0,0.0


<a id='rank'></a>
### Rank based features

This group of features is based on PHQ Rank for non-attendance based events (primarily scheduled non-attendance based ones). The following features are supported:

- `phq_rank_public_holidays`
- `phq_rank_school_holidays` (For US and UK we recommend using `phq_attendance_school_holidays`)
- `phq_rank_observances`
- `phq_rank_academic_session`
- `phq_rank_academic_exam`
- `phq_rank_academic_holiday`

PHQ Rank is on a scale of 0 to 100 and the levels are bucketed as:

- 1 - Minor (rank between 0 and 20)
- 2 - Moderate (rank between 21 and 40)
- 3 - Important (rank between 41 and 60)
- 4 - Significant (rank between 61 and 80)
- 5 - Major (rank between 81 and 100)

Additional filtering for PHQ Rank features is not currently supported.

#### Setup SDK parameters

Specify a list of Rank-Based Event Features to return.

In [29]:
CATEGORIIES_RANK = [
     "phq_rank_observances",
     "phq_rank_public_holidays",
     "phq_rank_school_holidays",
     "phq_rank_academic_session",
     "phq_rank_academic_exam",
     "phq_rank_academic_holiday",
]

for i in CATEGORIIES_RANK:
    parameters.update({f"{i}": True})

parameters

{'location__geo': {'lat': '40.7079', 'lon': '-74.0115', 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_sports__stats': ['sum'],
 'phq_attendance_conferences__stats': ['sum'],
 'phq_attendance_expos__stats': ['sum'],
 'phq_attendance_concerts__stats': ['sum'],
 'phq_attendance_festivals__stats': ['sum'],
 'phq_attendance_performing_arts__stats': ['sum'],
 'phq_attendance_community__stats': ['sum'],
 'phq_attendance_school_holidays__stats': ['sum'],
 'phq_attendance_sports__phq_rank': {'gte': 30},
 'phq_attendance_conferences__phq_rank': {'gte': 30},
 'phq_attendance_expos__phq_rank': {'gte': 30},
 'phq_attendance_concerts__phq_rank': {'gte': 30},
 'phq_attendance_festivals__phq_rank': {'gte': 30},
 'phq_attendance_performing_arts__phq_rank': {'gte': 30},
 'phq_attendance_community__phq_rank': {'gte': 30},
 'phq_attendance_school_holidays__phq_rank': {'gte': 30},
 'phq_rank_observances': True,
 'phq_rank_public_holidays': True,
 'phq_ran

#### Query features

In [30]:
results = []

for feature in phq.features.obtain_features(parameters):
    results.append(flatten_dict(feature.to_dict(), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

Unnamed: 0,date,phq_attendance_community_stats_sum,phq_attendance_concerts_stats_sum,phq_attendance_conferences_stats_sum,phq_attendance_expos_stats_sum,phq_attendance_festivals_stats_sum,phq_attendance_performing_arts_stats_sum,phq_attendance_school_holidays_stats_sum,phq_attendance_sports_stats_sum,phq_rank_observances_rank_levels_1,phq_rank_observances_rank_levels_2,phq_rank_observances_rank_levels_3,phq_rank_observances_rank_levels_4,phq_rank_observances_rank_levels_5,phq_rank_public_holidays_rank_levels_1,phq_rank_public_holidays_rank_levels_2,phq_rank_public_holidays_rank_levels_3,phq_rank_public_holidays_rank_levels_4,phq_rank_public_holidays_rank_levels_5,phq_rank_school_holidays_rank_levels_1,phq_rank_school_holidays_rank_levels_2,phq_rank_school_holidays_rank_levels_3,phq_rank_school_holidays_rank_levels_4,phq_rank_school_holidays_rank_levels_5,phq_rank_academic_session_rank_levels_1,phq_rank_academic_session_rank_levels_2,phq_rank_academic_session_rank_levels_3,phq_rank_academic_session_rank_levels_4,phq_rank_academic_session_rank_levels_5,phq_rank_academic_exam_rank_levels_1,phq_rank_academic_exam_rank_levels_2,phq_rank_academic_exam_rank_levels_3,phq_rank_academic_exam_rank_levels_4,phq_rank_academic_exam_rank_levels_5,phq_rank_academic_holiday_rank_levels_1,phq_rank_academic_holiday_rank_levels_2,phq_rank_academic_holiday_rank_levels_3,phq_rank_academic_holiday_rank_levels_4,phq_rank_academic_holiday_rank_levels_5
0,2021-09-01,360.0,2247.0,0.0,593.0,0.0,232.0,1259481.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,5,0,0,0,0,0,0,0,0,0,3,0
1,2021-09-02,540.0,948.0,0.0,510.0,0.0,232.0,1259481.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
2,2021-09-03,440.0,3288.0,0.0,791.0,0.0,632.0,1259481.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
3,2021-09-04,960.0,1177.0,0.0,780.0,45869.0,864.0,1259481.0,240.0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
4,2021-09-05,520.0,1829.0,0.0,514.0,38224.0,1064.0,1259481.0,0.0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0


#### Aggregate Rank Levels
As mentioned earlier, each feature encompasses five rank levels. These levels can be aggregated based on specific requirements. Below is an example demonstrating how to aggregate rank levels 3 and 4 of `phq_rank_observances` by calculating the level-weighted sum:

In [32]:
# Columns to be aggregated
columns_to_aggregate = ['phq_rank_observances_rank_levels_3', 'phq_rank_observances_rank_levels_4']
# Add a new column with the aggregated values
feature_df['phq_rank_observancesphq_rank_observances_rank_agg'] = feature_df.apply(lambda row: sum(int(col.split('_')[-1]) * int(row[col]) for col in feature_df.columns if col in columns_to_aggregate), axis=1)
# Display the new column with aggregated values
feature_df['phq_rank_observancesphq_rank_observances_rank_agg']

0     0
1     0
2     0
3     3
4     3
     ..
84    0
85    3
86    6
87    0
88    3
Name: phq_rank_observancesphq_rank_observances_rank_agg, Length: 89, dtype: int64

<a id='impact'></a>
### Impact-Based Features (Severe Weather)

<b>Severe weather</b> is currently the only category encompassing impact based features. These features utilize demand impact patterns to gauge the duration of impact caused by severe weather events, drawing upon industry-specific information. At present, our severe weather features are tailored and validated solely for the retail sector. For instance, during a flood event, the impact pattern might indicate a typical effect on retail businesses 1 day prior and 2 days post-event. This impact pattern information is employed in the features outlined below. If your business operates within a different industry segment such as Accommodation or Travel, the following features may not suit your needs or may exhibit reduced effectiveness.

For more details on severe weather, please refer to [Data Science Guide](https://docs.predicthq.com/datascience/severe-weather-events).

There are 13 features avaiable for the retail industry from the Features API:

- `phq_impact_severe_weather_air_quality_retail`
- `phq_impact_severe_weather_blizzard_retail`
- `phq_impact_severe_weather_cold_wave_retail`
- `phq_impact_severe_weather_cold_wave_snow_retail`
- `phq_impact_severe_weather_cold_wave_storm_retail`
- `phq_impact_severe_weather_dust_retail`
- `phq_impact_severe_weather_dust_storm_retail`
- `phq_impact_severe_weather_flood_retail`
- `phq_impact_severe_weather_heat_wave_retail`
- `phq_impact_severe_weather_hurricane_retail`
- `phq_impact_severe_weather_thunderstorm_retail`
- `phq_impact_severe_weather_tornado_retail`
- `phq_impact_severe_weather_tropical_storm_retail`

Similar to Attendance-Based Events, each of these features includes 7 statistical types:
- `sum` 
- `count` 
- `min`
- `max` (Recommended as a starting point)
- `avg`
- `median`
- `std_dev`

#### Radius setting for Severe Weather

The distance between a store and an event is determined by the shortest distance between the store and the points forming the event's polygon. If the store is located within the event's polygon, the distance is considered to be 0 km. By default, the radius is set to 0 km, implying that the events utilized for aggregation and feature engineering have polygons that overlap with the store's location.

<b>Note:  When employing  the `geo__location` parameter in the Features API to query for features around a specified radius, it's advisable to select a radius of 1 meter/3 feet, as the `geo__location` parameter doesn’t support a radius of 0.</b>

#### Setup SDK parameters

In [33]:
parameters = dict()
# update the radius for impact-based features
RADIUS_SERVE_WEATHER = "1m" # radius in meters 
parameters.update(location__geo=dict(lat=LATTITUDE, lon=LONGITUDE,radius=RADIUS_SERVE_WEATHER)) 

parameters.update(active__gte = START_TIME)
parameters.update(active__lte = END_TIME)

parameters

{'location__geo': {'lat': '40.7079', 'lon': '-74.0115', 'radius': '1m'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28'}

In [34]:
CATEGORIIES_IMPACT = [
     "phq_impact_severe_weather_air_quality_retail",
     "phq_impact_severe_weather_blizzard_retail",
     "phq_impact_severe_weather_cold_wave_retail",
     "phq_impact_severe_weather_cold_wave_snow_retail",
     "phq_impact_severe_weather_cold_wave_storm_retail",
     "phq_impact_severe_weather_dust_retail",
     "phq_impact_severe_weather_dust_storm_retail",
     "phq_impact_severe_weather_flood_retail",
     "phq_impact_severe_weather_heat_wave_retail",
     "phq_impact_severe_weather_hurricane_retail",
     "phq_impact_severe_weather_thunderstorm_retail",
     "phq_impact_severe_weather_tornado_retail",
     "phq_impact_severe_weather_tropical_storm_retail",
]

# only return max of the impact-based features
stats = ["max"]
for i in CATEGORIIES_IMPACT:
    parameters.update({f"{i}": {'stats': stats}})

parameters

{'location__geo': {'lat': '40.7079', 'lon': '-74.0115', 'radius': '1m'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_impact_severe_weather_air_quality_retail': {'stats': ['max']},
 'phq_impact_severe_weather_blizzard_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_snow_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_storm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_dust_retail': {'stats': ['max']},
 'phq_impact_severe_weather_dust_storm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_flood_retail': {'stats': ['max']},
 'phq_impact_severe_weather_heat_wave_retail': {'stats': ['max']},
 'phq_impact_severe_weather_hurricane_retail': {'stats': ['max']},
 'phq_impact_severe_weather_thunderstorm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_tornado_retail': {'stats': ['max']},
 'phq_impact_severe_weather_tropical_storm_retail': {'stat

#### Query features

In [35]:
results = []

for feature in phq.features.obtain_features(parameters):
    results.append(flatten_dict(feature.to_dict(), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

Unnamed: 0,date,phq_impact_severe_weather_air_quality_retail_stats_max,phq_impact_severe_weather_blizzard_retail_stats_max,phq_impact_severe_weather_cold_wave_retail_stats_max,phq_impact_severe_weather_cold_wave_snow_retail_stats_max,phq_impact_severe_weather_cold_wave_storm_retail_stats_max,phq_impact_severe_weather_dust_retail_stats_max,phq_impact_severe_weather_dust_storm_retail_stats_max,phq_impact_severe_weather_flood_retail_stats_max,phq_impact_severe_weather_heat_wave_retail_stats_max,phq_impact_severe_weather_hurricane_retail_stats_max,phq_impact_severe_weather_thunderstorm_retail_stats_max,phq_impact_severe_weather_tornado_retail_stats_max,phq_impact_severe_weather_tropical_storm_retail_stats_max
0,2021-09-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,82.0,0.0,0.0,86.0,60.0,0.0
1,2021-09-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,80.0,0.0,0.0,34.0,0.0,0.0
2,2021-09-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,0.0,0.0,0.0,0.0,0.0
3,2021-09-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021-09-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<a id='wide_range'></a>
## Using a longer date range
As previously mentioned, the Features API permits a date range of up to 90 days per request. To retrieve data for a longer range, multiple requests are necessary. In this section, we will demonstrate how to extract features for periods exceeding 90 days. 

This approach may be beneficial if you aim to download a historical data set for training your model.

<a id='functions_wide'></a>
### Common functions
Below are functions designed to split a wider date range into multiple 90-day intervals:

In [36]:
DATE_FORMAT = "%Y-%m-%d"
FEATURES_API_URL = "https://api.predicthq.com/v1/features"

phq = Client(access_token=ACCESS_TOKEN)

def get_date_groups(start, end):
    """
    Features API allows a range of up to 90 days, so we have to do several requests
    """

    def _split_dates(s, e):
        capacity = timedelta(days=90)
        interval = 1 + int((e - s) / capacity)
        for i in range(interval):
            yield s + capacity * i
        yield e

    dates = list(_split_dates(start, end))
    for i, (d1, d2) in enumerate(zip(dates, dates[1:])):
        if d2 != dates[-1]:
            d2 -= timedelta(days=1)
        yield d1.strftime(DATE_FORMAT), d2.strftime(DATE_FORMAT)

<a id='attend_wide'></a>
### Attendance-based features

In [40]:
def get_features_api_attendance_data(lat, lon, start, end, radius_filter, rank_threshold):
    """
    Retrieves attendance-based event features from the Features API within a specified date range, location with a specified rank threshold.

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range.
    end: End date of the range.
    radius_filter: The radius filter for geo-location query, e.g., "10km".
    rank_threshold: The minimum PHQ rank threshold for filtering events.

    Returns:
    list: A list of dictionaries where each dictionary contains attendance-based event features for a specific date range.
    """
    
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": radius_filter},
        }

        query.update({f"{f}__stats": ["sum"] for f in CATEGORIIES_ATTENDED})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in CATEGORIIES_ATTENDED}
        )

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.to_dict().items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in CATEGORIIES_ATTENDED:
                    record[k] = v.get("stats", {}).get("sum")
            result.append(record)

    return result

res = get_features_api_attendance_data(LATTITUDE, LONGITUDE, START_TIME, END_TIME, SUGGESTED_RADIUS, PHQ_RANK_THRESHOLD)
df_attended = pd.DataFrame(res)

df_attended.head()

Unnamed: 0,date,phq_attendance_community,phq_attendance_concerts,phq_attendance_conferences,phq_attendance_expos,phq_attendance_festivals,phq_attendance_performing_arts,phq_attendance_school_holidays,phq_attendance_sports
0,2021-09-01,360.0,2247.0,0.0,593.0,0.0,232.0,1259481.0,0.0
1,2021-09-02,540.0,948.0,0.0,510.0,0.0,232.0,1259481.0,0.0
2,2021-09-03,440.0,3288.0,0.0,791.0,0.0,632.0,1259481.0,0.0
3,2021-09-04,960.0,1177.0,0.0,780.0,45869.0,864.0,1259481.0,240.0
4,2021-09-05,520.0,1829.0,0.0,514.0,38224.0,1064.0,1259481.0,0.0


<a id='rank_wide'></a>
### Rank based features

In [41]:
def get_features_api_rank_data(lat, lon, start, end, radius_filter):
    """
    Retrieves rank-based event features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range.
    end: End date of the range.
    radius_filter: The radius filter for geo-location query, e.g., "10km".
    Returns:
    list: A list of dictionaries where each dictionary contains rank-based event features for a specific date range
    """

    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": radius_filter},
        }

        query.update({f"{f}": True for f in CATEGORIIES_RANK})

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.to_dict().items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in CATEGORIIES_RANK:
                    for rank_level, level_count in v.get("rank_levels", {}).items():
                        record[f"{k}_level_{rank_level}"] = float(level_count)

            result.append(record)

    return result

res = get_features_api_rank_data(LATTITUDE, LONGITUDE, START_TIME, END_TIME, SUGGESTED_RADIUS)
df_rank = pd.DataFrame(res)
df_rank.head()

Unnamed: 0,date,phq_rank_observances_level_1,phq_rank_observances_level_2,phq_rank_observances_level_3,phq_rank_observances_level_4,phq_rank_observances_level_5,phq_rank_public_holidays_level_1,phq_rank_public_holidays_level_2,phq_rank_public_holidays_level_3,phq_rank_public_holidays_level_4,phq_rank_public_holidays_level_5,phq_rank_school_holidays_level_1,phq_rank_school_holidays_level_2,phq_rank_school_holidays_level_3,phq_rank_school_holidays_level_4,phq_rank_school_holidays_level_5,phq_rank_academic_session_level_1,phq_rank_academic_session_level_2,phq_rank_academic_session_level_3,phq_rank_academic_session_level_4,phq_rank_academic_session_level_5,phq_rank_academic_exam_level_1,phq_rank_academic_exam_level_2,phq_rank_academic_exam_level_3,phq_rank_academic_exam_level_4,phq_rank_academic_exam_level_5,phq_rank_academic_holiday_level_1,phq_rank_academic_holiday_level_2,phq_rank_academic_holiday_level_3,phq_rank_academic_holiday_level_4,phq_rank_academic_holiday_level_5
0,2021-09-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0
1,2021-09-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0
2,2021-09-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0
3,2021-09-04,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0
4,2021-09-05,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0


<a id='impact_wide'></a>
### Impact-based features
<b> Severe Weather Features </b>

In [43]:
def get_features_api_impact_events(lat, lon, start, end, rank_threshold, radius_filter = RADIUS_SERVE_WEATHER):
    """
    Retrieves impact-based event features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range in 'YYYY-MM-DD' format.
    end: End date of the range in 'YYYY-MM-DD' format.
    rank_threshold: The minimum PHQ rank threshold for filtering events.
    radius_filter: The radius filter for geo-location query, '1m' is the default for impact-based features.
    
    Returns:
    list: A list of dictionaries where each dictionary contains impact-based event features for a specific date range
    """
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": radius_filter},
        }

        query.update({f"{f}__stats": ["max"] for f in CATEGORIIES_IMPACT})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in CATEGORIIES_IMPACT}
        )

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.to_dict().items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                else:
                    record[k] = v.get("stats", {}).get("max")

            result.append(record)

    return result



res = get_features_api_impact_events(LATTITUDE, LONGITUDE, START_TIME, END_TIME, PHQ_RANK_THRESHOLD, RADIUS_SERVE_WEATHER)
df_impact_features = pd.DataFrame(res)

# drop features that only contains 0s
columns_constant = [
    col
    for col in df_impact_features.sum()[1:].to_dict().keys()
    if df_impact_features[col].sum() == 0
]
df_impact_features.drop(columns=columns_constant, inplace=True)

df_impact_features

Unnamed: 0,date,phq_impact_severe_weather_cold_wave_retail,phq_impact_severe_weather_flood_retail,phq_impact_severe_weather_thunderstorm_retail,phq_impact_severe_weather_tornado_retail
0,2021-09-01,0.0,82.0,86.0,60.0
1,2021-09-02,0.0,80.0,34.0,0.0
2,2021-09-03,0.0,28.0,0.0,0.0
3,2021-09-04,0.0,0.0,0.0,0.0
4,2021-09-05,0.0,0.0,0.0,0.0
...,...,...,...,...,...
84,2021-11-24,0.0,0.0,0.0,0.0
85,2021-11-25,0.0,0.0,0.0,0.0
86,2021-11-26,0.0,0.0,0.0,0.0
87,2021-11-27,0.0,0.0,0.0,0.0


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=bab3e6d2-07a9-42d6-8cb8-77f215f9caaf' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>