# Feature Engineering Guide


<b>A how-to guide with details on how to create machine learning features using PredictHQ's Features API</b>. <br>
<b>The aim of this notebook is showcase how the Features API can be used to create features for a location and date range of your choice</b>.

- [Overview](#overview)
- [How to use event based features in your models](#usage)
- [The Features API features summary](#summary)
    - [Features for attendance and rank based events](#summary_attend_rank)
    - [Features for severe weather (retail only)](#summary_weather)
- [Setup](#setup)
- [Access token](#access_token)
- [SDK parameters](#setting_params)
- [Using the Features API to query features for forecasting](#features_api)
    - [Functions for formating data frame](#functions)
    - [Attendance based features](#attend)
    - [Rank based features](#rank)
    - [Impact based features](#impact)
- [Using a longer date range](#wide_range)
    - [Common functions](#functions_wide)
    - [Attendance based features](#attend_wide)
    - [Rank based features](#rank_wide)
    - [Impact based features](#impact_wide)

<a id='overview'></a>
## Overview
Creating Event-Based Features for Demand Forecasting Using PredictHQ's Features API SDK.

<a id='usage'></a>
## How to use event based features in your models

1. Exploration of Available Event-based Features
   - Familiarize yourself with all the event-based features outlined in this guide.
2. Data Preparation
   - Select your location of interest by specifying the latitude and longitude coordinates.
   - Generate suggested radius for your industry using the Suggested Radius API.
   - Define the time period of interest with a start and end date, which will be utilized for the Features API query.
   - Aggregate your training data on a daily basis, ensuring to include the date as a feature for subsequent data consolidation.
3. Event-based Features Evaluation
   - Integrate event-based features into your model.
   - Assess model performance and the importance of the newly incorporated features.
4. Model Selection
   - Choose your final model and prepare it for deployment in a production environment.
5. Engineering Collaboration
   - Collaborate with your engineering team to incorporate the new features into your production pipeline.
   - Utilize the Features API for querying and retrieving these features as needed.
6. Production Deployment
   - Deploy your enhanced model, now integrated with event-based features, in a production setting.

<p>Below is a simplified outline on integrating event-based features into your system. For a more robust production implementation, it is advisable to store or cache the features retrieved from the Features API prior to utilizing them in a production setting. This measure enhances the robustness of your implementation, as online service calls inherently carry a level of risk. Subsequently, ensure to update your cached copy of the features on a regular basis. </p>

<img src="./features-engineering-architecture-diagram.png">


<a id='summary'></a>
## The Features API features summary
<p>Below is a summary of available features from the Features API, which you may consider integrating into your models. The table shows the name of each feature, the type of statistical value from the Features API to utilize for aggregation (e.g., sum represents the total of values of PHQ attendance on a specified day), and notes instructing on the appropriate radius setting for each feature. Further down in this guide, example code and detailed instructions on utilizing these features are provided.</p>


<a id='summary_attend_rank'></a>
### Features for Attendance and Rank Based Events
 
<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Available Features from Features API</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Aggregation Stat Type</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Radius Setting Notes</strong></span></p></td></tr>
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Community</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_community</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Concerts</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_concerts</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Conferences</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_conferences</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Expos</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_expos</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Festivals</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_festivals</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Performing Arts</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_performing_arts</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Sports</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_sports</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use suggested radius or choose a radius around your location</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Observances</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_observances</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Public Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_public_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">School Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_school_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">School Holidays</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_school_holidays</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Graduation</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_academic_graduation</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Social</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_academic_social</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Session</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_session</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Exam</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_exam</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr><tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c7">Academic Holiday</span></p></td><td class="c2" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_rank_academic_holiday</span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c7">n/a</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Use a radius of 1 mile</span></p></td></tr>
</table>

<a id='summary_weather'></a>
### Features for severe weather (retail only)
<p>The features below are for the retail industry only. The severe weather features use demand impact patterns. Demand impact patterns calculate impact duration of a severe weather event and are based on industry specific information. Our severe weather features are currently designed and tested on data for the retail segment only. If your business is in an industry segment other than retail (e.g. accomodation or travel) then the features below may not work for you or may be less effective.</p><p>

<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c36" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Features API feature</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Stats</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Notes</strong></span></p></td></tr>
<tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Air quality)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_impact_severe_weather_air_quality_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Blizzard)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_impact_severe_weather_blizzard_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave - snow)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_snow_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Cold wave - storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_cold_wave_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Dust)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_dust_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Dust - Storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_dust_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Flood)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_flood_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Heat wave)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_heat_wave_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Hurricane)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_hurricane_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Thunderstorm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_thunderstorm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Tornado)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_tornado_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Severe Weather (Tropical Storm)</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_impact_severe_weather_tropical_storm_retail</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">max</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Retail only. Use a radius of 1 meter</span></p></td></tr>
</table>
</p>

<a id='summary_weather'></a>
### Demand Impact Pattern for Attended Based Events
<p>The Demand Impact patterns(DIP) describe the leading and lagging effects of an event on demand, which can vary by event category and industry vertical.</p><p>

#### Accommodation
<p>For accommodation vertical, the following impact features applied.</p><p>
<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c36" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Features API feature</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Stats</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Notes</strong></span></p></td></tr>
<tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Concerts</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_concerts_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Conferences</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_conferences_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Expos</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_expos_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Festivals</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_festivals_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Sports</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_sports_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Performing Arts</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_performing_arts_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Community</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_community_accommodation</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr>
</table>
</p>

#### Hospitality (Food and Beverage)
<p>For hospitality vertical, the following impact features applied.</p><p>
<table class="c28">
<tr class="c23"><td class="c22" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Category</strong></span></p></td><td class="c36" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Features API feature</strong></span></p></td><td class="c10" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Stats</strong></span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c11"><strong>Notes</strong></span></p></td></tr>
<tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Concerts</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_concerts_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Conferences</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c5"><span class="c7">phq_attendance_conferences_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Expos</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_expos_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Festivals</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_festivals_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Sports</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_sports_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Performing Arts</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_performing_arts_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr><tr class="c15"><td class="c25" colspan="1" rowspan="1"><p class="c5"><span class="c7">Community</span></p></td><td class="c19" colspan="1" rowspan="1"><p class="c14"><span class="c7">phq_attendance_community_hospitality</span></p></td><td class="c16" colspan="1" rowspan="1"><p class="c5"><span class="c7">sum</span></p></td><td class="c12" colspan="1" rowspan="1"><p class="c5"><span class="c7">Radius comes from Suggested Radius API.</span></p></td></tr>
</table>
</p>

<a id='setup'></a>
## Setup

- If you're using Google Colab, uncomment and run the following code block.

In [1]:
# %%capture
# !git clone https://github.com/predicthq/phq-data-science-docs.git
# %cd phq-data-science-docs/feature-engineering-guide
# !pip install pandas==1.1.5 shapely==1.8.0 timezonefinder==5.2.0 predicthq==3.1.0 numpy==1.20.3

- Alternatively if you're running this notebook on a local machine, set up a Python environment using [requirements.txt](https://github.com/predicthq/phq-data-science-docs/blob/master/feature-engineering-guide/requirements.txt) file which is shared alongside the notebook.
These requirements can be installed by runing the command `pip install -r requirements.txt`.

In [2]:
import pandas as pd
from predicthq import Client
import requests
import collections
import numpy as np
from datetime import datetime, date, timedelta

# To display more columns and with a larger width in the DataFrame
pd.set_option("display.max_columns", 50)

<a id='access_token'></a>
## Access token
An Access Token is required to query the API.

The following link will guide you through creating an account and an access token. 

 - https://docs.predicthq.com/guides/quickstart/

In [3]:
# Replace Access Token with own access token.
ACCESS_TOKEN = 'REPLACE_WITH_ACCESS_TOKEN'
phq = Client(access_token=ACCESS_TOKEN)

<a id='setting_params'></a>
## SDK parameters
To initiate a search for event-based features, begin by constructing a parameter dictionary to house the SDK parameters, and incorporate the necessary filters.

In [4]:
parameters = dict()

### Location
Specifying the location is crucial as it ensures that the events utilized for calculating features are relevant to the specified area.

In this notebook, the default location is set to a point in New York, specifically at coordinates 40.7079, -74.0115, which corresponds to Wall Street. Should you be executing this notebook, you have the option to modify the latitude and longitude values to correspond to a location of your choice. 

Location can be set in two ways:  

  1) Utilizing the `location__geo` Parameter
  This parameter encompasses the latitude and longitude of the desired location, coupled with a radius and a designated unit for the radius. This option is particularly useful when targeting events in the vicinity of a specific point, such as a store or hotel.
  
    * Avaliable Units:
        - m: meter
        - km: kilometer
        - mi: mile
  
  
  2) Employing a `place_id`
  This alternative is optimal when the objective is to retrieve events occurring within a broader area like an entire city.



When leveraging the Features API with specified `latitude and longitude` coordinates, it's imperative to define a radius for the query. The Features API will help generate aggregate features representing all events occurring within that defined radius. Events situated outside this radius will not be encompassed in the generated features. To ascertain a suitable radius for a particular location, you may utilize the Suggested Radius API.

On the other hand, if you opt to use a `place_id`, the necessity to set a radius is obviated. This option automatically fetches all events within the designated area associated with the place_id, thus providing a broader scope of event data. This distinction allows for flexibility in data retrieval based on the granularity or expansiveness of the geographical area you are interested in examining.

In [5]:
# example od using latitude and longitude for location
# comment out this cell if you want to use a place_id
latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City 

##### Using Suggested Radius API to set radius
The Suggested Radius API is powered by a machine learning model that looks at factors like population density, the number of events around a location, the customer’s industry, and many other factors to determine the ideal radius.
The Suggested Radius API returns a radius that can be used to find attended events around a given location. When looking for events around a business location (such as a store, a hotel, or another business location) a key question is how far should you look for events. For example, should you look at events in a 0.5-mile radius, a 2-mile radius, or a 10-mile radius from your location? The Suggested Radius API answers this question by returning a radius based on a number of factors that can be used to retrieve events around a location.

If you've used the Suggested Radius API (beta) before, please note that this updated version now allows you to specify the radius unit. The previous response value was in ***meters***.

However, you now have the flexibility to choose from the following units:
- m: meters (default)
- km: kilometers
- ft: feet
- mi: miles 


For more information, please refer to our [Suggested Radius API](https://docs.predicthq.com/resources/suggested-radius) doc.

In [6]:
def get_suggested_radius(lat, lon, industry, radius_unit):
    """
    Returns the suggested radius for a given latitude and longitude.

    Args:
        lat: The latitude of the location.
        lon: The longitude of the location.
        industry: The industry of interest that the radius will be calculated for. 
        radius_unit: Unit in which the suggested radius will be returned.
        
    Returns:
        The suggested radius in your perferred unit.
    """
     # Set the url for the API call
    url = "https://api.predicthq.com/v1/suggested-radius/"
    # Set the query parameters for the API call
    params = {
        "location.origin": f"{lat},{lon}", 
        "industry": industry, 
        "radius_unit": radius_unit 
    }
     # Set the headers for the API call (including the access token)
    headers={
              "Authorization": "Bearer " + ACCESS_TOKEN,
              "Accept": "application/json"
            }
    # Make the API call and get the JSON response
    response = requests.get(url, params=params, headers=headers)
    if response.status_code == 200:
        suggested_radius = f"{response.json()['radius']}{response.json()['radius_unit']}"
        return suggested_radius
    else:
        print("Error: " + str(response.status_code))
        print(response.text)
    

In [7]:
# get suggested radius for give latitude and longitude
suggested_radius = get_suggested_radius(latitude, longitude, 'other','mi')
suggested_radius

# if you would like your own radiu, uncomment the folliwng code
#suggested_radius = '5mi'
#suggested_radius

'2.08mi'

In [8]:
# update the parameters for the API call
parameters.update(location__geo=dict(lat=latitude, lon=longitude, radius=suggested_radius))

Alternatively, we could have used a `place_id` for our search (See our [Appendix on place_ids](#appendix) for detailed explanation).

In [9]:
## Keep commented if you want to use lat and lon
#place_ids = [5128638]
#parameters.update(location__place_id=place_ids) 

### Date "YYYY-MM-DD"

To define the period of time for which events will be returned, set the greater than or equal (`active__gte`) and less than or equal (`active__lte`) parameters. This will select all Attendance Based Events that are active within this period.

You could also use these parameters depending on your time period of interest:

`gte - Greater than or equal.` <br>
`gt - Greater than.`<br>
`lte - Less than or equal.`<br>
`lt - Less than.`<br>


Each request can currently fetch up to 90 days worth - for longer date ranges, multiple requests must be made and we have some examples of how to do that in this notebook. There is no pagination in this API.

In [10]:
start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time, active__lte = end_time)

<a id='features_api'></a>
## Using the Features API to query features for forecasting

<a id='functions'></a>
### Functions for formating data frame
The default response from the Features API is in json format, to convert it to a more usable data frame format, the following functions are defined and employed. 

In [11]:
def dict_value_by_flatten_key(dict_record, flatten_key):
    return reduce(lambda d, k: d.get(k) if isinstance(d, dict) else None,
                  flatten_key.split('.'),
                  dict_record)

def flatten_dict(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

<a id='attend'></a>
### Attendance based features

This group of features is based on PHQ Attendance. The following features are supported:

- `phq_attendance_sports`
- `phq_attendance_conferences`
- `phq_attendance_expos`
- `phq_attendance_concerts`
- `phq_attendance_festivals`
- `phq_attendance_performing_arts`
- `phq_attendance_community`
- `phq_attendance_academic_graduation` 
- `phq_attendance_academic_social`(For Academic features, we recommend to use the 3 rank based features in later sections)


Each of these features includes stats. You define which stats you need (or don't define any and receive the default set of stats). Supported stats are:

- `sum` (Recommended to start with)
- `count` 
- `min`
- `max`
- `avg`
- `median`
- `std_dev`

These features also support filtering by PHQ Rank as you'll see in the example below.

#### Setup SDK parameters
Specify a list of Attendance Based Events categories to return.

In [12]:
categories_attended = [
    "phq_attendance_sports",
    "phq_attendance_conferences",
    "phq_attendance_expos",
    "phq_attendance_concerts",
    "phq_attendance_festivals",
    "phq_attendance_performing_arts",
    "phq_attendance_community",
]

# only return sum of the attendance
stats = ["sum"]
#parameters.update(phq_attendance_school_holidays__stats=stats)
for i in categories_attended:
    parameters.update({f"{i}__stats": stats})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_sports__stats': ['sum'],
 'phq_attendance_conferences__stats': ['sum'],
 'phq_attendance_expos__stats': ['sum'],
 'phq_attendance_concerts__stats': ['sum'],
 'phq_attendance_festivals__stats': ['sum'],
 'phq_attendance_performing_arts__stats': ['sum'],
 'phq_attendance_community__stats': ['sum']}

#### Rank filter
Low rank or high rank events can be filtered out when calculating features if desired, just set the greater than and equal/greater than (gte/gt) and less than and equal or less than (lte/lt) parameters for the desired features. For example, this allows you to filter our smaller events if you want to initially concetrate on larger events.

See PHQ Attendance under [General Category Information](https://docs.predicthq.com/categoryinfo/general-category-information) in the documentation for more information on how rank maps to attendance.

In [13]:
phq_rank_filter = 50

# Example 1, set rank filter for a single feature
parameters.update(phq_attendance_sports__phq_rank=dict(gte = phq_rank_filter))

# Example 2, set rank filter for a batch of features
for i in categories_attended:
    parameters.update({f"{i}__phq_rank":{'gte': phq_rank_filter}})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_sports__stats': ['sum'],
 'phq_attendance_conferences__stats': ['sum'],
 'phq_attendance_expos__stats': ['sum'],
 'phq_attendance_concerts__stats': ['sum'],
 'phq_attendance_festivals__stats': ['sum'],
 'phq_attendance_performing_arts__stats': ['sum'],
 'phq_attendance_community__stats': ['sum'],
 'phq_attendance_sports__phq_rank': {'gte': 50},
 'phq_attendance_conferences__phq_rank': {'gte': 50},
 'phq_attendance_expos__phq_rank': {'gte': 50},
 'phq_attendance_concerts__phq_rank': {'gte': 50},
 'phq_attendance_festivals__phq_rank': {'gte': 50},
 'phq_attendance_performing_arts__phq_rank': {'gte': 50},
 'phq_attendance_community__phq_rank': {'gte': 50}}

#### Query features

In [14]:
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

  if isinstance(v, collections.MutableMapping):


Unnamed: 0,date,phq_attendance_community_stats_sum,phq_attendance_concerts_stats_sum,phq_attendance_conferences_stats_sum,phq_attendance_expos_stats_sum,phq_attendance_festivals_stats_sum,phq_attendance_performing_arts_stats_sum,phq_attendance_sports_stats_sum
0,2021-09-01,0.0,1416.0,0.0,0.0,0.0,0.0,0.0
1,2021-09-02,0.0,0.0,0.0,74.0,0.0,0.0,0.0
2,2021-09-03,0.0,2784.0,0.0,89.0,0.0,0.0,0.0
3,2021-09-04,0.0,1090.0,0.0,89.0,45869.0,0.0,0.0
4,2021-09-05,0.0,1616.0,0.0,89.0,38224.0,0.0,0.0


<a id='impact'></a>
#### Demand Impact Pattern - Accommodation

If you are interested in the accommodation industry,there are 7 features avaiable for the impact pattern from the Features API:

- `phq_attendance_concerts_accommodation`
- `phq_attendance_conferences_accommodation`
- `phq_attendance_festivals_accommodation`
- `phq_attendance_expos_accommodation`
- `phq_attendance_sports_accommodation`
- `phq_attendance_performing_arts_accommodation`
- `phq_attendance_community_accommodation`

Similar to Attendance Based Events, each of these feature includes 7 stats:
- `sum` (Recommended to start with)
- `count` 
- `min`
- `max` 
- `avg`
- `median`
- `std_dev`


#### Setup SDK parameters

The Demand Impact Pattern(DIP) is completelt seperate from the Attendance Based Events. You can specify all parameter seperately. 

Please note that for Accommodation vertical:
- the maximum length of `leading` demand impact is `5 days` 
- the maximum length of `lagging` demand impact is `3 days`

If you would like to query the DIP that related to the Attendance Based Events, you need to make sure:
- `start_time` of DIP =  `start_time` of attendance based event - 5 days
- `end_time` of DIP =  `end_time` of attendance based event + 3 days;
and keep the following parameters the same as the Attendance Based Events:
- `latitude`
- `longitude`
- `radius`
- `event categories`

In [15]:
parameters = dict()

latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City

loc_radius = dict()
loc_radius.update(lat=latitude, lon=longitude)

# get suggested radius for give latitude and longitude
suggested_radius = get_suggested_radius(latitude, longitude, 'other','mi')


# uncomment if you want to use your own radius
# default radius is 5km for DIP
# suggested_radius = "5000m"

parameters.update(location__geo=dict(lat=latitude, lon=longitude, radius=suggested_radius))

start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time)
parameters.update(active__lte = end_time)

categories_attended_dip_accommodation = [
    "phq_attendance_concerts_accommodation",
    "phq_attendance_conferences_accommodation",
    "phq_attendance_festivals_accommodation",
    "phq_attendance_expos_accommodation",
    "phq_attendance_sports_accommodation",
    "phq_attendance_performing_arts_accommodation",
    "phq_attendance_community_accommodation"
]

# only return sum of the attendance
stats = ["sum"]

for i in categories_attended_dip_accommodation:
    parameters.update({f"{i}__stats": stats})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_concerts_accommodation__stats': ['sum'],
 'phq_attendance_conferences_accommodation__stats': ['sum'],
 'phq_attendance_festivals_accommodation__stats': ['sum'],
 'phq_attendance_expos_accommodation__stats': ['sum'],
 'phq_attendance_sports_accommodation__stats': ['sum'],
 'phq_attendance_performing_arts_accommodation__stats': ['sum'],
 'phq_attendance_community_accommodation__stats': ['sum']}

#### Rank filter
For Rank filter, the Demand Impact Pattern(DIP) is also seperate from the Attendance Based Events. 

In Accommodation vertical, the minimium rank filter for different event categories are listed below:
- `Concerts`: 50
- `Conferences`: 40
- `Festivals`: 50
- `Expos`: 50
- `Sports`: 50
- `Performing Arts`: 50
- `Community`: 50

For events have rank below than the listed threshold have no Demand Impact Pattern.

In [16]:
phq_rank_filter = 50

# # Example 1, set rank filter for a single feature
# parameters.update(phq_attendance_sports__phq_rank=dict(gte = phq_rank_filter))

# Example 2, set rank filter for a batch of features
for i in categories_attended_dip_accommodation:
    parameters.update({f"{i}__phq_rank":{'gte': phq_rank_filter}})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_concerts_accommodation__stats': ['sum'],
 'phq_attendance_conferences_accommodation__stats': ['sum'],
 'phq_attendance_festivals_accommodation__stats': ['sum'],
 'phq_attendance_expos_accommodation__stats': ['sum'],
 'phq_attendance_sports_accommodation__stats': ['sum'],
 'phq_attendance_performing_arts_accommodation__stats': ['sum'],
 'phq_attendance_community_accommodation__stats': ['sum'],
 'phq_attendance_concerts_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_conferences_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_festivals_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_expos_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_sports_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_performing_arts_accommodation__phq_rank': {'gte': 50},
 'phq_attendance_community_accommodation__phq_rank': {'gte

#### Query features

In [17]:
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

Unnamed: 0,date,phq_attendance_community_accommodation_stats_sum,phq_attendance_concerts_accommodation_stats_sum,phq_attendance_conferences_accommodation_stats_sum,phq_attendance_expos_accommodation_stats_sum,phq_attendance_festivals_accommodation_stats_sum,phq_attendance_performing_arts_accommodation_stats_sum,phq_attendance_sports_accommodation_stats_sum
0,2021-09-01,0.0,1615.0,0.0,19.0,13091.0,0.0,0.0
1,2021-09-02,0.0,142.0,0.0,74.0,19637.0,0.0,0.0
2,2021-09-03,0.0,0.0,0.0,89.0,37091.0,0.0,0.0
3,2021-09-04,0.0,324.0,0.0,89.0,43636.0,0.0,0.0
4,2021-09-05,0.0,1616.0,0.0,89.0,36364.0,0.0,0.0


<a id='impact'></a>
#### Demand Impact Pattern - Hospitality

If you are interested in the hospitality industry,there are 7 features avaiable for the impact pattern from the Features API:

- `phq_attendance_concerts_hospitality`
- `phq_attendance_conferences_hospitality`
- `phq_attendance_festivals_hospitality`
- `phq_attendance_expos_hospitality`
- `phq_attendance_sports_hospitality`
- `phq_attendance_performing_arts_hospitality`
- `phq_attendance_community_hospitality`

Similar to Attendance Based Events, each of these feature includes 7 stats:
- `sum` (Recommended to start with)
- `count` 
- `min`
- `max` 
- `avg`
- `median`
- `std_dev`


#### Setup SDK parameters

The Demand Impact Pattern(DIP) is completelt seperate from the Attendance Based Events. You can specify all parameter seperately. 

Please note that for Hospitality vertical:
- the maximum length of `leading` demand impact is `3 days` 
- the maximum length of `lagging` demand impact is `1 days`

If you would like to query the DIP that related to the Attendance Based Events, you need to make sure:
- `start_time` of DIP =  `start_time` of attendance based event - 3 days
- `end_time` of DIP =  `end_time` of attendance based event + 1 days;
and keep the following parameters the same as the Attendance Based Events:
- `latitude`
- `longitude`
- `radius`
- `event categories`

In [18]:
parameters = dict()

latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City

loc_radius = dict()
loc_radius.update(lat=latitude, lon=longitude)

# get suggested radius for give latitude and longitude
suggested_radius = get_suggested_radius(latitude, longitude, 'other','mi')


# uncomment if you want to use your own radius
# default radius is 5km for DIP
# suggested_radius = "5000m"

parameters.update(location__geo=dict(lat=latitude, lon=longitude, radius=suggested_radius))

start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time)
parameters.update(active__lte = end_time)

categories_attended_dip_hospiality = [
    "phq_attendance_concerts_hospitality",
    "phq_attendance_conferences_hospitality",
    "phq_attendance_festivals_hospitality",
    "phq_attendance_expos_hospitality",
    "phq_attendance_sports_hospitality",
    "phq_attendance_performing_arts_hospitality",
    "phq_attendance_community_hospitality"
]

# only return sum of the attendance
stats = ["sum"]

for i in categories_attended_dip_hospiality:
    parameters.update({f"{i}__stats": stats})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_concerts_hospitality__stats': ['sum'],
 'phq_attendance_conferences_hospitality__stats': ['sum'],
 'phq_attendance_festivals_hospitality__stats': ['sum'],
 'phq_attendance_expos_hospitality__stats': ['sum'],
 'phq_attendance_sports_hospitality__stats': ['sum'],
 'phq_attendance_performing_arts_hospitality__stats': ['sum'],
 'phq_attendance_community_hospitality__stats': ['sum']}

#### Rank filter
For Rank filter, the Demand Impact Pattern(DIP) is also seperate from the Attendance Based Events. 

In Hospitality vertical, the minimium rank filter for different event categories are listed below:
- `Concerts`: 40
- `Conferences`: 40
- `Festivals`: 40
- `Expos`: 40
- `Sports`: 40
- `Performing Arts`: 40
- `Community`: 40

For events have rank below than the listed threshold have no Demand Impact Pattern.

In [19]:
phq_rank_filter = 50

# # Example 1, set rank filter for a single feature
# parameters.update(phq_attendance_sports__phq_rank=dict(gte = phq_rank_filter))

# Example 2, set rank filter for a batch of features
for i in categories_attended_dip_hospiality:
    parameters.update({f"{i}__phq_rank":{'gte': phq_rank_filter}})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '2.08mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_concerts_hospitality__stats': ['sum'],
 'phq_attendance_conferences_hospitality__stats': ['sum'],
 'phq_attendance_festivals_hospitality__stats': ['sum'],
 'phq_attendance_expos_hospitality__stats': ['sum'],
 'phq_attendance_sports_hospitality__stats': ['sum'],
 'phq_attendance_performing_arts_hospitality__stats': ['sum'],
 'phq_attendance_community_hospitality__stats': ['sum'],
 'phq_attendance_concerts_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_conferences_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_festivals_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_expos_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_sports_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_performing_arts_hospitality__phq_rank': {'gte': 50},
 'phq_attendance_community_hospitality__phq_rank': {'gte': 50}}

#### Query features

In [20]:
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))

feature_df = pd.DataFrame(results)

feature_df.head()

Unnamed: 0,date,phq_attendance_community_hospitality_stats_sum,phq_attendance_concerts_hospitality_stats_sum,phq_attendance_conferences_hospitality_stats_sum,phq_attendance_expos_hospitality_stats_sum,phq_attendance_festivals_hospitality_stats_sum,phq_attendance_performing_arts_hospitality_stats_sum,phq_attendance_sports_hospitality_stats_sum
0,2021-09-01,0.0,1476.0,0.0,6.0,0.0,0.0,0.0
1,2021-09-02,0.0,43.0,0.0,74.0,0.0,0.0,0.0
2,2021-09-03,0.0,0.0,0.0,89.0,21818.0,0.0,0.0
3,2021-09-04,0.0,243.0,0.0,89.0,43636.0,0.0,0.0
4,2021-09-05,0.0,1616.0,0.0,89.0,36364.0,0.0,0.0


#### Features for School Holidays
`phq_attendance_school_holidays` is also one of the attendance based features, but it requires a special setting for radius. Currently the school holidays are detailed at the district level for the US, therefore, we recommend setting the radius of 1 mile for the US.

In [21]:
# Reset SKD Parameters
parameters = dict()

latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City

radius_filter = "1mi"

parameters.update(location__geo=dict(lat=latitude, lon=longitude,radius=radius_filter)) 

start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time)
parameters.update(active__lte = end_time)

In [22]:
# only return sum of the attendance
stats = ["sum"]
parameters.update({'phq_attendance_school_holidays__stats': stats})

# add rank filter if required
phq_rank_filter = 50
parameters.update(phq_attendance_school_holidays__phq_rank=dict(gt = phq_rank_filter))
parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '1mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_attendance_school_holidays__stats': ['sum'],
 'phq_attendance_school_holidays__phq_rank': {'gt': 50}}

In [23]:
# Query Features
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))

feature_df = pd.DataFrame(results)
feature_df.head()

Unnamed: 0,date,phq_attendance_school_holidays_stats_sum
0,2021-09-01,1182931.0
1,2021-09-02,1182931.0
2,2021-09-03,1182931.0
3,2021-09-04,1182931.0
4,2021-09-05,1182931.0


<b>Another two useful features can be derived from</b> `phq_attendance_school_holidays_stats_sum`:
* `phq_school_holidays_first_day_flag`: a binary variable which indicates if that day is the first day of any school holidays within the seleacted radius at the selected location.
* `phq_school_holidays_last_day_flag`: a binary variable which indicates if that day is the last day of any school holidays within the seleacted radius at the selected location.
* please note that the value of first row's `phq_school_holidays_first_day_flag` and the value of last row's `phq_school_holidays_last_day_flag` will be </b>NaN</b> as those two features are derivated from the customer selected time range, which cannot guarantee to cover the actual entire school holiday.

In [24]:
# creating shifted attendance
feature_df['temp_pre'] = feature_df['phq_attendance_school_holidays_stats_sum'].shift(1)
feature_df['temp_after'] = feature_df['phq_attendance_school_holidays_stats_sum'].shift(-1)

# first day flag
feature_df['phq_school_holidays_first_day_flag'] = feature_df[[
    'phq_attendance_school_holidays_stats_sum','temp_pre']].apply(
        lambda x: np.nan if pd.isna(x[1]) else 1 if  x[0] > x[1] else 0, axis=1)

# last day flag
feature_df['phq_school_holidays_last_day_flag'] = feature_df[[
    'phq_attendance_school_holidays_stats_sum','temp_after']].apply(
        lambda x: np.nan if pd.isna(x[1]) else 1 if  x[0] > x[1] else 0, axis=1)

# remove temeorary features
feature_df.drop(['phq_school_holidays_first_day_flag', 'phq_school_holidays_last_day_flag'], axis=1)

feature_df.head()

Unnamed: 0,date,phq_attendance_school_holidays_stats_sum,temp_pre,temp_after,phq_school_holidays_first_day_flag,phq_school_holidays_last_day_flag
0,2021-09-01,1182931.0,,1182931.0,,0.0
1,2021-09-02,1182931.0,1182931.0,1182931.0,0.0,0.0
2,2021-09-03,1182931.0,1182931.0,1182931.0,0.0,0.0
3,2021-09-04,1182931.0,1182931.0,1182931.0,0.0,0.0
4,2021-09-05,1182931.0,1182931.0,1182931.0,0.0,0.0


<a id='rank'></a>
### Rank based features

This group of features is based on PHQ Rank for non-attendance based events (mostly scheduled non-attendance based). The following features are supported:

- `phq_rank_public_holidays`
- `phq_rank_school_holidays` (For US and UK we recommend to use `phq_attendance_school_holidays`)
- `phq_rank_observances`
- `phq_rank_academic_session`
- `phq_rank_academic_exam`
- `phq_rank_academic_holiday`


Results are broken down by PHQ Rank Level (1 to 5). Rank Levels are groupings of Rank and are grouped as follows:

- 1 = between 0 and 20
- 2 = between 21 and 40
- 3 = between 41 and 60
- 4 = between 61 and 80
- 5 = between 81 and 100

Additional filtering for PHQ Rank features is not currently supported.

#### Setup SDK parameters

In [25]:
parameters = dict()

latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City

radius_filter = "1mi"

parameters.update(location__geo=dict(lat=latitude, lon=longitude,radius=radius_filter)) 

start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time)
parameters.update(active__lte = end_time)

Specify a list of Rank Based Events categories to return.

In [26]:
categories_rank = [
     "phq_rank_observances",
     "phq_rank_public_holidays",
     "phq_rank_school_holidays",
     "phq_rank_academic_session",
     "phq_rank_academic_exam",
     "phq_rank_academic_holiday",

]


for i in categories_rank:
    parameters.update({f"{i}": True})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '1mi'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_rank_observances': True,
 'phq_rank_public_holidays': True,
 'phq_rank_school_holidays': True,
 'phq_rank_academic_session': True,
 'phq_rank_academic_exam': True,
 'phq_rank_academic_holiday': True}

#### Query features

In [27]:
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))

feature_df = pd.DataFrame(results)
col = list(feature_df.columns)
col = [c for c in col if "rank_observances_rank_levels" in c]
new_types = {c: 'int' for c in col}
feature_df = feature_df.astype(new_types)
for c in col:
    multiply = int(c.split("_")[-1])
    feature_df[c] = feature_df[c]*multiply

feature_df.head()

Unnamed: 0,date,phq_rank_observances_rank_levels_1,phq_rank_observances_rank_levels_2,phq_rank_observances_rank_levels_3,phq_rank_observances_rank_levels_4,phq_rank_observances_rank_levels_5,phq_rank_public_holidays_rank_levels_1,phq_rank_public_holidays_rank_levels_2,phq_rank_public_holidays_rank_levels_3,phq_rank_public_holidays_rank_levels_4,phq_rank_public_holidays_rank_levels_5,phq_rank_school_holidays_rank_levels_1,phq_rank_school_holidays_rank_levels_2,phq_rank_school_holidays_rank_levels_3,phq_rank_school_holidays_rank_levels_4,phq_rank_school_holidays_rank_levels_5,phq_rank_academic_session_rank_levels_1,phq_rank_academic_session_rank_levels_2,phq_rank_academic_session_rank_levels_3,phq_rank_academic_session_rank_levels_4,phq_rank_academic_session_rank_levels_5,phq_rank_academic_exam_rank_levels_1,phq_rank_academic_exam_rank_levels_2,phq_rank_academic_exam_rank_levels_3,phq_rank_academic_exam_rank_levels_4,phq_rank_academic_exam_rank_levels_5,phq_rank_academic_holiday_rank_levels_1,phq_rank_academic_holiday_rank_levels_2,phq_rank_academic_holiday_rank_levels_3,phq_rank_academic_holiday_rank_levels_4,phq_rank_academic_holiday_rank_levels_5
0,2021-09-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,5,0,0,0,0,0,0,0,0,0,3,0
1,2021-09-02,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
2,2021-09-03,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
3,2021-09-04,0,0,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0
4,2021-09-05,0,0,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,6,0,0,0,0,0,0,0,0,0,2,0


#### Aggregate rank levels
As explained at the beginning of this section, each feature has 5 rank levels, so those 5 rank levels can be aggregated based on various requirements. Here we provide an example of aggregating rank level 3 and rank level 4 of `phq_rank_observances`:

In [28]:
feature_df['phq_rank_observancesphq_rank_observances_rank_agg'] = feature_df[
    'phq_rank_observances_rank_levels_3'] + feature_df['phq_rank_observances_rank_levels_4']

feature_df['phq_rank_observancesphq_rank_observances_rank_agg']

0     0
1     0
2     0
3     3
4     3
     ..
84    0
85    3
86    6
87    0
88    3
Name: phq_rank_observancesphq_rank_observances_rank_agg, Length: 89, dtype: int64

<a id='impact'></a>
### Severe Weatherb Demand Impact Pattern - Retail

Our severe weather features are currently designed and tested on data for the <b>Retail  segment</b> only. For example, for a flood event the impact pattern may show that it typically impacts retail businesses 1 day before and 2 days after the event. That impact pattern information is used in the features below. If your business is in an industry segment other than retail (e.g. Accomodation or Travel) then the features below may not work for you or may be less effective.

For more details on severe weather see the [data science guide](https://docs.predicthq.com/datascience/severe-weather-events).

There are 13 features avaiable for the retail industry from the Features API:

- `phq_impact_severe_weather_air_quality_retail`
- `phq_impact_severe_weather_blizzard_retail`
- `phq_impact_severe_weather_cold_wave_retail`
- `phq_impact_severe_weather_cold_wave_snow_retail`
- `phq_impact_severe_weather_cold_wave_storm_retail`
- `phq_impact_severe_weather_dust_retail`
- `phq_impact_severe_weather_dust_storm_retail`
- `phq_impact_severe_weather_flood_retail`
- `phq_impact_severe_weather_heat_wave_retail`
- `phq_impact_severe_weather_hurricane_retail`
- `phq_impact_severe_weather_thunderstorm_retail`
- `phq_impact_severe_weather_tornado_retail`
- `phq_impact_severe_weather_tropical_storm_retail`

Similar to Attendance Based Events, each of these feature includes 7 stats:
- `sum` 
- `count` 
- `min`
- `max` (Recommended to start with)
- `avg`
- `median`
- `std_dev`
For severe weather, we recommend to start with `max` in this notebook.

#### Radius for Severe Weahter

The distance between a store and event is defined as the minimum distance between the store and the points from the polygon. When the store is inside the polygon, the distance between the store and the event is 0km. The default radius is set to 0km, i.e., the events which are used for aggregating and feature engineering have polygons which overlap with the store.

<b>Note: when using the `geo__location` parameter in the Features API to query for features around a radius choose a radius of 1 meter (the `geo__location` parameter doesn’t support a radius of 0).</b>

#### Setup SDK parameters

In [29]:
parameters = dict()

latitude, longitude = (40.7079, -74.0115) # LAT, LONG for centre of New York City
radius_filter = "1m"
parameters.update(location__geo=dict(lat=latitude, lon=longitude,radius=radius_filter)) 

start_time = "2021-09-01"
end_time = "2021-11-28"
parameters.update(active__gte = start_time)
parameters.update(active__lte = end_time)

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '1m'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28'}

In [30]:
categories_impact = [
     "phq_impact_severe_weather_air_quality_retail",
     "phq_impact_severe_weather_blizzard_retail",
     "phq_impact_severe_weather_cold_wave_retail",
     "phq_impact_severe_weather_cold_wave_snow_retail",
     "phq_impact_severe_weather_cold_wave_storm_retail",
     "phq_impact_severe_weather_dust_retail",
     "phq_impact_severe_weather_dust_storm_retail",
     "phq_impact_severe_weather_flood_retail",
     "phq_impact_severe_weather_heat_wave_retail",
     "phq_impact_severe_weather_hurricane_retail",
     "phq_impact_severe_weather_thunderstorm_retail",
     "phq_impact_severe_weather_tornado_retail",
     "phq_impact_severe_weather_tropical_storm_retail",
]

# only return sum of the attendance
stats = ["max"]
#parameters.update(phq_attendance_school_holidays__stats=stats)
for i in categories_impact:
    parameters.update({f"{i}": {'stats': stats}})

# # Similar to Attend Events, low/high rank events can be excluded from calculating features.
# phq_rank_filter = 30

# for i in categories_impact:
#     parameters.update({f"{i}__phq_rank":{'gte':phq_rank_filter}})

parameters

{'location__geo': {'lat': 40.7079, 'lon': -74.0115, 'radius': '1m'},
 'active__gte': '2021-09-01',
 'active__lte': '2021-11-28',
 'phq_impact_severe_weather_air_quality_retail': {'stats': ['max']},
 'phq_impact_severe_weather_blizzard_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_snow_retail': {'stats': ['max']},
 'phq_impact_severe_weather_cold_wave_storm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_dust_retail': {'stats': ['max']},
 'phq_impact_severe_weather_dust_storm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_flood_retail': {'stats': ['max']},
 'phq_impact_severe_weather_heat_wave_retail': {'stats': ['max']},
 'phq_impact_severe_weather_hurricane_retail': {'stats': ['max']},
 'phq_impact_severe_weather_thunderstorm_retail': {'stats': ['max']},
 'phq_impact_severe_weather_tornado_retail': {'stats': ['max']},
 'phq_impact_severe_weather_tropical_storm_retail': {'stats': 

#### Query features

In [31]:
results = []

for feature in phq.features.obtain_features(**parameters):
    results.append(flatten_dict(feature.model_dump(exclude_unset=True, exclude_none=True), '', '_'))
feature_df = pd.DataFrame(results)

feature_df.head()

Unnamed: 0,date,phq_impact_severe_weather_air_quality_retail_stats_max,phq_impact_severe_weather_blizzard_retail_stats_max,phq_impact_severe_weather_cold_wave_retail_stats_max,phq_impact_severe_weather_cold_wave_snow_retail_stats_max,phq_impact_severe_weather_cold_wave_storm_retail_stats_max,phq_impact_severe_weather_dust_retail_stats_max,phq_impact_severe_weather_dust_storm_retail_stats_max,phq_impact_severe_weather_flood_retail_stats_max,phq_impact_severe_weather_heat_wave_retail_stats_max,phq_impact_severe_weather_hurricane_retail_stats_max,phq_impact_severe_weather_thunderstorm_retail_stats_max,phq_impact_severe_weather_tornado_retail_stats_max,phq_impact_severe_weather_tropical_storm_retail_stats_max
0,2021-09-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,82.0,0.0,0.0,86.0,60.0,0.0
1,2021-09-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,80.0,0.0,0.0,34.0,0.0,0.0
2,2021-09-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,0.0,0.0,0.0,0.0,0.0
3,2021-09-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021-09-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<a id='wide_range'></a>
## Using a longer date range
As we mentioned earlier, the Features API only allows a range of up to 90 days, so if want to get a longer range of data, multiple requests have to be made. In this section we will provide examples of how to extract features for more than 90 days. Please note that these examples are not using the SDK.

You may want to use this approach to download a data set of historic data to train your model.

<a id='functions_wide'></a>
### Common functions
Functions to split wide data range into multiple 90 days ranges.

In [32]:
DATE_FORMAT = "%Y-%m-%d"
FEATURES_API_URL = "https://api.predicthq.com/v1/features"

phq = Client(access_token=ACCESS_TOKEN)

def get_date_groups(start, end):
    """
    Features API allows a range of up to 90 days, so we have to do several requests
    """

    def _split_dates(s, e):
        capacity = timedelta(days=90)
        interval = 1 + int((e - s) / capacity)
        for i in range(interval):
            yield s + capacity * i
        yield e

    dates = list(_split_dates(start, end))
    for i, (d1, d2) in enumerate(zip(dates, dates[1:])):
        if d2 != dates[-1]:
            d2 -= timedelta(days=1)
        yield d1.strftime(DATE_FORMAT), d2.strftime(DATE_FORMAT)

<a id='attend_wide'></a>
### Attendance based features

In [33]:
categories_attended = [
    "phq_attendance_sports",
    "phq_attendance_conferences",
    "phq_attendance_expos",
    "phq_attendance_concerts",
    "phq_attendance_festivals",
    "phq_attendance_performing_arts",
    "phq_attendance_community",
    "phq_attendance_school_holidays",
]

def get_features_api_attended_data(lat, lon, start, end, radius, rank_threshold):
    """
    Retrieves attendance-based event features from the Features API within a specified date range, location with a specified rank threshold.

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range.
    end: End date of the range.
    radius_filter: The radius filter for geo-location query, , it is recommended to use the radius value suggested by the Suggested Radius API.
    rank_threshold: The minimum PHQ rank threshold for filtering events.

    Returns:
    list: A list of dictionaries where each dictionary contains attendance-based event features for a specific date range.
    """
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": f"{radius}mi"},
        }

        query.update({f"{f}__stats": ["sum"] for f in categories_attended})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in categories_attended}
        )

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.model_dump(exclude_unset=True, exclude_none=True).items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in categories_attended:
                    record[k] = v.get("stats", {}).get("sum")
            result.append(record)

    return result


res = get_features_api_attended_data(40.7079, -74.0115, "2021-06-01", "2022-07-04", 5, 50)
df_attended = pd.DataFrame(res)

df_attended.head()

Unnamed: 0,date,phq_attendance_community,phq_attendance_concerts,phq_attendance_conferences,phq_attendance_expos,phq_attendance_festivals,phq_attendance_performing_arts,phq_attendance_school_holidays,phq_attendance_sports
0,2021-06-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4195.0
1,2021-06-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14898.0
2,2021-06-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1746.0
3,2021-06-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021-06-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4386.0


<a id='attend_wide'></a>
### DIP for Attendance based features - Accommodation

In [34]:
categories_attended_dip_accommodation = [
    "phq_attendance_concerts_accommodation",
    "phq_attendance_conferences_accommodation",
    "phq_attendance_festivals_accommodation",
    "phq_attendance_expos_accommodation",
    "phq_attendance_sports_accommodation",
    "phq_attendance_performing_arts_accommodation",
    "phq_attendance_community_accommodation"
]

def get_features_api_attended_dip_accommodation_data(lat, lon, start, end, radius, rank_threshold):
    """
    Retrieves Accommodation Demand Impact Pattern features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range in 'YYYY-MM-DD' format.
    end: End date of the range in 'YYYY-MM-DD' format.
    rank_threshold: The minimum PHQ rank threshold for filtering events.
    radius_filter: The radius filter for geo-location query, it is recommended to use the radius value suggested by the Suggested Radius API.
    
    Returns:
    list: A list of dictionaries where each dictionary contains impact-based event features for a specific date range
    """
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": f"{radius}mi"},
        }

        query.update({f"{f}__stats": ["sum"] for f in categories_attended_dip_accommodation})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in categories_attended_dip_accommodation}
        )

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.model_dump(exclude_unset=True, exclude_none=True).items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in categories_attended_dip_accommodation:
                    record[k] = v.get("stats", {}).get("sum")
            result.append(record)

    return result


res = get_features_api_attended_dip_accommodation_data(40.7079, -74.0115, "2021-06-01", "2022-07-04", 5, 50)
df_attended = pd.DataFrame(res)

df_attended.head()

Unnamed: 0,date,phq_attendance_community_accommodation,phq_attendance_concerts_accommodation,phq_attendance_conferences_accommodation,phq_attendance_expos_accommodation,phq_attendance_festivals_accommodation,phq_attendance_performing_arts_accommodation,phq_attendance_sports_accommodation
0,2021-06-01,0.0,0.0,0.0,0.0,0.0,0.0,14624.0
1,2021-06-02,0.0,0.0,0.0,0.0,0.0,0.0,15318.0
2,2021-06-03,0.0,0.0,0.0,0.0,0.0,0.0,4470.0
3,2021-06-04,0.0,0.0,0.0,0.0,0.0,0.0,878.0
4,2021-06-05,0.0,0.0,0.0,0.0,0.0,0.0,4386.0


<a id='attend_wide'></a>
### DIP for Attendance based features - Hospitality

In [35]:
categories_attended_dip_hospiality = [
    "phq_attendance_concerts_hospitality",
    "phq_attendance_conferences_hospitality",
    "phq_attendance_festivals_hospitality",
    "phq_attendance_expos_hospitality",
    "phq_attendance_sports_hospitality",
    "phq_attendance_performing_arts_hospitality",
    "phq_attendance_community_hospitality"
]

def get_features_api_attended_dip_hospitality_data(lat, lon, start, end, radius, rank_threshold):
    """
    Retrieves Hospitality Demand Impact Pattern features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range in 'YYYY-MM-DD' format.
    end: End date of the range in 'YYYY-MM-DD' format.
    rank_threshold: The minimum PHQ rank threshold for filtering events.
    radius_filter: The radius filter for geo-location query, , it is recommended to use the radius value suggested by the Suggested Radius API.

    Returns:
    list: A list of dictionaries where each dictionary contains impact-based event features for a specific date range
    """
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": f"{radius}mi"},
        }

        query.update({f"{f}__stats": ["sum"] for f in categories_attended_dip_hospiality})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in categories_attended_dip_hospiality}
        )

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.model_dump(exclude_unset=True, exclude_none=True).items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in categories_attended_dip_hospiality:
                    record[k] = v.get("stats", {}).get("sum")
            result.append(record)

    return result


res = get_features_api_attended_dip_hospitality_data(40.7079, -74.0115, "2021-06-01", "2022-07-04", 5, 50)
df_attended = pd.DataFrame(res)

df_attended.head()

Unnamed: 0,date,phq_attendance_community_hospitality,phq_attendance_concerts_hospitality,phq_attendance_conferences_hospitality,phq_attendance_expos_hospitality,phq_attendance_festivals_hospitality,phq_attendance_performing_arts_hospitality,phq_attendance_sports_hospitality
0,2021-06-01,0.0,0.0,0.0,0.0,0.0,0.0,9410.0
1,2021-06-02,0.0,0.0,0.0,0.0,0.0,0.0,16367.0
2,2021-06-03,0.0,0.0,0.0,0.0,0.0,0.0,5215.0
3,2021-06-04,0.0,0.0,0.0,0.0,0.0,0.0,1536.0
4,2021-06-05,0.0,0.0,0.0,0.0,0.0,0.0,4386.0


<a id='rank_wide'></a>
### Rank based features

In [36]:
categories_rank = [
     "phq_rank_health_warnings",
     "phq_rank_observances",
     "phq_rank_public_holidays",
     "phq_rank_school_holidays",
     "phq_rank_academic_session",
     "phq_rank_academic_exam",
     "phq_rank_academic_holiday",
]

def get_features_api_data(lat, lon, start, end):
    """
    Retrieves rank-based event features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range.
    end: End date of the range.
    radius_filter: The radius filter for geo-location query, default to 1 mile.
    Returns:
    list: A list of dictionaries where each dictionary contains rank-based event features for a specific date range
    """

    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": "1mi"},
        }

        query.update({f"{f}": True for f in categories_rank})

        features = phq.features.obtain_features(**query)

        for feature in features:
            record = {}
            for k, v in feature.model_dump(exclude_unset=True, exclude_none=True).items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                elif k in categories_rank:
                    for rank_level, level_count in v.get("rank_levels", {}).items():
                        record[f"{k}_level_{rank_level}"] = float(level_count)

            result.append(record)

    return result

res = get_features_api_data(40.7079, -74.0115, "2021-06-01", "2022-07-04")
df_rank = pd.DataFrame(res)
df_rank.head()

Unnamed: 0,date,phq_rank_health_warnings_level_1,phq_rank_health_warnings_level_2,phq_rank_health_warnings_level_3,phq_rank_health_warnings_level_4,phq_rank_health_warnings_level_5,phq_rank_observances_level_1,phq_rank_observances_level_2,phq_rank_observances_level_3,phq_rank_observances_level_4,phq_rank_observances_level_5,phq_rank_public_holidays_level_1,phq_rank_public_holidays_level_2,phq_rank_public_holidays_level_3,phq_rank_public_holidays_level_4,phq_rank_public_holidays_level_5,phq_rank_school_holidays_level_1,phq_rank_school_holidays_level_2,phq_rank_school_holidays_level_3,phq_rank_school_holidays_level_4,phq_rank_school_holidays_level_5,phq_rank_academic_session_level_1,phq_rank_academic_session_level_2,phq_rank_academic_session_level_3,phq_rank_academic_session_level_4,phq_rank_academic_session_level_5,phq_rank_academic_exam_level_1,phq_rank_academic_exam_level_2,phq_rank_academic_exam_level_3,phq_rank_academic_exam_level_4,phq_rank_academic_exam_level_5,phq_rank_academic_holiday_level_1,phq_rank_academic_holiday_level_2,phq_rank_academic_holiday_level_3,phq_rank_academic_holiday_level_4,phq_rank_academic_holiday_level_5
0,2021-06-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0
1,2021-06-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0
2,2021-06-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0
3,2021-06-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0
4,2021-06-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0


<a id='impact_wide'></a>
### Impact based features
<b> Severe weahter features </b>

In [37]:
categories_impact = {
    "phq_impact_severe_weather_air_quality_retail",
    "phq_impact_severe_weather_blizzard_retail",
    "phq_impact_severe_weather_cold_wave_retail",
    "phq_impact_severe_weather_cold_wave_snow_retail",
    "phq_impact_severe_weather_cold_wave_storm_retail",
    "phq_impact_severe_weather_dust_retail",
    "phq_impact_severe_weather_dust_storm_retail",
    "phq_impact_severe_weather_flood_retail",
    "phq_impact_severe_weather_heat_wave_retail",
    "phq_impact_severe_weather_hurricane_retail",
    "phq_impact_severe_weather_thunderstorm_retail",
    "phq_impact_severe_weather_tornado_retail",
    "phq_impact_severe_weather_tropical_storm_retail",
}


def get_features_api_impact_events(lat, lon, start, end, rank_threshold):
    """
    Retrieves impact-based event features from the Features API within a specified date range and location

    Parameters:
    lat: Latitude of the location.
    lon: Longitude of the location.
    start: Start date of the range in 'YYYY-MM-DD' format.
    end: End date of the range in 'YYYY-MM-DD' format.
    rank_threshold: The minimum PHQ rank threshold for filtering events.
    radius_filter: The radius filter for geo-location query, default to 1 meter.
    
    Returns:
    list: A list of dictionaries where each dictionary contains impact-based event features for a specific date range
    """
    start = datetime.strptime(start, DATE_FORMAT).date()
    end = datetime.strptime(end, DATE_FORMAT).date()

    result = []
    for gte, lte in get_date_groups(start, end):
        query = {
            "active__gte": gte,
            "active__lte": lte,
            "location__geo": {"lat": lat, "lon": lon, "radius": "1m"},
        }

        query.update({f"{f}__stats": ["max"] for f in categories_impact})
        query.update(
            {f"{f}__phq_rank": {"gte": rank_threshold} for f in categories_impact}
        )

        features = phq.features.obtain_features(**query)
        import json
        for feature in features:
            record = {}
            for k, v in feature.model_dump(exclude_unset=True, exclude_none=True).items():
                if k == "date":
                    record[k] = v.strftime("%Y-%m-%d")
                else:
                    record[k] = v.get("stats", {}).get("max")
                
            result.append(record)

    return result



res = get_features_api_impact_events(
    40.7079, -74.0115, "2021-06-01", "2022-07-04", 50
)
df_impact_features = pd.DataFrame(res)

# drop features that only contains 0s
columns_constant = [
    col
    for col in df_impact_features.sum()[1:].to_dict().keys()
    if df_impact_features[col].sum() == 0
]
df_impact_features.drop(columns=columns_constant, inplace=True)

df_impact_features

Unnamed: 0,date,phq_impact_severe_weather_cold_wave_retail,phq_impact_severe_weather_cold_wave_storm_retail,phq_impact_severe_weather_flood_retail,phq_impact_severe_weather_heat_wave_retail,phq_impact_severe_weather_thunderstorm_retail,phq_impact_severe_weather_tornado_retail,phq_impact_severe_weather_tropical_storm_retail
0,2021-06-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2021-06-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2021-06-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2021-06-04,0.0,0.0,0.0,0.0,56.0,0.0,0.0
4,2021-06-05,0.0,0.0,0.0,0.0,25.0,0.0,0.0
...,...,...,...,...,...,...,...,...
394,2022-06-30,0.0,0.0,0.0,0.0,0.0,0.0,0.0
395,2022-07-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
396,2022-07-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
397,2022-07-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=bab3e6d2-07a9-42d6-8cb8-77f215f9caaf' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>