In order to use the feature store in the API calls we will need the project name and the API key. I am using CAPS for hopsworks variables.

In [1]:
# project name for API call feature store
HOPSWORKS_PROJECT_NAME = 'taxi_demand_rs'

#### Loading the API Key from the .env File

The **dotenv** library allows us to load variables from external files as environment variables. Environment variables means we can access them using the **os** module.

A **.env** file is a plain text file that stores API keys and other sensitive information. This file is created within our project parent directory (not in the notebooks or src folders). We store it in this file because hardcoding an API key is a serious security violation. 

In [2]:
import os
from dotenv import load_dotenv
from src.paths import PARENT_DIR

# specify the path where the file is
load_dotenv(PARENT_DIR / '.env')

HOPSWORKS_API_KEY = os.environ['HOPSWORKS_API_KEY']

##### Commiting to Git

Never commit the API key to GitHub repository. For this reason, we create a **gitignore** file. This file is located in the parent directory (not src, data, notebooks).

#### Fetching Raw Data

Now we can use the **load_raw_data** function from the **src.data** script to load in raw data from 2022-today.

In [4]:
from datetime import datetime
import pandas as pd
from src.data import load_raw_data

# use load_raw_data
# starting year of data fetching
start_year = 2022
# ending year will be the current year
end_year = datetime.now().year   
print(f'Downloading files from {start_year} to {end_year}.')

# set up an empty dataframe to be filled by function
rides = pd.DataFrame()

# loop to download all wanted data
for year in range(start_year, end_year+1):
    # download data for the year
    rides_one_year = load_raw_data(year)

    # append rows
    rides = pd.concat([rides, rides_one_year])



Downloading files from 2022 to 2024.
File 2022-01 was already in local storage
File 2022-02 was already in local storage
File 2022-03 was already in local storage
File 2022-04 was already in local storage
File 2022-05 was already in local storage
File 2022-06 was already in local storage
File 2022-07 was already in local storage
File 2022-08 was already in local storage
File 2022-09 was already in local storage
File 2022-10 was already in local storage
File 2022-11 was already in local storage
File 2022-12 was already in local storage
File 2023-01 was already in local storage
File 2023-02 was already in local storage
File 2023-03 was already in local storage
File 2023-04 was already in local storage
File 2023-05 was already in local storage
File 2023-06 was already in local storage
File 2023-07 was already in local storage
File 2023-08 was already in local storage
File 2023-09 was already in local storage
File 2023-10 was already in local storage
File 2023-11 was already in local stora

In [5]:
print(len(rides))

101372930


#### Transform the Data into Time Series Data

Next, the data needs to be transformed into time series data. We can use the **transform_raw_data_into_ts_data** from the **data.py** module for this.

In [6]:
from src.data import transform_raw_data_into_ts_data

ts_data = transform_raw_data_into_ts_data(rides)

  full_range = pd.date_range(ts_data['pickup_hour'].min(),
100%|██████████| 265/265 [00:08<00:00, 31.27it/s]


In [8]:
import hopsworks