# Integraging the SmartSpeed API to access team data
This guide is intended to provide instructions on how to interact with and use the [External SmartSpeed API.](https://prd-use-api-extsmartspeed.valdperformance.com/index.html) This python class will provide the functionality to pull data from the SmartSpeed and Profiles API endpoints, to fetch and format data into csv(s) like you would on ValdHub. 

To use this class, you must first retrieve your ClientID, ClientSecret, and TenantID. In the vald_smartspeed Python file, replace the values of `self.client_id`, `self.client_secret`, and `self.tenant_id` with your retrieved values within the `Vald()` class.


### First, import required python packages.

In [1]:
import pandas as pd
import vald_smartspeed
from vald_smartspeed import Vald
from dotenv import load_dotenv
import os
import numpy as np
import time
import logging
import importlib
load_dotenv()
logging.basicConfig(level=logging.INFO)
importlib.reload(vald_smartspeed) #used to ensure any changes in the vald_smartspeed python file are reflected in this notebook

<module 'vald_smartspeed' from 'C:\\Users\\cquin\\OneDrive\\Documents\\vald\\vald_api_pulls\\smartspeed\\vald_smartspeed.py'>

### Our class is called vald, let's create an instance of it and inspect it's attributes and methods
The attributes defined in this class are

| Attribute                 | Description                                                                                             |
|----------------------------|---------------------------------------------------------------------------------------------------------|
| `client_id`                | The unique identifier for the client application, used to authenticate Vald API requests.                    |
| `client_secret`            | A secret key associated with the `client_id`, used to securely authenticate Vald API requests.                |
| `tenant_id`                | The identifier for the specific tenant or organization within the Vald API system.                          |
| `smartspeed_api_url`       | The URL endpoint for accessing the [External Smartspeed API](https://prd-use-api-extsmartspeed.valdperformance.com/index.html), used to retrieve performance metrics and data.        |
| `groupnames_api_url`       | The URL endpoint for accessing the [External Tenants API](https://prd-use-api-externaltenants.valdperformance.com/swagger/index.html) to retrieve group (team) names related to the athletes or tests.        |
| `profiles_api_url`         | The URL endpoint for accessing the [External Profiles API](https://prd-use-api-externalprofile.valdperformance.com/swagger/index.html) to retrieve athlete profile information.                         |
| `vald_master_file_path`    | The file path to the master file containing all smartspeed data.                                        |
| `base_directory`           | The base directory on the local system where files and data related to the Vald system are stored.       |

`client_id`, `client_secret`, and `tenant_id` will be stored in a `.env` file for security purposes. You can retrieve these credentials from ValdHub or by reaching out to your Vald support representative. These values remain consistent across your organization, meaning you'll use the same credentials to interact with all of Vald's external APIs.


### Now let's look at the methods of this class

| Method                  | Description                                                                                                               |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------|
| `get_last_update`        | Retrieves the last test date from the MasterFile and adds a 1-millisecond increment to ensure uniqueness for API requests. |
| `sanitize_filename`      | Replaces any special characters in a filename with underscores for safe file saving.                                       |
| `sanitize_foldername`    | Replaces any special characters in a folder name with underscores or spaces to ensure safe folder creation.                |
| `get_access_token`       | Requests an access token using `client_id` and `client_secret` to authenticate API requests.                               |
| `fetch_data`             | Fetches data from a given API URL using provided headers, returning JSON data if the response is successful.               |
| `get_tests`              | Retrieves test data from the [Smartspeed API](https://prd-use-api-extsmartspeed.valdperformance.com/index.html), retrieves athlete information from the Profiles API, <br> retrieves group (team) information from the Groupnames API, and combines group and athlete data in parallel.  |
| `modify_df`              | Modifies and reformats a DataFrame by adding UTC dates/times and renaming key columns for clarity.                         |
| `update_smartspeed`      | Updates the master data file with the latest test data from the Smartspeed system.                                         |
| `update_master_file`     | Appends new data to the existing master file or creates a new file if it does not exist.                                    |
| `save_dataframes`        | Saves team-specific test data in individual files, updating existing files if necessary.                                    |
| `save_master_file`       | Saves the master DataFrame to the specified file, creating necessary directories if they don't exist.                      |
| `data_to_groups`         | Organizes the retrieved test data into teams/groups and separates it by test type.                                         |
| `get_data_until_today`   | Fetches test data from the API until today’s date and saves it after processing and filtering for duplicates.              |
| `populate_folders`       | Sets up the folder structure and updates the data by calling the relevant methods, then saves the team/group data.         |


### `get_tests`

The `get_tests` method accepts two parameters: `start_date` and `pageno`. 

- The **`start_date`** parameter specifies the date to be plugged into the `TestFromUtc` parameter of the `/tests` API input. This allows the function to retrieve data starting from the specified date.
- The **`pageno`** parameter indicates which page of data to fetch from the endpoint.

This structure provides an intuitive way to interact with the API, as it retrieves tests starting from a certain date and paginates through the results. However, you could modify the function to include additional parameters like:

- **`TestToUtc`**: Specifies the end date for filtering tests.
- **`ModifiedFromUtc`**: Filters tests based on the date they were last modified.
- **`GroupUnderTestId`**: Filters tests by specific group IDs.

These parameters can enhance flexibility depending on your specific implementation needs.


#### Functionality

1. **Access Token Retrieval**:
   - The method starts by attempting to get the access token using the `get_access_token` function. If it fails to retrieve the token, it prints an error message and exits the function.

2. **API URL Construction**:
   - An API URL, `api_url` is constructed using the provided `start_date` and `pageno`, formatted as a query string.

3. **Fetching Tests Data**:
   - The method calls the `fetch_data` function with the constructed API URL and authorization headers to retrieve the tests data. If the response is empty (i.e., `None`), it returns an empty DataFrame.

4. **Group Names Retrieval**:
   - The method then constructs a second API URL to fetch group names associated with the tenant. It creates a mapping of group IDs to group names for later use.

5. **Concurrent Data Fetching for Profiles**:
   - Using a `ThreadPoolExecutor`, it concurrently fetches profile data for each test in the `tests_data` using the `fetch_data` method. This is done to improve efficiency by making multiple requests in parallel.
   - As each profile data is retrieved, it adds the `Name` (composed of given and family names) and the associated `Groups` (derived from group IDs) to each test record.

6. **Flattening Nested JSON**:
   - The method includes a nested function, `flatten_json`, which is responsible for transforming nested JSON structures into a flat dictionary format. This is useful for converting complex data into a more manageable format.
   - The method calls this function on each record in the `tests_data` to create a list of flattened records.

7. **DataFrame Creation**:
   - Finally, it converts the flattened data into a pandas DataFrame and prints a completion message before returning the DataFrame.

In [2]:
smartspeed = Vald()
start_date = '2024-10-01T00:00:00Z'
#Store the data in october_data
october_data = smartspeed.get_tests(start_date, 1)
#If you are seeing "Failed to retrieve access token", ensure you have properly set up your .env file.

Getting tests starting from 2024-10-01T00:00:00Z on page number 1
Data retrieval complete.


### JSON Data Retrieved from the SmartSpeed API

The following JSON structure is an example value of the schema of the /tests endpoint:

```json
[
  {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "testResultId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "profileId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "groupUnderTestId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "testName": "string",
    "testTypeName": "TrafficLightSprint",
    "repCount": 0,
    "deviceCount": 0,
    "testDateUtc": "2024-10-07T20:11:10.186Z",
    "additionalOptionsFields": {
      "startType": "Standard",
      "direction": "Left",
      "cutDirectionChoice": "Random",
      "reactiveDelayEnabled": true,
      "reactiveDelayMinimumInSeconds": 0,
      "reactiveDelayMaximumInSeconds": 0,
      "events": 0,
      "durationInSeconds": 0,
      "lapCount": 0,
      "intervalType": "FixedDuration",
      "testStandardType": "Standard",
      "dropHeight": 0,
      "dropHeightEnabled": true,
      "weightKg": 0
    },
    "runningSummaryFields": {
      "totalTimeSeconds": 0,
      "bestSplitSeconds": 0,
      "splitAverageSeconds": 0,
      "velocityFields": {
        "peakVelocityMetersPerSecond": 0,
        "meanVelocityMetersPerSecond": 0,
        "distance": 0,
        "fvpSummaryDto": {
          "maxVelocity": 0,
          "maxForce": 0,
          "maxForceNormalised": 0,
          "maxPower": 0,
          "maxPowerNormalised": 0,
          "forceVelocityCurve": 0,
          "drf": 0,
          "rfMax": 0,
          "tau": 0,
          "vMax": 0
        }
      },
      "gateSummaryFields": {
        "splitOne": 0,
        "splitTwo": 0,
        "splitThree": 0,
        "splitFour": 0,
        "cumulativeOne": 0,
        "cumulativeTwo": 0,
        "cumulativeThree": 0,
        "cumulativeFour": 0
      }
    },
    "jumpingSummaryFields": {
      "flightTimeSeconds": 0,
      "contactTimeSeconds": 0,
      "heightMeters": 0,
      "rsi": 0,
      "flightTimeOverContractionTime": 0,
      "peakPowerOutput": 0,
      "legStiffness": 0,
      "impulse": 0,
      "flightTimePlusContractionTime": 0,
      "peakPowerOutputOverTotalMass": 0
    },
    "isValid": true,
    "allGroups": [
      "3fa85f64-5717-4562-b3fc-2c963f66afa6"
    ]
  }
]
```
The `get_tests` method includes a function that flattens the JSON data, transforming each nested value into its own column within a Pandas DataFrame. This approach enhances data accessibility and analysis by providing a more structured format. The `get_tests` method also attaches the athlete's name and group (team) to each row of data by matching the `profileId` with the corresponding athlete profile. This is achieved by retrieving the athlete's name from the profiles API, and the athlete's group (team) from the tenants API. The resulting dataframe looks like this:

In [3]:
pd.set_option('display.max_columns', None)
#Remove this line. Names de-identified for privacy reasons
october_data[['Name', 'id', 'profileId']] = np.nan
october_data.head()

Unnamed: 0,id,testResultId,profileId,groupUnderTestId,testName,testTypeName,repCount,deviceCount,testDateUtc,additionalOptionsFields.startType,additionalOptionsFields.direction,additionalOptionsFields.cutDirectionChoice,additionalOptionsFields.reactiveDelayEnabled,additionalOptionsFields.reactiveDelayMinimumInSeconds,additionalOptionsFields.reactiveDelayMaximumInSeconds,additionalOptionsFields.events,additionalOptionsFields.durationInSeconds,additionalOptionsFields.lapCount,additionalOptionsFields.intervalType,additionalOptionsFields.testStandardType,additionalOptionsFields.dropHeight,additionalOptionsFields.dropHeightEnabled,additionalOptionsFields.weightKg,runningSummaryFields.totalTimeSeconds,runningSummaryFields.bestSplitSeconds,runningSummaryFields.splitAverageSeconds,runningSummaryFields.velocityFields.peakVelocityMetersPerSecond,runningSummaryFields.velocityFields.meanVelocityMetersPerSecond,runningSummaryFields.velocityFields.distance,runningSummaryFields.velocityFields.fvpSummaryDto.maxVelocity,runningSummaryFields.velocityFields.fvpSummaryDto.maxForce,runningSummaryFields.velocityFields.fvpSummaryDto.maxForceNormalised,runningSummaryFields.velocityFields.fvpSummaryDto.maxPower,runningSummaryFields.velocityFields.fvpSummaryDto.maxPowerNormalised,runningSummaryFields.velocityFields.fvpSummaryDto.forceVelocityCurve,runningSummaryFields.velocityFields.fvpSummaryDto.drf,runningSummaryFields.velocityFields.fvpSummaryDto.rfMax,runningSummaryFields.velocityFields.fvpSummaryDto.tau,runningSummaryFields.velocityFields.fvpSummaryDto.vMax,runningSummaryFields.gateSummaryFields.splitOne,runningSummaryFields.gateSummaryFields.splitTwo,runningSummaryFields.gateSummaryFields.splitThree,runningSummaryFields.gateSummaryFields.splitFour,runningSummaryFields.gateSummaryFields.cumulativeOne,runningSummaryFields.gateSummaryFields.cumulativeTwo,runningSummaryFields.gateSummaryFields.cumulativeThree,runningSummaryFields.gateSummaryFields.cumulativeFour,jumpingSummaryFields,isValid,allGroups.0,Name,Groups
0,,00000000-0000-0000-0000-000000000000,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,20m Sprint,OneWay,1,2,2024-10-01T15:38:48,Standard,Left,,False,0,0,,,1,,Standard,0,False,0,3.551,3.551,3.551,5.632,5.632,20,0,0,0,0,0,0,0,0,0,0,3.551,0,0,0,3.551,0,0,0,,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,,SB
1,,00000000-0000-0000-0000-000000000000,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,20m Sprint,OneWay,1,2,2024-10-01T15:41:22,Standard,Left,,False,0,0,,,1,,Standard,0,False,0,3.603,3.603,3.603,5.551,5.551,20,0,0,0,0,0,0,0,0,0,0,3.603,0,0,0,3.603,0,0,0,,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,,SB
2,,00000000-0000-0000-0000-000000000000,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,20m Sprint,OneWay,1,2,2024-10-01T15:37:03,Standard,Left,,False,0,0,,,1,,Standard,0,False,0,3.69,3.69,3.69,5.42,5.42,20,0,0,0,0,0,0,0,0,0,0,3.69,0,0,0,3.69,0,0,0,,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,,SB
3,,00000000-0000-0000-0000-000000000000,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,20m Sprint,OneWay,1,2,2024-10-01T15:42:44,Standard,Left,,False,0,0,,,1,,Standard,0,False,0,3.298,3.298,3.298,6.064,6.064,20,0,0,0,0,0,0,0,0,0,0,3.298,0,0,0,3.298,0,0,0,,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,,SB
4,,00000000-0000-0000-0000-000000000000,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,20m Sprint,OneWay,1,2,2024-10-01T15:44:02,Standard,Left,,False,0,0,,,1,,Standard,0,False,0,3.425,3.425,3.425,5.839,5.839,20,0,0,0,0,0,0,0,0,0,0,3.425,0,0,0,3.425,0,0,0,,,b2e3f63a-bcb2-456d-b9bf-4e290ff128b3,,SB


The resulting DataFrame contains all the information present in the JSON structure; however, this raw data may not be readily useful for other analysts, such as strength performance coaches. The `modify_df` function will transform this raw data into a more useful format, resembling the structure of a CSV file exported directly from the ValdHub user interface. Feel free to modify the `modify_df` function to fit your specific needs.

In [4]:
october_data_cleaned = smartspeed.modify_df(october_data)
october_data_cleaned.head(5)

Unnamed: 0,ExternalId,Name,Groups,Date UTC,Time UTC,testName,testTypeName,Rep Count,Device Count,Distance,Split 1,Split 2,Split 3,Split 4,Cumulative 1,Cumulative 2,Cumulative 3,Cumulative 4,Peak Velocity,Mean Velocity,Best Split,Total Time,Average Split,testDateUtc,id
0,,,SB,10/01/2024,03:38 PM,20m Sprint,OneWay,1,2,20,3.55,0,0,0,3.55,0,0,0,5.63,5.63,3.55,3.55,3.55,2024-10-01T15:38:48,
1,,,SB,10/01/2024,03:41 PM,20m Sprint,OneWay,1,2,20,3.6,0,0,0,3.6,0,0,0,5.55,5.55,3.6,3.6,3.6,2024-10-01T15:41:22,
2,,,SB,10/01/2024,03:37 PM,20m Sprint,OneWay,1,2,20,3.69,0,0,0,3.69,0,0,0,5.42,5.42,3.69,3.69,3.69,2024-10-01T15:37:03,
3,,,SB,10/01/2024,03:42 PM,20m Sprint,OneWay,1,2,20,3.3,0,0,0,3.3,0,0,0,6.06,6.06,3.3,3.3,3.3,2024-10-01T15:42:44,
4,,,SB,10/01/2024,03:44 PM,20m Sprint,OneWay,1,2,20,3.42,0,0,0,3.42,0,0,0,5.84,5.84,3.42,3.42,3.42,2024-10-01T15:44:02,


### Saving the Data to .csv Using `save_masterfile`, `data_to_groups` and `save_dataframes`

### `save_masterfile`
- The `save_masterfile` function consolidates all sports data into a single master CSV file.
- It checks if the master file already exists; if it does, new data is appended to ensure no information is lost.
- This master file provides a comprehensive overview of all recorded metrics, making it easier for analysts to access a unified dataset for broader analysis.

### `data_to_groups` 
- The function organizes a given DataFrame into a nested dictionary (`teams_data`) based on unique groups found in the `Groups` column, allowing for structured data management.
- For each group, it further categorizes the data by unique test names found in the `testName` column, storing the corresponding test data in the nested dictionary structure.
- The result is a comprehensive dictionary where each key represents a group, and each value is another dictionary containing test names as keys and their associated test data as values, facilitating easy access to specific datasets.


### `save_dataframes`
- The `save_dataframes` function organizes the data into distinct folders based on each sport, enhancing data accessibility.
- Inside each sport's folder, it creates separate CSV files for each test type, allowing for targeted analysis by strength performance coaches.
- This structured organization facilitates efficient data retrieval and analysis, enabling coaches to quickly find and utilize the information specific to their needs.

You can customize these functions, as well as the `vald_master_file_path` and `base_directory`, to better suit your requirements. One suggestion is to set the base directory to your organization's shared OneDrive. This way, the DataFrames will be saved in the cloud, making them accessible to whoever may need to access the data.

### Execute the following code block to save the `october_data_cleaned` DataFrame to your local machine as multiple .csv files, organized by sport. This will help you visualize the directory structure.

In [5]:
smartspeed.save_master_file(october_data_cleaned)
october_data_cleaned_by_sport = smartspeed.data_to_groups(october_data_cleaned)
smartspeed.save_dataframes(october_data_cleaned_by_sport)

Saved master file data\master_files\smartspeed_allsports.csv
Saved data\sb\SmartSpeed\sb_20m_sprint.csv
Saved data\performance\SmartSpeed\performance_20m_sprint.csv


The previous example was simplified to focus on illustrating how each function operates independently. However, it is important to note that the `october_data_cleaned` DataFrame only contains a single page of data, comprising 50 records, which is the limit imposed by the API's `/tests` endpoint. To retrieve more than one page of data, we will need to utilize the `get_data_until_today` function.


### `get_data_until_today`
- The function retrieves test data from a specified `start_date` until today, fetching data in batches, where each batch is one 50 records "page", and aggregating it into a single DataFrame (`new_data`).
- It checks for existing data in the master file, and if any duplicates are found based on the 'id' column, these duplicates are removed from the new data.
- After formatting the new data with the `modify_df` function, it saves this updated DataFrame to the master file and then processes the data into groups using the `data_to_groups` and `save_dataframes` functions, ultimately saving the organized team data to appropriate files.

Execute the following code block to retrieve and save DataFrames from the specified `start_date` to the most recent available record. You can adjust the `start_date` as needed. Keep in mind that if your `start_date` is set to a date earlier than the data in `october_data_cleaned`, any duplicate records will be removed from your master and group data files.

In [6]:
start_date = '2024-10-01T00:00:00Z'
smartspeed.get_data_until_today(start_date)
#Name and id will be printed as NaN until you remove line 268 from vald_smartspeed.py

Getting tests starting from 2024-10-01T00:00:00Z on page number 1
Data retrieval complete.
Getting tests starting from 2024-10-01T00:00:00Z on page number 2
Data retrieval complete.
Getting tests starting from 2024-10-01T00:00:00Z on page number 3
Data retrieval complete.
Getting tests starting from 2024-10-01T00:00:00Z on page number 4
Data retrieval complete.
No new tests were found, the master file is up to date.
Saved master file data\master_files\smartspeed_allsports.csv
Saved data\sb\SmartSpeed\sb_20m_sprint.csv
Saved data\sb\SmartSpeed\sb_100m_sprint.csv
Saved data\performance\SmartSpeed\performance_20m_sprint.csv
Saved data\base\SmartSpeed\base_100m_sprint.csv
Saved data\base\SmartSpeed\base_30m_sprint.csv
Saved data\SmartSpeed\_100m_sprint.csv
Saved data\SmartSpeed\_30m_sprint.csv


Finally, the `update_smartspeed` function will identify the most recent date recorded in your master file. It will then call `get_data_until_today`, using this latest date as the parameter, effectively retrieving any new tests that are not already present in your .csv files.


In [7]:
smartspeed.update_smartspeed

<bound method Vald.update_smartspeed of <vald_smartspeed.Vald object at 0x00000226ACA78350>>