# Example notebook to get started with the taxonomy API
## Overview
The taxonomy API can be used to classify and categorize any text into a predefined taxonomy or categories.  
It is a powerful API that helps you to resolve any classification problem in which you need to match an textual item to a category.

## Objective
In this tutorial, you will learn how to use the taxonomy API of <a href="https://howsustainabledataservices.com/" target="_blank">How Sustainable Data Services</a>.

## Dataset
The dataset you will use in this notebook consists out of dummy data that is available in the code cells.

## API documentation
API documentation can be found at: <a href="https://api.howsustainabledataservices.com/docs" target="_blank">How Sustainable Data Services - API docs</a>.

## Questions?
Contact us at: <a href="https://howsustainabledataservices.com/contact/" target="_blank">How Sustainable Data Services - Contact</a>

## Installation
Install the packages required for executing this notebook.

In [2]:
%pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


## Import dependencies
Import the dependencies and define the API server URL

In [3]:
import requests
import pandas as pd
from requests.exceptions import HTTPError, JSONDecodeError
import json

API_SERVER = "https://api.howsustainabledataservices.com/latest"

## Obtain API token
Before we can use the API, you need to obtain a free API token at: <a href="https://howsustainabledataservices.com/auth/register/" target="_blank">How Sustainable Data Services - Register for API</a>.  
You can simply fill in the form on the website and you will directly receive an API token.  
Once you have received this token, you can paste it in the cell below. 

In [4]:
API_TOKEN = "INSERT_YOUR_API_TOKEN_HERE"

## Define helper functions
To call the API we need to define a helper function that makes a POST requests to the API endpoint that is authorized using the API token 

In [6]:
# Helper function to call the API
def make_api_call(endpoint, json_body):
    url = API_SERVER + endpoint
    headers = {"Authorization": f"Bearer {API_TOKEN}"}

    try:
        response = requests.post(url, json=json_body, headers=headers)
        response.raise_for_status()  # Raise an exception for non-2xx status codes
        result = response.json()
        return result
    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except JSONDecodeError as json_err:
        print(f"JSON decoding error occurred: {json_err}")
    except Exception as err:
        print(f"An error occurred: {err}")

    return None

## Upload your categories to the API
The simplest form of a taxonomy is a flat taxonomy, all the items or categories are listed at the same hierarchical level without any further subcategories or levels of nesting.  
When you have a list of categories this is a flat taxonomy, which can be uploaded as a list to the API.  
In our example we have two categories that are uploaded to the API as a list.  
When uploaded successfully the API will return an taxonomy identifier, this identifier will be used in the rest of the notebook to reference back to our uploaded taxonomy.  

In [7]:
your_category_list = ["Renewable energy", "Non-renewable energy"] # replace with a list of your categories

def upload_category_list(category_list):
    endpoint = "/taxonomies/upload_list"
    response = make_api_call(endpoint, category_list)
    taxonomy_id = response["taxonomy_id"]

    return taxonomy_id

taxonomy_id = upload_category_list(your_category_list)
print("Identifier for your uploaded taxonomy:", taxonomy_id)

Identifier for your uploaded taxonomy: NQBFzDvigBSg


## Classify items using the API
Now that the API knows our taxonomy, we can ask it to classify our items.  
In this example we want to classify each item (in our case: each energy source) into our flat taxonomy.  
The API will return for each item a dictionary with the classification results.  

In [8]:
your_item_list = ["Solar panels", "Wind turbines", "Hydropower plant", "Coal power plant", "Natural gas power plant"] # replace with a list of items that you want to classify into categories, in this case we use products

def classify_items(taxonomy_id, product_list):
    endpoint = f"/taxonomies/{taxonomy_id}/classify_items_list"
    response = make_api_call(endpoint, product_list)

    return response

results = classify_items(taxonomy_id, your_item_list)
results[0]

{'gpt_product_name': 'Solar panels',
 'gpt_product_category': 'Renewable energy',
 'gpt_confidence': 1.0,
 'gpt_notebook_version': 'v1'}

## Check results
These dictionaries can easily be transformed into a Pandas DataFrame, to check the results.  
The column 'product_category' contains the predicted result of the API

In [9]:
pd.DataFrame(results)

Unnamed: 0,gpt_product_name,gpt_product_category,gpt_confidence,gpt_notebook_version
0,Solar panels,Renewable energy,1.0,v1
1,Wind turbines,Renewable energy,1.0,v1
2,Hydropower plant,Renewable energy,1.0,v1
3,Coal power plant,Non-renewable energy,1.0,v1
4,Natural gas power plant,Non-renewable energy,1.0,v1


## Upload a hierarchical taxonomy to the API
If you have a hierarchical taxonomy that consists out of multiple levels, this is supported by the API as well.  
You can upload this taxonomy as a JSON to the API, which should adhere to same schema as the example below.  

In [10]:
def upload_taxonomy_json(taxonomy_json):
    endpoint = "/taxonomy/taxonomies/upload_json"
    response = make_api_call(endpoint, taxonomy_json)
    taxonomy_id = response["taxonomy_id"]

    return taxonomy_id

your_taxonomy_json = {
  "categories": [
    {
      "category_name": "Renewable energy",
      "category_description": "Power generation based on renewable energy sources",
      "subcategories": [
        {
          "category_name": "Wind energy",
          "category_description": "Power generation based on wind",
          "subcategories": []
        },
        {
          "category_name": "Solar energy",
          "category_description": "Power generation based on sun",
          "subcategories": []
        }
      ]
    },
    {
      "category_name": "Fossil fuel based energy	",
      "category_description": "Power generation based on fossil fuel energy sources",
    }
  ]
}


taxonomy_id = upload_category_list(your_category_list)
print("Identifier for your uploaded taxonomy:", taxonomy_id)

Identifier for your uploaded taxonomy: 7EhOSwN9V19S


## Check results of hierarchical taxonomy
As you can see the "product_category" column now also contains categories which where subcategories in the taxonomy.  

In [11]:
results = classify_items(taxonomy_id, your_item_list) # same item list as before, based on new taxonomy
pd.DataFrame(results)

Unnamed: 0,gpt_product_name,gpt_product_category,gpt_confidence,gpt_notebook_version
0,Solar panels,Solar energy,1.0,v1
1,Wind turbines,Wind energy,1.0,v1
2,Hydropower plant,Renewable energy,1.0,v1
3,Coal power plant,Non-renewable energy,1.0,v1
4,Natural gas power plant,Non-renewable energy,1.0,v1
