# Automatically classify consumer goods
# Notebook 3 - Data collection from an API

# Objective of the notebook

In order to expand the product range of the company in the delicatessen sector, the collection of champagne-based products from an online API (RapidAPI) must be tested. A csv file with the top 10 first champagne-based products returned is expected at the end of the process. 

*NB: Introduction on APIs (Definitions, use cases and how to use it with python): https://www.dataquest.io/blog/python-api-tutorial/*

*How to Use the Edamam Food and Grocery Database API with Python: https://rapidapi.com/blog/edamam-food-and-grocery-database-api-with-python-php-ruby-javascript-examples/*

*API doc: https://developer.edamam.com/food-database-api-docs*

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Automatically-classify-consumer-goods" data-toc-modified-id="Automatically-classify-consumer-goods-1">Automatically classify consumer goods</a></span></li><li><span><a href="#Notebook-3---Data-collection-from-an-API" data-toc-modified-id="Notebook-3---Data-collection-from-an-API-2">Notebook 3 - Data collection from an API</a></span></li><li><span><a href="#Objective-of-the-notebook" data-toc-modified-id="Objective-of-the-notebook-3">Objective of the notebook</a></span></li><li><span><a href="#I)-Importation-of-libraries" data-toc-modified-id="I)-Importation-of-libraries-4">I) Importation of libraries</a></span></li><li><span><a href="#II)-Functions" data-toc-modified-id="II)-Functions-5">II) Functions</a></span></li><li><span><a href="#III)-Processing" data-toc-modified-id="III)-Processing-6">III) Processing</a></span><ul class="toc-item"><li><span><a href="#1)-Request" data-toc-modified-id="1)-Request-6.1">1) Request</a></span></li><li><span><a href="#2)-Explore-the-response's-form" data-toc-modified-id="2)-Explore-the-response's-form-6.2">2) Explore the response's form</a></span></li><li><span><a href="#3)-Pick-and-store-in-a-dataframe-the-required-data" data-toc-modified-id="3)-Pick-and-store-in-a-dataframe-the-required-data-6.3">3) Pick and store in a dataframe the required data</a></span><ul class="toc-item"><li><span><a href="#a)-Method-1-(2-steps):-Prepares-the-dictionary-list,-then-use-of-the-pd.DataFrame.from_dict()-method" data-toc-modified-id="a)-Method-1-(2-steps):-Prepares-the-dictionary-list,-then-use-of-the-pd.DataFrame.from_dict()-method-6.3.1">a) Method 1 (2 steps): Prepares the dictionary list, then use of the pd.DataFrame.from_dict() method</a></span></li><li><span><a href="#b)-Method-2-(1-step):-Selects-directly-the-data-to-store" data-toc-modified-id="b)-Method-2-(1-step):-Selects-directly-the-data-to-store-6.3.2">b) Method 2 (1 step): Selects directly the data to store</a></span></li></ul></li><li><span><a href="#4)-Conlusion" data-toc-modified-id="4)-Conlusion-6.4">4) Conlusion</a></span></li></ul></li><li><span><a href="#IV)-Exportation-of-the-collected-10-first-products-data" data-toc-modified-id="IV)-Exportation-of-the-collected-10-first-products-data-7">IV) Exportation of the collected 10 first products data</a></span></li><li><span><a href="#V)-Removal-of-the-loaded-data-following-the-RGPD-principles" data-toc-modified-id="V)-Removal-of-the-loaded-data-following-the-RGPD-principles-8">V) Removal of the loaded data following the RGPD principles</a></span></li></ul></div>

# I) Importation of libraries

In [1]:
# Common libraries.
import os.path
import pandas as pd

# Python's library to be able to make API requests.
import requests

# Python's library to be able to read JSON file formats.
import json

# Global exportation folder path.
CSV_EXPORT_PATH = "Exports"

# II) Functions

In [2]:
def jprint (json_text, indent=4, sort_keys=False):
    
    """ Print a python dictionary in a json more readable format. """
    
    # Creates a better formatted string than the default python JSON text.
    json_text_printable = json.dumps(json_text, indent=indent, sort_keys=sort_keys) # NB: sort_keys sorts all elements of the JSON from top to bottom or the reverse.
    print(json_text_printable)

# III) Processing

In order to test the API, the formulated query includes a filter to select only champagne-based products, then a second filter was applied to collect the fields of interest, which you can see in the table header down below (dfs).
The variables containing the data are deleted once the first 10 products has been exported in csv format to remain GDPR compliant.

## 1) Request

In [3]:
# Stores the endpoint path (URL).
# NB: Old address: "https://edamam-food-and-grocery-database.p.rapidapi.com/parser"
url = "https://edamam-food-and-grocery-database.p.rapidapi.com/api/food-database/v2/parser"

# Sets the request to ask to the server.
querystring = {'ingr':'champagne'}#, 'label', 'category', 'foodContentsLabel', 'image']}

# Sets connection parameters of my account on rapidapi.com (My IDs).
headers = {
    "X-RapidAPI-Key": "5e7b221180msh0ecb71e78eed135p1bf43ajsnab984b5e8385",
    "X-RapidAPI-Host": "edamam-food-and-grocery-database.p.rapidapi.com"
}

# Gets the server code response for the request and shows the code.
response = requests.request("GET", url, headers=headers, params=querystring)
print(response)

<Response [200]>


Code "200" means that everything happened well and the result has been returned.

## 2) Explore the response's form

In [4]:
# Shows the response result.
print(response.text)

{"text":"champagne","parsed":[{"food":{"foodId":"food_a656mk2a5dmqb2adiamu6beihduu","uri":"http://www.edamam.com/ontologies/edamam.owl#Food_table_white_wine","label":"Champagne","knownAs":"dry white wine","nutrients":{"ENERC_KCAL":82.0,"PROCNT":0.07,"FAT":0.0,"CHOCDF":2.6,"FIBTG":0.0},"category":"Generic foods","categoryLabel":"food","image":"https://www.edamam.com/food-img/a71/a718cf3c52add522128929f1f324d2ab.jpg"}}],"hints":[{"food":{"foodId":"food_a656mk2a5dmqb2adiamu6beihduu","uri":"http://www.edamam.com/ontologies/edamam.owl#Food_table_white_wine","label":"Champagne","knownAs":"dry white wine","nutrients":{"ENERC_KCAL":82.0,"PROCNT":0.07,"FAT":0.0,"CHOCDF":2.6,"FIBTG":0.0},"category":"Generic foods","categoryLabel":"food","image":"https://www.edamam.com/food-img/a71/a718cf3c52add522128929f1f324d2ab.jpg"},"measures":[{"uri":"http://www.edamam.com/ontologies/edamam.owl#Measure_unit","label":"Whole","weight":750.0},{"uri":"http://www.edamam.com/ontologies/edamam.owl#Measure_serving",

The text looks to be in JSON format.<br>
=> In order to extract the 10 first returned request products to a CSV file, the response text can be converted to a pandas dataframe and to a CSV file with specification requested by Linda (The 10 first products should be stored with the following fields filled: foodId, label, category, foodContentsLabel, image)

In [5]:
response_json = response.json()
print(type(response_json))
jprint(response_json)

<class 'dict'>
{
    "text": "champagne",
    "parsed": [
        {
            "food": {
                "foodId": "food_a656mk2a5dmqb2adiamu6beihduu",
                "uri": "http://www.edamam.com/ontologies/edamam.owl#Food_table_white_wine",
                "label": "Champagne",
                "knownAs": "dry white wine",
                "nutrients": {
                    "ENERC_KCAL": 82.0,
                    "PROCNT": 0.07,
                    "FAT": 0.0,
                    "CHOCDF": 2.6,
                    "FIBTG": 0.0
                },
                "category": "Generic foods",
                "categoryLabel": "food",
                "image": "https://www.edamam.com/food-img/a71/a718cf3c52add522128929f1f324d2ab.jpg"
            }
        }
    ],
    "hints": [
        {
            "food": {
                "foodId": "food_a656mk2a5dmqb2adiamu6beihduu",
                "uri": "http://www.edamam.com/ontologies/edamam.owl#Food_table_white_wine",
                "label": "C

A first look shows that all the required fields can be found in "hints" > "Food" for every products.

## 3) Pick and store in a dataframe the required data

### a) Method 1 (2 steps): Prepares the dictionary list, then use of the pd.DataFrame.from_dict() method

In [6]:
%%time

# Sets the list of fields to keep.
l_keys = ['foodId', 'label', 'category', 'foodContentsLabel', 'image']

# Number of products returned by the request.
n_products = len(response_json['hints'])

# Picks the part of the JSON file storing the required data.
l_dicts = []
for i in range(n_products):
    l_dicts.append(response_json['hints'][i]['food'].copy()) # NB: Without .copy() the real pieces if the original variable (response_json) are appended and modified.

# Removes unnecessary products' field.
l_keys = ['foodId', 'label', 'category', 'foodContentsLabel', 'image']
for i in range(n_products):
    l_dict_keys = [key for key in l_dicts[i].keys()]
    for key in l_dict_keys:
        if key not in l_keys:
            del l_dicts[i][key]

# Sets the dictionary's values as lists to be useable for the pd.DataFrame.from_dict() method.
for dictionary in l_dicts:
    for key in dictionary.keys():
        dictionary[key] = [dictionary[key]]

# Shows the preprared products' dictionaries.
print("Prepared dictionaries:")
display(l_dicts)

# Turns the list of dictionaries into a pandas dataframe.
df_products = pd.DataFrame(columns=l_keys)
for dictionary in l_dicts:
    df_products = pd.concat([df_products, pd.DataFrame.from_dict(dictionary)], axis=0)
df_products = df_products.reset_index(drop=True)

# Shows the generated dataframe of the request products with the right fields.
df_products

Prepared dictionaries:


[{'foodId': ['food_a656mk2a5dmqb2adiamu6beihduu'],
  'label': ['Champagne'],
  'category': ['Generic foods'],
  'image': ['https://www.edamam.com/food-img/a71/a718cf3c52add522128929f1f324d2ab.jpg']},
 {'foodId': ['food_b753ithamdb8psbt0w2k9aquo06c'],
  'label': ['Champagne Vinaigrette, Champagne'],
  'category': ['Packaged foods'],
  'foodContentsLabel': ['OLIVE OIL; BALSAMIC VINEGAR; CHAMPAGNE VINEGAR; GARLIC; DIJON MUSTARD; SEA SALT.']},
 {'foodId': ['food_b3dyababjo54xobm6r8jzbghjgqe'],
  'label': ['Champagne Vinaigrette, Champagne'],
  'category': ['Packaged foods'],
  'foodContentsLabel': ['INGREDIENTS: WATER; CANOLA OIL; CHAMPAGNE VINEGAR; SUGAR; OLIVE OIL; SALT; DRIED GARLIC; DRED SHALLOTS; BLACK PEPPER; XANTHAN GUM; SPICE'],
  'image': ['https://www.edamam.com/food-img/d88/d88b64d97349ed062368972113124e35.jpg']},
 {'foodId': ['food_a9e0ghsamvoc45bwa2ybsa3gken9'],
  'label': ['Champagne Vinaigrette, Champagne'],
  'category': ['Packaged foods'],
  'foodContentsLabel': ['CANOLA A

CPU times: total: 15.6 ms
Wall time: 18 ms


Unnamed: 0,foodId,label,category,foodContentsLabel,image
0,food_a656mk2a5dmqb2adiamu6beihduu,Champagne,Generic foods,,https://www.edamam.com/food-img/a71/a718cf3c52...
1,food_b753ithamdb8psbt0w2k9aquo06c,"Champagne Vinaigrette, Champagne",Packaged foods,OLIVE OIL; BALSAMIC VINEGAR; CHAMPAGNE VINEGAR...,
2,food_b3dyababjo54xobm6r8jzbghjgqe,"Champagne Vinaigrette, Champagne",Packaged foods,INGREDIENTS: WATER; CANOLA OIL; CHAMPAGNE VINE...,https://www.edamam.com/food-img/d88/d88b64d973...
3,food_a9e0ghsamvoc45bwa2ybsa3gken9,"Champagne Vinaigrette, Champagne",Packaged foods,CANOLA AND SOYBEAN OIL; WHITE WINE (CONTAINS S...,
4,food_an4jjueaucpus2a3u1ni8auhe7q9,"Champagne Vinaigrette, Champagne",Packaged foods,WATER; CANOLA AND SOYBEAN OIL; WHITE WINE (CON...,
5,food_bmu5dmkazwuvpaa5prh1daa8jxs0,"Champagne Dressing, Champagne",Packaged foods,SOYBEAN OIL; WHITE WINE (PRESERVED WITH SULFIT...,https://www.edamam.com/food-img/ab2/ab2459fc2a...
6,food_alpl44taoyv11ra0lic1qa8xculi,Champagne Buttercream,Generic meals,sugar; butter; shortening; vanilla; champagne;...,
7,food_byap67hab6evc3a0f9w1oag3s0qf,Champagne Sorbet,Generic meals,Sugar; Lemon juice; brandy; Champagne; Peach,
8,food_am5egz6aq3fpjlaf8xpkdbc2asis,Champagne Truffles,Generic meals,butter; cocoa; sweetened condensed milk; vanil...,
9,food_bcz8rhiajk1fuva0vkfmeakbouc0,Champagne Vinaigrette,Generic meals,champagne vinegar; olive oil; Dijon mustard; s...,


### b) Method 2 (1 step): Selects directly the data to store

In [7]:
%%time

# Sets the list of fields to keep.
l_keys = ['foodId', 'label', 'category', 'foodContentsLabel', 'image']

# Number of products returned by the request.
n_products = len(response_json['hints'])

# Sets the dataframe which will stores the products and their fields.
df_products = pd.DataFrame(index=range(n_products), columns=l_keys)

# Gets the products with the required fields and stores them within the dataframe.
for i in range(n_products):
    l_dict_keys = [key for key in response_json['hints'][i]['food'].keys()]
    for key in l_keys:
        if key in l_dict_keys:
            df_products[key].iloc[i] = response_json['hints'][i]['food'][key]
            
# Shows the generated dataframe of the request products with the right fields.
df_products

CPU times: total: 0 ns
Wall time: 8.01 ms


Unnamed: 0,foodId,label,category,foodContentsLabel,image
0,food_a656mk2a5dmqb2adiamu6beihduu,Champagne,Generic foods,,https://www.edamam.com/food-img/a71/a718cf3c52...
1,food_b753ithamdb8psbt0w2k9aquo06c,"Champagne Vinaigrette, Champagne",Packaged foods,OLIVE OIL; BALSAMIC VINEGAR; CHAMPAGNE VINEGAR...,
2,food_b3dyababjo54xobm6r8jzbghjgqe,"Champagne Vinaigrette, Champagne",Packaged foods,INGREDIENTS: WATER; CANOLA OIL; CHAMPAGNE VINE...,https://www.edamam.com/food-img/d88/d88b64d973...
3,food_a9e0ghsamvoc45bwa2ybsa3gken9,"Champagne Vinaigrette, Champagne",Packaged foods,CANOLA AND SOYBEAN OIL; WHITE WINE (CONTAINS S...,
4,food_an4jjueaucpus2a3u1ni8auhe7q9,"Champagne Vinaigrette, Champagne",Packaged foods,WATER; CANOLA AND SOYBEAN OIL; WHITE WINE (CON...,
5,food_bmu5dmkazwuvpaa5prh1daa8jxs0,"Champagne Dressing, Champagne",Packaged foods,SOYBEAN OIL; WHITE WINE (PRESERVED WITH SULFIT...,https://www.edamam.com/food-img/ab2/ab2459fc2a...
6,food_alpl44taoyv11ra0lic1qa8xculi,Champagne Buttercream,Generic meals,sugar; butter; shortening; vanilla; champagne;...,
7,food_byap67hab6evc3a0f9w1oag3s0qf,Champagne Sorbet,Generic meals,Sugar; Lemon juice; brandy; Champagne; Peach,
8,food_am5egz6aq3fpjlaf8xpkdbc2asis,Champagne Truffles,Generic meals,butter; cocoa; sweetened condensed milk; vanil...,
9,food_bcz8rhiajk1fuva0vkfmeakbouc0,Champagne Vinaigrette,Generic meals,champagne vinegar; olive oil; Dijon mustard; s...,


The table shows the top 20 products returned by the RapidAPI API when a query with the keyword "Champagne" is sent to it.  
It can already be noticed that the "foodContentsLabel" feature is not entirely homogeneous in its nomenclature and that the description images are mostly missing for these products.

## 4) Conlusion

Consequently, using this API to enrich our dataset and perform classification may not be possible if we rely solely on description images. Furthermore, if it turns out that images are provided for other products and that this API can be used, we'll first need to ensure that the nomenclature of each value present in the features is homogenized.

# IV) Exportation of the collected 10 first products data

In [8]:
# Exports the 10 first products of the dataset in csv format.
df_products.iloc[:10].to_csv(os.path.join(CSV_EXPORT_PATH, 'RapidAPI-Champagne_request-10_first_products.csv'),
                   index=False
                  )

# V) Removal of the loaded data following the RGPD principles

In [9]:
del response, response_json, l_dicts, df_products

All the variables used containing the collected data are deleted and only the required content of the table was exported in csv format to comply with the GDPR regulation on the processing and storage of external data.