### 1. Introduction

This Notebook shows how you can download additional metadata for the birds songs recording so that you enhance your training data.  
As an application, we will just show how to download data from specific countries. As well, you can modify the scripts to download data per family or species.  

We will run requests for **xeno-canto** public API and use them to download the data, as specified in our request.  
The scripts download first page and then, based on the content of the first page response, will also run repeatedly, to download all pages of response.



### 2. Load Packages

We will need Python packages for **json** (to read the requested data), **requests** (to build the request to the **xeno-canto** API). We also add **tqdm** package, to show progress of using the API.

In [None]:
import numpy as np
import pandas as pd
import json
import requests
from tqdm import tqdm

### 2. Functions

The functions used to download content from **xeno-canto-org** are the following:  

* **get_first_page_per_country** - get, for a specific country, the first page of response, as well as metadata for the next pages;  
* **get_page_per_country** - get a specific page per country - this is called after a first page was downloaded and is called for each subsequent page;  
* **inspect_json** - print metadata for the first response;  
* **get_recordings** - retrieve payload from a downloaded page;   
* **download_suite_from_country** - end-to-end suite for downloading content for a specific country - call the above described functions.


In [None]:
def get_first_page_per_country(country):
    """
    @country: the country for which we download metadata content 
    @returns: the content downloaded
    """
    api_search = f"https://www.xeno-canto.org/api/2/recordings?query=cnt:{country}"
    response = requests.get(api_search)
    if response.status_code == 200:
        response_payload = json.loads(response.content)
        return response_payload
    else:
        return None

def get_page_per_country(country, page):
    """
    @country: the country for which we download metadata content 
    @page: the current page to be downloaded
    @returns: the content downloaded
    """
    api_search = f"https://www.xeno-canto.org/api/2/recordings?query=cnt:{country}&page={page}"
    response = requests.get(api_search)
    if response.status_code == 200:
        response_payload = json.loads(response.content)
        return response_payload
    else:
        return None

def inspect_json(json_data):
    """
    @json_data: json data to be inspected
    """
    print(f"recordings: {json_data['numRecordings']}")
    print(f"species: {json_data['numSpecies']}")
    print(f"page: {json_data['page']}")
    print(f"number pages: {json_data['numPages']}")

def get_recordings(payload):
    """
    @payload: json data from which we extract the bird recordings metadata collection
    @returns: birds recordings metadata collection
    """
    return payload["recordings"]

def download_suite_from_country(country, country_initial_payload):
    """
    @country: the country for which we download metadata content 
    @country_initial_payload: the initial downloaded payload for the country (1st page). We download all the other pages.
    @returns: the content recordings (all pages, including the original one)
    """
    pages = country_initial_payload["numPages"]
    
    all_recordings = []
    all_recordings = all_recordings + get_recordings(country_initial_payload)
    for page in tqdm(range(2,pages+1)):
        payload = get_page_per_country(country, page)
        recordings = get_recordings(payload)
        all_recordings = all_recordings + recordings
    
    return all_recordings

### 3. Application: download all metadata of recordings from a country

We are using the utility funtions to download and save the meta information for birdsongs recording for a specific country.

In [None]:
def download_save_all_meta_for_country(country):
    # download first batch. From here we extract the number of pages
    birds = get_first_page_per_country(country)
    # let's inspect the first batch
    inspect_json(birds)
    print(f"recordings in first batch: {len(get_recordings(birds))}")
    # download entire suite (all pages)
    suite = download_suite_from_country(country, birds)
    # convert the collection in a dataFrame
    data_df = pd.DataFrame.from_records(suite)
    # export the dataframe as a csv
    data_df.to_csv(f"birds_{country}.csv", index=False)
    print(f"suite length: {data_df.shape[0]}")
    return data_df

#### 3.1. Download France data

Download and save all data for France.

In [None]:
data_df = download_save_all_meta_for_country('france')

In [None]:
pd.set_option('max_columns', 30)
pd.set_option('max_colwidth', 100)
data_df.head()

#### 3.2. Download Romania data

Download and save all data for Romania.

In [None]:
data_df = download_save_all_meta_for_country('romania')

In [None]:
data_df.head()

#### 3.3. Download Bulgaria data

Download and save all data for Bulgaria.

In [None]:
data_df = download_save_all_meta_for_country('bulgaria')

In [None]:
data_df.head()

#### 3.4. Download Italy data

Download and save all data for Italy.

In [None]:
data_df = download_save_all_meta_for_country('italy')

In [None]:
data_df.head()

#### 3.5. Download India data

Download and save all data for India.

In [None]:
data_df = download_save_all_meta_for_country('india')

In [None]:
data_df.head()

#### 3.6. Download Brazil data

Download and save all data for Brazil. Brazil is one of the countries with largest collection of recordings.

In [None]:
data_df = download_save_all_meta_for_country('brazil')

In [None]:
data_df.head()