# Create the table resources for the website

Purposes of this script:
* Create a main, toggleable HTML table (table.html) with the following information:
    * Species names, information, and spectrograms
    * Oldbird link references for all species that have them
* Create a secondary HTML table (needed.html) with the following information:
    * The species that we have fewer than 30 recordings for
* Create a tertiary information sheet showing the variables used for extracting and cleaning up each species's sound clips

This notebook uses several resources to create the HTML code for the NFC website's table of spectrograms.

This notebook uses the following inputs:
* `../media/` - a folder containing spectrograms and audio for nocturnal flight calls. Top level folders are alpha codes, and within each alpha code folder is a folder of audio (`audio/`) and spectrograms (`spectrograms/`)
* `ibp-alpha-codes_2021.csv` - Downloaded from Goldeneye: https://github.com/rhine3/goldeneye
* `NACC_list_species.csv` - Downloaded from AOS: http://checklist.americanornithology.org/taxa/
* `bioacoustic_groups.csv` - Cobbled together from OldBird (especially http://www.oldbird.org/Library.htm) and, supplementary info from here: https://academic.oup.com/condor/article/116/3/371/5153144

To create the following outputs:
* `table.html` - the final main HTML table for the website, which contains spectrograms and a pared-down set of information from table_source.csv
* `needed.html` - the secondary HTML table, which contains information from table_source.csv for species we have no or fewer recordings of
* `table_source.csv` - a table containing the following information about all North American species and the codes for each file:
    * taxonomic_index - numerical index for sorting taxonomically
    * alphabetic_index - numerical index for sorting alphabetically
    * scientific_name - latin name (genus and species)
    * alpha_code - 4-letter alpha code
    * order - taxonomic order of species
    * family - taxonomic order of species
    * bioacoustic_category - Zeep, double-banded upsweep, etc. info from Mennill and Oldbird
    * description - description coming from Mennill site
    * high_or_low - whether the call is high, medium, low, or multiple
    * bandpass_low_freq - the low frequency used for bandpassing
    * bandpass_high_freq - the high frequency used for bandpassing
    * typical_duration_ms - typical duration from Mennill's guide
    * approx_duration - approx duration determined using our scripts
    * oldbird_spectrogram - whether the Oldbird guide contains a spectrogram for the species
    * oldbird_nocturnal_spectrogram - whether the Oldbird guide contains a *nocturnally recorded* spectrogram for the species
    * oldbird_link - the link to the species on OldBird, if there is one
    * images - a list of spectrogram images
    * audio - a list of audio files that go with the spectrogram images

In [1]:
from pathlib import Path
import pandas as pd

## Load resources

### Taxonomic information

Load a table containing family, order, and genus for each species.

In [2]:
species_df = pd.read_csv("NACC_list_species.csv", index_col='common_name')

### Alpha code information

Load a table of alpha codes and common names, which will help us translate from the `bird_folders`' alpha codes to the `species_df` records. This also contains the species in the correct taxonomic order, so we reset the index so we can both search by alpha code and have the species's taxonomic order.


In [3]:
alpha_df = pd.read_csv("ibp-alpha-codes_2021.csv")
alpha_df = alpha_df.reset_index().set_index("true_alpha")

### Manually created table of spec time limits, durations

`freqs_and_durations_manual_edits.csv` created here: https://docs.google.com/spreadsheets/d/1n4y7yxoter0clf9wcvyTU1R-CR4zzeRkSsYB_tsV3n8/edit#gid=1396987416

Contains columns:
* taxonomic_index - just an ordering variable
* code - alpha code
* low_freq - the low frequency for the bandpass
* high_freq - the high frequency for the bandpass
* median_duration - the median duration of calls as determined from our sample
* approx_duration - the rough expected duration of the call (very rough, so spectrogram lengths are somewhat consistent--lengths are either 0.05, 0.1, 0.15, 0.2, 0.25, or 0.5 seconds)
* frequency_modified - whether the frequency limits were modified from my first gess
* duration_modified - whether the duration was modified from my first algorithmic estimation

In [4]:
freqs_durs_df = pd.read_csv("../freqs_and_durations_manual_edits.csv")

### Manually created table of bioacoustic groups (and some other incomplete information)

Contains columns:
* species
* bioacoustic_group - mostly compiled from the supporting information of this paper - https://academic.oup.com/condor/article/116/3/371/5153144 and from OldBird's "cheat sheet" http://www.oldbird.org/Library.htm. Complete to the best of my knowledge for the group of species contained in this table.
* typical_length_ms - From the Mennill information. Incomplete.
* description - from the Mennill information. Incomplete.

In [5]:
bioacoustic_groups_df = pd.read_csv("bioacoustic_groups_mennill_evans.csv")

### Table 3 - All OldBird links that we have for individual species

`OldBird Comparison - All OldBird.csv` - created by Lauren here: https://docs.google.com/spreadsheets/d/1pBZLtxtXK3-SYkQT5I8V9ISCDgmovkbBhCyaS1lf8D8/edit#gid=1146517336

Contains columns:
* Common Name 
* Alpha Code
* OldBird Spectrogram - Y/N. whether OldBird has any spectrogram for this species. (Some species, like Blue-headed Vireo, only contain info about the migration habits of the species but no flight calls)
* OldBird Nocturnal Spectrogram - Y/N. Whether OldBird has a specifically nocturnal spectrogram for this species. (Some species only have diurnally recorded sounds, which could be different from the sounds predominantly given at night)
* Link - Link to OldBird page

In [6]:
oldbird_df = pd.read_csv("oldbird_comparison.csv")
oldbird_df.head()

Unnamed: 0,Common Name,Alpha Code,Oldbird Spectrogram,Oldbird Nocturnal Spectrogram,Link
0,White-crowned Pigeon,WCPI,N,N,http://oldbird.org/pubs/fcmb/species/doves/wcp...
1,Eurasion Collared-dove,EUCD,Y,N,http://oldbird.org/pubs/fcmb/species/doves/cod...
2,White-winged Dove,WWDO,N,N,http://oldbird.org/pubs/fcmb/species/doves/wwd...
3,Mourning Dove,MODO,N,N,http://oldbird.org/pubs/fcmb/species/doves/mod...
4,Black-billed Cuckoo,BBCU,Y,Y,http://oldbird.org/pubs/fcmb/species/cuckoos/b...


### Table 4 - All regularly occurring North American species

The ABA Checklist downloaded from here: https://www.aba.org/aba-checklist/

In [7]:
all_nabirds_df = pd.read_csv("ABA_Checklist-8.1.csv", skiprows=2)
regular_spp_df = all_nabirds_df[all_nabirds_df['Unnamed: 5'].isin([1,2])]
regular_spp_df.columns = ["group", "common_name", "french_name", "sci_name", "alpha_code", "aba_code"]
regular_spp_df = regular_spp_df.drop(["group", "aba_code", "french_name"], axis=1)
regular_spp_df.head()

Unnamed: 0,common_name,sci_name,alpha_code
0,Black-bellied Whistling-Duck,Dendrocygna autumnalis,BBWD
1,Fulvous Whistling-Duck,Dendrocygna bicolor,FUWD
2,Emperor Goose,Anser canagicus,EMGO
3,Snow Goose,Anser caerulescens,SNGO
4,Ross's Goose,Anser rossii,ROGO


## Create a CSV containing information about each species

Load a list of folders we have of media (spectrograms and audio) for bird sounds. The folders are named after alpha codes of each bird.

In [8]:
bird_folders = [folder for folder in Path("../../media/").glob("*") if folder.is_dir()]
print("Number of species:",len(bird_folders))
bird_folder_dict = {folder.name:folder for folder in bird_folders}

Number of species: 129


Use the ABA list to get info for each species.

In [9]:
def approx_freq_range(f1, f2):
    if f1 < 10: f1 = 0
    if f2 > 11000: f2 = 11000
    f1 = int(f1)
    f2 = int(f2)
    
    low = range(0, 5000, 1000)
    mid = range(3000, 7000, 1000)
    high = range(5000, 10000, 1000)
    
    this_range = range(f1, f2, 1000)
    
    low_intersection = len(set(low).intersection(this_range))
    mid_intersection = len(set(mid).intersection(this_range))
    high_intersection = len(set(high).intersection(this_range))
    
    intersections = []
    if low_intersection >= 4:
        intersections.append('Low')
    if mid_intersection >= 4:
        intersections.append('Middle')
    if high_intersection >= 4:
        intersections.append('High')

    return ', '.join(intersections) 
    
    
approx_freq_range(0.1, 11024.0)

'Low, Middle, High'

We will use the `regular_spp_df` below as a source for the species list for the table. 

The next cell makes that this df contains all the alpha codes of species we have info for.

In [10]:
for key in bird_folder_dict.keys():
    try:
        assert key in regular_spp_df.alpha_code.values
    except:
        print(key)

Create a dictionary associating each species contained in the bird folders list with information about the species. Dictionary keys are common names and values are lists containing information about each species.

In [11]:
table_dict = {}
names = regular_spp_df.common_name.to_list()
names.sort()

for taxonomic_index, (common_name, scientific_name, alpha_code) in regular_spp_df.iterrows():    
    # Taxonomic info from the AOS list
    order = species_df.loc[common_name].order
    family = species_df.loc[common_name].family
    alphabetic_index = names.index(common_name)

    # Info from the bioacoustic category table from Mennill and Evans
    # Not all species have this information
    group = bioacoustic_groups_df.query("species==@common_name")[['bioacoustic_group', 'typical_length_ms', 'description']]
    if len(group) == 0:
        bioacoustic_category = pd.NA
        typical_duration_ms = pd.NA
        description = pd.NA
    elif len(group) == 1:
        bioacoustic_category, typical_duration_ms, description = group.values[0]
    else:
        print(common_name)
        raise ValueError
    
    # Frequency and duration    
    if alpha_code in freqs_durs_df.code.unique():
        vals = freqs_durs_df.query("code==@alpha_code")[['low_freq', 'high_freq', 'median_duration', 'approx_duration']].values[0]
        bandpass_low_freq, bandpass_high_freq, median_duration, approx_duration = vals
        high_or_low = approx_freq_range(bandpass_low_freq, bandpass_high_freq)
    else:
        bandpass_low_freq = pd.NA
        bandpass_high_freq = pd.NA
        high_or_low = pd.NA
        approx_duration = pd.NA
        median_duration = pd.NA

    # Get Oldbird info
    if alpha_code in oldbird_df['Alpha Code'].unique():
        oldbird_spectrogram, oldbird_nocturnal_spectrogram, oldbird_link = oldbird_df[oldbird_df['Alpha Code'] == alpha_code][['Oldbird Spectrogram', 'Oldbird Nocturnal Spectrogram', 'Link']].values[0]
    else:
        oldbird_spectrogram = pd.NA
        oldbird_nocturnal_spectrogram = pd.NA
        oldbird_link = ""
    
    # Get lists of images
    if alpha_code in bird_folder_dict.keys():
        bird_folder = bird_folder_dict[alpha_code]
        jpgs = list(bird_folder.joinpath("spectrograms").glob("*.jpg"))
        display_images = ['assets'+str(f)[5:] for f in list(bird_folder.joinpath("spectrograms").glob("*_display.jpg"))]
        full_images = [display_image.strip("_display.jpg")+".jpg" for display_image in display_images]
        audio = ['assets/media/'+alpha_code+'/audio/'+Path(f).stem+'.wav' for f in full_images]

    else:
        jpgs = pd.NA
        display_images = pd.NA
        full_images = pd.NA
        audio = pd.NA
    
    table_dict[common_name] = [
        taxonomic_index, alphabetic_index, # For sorting taxonomically or alphabetically
        scientific_name, alpha_code, order, family, # Taxonomic info
        bioacoustic_category, description, # Zeep, etc. info from Mennill and Oldbird
        high_or_low, bandpass_low_freq, bandpass_high_freq, # Frequency info
        typical_duration_ms, median_duration, approx_duration, # Duration info
        oldbird_spectrogram, oldbird_nocturnal_spectrogram, oldbird_link, #Oldbird info
        display_images, full_images, audio
    ]


Turn the dictionary into a dataframe.

In [12]:
table = pd.DataFrame(
    table_dict,
    index=[
        "taxonomic_index", "alphabetic_index", # For sorting taxonomically or alphabetically
        "scientific_name", "alpha_code", "order", "family", # Taxonomic info
        "bioacoustic_category", "description", # Zeep, etc. info from Mennill and Oldbird
        "high_or_low", "bandpass_low_freq", "bandpass_high_freq", # Frequency info
        "typical_duration_ms", "median_duration", "approx_duration", # Duration info
        "oldbird_spectrogram", "oldbird_nocturnal_spectrogram", "oldbird_link", #Oldbird info
        "display_images", "full_images", "audio"]
)
table = table.transpose()
table.index = table.index.rename("common_name")
table = table.reset_index()

In [13]:
table.to_csv("table_source.csv", index=False)

## Convert the CSV into an HTML table.

In [14]:
import pandas as pd
from pathlib import Path
import ast # Used to transform string lists from CSV into actual lists
from PIL import Image # Used to check image dimensions

These columns of the table will be hidden to start.

In [15]:
hidden_tds = {
    'alpha_code':"Code",
    'order':"Order",
    'family':"Family",
    'bioacoustic_category':"Category",
    'oldbird_link':"OldBird link",
    'high_or_low':"Frequency"
}

### Create the header information for the table

We will create a string called `table_str` that we'll eventually write to the HTML file. 

First, give the table some buttons that allow users to toggle hidden columns on and off.

In [16]:
def create_header():
    # Allow users to toggle columns on/off
    table_str = """
<form class="form-block">
<div class="form-row">
    <label for="colButtons" class="col-lg-2 col-form-label">Display columns</label>
    <div class="col btn-group btn-group-toggle" data-toggle="buttons" id="colButtons">
        <label class="btn btn-primary active" id="species_button">
            <input type="checkbox" name="options" autocomplete="off" onchange="toggleHiddenColumn('species', 'species_button')"> Species
        </label>"""

    # Selectors for the hidden columns
    for col_class, col_title in hidden_tds.items():
        table_str += f"""
    <label class="btn btn-primary" id="{col_class}_button">
        <input type="checkbox" name="options" autocomplete="off" onchange="toggleHiddenColumn('{col_class}', '{col_class}_button')"> {col_title}
    </label>"""
    table_str += """
    <label class="btn btn-primary active" id="spectrograms_button">
        <input type="checkbox" name="options" autocomplete="off" checked onchange="toggleHiddenColumn('spectrograms', 'spectrograms_button')"> Spectrograms
    </label>
</div>
</form>"""
    
    
    # Header of the table with hidden columns hidden by default
    table_str +="""
<table>
  <thead>
      <tr>
        <th class="species">Species</th>"""
    for col_class, col_title in hidden_tds.items():
        table_str += f"""
        <th class="{col_class}" style="display:none;">{col_title}</th>"""
    table_str += """
        <th class="spectrograms" width="60%">Spectrograms</th>
      </tr>
  </thead>
  <tbody id="nfcTable">"""

    return table_str

### Add the spectrogram information to the table

Get the CSV created in the first section above.

In [17]:
full_df = pd.read_csv("table_source.csv")

In [18]:
df = full_df[~full_df.display_images.isna()]

Now we use the CSV that we just created to add a row for each species, where we encode formatting like italics for the scientific name and clickable links for the spectrograms. 

In [19]:
# How many spectrograms to show before toggling
#images_to_show = 5
table_str = create_header()

#max_num_images = 30
max_num_images = None
include_full_display_image = False

overlay_str = ""

for idx, row in df.iterrows():
    
    display_image_urls = ast.literal_eval(row.display_images)
    full_image_urls = ast.literal_eval(row.full_images)
    audio_urls = ast.literal_eval(row.audio)
    if len(display_image_urls) == 0:
        print(f"{row.alpha_code} has no images. Skipping")
        continue
    
    # Get common and scientific name and format them nicely in the string
    common_name = row['common_name']
    scientific_name = row['scientific_name']
    table_str += f"""
    <tr class="species_row">
        <td class="species">{common_name} (<i>{scientific_name}</i>)</td>"""

    # Get list of other searchable categories that won't display (as of yet)
    hidden_col_classes = hidden_tds.keys()
    for col_class, data in zip(hidden_col_classes, row[hidden_col_classes]):
        if str(data) == 'nan':
            data = ''
        elif col_class == 'oldbird_link':
            data = f'<a href="{data}" target="_blank">Link</a>'
        table_str += f"""
        <td class="{col_class}" style="display:none;">{data}</td>"""

    # Get list of images with links to audio files and put them all in a string, img_str
    alpha_code = row['alpha_code']
    img_str = ""
    idx = 0
    height = 100
    for audio_url, display_image_url, full_image_url in zip(audio_urls[:max_num_images], display_image_urls[:max_num_images], full_image_urls[:max_num_images]):
        idx += 1
        media_id = f"{alpha_code}{idx}"
        #img_str += f'<a href="{audio_url}" target="_blank" rel="noopener noreferrer"><img src="{display_image_url}" height="{height}"></a>\n'
        img_str += f"""
            <img 
                src="{display_image_url}" style="margin: 2px 2px 2px 2px; float: left;"
                height="{height}" onclick="overlayOn('{media_id}')">\n"""
        
        if include_full_display_image:
            overlay_str += f"""
    <div class="overlay" id="{media_id}" onclick="overlayOff(this)">
        <div class="audiobox">
            <img width="50%"
                data-src="{full_image_url}"></img>
            <audio controls
                data-src="{audio_url}">
            </audio>
        </div>
    </div>\n"""
        else:
            overlay_str += f"""
    <div class="overlay" id="{media_id}" onclick="overlayOff(this)">
        <div class="audiobox">
            <audio controls
                data-src="{audio_url}">
            </audio>
        </div>
    </div>\n"""
            

        

    #for audio_url, image_url in zip(audio_urls[:images_to_show], image_urls[:images_to_show]): # Non-hidden images
    #    img_str += f'<a href="{Path("assets").joinpath(audio_url)}" target="_blank"><img src="{Path("assets").joinpath(image_url)}"  height="{height}"></a>\n'
    #for audio_url, image_url in zip(audio_urls[images_to_show:max_num_images], image_urls[images_to_show:max_num_images]): # Hidden, toggle-able images
    #    img_str += f'<a href="{Path("assets").joinpath(audio_url)}" target="_blank"><img src="{Path("assets").joinpath(image_url)}"  height="{height}" class="image_toggle{idx}" style="display:none"></a>\n'
    
    
    # Add the image string to the table
    table_str += f"""
        <td class="spectrograms">
            <div class="spectrogramContainer">
                {img_str}
            </div>
        </td>
    </tr>"""

table_str += f"""
    </tbody>
</table>

{overlay_str}
"""

#print(table_str)
with open("/Users/tessa/Code/nfcs2/table.html", "w+") as f:
    f.write(table_str)

# The "Needed" ones

All the species that are in OldBird that we have fewer than 30 recordings of

In [20]:
completed_species = df[df['audio'].apply(lambda x: len(ast.literal_eval(x))) >= 30].alpha_code
needed_spp = oldbird_df[~oldbird_df['Alpha Code'].isin(completed_species)]
needed_spp.head()

Unnamed: 0,Common Name,Alpha Code,Oldbird Spectrogram,Oldbird Nocturnal Spectrogram,Link
0,White-crowned Pigeon,WCPI,N,N,http://oldbird.org/pubs/fcmb/species/doves/wcp...
1,Eurasion Collared-dove,EUCD,Y,N,http://oldbird.org/pubs/fcmb/species/doves/cod...
2,White-winged Dove,WWDO,N,N,http://oldbird.org/pubs/fcmb/species/doves/wwd...
3,Mourning Dove,MODO,N,N,http://oldbird.org/pubs/fcmb/species/doves/mod...
4,Black-billed Cuckoo,BBCU,Y,Y,http://oldbird.org/pubs/fcmb/species/cuckoos/b...


In [45]:
def create_header_needed_spp():    
    # Header of the table with hidden columns hidden by default
    table_str = """
<table class='table'>
  <thead>
      <tr>
        <th class="species">Species</th>
        <th class="oldbird_link">Oldbird Link</th>
    </thead>
    <tbody id="neededTable">
    """
    
    return table_str

In [46]:
table_str = create_header_needed_spp()
extra_commonnames = {'SHCO':'Shiny Cowbird'}
extra_scinames = {'SHCO':'Molothrus bonariensis'}

for idx, row in needed_spp.iterrows():
    # Get common and scientific name and format them nicely in the string
    alpha_code = row['Alpha Code']
    if alpha_code in regular_spp_df.alpha_code.to_list():
        this_sp = regular_spp_df[regular_spp_df['alpha_code'] == alpha_code]
        common_name = this_sp.common_name.values[0]
        scientific_name = this_sp.sci_name.values[0]
    elif alpha_code in extra_commonnames.keys():
        scientific_name = extra_scinames[alpha_code]
        common_name = extra_commonnames[alpha_code]
    else:
        scientific_name = ''
        print("NEED SCINAME FOR", alpha_code)
    table_str += f"""
    <tr class="species_row">
        <td class="species">{common_name} (<i>{scientific_name}</i>)</td>
        <td class="oldbird_link"><a href="{row['Link']}" target="_blank">Link</a>
    </tr>
    """

table_str += f"""
    </tbody>
</table>
"""

#print(table_str)
with open("/Users/tessa/Code/nfcs2/needed.html", "w+") as f:
    f.write(table_str)