## Download needed documents
Institute of Bird Populations list: updated 2020, downloaded in June 2021. The following code can be run in a terminal or by using the `%%bash` cell magic in Jupyter Notebook

In [1]:
import pandas as pd

In [2]:
%%bash
# IBP zipped alpha codes in taxonomic order
wget https://www.birdpop.org/docs/misc/IBPAOU.zip
unzip IBPAOU.zip
mv IBP-AOS-LIST21.csv ibp-alpha-codes_2021.csv
rm IBPAOU.zip

Archive:  IBPAOU.zip
  inflating: .zip                    


--2023-01-27 08:23:14--  https://www.birdpop.org/docs/misc/IBPAOU.zip
Resolving www.birdpop.org (www.birdpop.org)... 204.44.192.44
Connecting to www.birdpop.org (www.birdpop.org)|204.44.192.44|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 47860 (47K) [application/zip]
Saving to: ‘IBPAOU.zip’

     0K .......... .......... .......... .......... ......    100%  310K=0.2s

2023-01-27 08:23:14 (310 KB/s) - ‘IBPAOU.zip’ saved [47860/47860]

replace IBP-AOS-LIST22.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: error:  invalid response [mv IBP-AO]
replace IBP-AOS-LIST22.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: error:  invalid response [S-LIST21.]
replace IBP-AOS-LIST22.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: error:  invalid response [csv ibp-a]
replace IBP-AOS-LIST22.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: error:  invalid response [lpha-code]
replace IBP-AOS-LIST22.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: error:  invalid response [s_2021.cs]
replace IBP-AOS-L

Bird Banding Laboratory list: read directly from website in June 2021, but it's unclear when last updated. This requires installation of `pandas` and `lxml`

In [3]:
ibp_table = pd.read_csv("ibp-alpha-codes_2021.csv")
ibp_table.columns = ['non_species', 'true_alpha', 'conflict', 'common_name', 'scientific_name', 'true_alpha_6', 'conflict_6']
ibp_table.to_csv("ibp-alpha-codes_2021.csv", index=False)

In [4]:
[bbl_table] = pd.read_html("https://www.pwrc.usgs.gov/bbl/manual/speclist.cfm")

In [5]:
bbl_table = bbl_table.drop(['Species Number', 'Band Size', 'French Name', 'Taxonomic Order'], axis=1)
bbl_table.columns = ['true_alpha', 'common_name', 'scientific_name', 't_and_e', 'comments']

In [6]:
bbl_table.to_csv("bbl-alpha-codes_2021.csv", index=False)

## Get expected alpha codes

In [7]:
bbl_table = pd.read_csv("bbl-alpha-codes_2021.csv")
ibp_table = pd.read_csv("ibp-alpha-codes_2021.csv")

Produce expected alpha codes from the common names. Rules adapted from: https://sora.unm.edu/sites/default/files/journals/nabb/v028n02/p0064-p0079.pdf

Mostly works. Doesn't work for:

Poo-uli: 
* Desired output: POUL
* Actual output: POO-

Chuck-will's-Widow:
* Desired output: CWWI
* Actual output: CHUC

In [8]:
def expected_alpha(common_name):
    split_spaces = common_name.split(' ')
    split_spaces_hyphens = common_name.replace('-', ' ').replace('/', ' ').split(' ')
    
    # Canvasback --> CANV
    # Also: Chuck-will's-Widow --> CHUC, an error :(
    if len(split_spaces) == 1:
        return split_spaces[0][:4]
    
    if len(split_spaces) == 2:
        
        # Eastern Towhee --> EATO
        if len(split_spaces_hyphens) == 2:
            return split_spaces[0][:2] + split_spaces[1][:2]
    
        elif len(split_spaces_hyphens) == 3:
            
            # Yellow-rumped Warbler --> YRWA
            if ('-' in split_spaces[0]) or ('/' in split_spaces[0]):
                return split_spaces_hyphens[0][0] + split_spaces_hyphens[1][0] + split_spaces_hyphens[2][:2]
            
            # Eastern Screech-Owl --> EASO
            else:
                return split_spaces_hyphens[0][:2] + split_spaces_hyphens[1][0] + split_spaces_hyphens[2][0]
        
        # Black-and-white Warbler --> BAWW
        # Band-rumped Storm-Petrel --> BRSP [this produces a conflict!]
        else:
            return split_spaces_hyphens[0][0] + split_spaces_hyphens[1][0] + split_spaces_hyphens[2][0] + split_spaces_hyphens[3][0]
        
    elif len(split_spaces) == 3:
        # American Tree Sparrow --> ATSP
        if ('-' not in common_name) and ('/' not in common_name):
            return split_spaces[0][0] + split_spaces[1][0] + split_spaces[2][:2]
        
        # Great Black-backed Gull --> GBBG
        else:
            return split_spaces_hyphens[0][0] + split_spaces_hyphens[1][0] + split_spaces_hyphens[2][0] + split_spaces_hyphens[3][0]
    
    # Puget Sound White-crowned Sparrow --> PSWP
    else:
            return split_spaces[0][0] + split_spaces[1][0] + split_spaces[2][0] + split_spaces[3][0]
        


In [9]:
# Put expected codes into IBP table
ibp_table['expected_alpha'] = ibp_table['common_name'].apply(expected_alpha).str.upper()
ibp_table[ibp_table['expected_alpha'] != ibp_table['true_alpha']].head()

Unnamed: 0,non_species,true_alpha,conflict,common_name,scientific_name,true_alpha_6,conflict_6,expected_alpha
18,+,SRGH,,Snow X Ross's Goose Hybrid,Anser caerulescens x rossii,ANSCAR,,SXRG
31,,BARG,*,Barnacle Goose,Branta leucopsis,BRALEU,,BAGO
32,,CACG,*,Cackling Goose,Branta hutchinsii,BRAHUT,,CAGO
36,,CANG,*,Canada Goose,Branta canadensis,BRACAN,,CAGO
42,,TRUS,*,Trumpeter Swan,Cygnus buccinator,CYGBUC,,TRSW


In [10]:
# Put expected codes into BBL data
bbl_table['expected_alpha'] = bbl_table['common_name'].apply(expected_alpha).str.upper()
bbl_table[bbl_table['expected_alpha'] != bbl_table['true_alpha']].head()

Unnamed: 0,true_alpha,common_name,scientific_name,t_and_e,comments,expected_alpha
56,HERG,Herring Gull,Larus argentatus,,,HEGU
64,SHOG,Short-billed Gull,Larus brachyrhynchus,,"from Mew Gull Split, AOS 62nd Supplement (2021)",SBGU
65,HEEG,Heermann's Gull,Larus heermanni,,,HEGU
74,ROYT,Royal Tern,Thalasseus maximus,,,ROTE
78,CAYT,Cayenne Tern,Thalasseus acuflavidus eurygnathus,,,CATE


Sanity check: the dataframe where the expected does not equal the true should either have a star in the `'CONF'` column or a plus in the `'SP'` column

In [11]:
# Expected != True
unmatched = ibp_table[ibp_table['expected_alpha'] != ibp_table['true_alpha']]

# If expected != true and there's no "*" in the conflict column, print the row
for idx, row in unmatched.iterrows():
    try:
        assert (row['non_species'] == '+') or (row['conflict'] == '*')
    except AssertionError:
        print(row)

non_species                             NaN
true_alpha                             CWWI
conflict                                NaN
common_name              Chuck-will's-widow
scientific_name    Antrostomus carolinensis
true_alpha_6                         ANTCAR
conflict_6                              NaN
expected_alpha                         CHUC
Name: 266, dtype: object
non_species                       NaN
true_alpha                       SBAG
conflict                          NaN
common_name         Slaty-backed Gull
scientific_name    Larus schistisagus
true_alpha_6                   LARSCH
conflict_6                        NaN
expected_alpha                   SBGU
Name: 635, dtype: object
non_species                          NaN
true_alpha                          RTHE
conflict                             NaN
common_name        Rufescent Tiger-Heron
scientific_name       Tigrisoma lineatum
true_alpha_6                      TIGLIN
conflict_6                           NaN
expecte

Most of these are understandable. Failures:

* CWWI and Poo-uli: hyphen problems
* Rufescent Tiger-Hereon and Fasciated Tiger-Heron: whoever made these alpha codes disregarded the fact that Tiger-Heron is hyphenated
* LeConte's Thrasher, LeConte's Sparrow, McKay's Bunting, and MacGillivray's Warbler: apparently the "mc" and "le" patronymics have a different naming scheme.
* Puerto Rican Owl: Should have a conflict * but it does not
* Mayan Antthrush: seems like someone just got cute when they made this alpha code

## Create "display" common names / comparison names

These will be fixed up in a "display name" column

### Fix display names

* How hybrids are written
* How slashes and unidentified birds are written


In [12]:
bbl_table['display_name'] = bbl_table['common_name'].str.replace(' X ', ' x ')

In [13]:
ibp_table['display_name'] = ibp_table['common_name'].str.replace(' X ', ' x ')

### What are the discrepancies between BBL and IBP common names?

Potential discrepancies:
* Hyphenation differences
* Capitalization differences
* European vs Eurasian GWT; American GWT


First, look through all of IBP and print the codes where BBL name is not equal to IBP name:

In [14]:
ibp_tc = ibp_table[['true_alpha', 'display_name', 'scientific_name', 'common_name']]
bbl_tc = bbl_table[['true_alpha', 'display_name', 'scientific_name', 'common_name']]

# Look through all the IBP species
ibp_idxes_checked = []
for idx, row in ibp_tc.iterrows():
    
    # Record which IBP entries we've looked through
    ibp_idxes_checked.append(idx)
    
    # Find the BBL name for this IBP species
    bbl_for_this = bbl_tc[bbl_tc['true_alpha'] == row['true_alpha']]
    
    # If BBL has a code for this species and the code is not equal to the IBP code
    if len(bbl_for_this) > 0 and bbl_for_this['display_name'].values[0] != row['display_name']: 
        print('IBP SPECIES:', row)
        print()
        print('BBL SPECIES:', bbl_for_this)
        print()
        print()
        print()

IBP SPECIES: true_alpha                     BARG
display_name         Barnacle Goose
scientific_name    Branta leucopsis
common_name          Barnacle Goose
Name: 31, dtype: object

BBL SPECIES:     true_alpha       display_name   scientific_name        common_name
344       BARG  Bar-tailed Godwit  Limosa lapponica  Bar-tailed Godwit



IBP SPECIES: true_alpha                                  ACGO
display_name             Aleutian Cackling Goose
scientific_name    Branta hutchinsii leucopareia
common_name              Aleutian Cackling Goose
Name: 33, dtype: object

BBL SPECIES:     true_alpha           display_name                scientific_name  \
243       ACGO  Aleutian Canada Goose  Branta hutchinsii leucopareia   

               common_name  
243  Aleutian Canada Goose  



IBP SPECIES: true_alpha                               AGWT
display_name       American Green-winged Teal
scientific_name      Anas crecca carolinensis
common_name        American Green-winged Teal
Name: 76, 

Now, check through BBL as well, skipping any that already came up above also need to be fixed...

In [15]:
for idx, row in bbl_tc.iterrows():
    ibp_for_this = ibp_tc[ibp_tc['true_alpha'] == row['true_alpha']]
    if (len(ibp_for_this) > 0) and (ibp_for_this['common_name'].values[0] != row['common_name']) and (ibp_for_this.index[0] not in ibp_idxes_checked):
        print('BBL SPECIES:', row)
        print()
        print('IBP SPECIES:', ibp_for_this)
        print()
        print()
        print()

Okay, we got them all in the previous round.

Now, check through the remaining IBP entries for ones that there's no BBL match for. Which ones have expected alpha code != true?

In [16]:
ibp_tc = ibp_table
bbl_tc = bbl_table

for idx, row in ibp_tc.iterrows():
    bbl_for_this = bbl_tc[bbl_tc['true_alpha'] == row['true_alpha']]
    if len(bbl_for_this) == 0 and row['expected_alpha'] != row['true_alpha']:
        print('IBP SPECIES:', row)
        print()
        print('BBL SPECIES:', bbl_for_this)
        print()
        print()
        print()

IBP SPECIES: non_species                      NaN
true_alpha                      CANG
conflict                           *
common_name             Canada Goose
scientific_name    Branta canadensis
true_alpha_6                  BRACAN
conflict_6                       NaN
expected_alpha                  CAGO
display_name            Canada Goose
Name: 36, dtype: object

BBL SPECIES: Empty DataFrame
Columns: [true_alpha, common_name, scientific_name, t_and_e, comments, expected_alpha, display_name]
Index: []



IBP SPECIES: non_species                    NaN
true_alpha                    COMS
conflict                         *
common_name        Common Shelduck
scientific_name    Tadorna tadorna
true_alpha_6                TADTAD
conflict_6                     NaN
expected_alpha                COSH
display_name       Common Shelduck
Name: 50, dtype: object

BBL SPECIES: Empty DataFrame
Columns: [true_alpha, common_name, scientific_name, t_and_e, comments, expected_alpha, display_name]
Ind

### Fix observed discrepancies

Above, I noticed that some of the entries that really represent the same taxon were written differently in the two tables. Change the display names in one of the tables to match the other table to resolve these discrepancies.

Common name to display name changes:
```
BBL 'Aleutian Canada Goose' --> 'Aleutian Cackling Goose'
BBL 'Green-winged Teal' --> 'American Green-winged Teal'
BBL 'European Green-winged Teal' --> 'Eurasian Green-winged Teal'
IBP 'Gray-cheeked/Bicknell's Thrush' --> 'Unidentified Gray-cheeked/Bicknell's Thrush'
IBP 'Cape Sable Seaside-Sparrow' --> 'Cape Sable Seaside Sparrow'
IBP 'Sharp-tailed Sparrow' --> 'Unidentified Sharp-tailed Sparrow'
IBP 'Bullock's x Baltimore Oriole Hybrid' --> 'Baltimore x Bullock's Oriole Hybrid
```

TODO: Should we add "GWTE"? DEJU? Etc.

In [17]:
bbl_table.loc[bbl_table[bbl_table['common_name'] == 'Aleutian Canada Goose'].index, 'display_name'] = 'Aleutian Cackling Goose'

In [18]:
bbl_table.loc[bbl_table[bbl_table['common_name'] == 'Green-winged Teal'].index, 'display_name'] = 'American Green-winged Teal'

In [19]:
bbl_table[bbl_table['display_name'] == 'American Green-winged Teal']

Unnamed: 0,true_alpha,common_name,scientific_name,t_and_e,comments,expected_alpha,display_name
196,AGWT,Green-winged Teal,Anas crecca,,,GWTE,American Green-winged Teal


In [20]:
bbl_table[bbl_table['common_name'] == 'American Green-winged Teal']

Unnamed: 0,true_alpha,common_name,scientific_name,t_and_e,comments,expected_alpha,display_name


In [21]:
bbl_table[bbl_table['display_name'] == 'Green-winged Teal']

Unnamed: 0,true_alpha,common_name,scientific_name,t_and_e,comments,expected_alpha,display_name


In [22]:
bbl_table.loc[bbl_table[bbl_table['common_name'] == 'European Green-winged Teal'].index, 'display_name'] = 'Eurasian Green-winged Teal'

In [23]:
ibp_table.loc[ibp_table[ibp_table['common_name'] == "Gray-cheeked/Bicknell's Thrush"].index, 'display_name'] = "Unidentified Gray-cheeked/Bicknell's Thrush"

In [24]:
ibp_table.loc[ibp_table[ibp_table['common_name'] == "Cape Sable Seaside-Sparrow"].index, 'display_name'] = "Cape Sable Seaside Sparrow"

In [25]:
ibp_table.loc[ibp_table[ibp_table['common_name'] == "Sharp-tailed Sparrow"].index, 'display_name'] = "Unidentified Sharp-tailed Sparrow"

In [26]:
ibp_table.loc[ibp_table[ibp_table['common_name'] == "Bullock's x Baltimore Oriole Hybrid"].index, 'display_name'] = "Baltimore x Bullock's Oriole Hybrid"

Now rerun the code above to make sure the above discrepancies were fixed.

We should only see discrepancies where the two authorities have the same alpha code for entirely different species, not ones where the common names are just a little different.

In [27]:
ibp_tc = ibp_table[['true_alpha', 'display_name', 'scientific_name', 'common_name']]
bbl_tc = bbl_table[['true_alpha', 'display_name', 'scientific_name', 'common_name']]

ibp_idxes_checked = []
for idx, row in ibp_tc.iterrows():
    ibp_idxes_checked.append(idx)
    bbl_for_this = bbl_tc[bbl_tc['true_alpha'] == row['true_alpha']]
    if len(bbl_for_this) > 0 and bbl_for_this['display_name'].values[0] != row['display_name']:
        print('IBP SPECIES:', row)
        print()
        print('BBL SPECIES:', bbl_for_this)
        print()
        print()
        print()

IBP SPECIES: true_alpha                     BARG
display_name         Barnacle Goose
scientific_name    Branta leucopsis
common_name          Barnacle Goose
Name: 31, dtype: object

BBL SPECIES:     true_alpha       display_name   scientific_name        common_name
344       BARG  Bar-tailed Godwit  Limosa lapponica  Bar-tailed Godwit



IBP SPECIES: true_alpha                         BLAG
display_name                 Black Guan
scientific_name    Chamaepetes unicolor
common_name                  Black Guan
Name: 115, dtype: object

BBL SPECIES:     true_alpha         display_name scientific_name          common_name
346       BLAG  Black-tailed Godwit   Limosa limosa  Black-tailed Godwit



IBP SPECIES: true_alpha                           PAPI
display_name             Passenger Pigeon
scientific_name    Ectopistes migratorius
common_name              Passenger Pigeon
Name: 196, dtype: object

BBL SPECIES:      true_alpha    display_name  scientific_name     common_name
1172       PAP

# Create a table with alpha codes and display names, to be searchable

Desired behavior if I type in...

The alpha code for a species with no conflicts and the same alpha codes: 
```
CODE: Species Name.
```

The conflicting alpha code for a species with a conflict, code not in use:
```
CODE: Not in use. 

Confusion species:
Species1 - IBPCODE - BBLCODE
Species2 - IBPCODE - BBLCODE
```

The conflicting alpha code for a species with a conflict, code in use by both for same species:

```
CODE: Species Name.

Confusion species:
Species1 - IBPCODE - BBLCODE
```


The conflicting alpha code for a species with a conflict, other scenario not described:

```
CODE: Conflict.

Confusion species:
Species1 - IBCO (IBP) - BBLCODE (BBL)
Species2 - IBPCODE - BBLCODE
```

The non-conflicting alpha code for a species with a conflict:

```
CODE: Species Name.

Confusion species:
Species1 - IBPCODE - BBLCODE
```

In [28]:
def code_in_use(code, row):
    if code in row['true_alpha'].values:
        return True
    else:
        return False

In [29]:
def expected_code_in_use(row):
    # Asserts that the expected code is unique
    [expected_code] = row['expected_alpha'].unique()

    return code_in_use(expected_code, row)



One (not quite foolproof) way to find out of the logic below is sound : return from this function at every print. run this function on every possible alpha code. make sure it always returns something.

Also, check the conflicts in the IBP document for good fodder for tests.

In [30]:
vc = ibp_table['expected_alpha'].value_counts()

In [31]:
vc[vc > 1]

CAWR    5
LESP    4
SBWO    4
COPO    4
BRWA    4
       ..
GHTA    2
BLGU    2
SCWO    2
NOBO    2
GCSP    2
Name: expected_alpha, Length: 205, dtype: int64

Are there any where the code used to resolve the conflict is the expected code of another species? Yes, one: Prothonotary Warbler!

In [32]:
expected_codes_ibp = ibp_table['expected_alpha'].unique()
for a in expected_codes_ibp:
    used = ibp_table[ibp_table['true_alpha'] == a]
    if used.shape[0] > 0:
        if used['expected_alpha'].values[0] != a:
            print(used)
        

     non_species true_alpha conflict           common_name  \
2079         NaN       PROW        *  Prothonotary Warbler   

          scientific_name true_alpha_6 conflict_6 expected_alpha  \
2079  Protonotaria citrea       PROCIT        NaN           PRWA   

              display_name  
2079  Prothonotary Warbler  
     non_species true_alpha conflict             common_name  \
2092         NaN       MGWA      NaN  MacGillivray's Warbler   

         scientific_name true_alpha_6 conflict_6 expected_alpha  \
2092  Geothlypis tolmiei       GEOTOL        NaN           MAWA   

                display_name  
2092  MacGillivray's Warbler  


In [33]:
ibp_table[ibp_table['expected_alpha'] == 'PROW']

Unnamed: 0,non_species,true_alpha,conflict,common_name,scientific_name,true_alpha_6,conflict_6,expected_alpha,display_name
884,,PRIO,,Puerto Rican Owl,Gymnasio nudipes,GYSNUD,*,PROW,Puerto Rican Owl


In [34]:
expected_codes_bbl = bbl_table['expected_alpha'].unique()
for a in expected_codes_bbl:
    used = bbl_table[bbl_table['true_alpha'] == a]
    if used.shape[0] > 0:
        if used['expected_alpha'].values[0] != a:
            print(used)
        

Create some mini tables.

In [35]:
ibp_df = ibp_table[['true_alpha', 'display_name', 'scientific_name', 'expected_alpha']]
ibp_df.loc[:, 'authority'] = 'ibp'
bbl_df = bbl_table[['true_alpha', 'display_name', 'scientific_name', 'expected_alpha']]
bbl_df.loc[:, 'authority'] = 'bbl'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value


In [36]:
def code_search(term, print_result=True, sep='<br>'):
    #sep="\n" is another possibility
    print_string = ""
    
    # Get all species that either have this expected alpha or use it as actual alpha
    term = term.upper()
    all_ibp = ibp_df[(ibp_table['expected_alpha'] == term) | (ibp_table['true_alpha'] == term)]
    all_bbl = bbl_df[(bbl_table['expected_alpha'] == term) | (bbl_table['true_alpha'] == term)]
    all_all = pd.concat([all_ibp, all_bbl])
    
    # Get all confusion species for anything with this expected alpha
    all_all = pd.concat([all_all, ibp_df[
        (ibp_df.expected_alpha.isin(all_all.expected_alpha))
        
    ]])
    all_all = pd.concat([all_all, bbl_df[
        (bbl_df.expected_alpha.isin(all_all.expected_alpha))
        
    ]])
    all_all = all_all.drop_duplicates()

    where_used = all_all[all_all.true_alpha == term]
    
    # Put IBP first
    where_used = where_used.sort_values('authority', ascending=True)
    
    ## STEP 1: PRINT WHETHER SOMEONE ACTUALLY USES THE CODE, AND IF SO, WHAT IT REFERS TO
    # If both use the code
    if where_used.shape[0] == 2:
        # e.g. YEWA
        # If both refer to the same species
        if len(where_used.display_name.unique()) == 1:
            print_string += f"{term}: {where_used.display_name.values[0]} - {where_used.scientific_name.values[0]}{sep}"
            species = where_used.display_name.values[0]
            scientific_name = where_used.scientific_name.values[0]
    
        # If it differs between the two
        else:
            # e.g. KEPE
            print_string += (f"{term}: Code refers to different taxa in BBL and IBP.{sep}")
            print_string += f"{term}: {where_used.display_name.values[0]} - {where_used.scientific_name.values[0]} ({where_used.authority.values[0].upper()}){sep}"
            print_string += f"{term}: {where_used.display_name.values[1]} - {where_used.scientific_name.values[1]} ({where_used.authority.values[1].upper()}){sep}"
            species = where_used.display_name.values[0], where_used.display_name.values[1]
            scientific_name = where_used.scientific_name.values[0], where_used.scientific_name.values[1]
    
    # If only one authority uses the code for one species
    elif where_used.shape[0] == 1:
        different_code = all_all[
            (all_all.display_name == where_used.display_name.values[0]) &
            (all_all.true_alpha != term)
        ]
        
        # If that species does not appear in the other authority under a different code
        # e.g. WTPT
        if different_code.shape[0] == 0:
            print_string += f"{term}: {where_used.display_name.values[0]} ({where_used.authority.values[0].upper()}){sep}"
            species = where_used.display_name.values[0]
            scientific_name = where_used.scientific_name.values[0]
        
        # If that species does appear in the other authority under a different code
        # e.g. CANG
        else:
            print_string += f"{term}: {where_used.display_name.values[0]} - {where_used.scientific_name.values[0]} ({different_code.authority.values[0].upper()}: {different_code.true_alpha.values[0].upper()}){sep}"
            species = where_used.display_name.values[0]
            scientific_name = where_used.scientific_name.values[0]
            #print_string += f"{term}: {where_used.display_name.values[0]} ({where_used.authority.values[0].upper()}){sep}"
            #print_string += f"[{different_code.true_alpha.values[0]}: {where_used.display_name.values[0]} ({different_code.authority.values[0].upper()})]{sep}"
            #print_string += f"({different_code.authority.values[0].upper()} code for {where_used.display_name.values[0]}: {different_code.true_alpha.values[0]}){sep}"
            
    # If neither use the code
    # E.g. PRWA
    else:
        assert where_used.shape[0] == 0
        if all_all.shape[0] > 0:
            print_string += f"{term}: Code not in use.{sep}"
            species = ''
            scientific_name = ''
        else:
            print_string += f"{term}: Code not found.{sep}"
            return False, False
        
    ## STEP 2: PRINT CONFUSION SPECIES.
    confusions = all_all[
        (all_all.true_alpha != term) &
        (~all_all.display_name.isin(where_used.display_name.values))
    ]
    if confusions.shape[0] > 0:
        print_string += f"{sep}Confusion species:{sep}"
    for confusion_species in confusions.display_name.unique():
        instances = confusions[confusions.display_name == confusion_species]
        assert instances.shape[0] in [1, 2]
        
        # If the confusion species appears in both authorities
        if instances.shape[0] == 2:
            
            # If the authorities have different codes for the confusion species
            # E.g. LESP --> different codes for Lesser Sand-Plover and Least Storm-Petrel
            if len(instances.true_alpha.unique()) == 2:
            #    print_string += f"{instances.true_alpha.values[0]}: {confusion_species} ({instances.authority.values[0].upper()}){sep}"
            #    print_string += f"{instances.true_alpha.values[1]}: {confusion_species} ({instances.authority.values[1].upper()}){sep}"
                print_string += f"{instances.true_alpha.values[0]}/{instances.true_alpha.values[1]}: {confusion_species}{sep}"

            # If the authorities have the same code for the confusion species
            # E.g. MAWA --> MacGillivray's Warbler uses MGWA for both authorities
            else:
                print_string += f"{instances.true_alpha.values[0]}: {confusion_species}{sep}"
        
        # If the confusion species appears in only one authority
        # E.g. MAWA --> Mangrove Warbler appears only in IBP
        else:
            print_string += f"{instances.true_alpha.values[0]}: {confusion_species} ({instances.authority.values[0].upper()}){sep}"
    
    #if print_result:
    #    print(print_string) 
        
    return print_string, term, species, scientific_name
code_search('yewa')

('YEWA: Yellow Warbler - Setophaga petechia<br>',
 'YEWA',
 'Yellow Warbler',
 'Setophaga petechia')

In [37]:
print(code_search("LCTH"))
print(code_search("leth"))
print(code_search("KEPE"))
print(code_search("SBOH"))
print(code_search("AGWT"))
print(code_search("gwte"))
print(code_search("grej"))
print(code_search("cang"))
print(code_search("hard"))
print(code_search("lesp"))
print(code_search("prow"))


("LCTH: LeConte's Thrasher - Toxostoma lecontei<br>", 'LCTH', "LeConte's Thrasher", 'Toxostoma lecontei')
("LETH: Code not in use.<br><br>Confusion species:<br>LCTH: LeConte's Thrasher<br>", 'LETH', '', '')
('KEPE: Code refers to different taxa in BBL and IBP.<br>KEPE: Kerguelen Petrel - Aphrodroma brevirostris (BBL)<br>KEPE: Kermadec Petrel - Pterodroma neglecta (IBP)<br>', 'KEPE', ('Kerguelen Petrel', 'Kermadec Petrel'), ('Aphrodroma brevirostris', 'Pterodroma neglecta'))
('SBOH: Spotted x Barred Owl Hybrid - nan<br>', 'SBOH', 'Spotted x Barred Owl Hybrid', nan)
('AGWT: American Green-winged Teal - Anas crecca<br><br>Confusion species:<br>GWTE: Green-winged Teal (IBP)<br>', 'AGWT', 'American Green-winged Teal', 'Anas crecca')
('GWTE: Green-winged Teal (IBP)<br><br>Confusion species:<br>AGWT: American Green-winged Teal (BBL)<br>', 'GWTE', 'Green-winged Teal', 'Anas crecca')
('GREJ: Great Jacamar (IBP)<br><br>Confusion species:<br>GRJA: Green Jay<br>', 'GREJ', 'Great Jacamar', 'Jacamer

In [38]:

all_codes_appearing = []
all_codes_appearing.extend(ibp_table.true_alpha.unique())
all_codes_appearing.extend(ibp_table.expected_alpha.unique())
all_codes_appearing.extend(bbl_table.true_alpha.unique())
all_codes_appearing.extend(bbl_table.expected_alpha.unique())
all_codes = sorted(set(all_codes_appearing))

print_column = []
code_column = []
species_column = []
sci_name_column = []
for code in all_codes:
    ret = code_search(code, print_result=False)
    print_column.append(ret[0])
    code_column.append(ret[1])
    species_column.append(ret[2])
    sci_name_column.append(ret[3])



In [39]:
df = pd.DataFrame({"Result":print_column, "code":code_column, "species":species_column, "scientific_name": sci_name_column})
df.head()

Unnamed: 0,Result,code,species,scientific_name
0,ABDD: Code not in use.<br><br>Confusion specie...,ABDD,,
1,ABDU: American Black Duck - Anas rubripes<br>,ABDU,American Black Duck,Anas rubripes
2,ABDX: American Black Duck Dominant x Mallard H...,ABDX,American Black Duck Dominant x Mallard Hybrid,
3,ABFL: Asian Brown Flycatcher (IBP)<br>,ABFL,Asian Brown Flycatcher,Muscicapa dauurica
4,ABMH: American Black Duck x Mallard Hybrid (IB...,ABMH,American Black Duck x Mallard Hybrid,Anas rubripes x platyrhynchos


# Now make a searchable HTML table.

### Remove hybrids and intergrades

Hybrids:

In [40]:
df = df[~df['Result'].str.lower().str.contains("hybrid", na=False)]

Intergrades:

In [41]:
df = df[~df['Result'].str.lower().str.contains("intergrade", na=False)]


### Put search terms in a single column

In [42]:
df

Unnamed: 0,Result,code,species,scientific_name
1,ABDU: American Black Duck - Anas rubripes<br>,ABDU,American Black Duck,Anas rubripes
3,ABFL: Asian Brown Flycatcher (IBP)<br>,ABFL,Asian Brown Flycatcher,Muscicapa dauurica
5,ABTO: Abert's Towhee - Melozone aberti<br>,ABTO,Abert's Towhee,Melozone aberti
6,ACCA: Audubon's Crested Caracara (BBL)<br>,ACCA,Audubon's Crested Caracara,Polyborus plancus audubonii
7,ACFL: Acadian Flycatcher - Empidonax virescens...,ACFL,Acadian Flycatcher,Empidonax virescens
...,...,...,...,...
2642,ZEBD: Zebra Dove - Geopelia striata<br><br>Con...,ZEBD,Zebra Dove,Geopelia striata
2643,ZEDO: Code not in use.<br><br>Confusion specie...,ZEDO,,
2644,ZEND: Zenaida Dove - Zenaida aurita<br><br>Con...,ZEND,Zenaida Dove,Zenaida aurita
2645,ZIPE: Zino's Petrel (IBP)<br>,ZIPE,Zino's Petrel,Pterodroma madeira


In [43]:
df['search'] = df['code'].astype(str) + ' ' + df['species'].astype(str) + ' ' + df['scientific_name'].astype(str)

In [45]:
df#.head()

Unnamed: 0,Result,code,species,scientific_name,search
1,ABDU: American Black Duck - Anas rubripes<br>,ABDU,American Black Duck,Anas rubripes,ABDU American Black Duck Anas rubripes
3,ABFL: Asian Brown Flycatcher (IBP)<br>,ABFL,Asian Brown Flycatcher,Muscicapa dauurica,ABFL Asian Brown Flycatcher Muscicapa dauurica
5,ABTO: Abert's Towhee - Melozone aberti<br>,ABTO,Abert's Towhee,Melozone aberti,ABTO Abert's Towhee Melozone aberti
6,ACCA: Audubon's Crested Caracara (BBL)<br>,ACCA,Audubon's Crested Caracara,Polyborus plancus audubonii,ACCA Audubon's Crested Caracara Polyborus plan...
7,ACFL: Acadian Flycatcher - Empidonax virescens...,ACFL,Acadian Flycatcher,Empidonax virescens,ACFL Acadian Flycatcher Empidonax virescens
...,...,...,...,...,...
2642,ZEBD: Zebra Dove - Geopelia striata<br><br>Con...,ZEBD,Zebra Dove,Geopelia striata,ZEBD Zebra Dove Geopelia striata
2643,ZEDO: Code not in use.<br><br>Confusion specie...,ZEDO,,,ZEDO
2644,ZEND: Zenaida Dove - Zenaida aurita<br><br>Con...,ZEND,Zenaida Dove,Zenaida aurita,ZEND Zenaida Dove Zenaida aurita
2645,ZIPE: Zino's Petrel (IBP)<br>,ZIPE,Zino's Petrel,Pterodroma madeira,ZIPE Zino's Petrel Pterodroma madeira


# Write as json variable in file

In [46]:
with open("../docs/resources/alpha_data.html", "w+") as f:
    j = df[['search', 'Result']].set_index("search").transpose().to_json()
    f.write(f"var search_data = {j};")
