# Update species seen file

Creates an aggregate life list using checklists 

Before running this file:
* Use `initialize_list.ipynb` to generate a base list of all scientific and English (United States) common names
* Ensure your eBird "Species name display" preference is set to one of the following:
    * "Both"
    * "Scientific name"
    * "Common name" with the English (United States) translation option selected
* Obtain your eBird life list from [this link](https://ebird.org/MyEBird?cmd=list&rtype=custom&r=world&time=life&fmt=csv) while logged into your eBird account
* Place all `.csv`s to be processed in the `csv_dir_new` directory. Once each has been processed, it will be moved to the `csv_dir_processed` directory

In [1]:
import os
import pandas as pd

In [2]:
# Where lists to be processed are stored
csv_dir_new = 'lists_new/'
csv_dir_processed = 'lists_processed/'

# Where the completed seen_list is stored
current_list = 'seen_list.csv'
name_of_seen_column = 'number of people seeing species'

Load current list of species and get total number of species possible

In [3]:
# Use scientific name as index for easy lookup by scientific names later
all_species = pd.read_csv(current_list, index_col = 'scientific name')
total_spp_possible = all_species.shape[0]

For a demonstration, use the lines below. For using your own lists, comment out the lines below.

In [4]:
# Test lists for demonstration
csv_dir_new = 'lists_test/'
csv_dir_processed = 'lists_test/'

In [5]:
csvs_to_process = [file for file in os.listdir(csv_dir_new) if file.endswith('.csv')]


# For all new csvs
for csv_name in csvs_to_process:
    csv_path = os.path.join(csv_dir_new, csv_name)
    life_list = pd.read_csv(csv_path)
    
    # Increment rows of all_species DataFrame, using different row
    # locator algorithm depending on format of .csv
    
    # If 'Species' column of .csv is in "English name - scientific name" format:
    if ' - ' in life_list.Species[0]:
        print(f'{csv_name}: splitting species column & using scientific name')
        
        # The below method doesn't work with Hawai'ian species names, 
        # which are double hyphenated (Hawai'ian name - Common name (English) - Scientific name)
        #life_list[['Common Name', 'Scientific Name']] = life_list.Species.str.split(' - ', expand=True) 
        #for seen_species in life_list['Scientific Name'].values:
        #    assert seen_species in all_species.index
        #    all_species.loc[seen_species, name_of_seen_column] += 1
        
        for entry in life_list['Species'].values:
            scientific_name = entry.split(' - ')[-1]
            all_species.loc[scientific_name, name_of_seen_column] += 1
        
    
    # If 'Species' column of .csv is in English name only format
    elif life_list.Species[0] in all_species['English name'].values:
        print(f'{csv_name}: using English name')
        for seen_species in life_list['Species'].values:
            assert seen_species in all_species['English name'].values
            all_species.loc[all_species['English name'] == seen_species, name_of_seen_column] += 1
            
    # If 'Species' column of .csv is in scientific name only format
    elif life_list.Species[0] in all_species.index:
        print(f'{csv_name}: using scientific name')
        for seen_species in life_list['Species'].values:
            assert seen_species in all_species.index
            all_species.loc[seen_species, name_of_seen_column] += 1
    
    else:
        print(f'Failure for {csv_name}')
        continue
    
    # Move file to 'processed' folder
    os.rename(csv_path, os.path.join(csv_dir_processed, csv_name))

test_ebird_world_life_list_scinames.csv: using scientific name
test_ebird_world_life_list_commonnames.csv: using English name
test_ebird_world_life_list_common-and-scinames.csv: splitting species column & using scientific name


Ensure we haven't added any extra species!

In [6]:
assert(all_species.shape[0] == total_spp_possible)

Save to `.csv`

In [7]:
all_species.to_csv(current_list)