# The One County Curse
Greg Majewski asks: if you could only bird in counties sharing the same name, which name could get you the biggest list? We can use the eBird data to figure this out. 

In [1]:
import os
import pandas as pd

I have already downloaded the eBird data for every county in the United States using `rebird`. (For instructions on how to do this, see [here](https://github.com/rhine3/chaser/blob/master/download_freqs.Rmd)). I am using the 2018 taxonomy, so this doesn't reflect current answers--but it probably comes close.

Each county is in a `.csv` in the directory `county_results/`. The files are named by county codes (e.g. `'US-AK-013.csv'`); let's first get a list of all the county codes we have. 

In [2]:
county_codes = [file[:-4] for file in os.listdir('county_results/') if file[:2] == 'US']

Next, I have downloaded a file from eBird (`'list.csv'`) which provides a mapping from county codes to county names. You can download it by clicking on this link: [county code csv](https://web.archive.org/web/20170823200026/http://ebird.org/ws1.1/ref/location/list?rtype=subnational2). We'll use this to get all the county codes associated with a single county name.

In [3]:
# Get all the entries from US counties in 'list.csv'
county_df = pd.read_csv('list.csv')
usa_county_df = county_df[county_df['SUBNATIONAL2_CODE'].isin(county_codes)]

# Get unique list of county names
usa_county_names = set(usa_county_df['SUBNATIONAL2_NAME'])

# Create a dictionary mapping each county name to county code(s)
county_dict = {}
for name in usa_county_names:
    county_dict[name] = list(usa_county_df[usa_county_df['SUBNATIONAL2_NAME'] == name].SUBNATIONAL2_CODE)

Now, use the `rebird` data to create a species list for every county name. We will have to remove some non-species in each county, including hybrids and "spuhs" or "slashes." In the future, this operation would be faster if I just downloaded a list of all ebird species, created a `set` of spuhs/slashes called `non_spp`, and did this set operation in the loop: `spp_only = all_taxa - non_spp`

In [4]:
spp_dict = {}
for name in usa_county_names:
    all_codes = county_dict[name]
    number_of_counties = len(all_codes)
    county_list = []

    # Add all species to the name's list
    for code in all_codes:
        all_taxa = set(pd.read_csv('county_results/' + code + '.csv')['comName'])
        
        # Remove any non-species level records
        spp_only = [x for x in all_taxa if not (('.' in x) or ('/' in x) or ('hybrid' in x))]
        
        # Add all species to list
        county_list.extend(spp_only)
        
    # For this name, make an entry: the number of counties 
    # that have this name, and the unique set of species
    spp_dict[name] = (number_of_counties, set(county_list))

Sort the results by number of species that each county has, and print them up.

In [5]:
for name in sorted(spp_dict, key=lambda name: len(spp_dict[name][1]), reverse=True):
    num_counties = spp_dict[name][0]
    if num_counties > 1:
        county_string = 'counties'
    else:
        county_string = 'county'
    print(f'{name}: {len(spp_dict[name][1])} species in {num_counties} {county_string}')

Orange: 615 species in 8 counties
Lincoln: 594 species in 24 counties
Washington: 586 species in 31 counties
Jefferson: 586 species in 26 counties
Santa Cruz: 569 species in 2 counties
Los Angeles: 551 species in 1 county
San Diego: 547 species in 1 county
Lake: 536 species in 12 counties
Jackson: 531 species in 24 counties
Hidalgo: 530 species in 2 counties
Clark: 529 species in 12 counties
Grant: 526 species in 15 counties
Douglas: 526 species in 12 counties
Cameron: 523 species in 3 counties
Santa Barbara: 515 species in 1 county
Monterey: 505 species in 1 county
Marin: 505 species in 1 county
Franklin: 501 species in 25 counties
Humboldt: 501 species in 3 counties
Monroe: 497 species in 17 counties
Polk: 486 species in 12 counties
Ventura: 485 species in 1 county
San Francisco: 482 species in 1 county
Riverside: 479 species in 1 county
Union: 474 species in 18 counties
San Bernardino: 473 species in 1 county
Cochise: 472 species in 1 county
Pima: 472 species in 1 county
Marion: 469

## Let's get personal

If I was forced to only bird in one county name from now on, with my current life list, which county should I choose?

I have already downloaded my eBird life list (`'csvs/my_ebird_world_life_list.csv'`). It has a column, `'Species'`, which is composed of the species I have seen in the following format: `Common Name - Scientific Name`. I'll first get a list of just common names:

In [6]:
my_life_list_sciname = pd.read_csv('csvs/my_ebird_world_life_list.csv')['Species']
my_life_list = set([sp.split(' - ')[0] for sp in my_life_list_sciname])

And now, for each county, delete all the species from its list that I have already seen.

In [7]:
# Create a dictionary associating county names with my lifers
my_spp_dict = {}

for name in usa_county_names:
    spp_in_counties = set(spp_dict[name][1])
    num_counties = spp_dict[name][0]
    
    # For each county list, subtract sets to find the list of lifers
    lifer_spp = spp_in_counties - my_life_list
    my_spp_dict[name] = (num_counties, lifer_spp)

Organize and print the results!

In [8]:
for name in sorted(my_spp_dict, key=lambda name: len(my_spp_dict[name][1]), reverse=True):  
    num_counties = my_spp_dict[name][0]
    if num_counties > 1:
        county_string = 'counties'
    else:
        county_string = 'county'
    print(f'{name}: {len(my_spp_dict[name][1])} lifers in {num_counties} {county_string}')

Santa Cruz: 231 lifers in 2 counties
Orange: 228 lifers in 8 counties
Lincoln: 216 lifers in 24 counties
Los Angeles: 214 lifers in 1 county
Hidalgo: 212 lifers in 2 counties
San Diego: 209 lifers in 1 county
Washington: 206 lifers in 31 counties
Jefferson: 205 lifers in 26 counties
Aleutians West: 191 lifers in 1 county
Cameron: 183 lifers in 3 counties
Santa Barbara: 180 lifers in 1 county
Monterey: 175 lifers in 1 county
Grant: 174 lifers in 15 counties
Clark: 172 lifers in 12 counties
Pima: 170 lifers in 1 county
Cochise: 167 lifers in 1 county
Marin: 166 lifers in 1 county
Ventura: 165 lifers in 1 county
Humboldt: 163 lifers in 3 counties
Riverside: 162 lifers in 1 county
Douglas: 161 lifers in 12 counties
Lake: 161 lifers in 12 counties
Jackson: 159 lifers in 24 counties
San Francisco: 158 lifers in 1 county
Monroe: 155 lifers in 17 counties
San Bernardino: 155 lifers in 1 county
Nome: 151 lifers in 1 county
San Mateo: 146 lifers in 1 county
Maricopa: 145 lifers in 1 county
San L