# Catching Last Chance Critters in Animal Crossing: New Horizons

**by Gerard Tieng** (http://www.linkedin.com/in/gerardtieng)

One of your main goals in the new Animal Crossing: New Horizons for Nintendo Switch it to complete your Critterpedia by catching every type of insect and fish on your own desert island. Many of these creatures are seasonal, so it's a very good idea to understand what you'll need to catch each month before it's too late and you have to wait months before they become available again. 

Big thanks to Polygon for providing their complete [insect](https://www.polygon.com/animal-crossing-new-horizons-switch-acnh-guide/2020/3/24/21191276/insect-bug-locations-times-month-day-list-critterpedia) and [fish](https://www.polygon.com/animal-crossing-new-horizons-switch-acnh-guide/2020/3/23/21190775/fish-locations-times-month-day-list-critterpedia) guides as the data for this project, and to Nintendo for making one of the best games of 2020 already.

## Step 1: Loading in the Data

The data for this project was copied and saved into CSV format. We'll use the Pandas module to perform most of the exploration and cleanup of the data.

In [1]:
import pandas as pd

In [2]:
bugs = pd.read_csv("ac_bugs.csv")
fish = pd.read_csv("ac_fish.csv")

While there are some null values in our dataset, the most important values for creature names and the dates they are available to catch are completely intact.

In [3]:
bugs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 5 columns):
Name        80 non-null object
Location    79 non-null object
Value       54 non-null object
Time        79 non-null object
Date        80 non-null object
dtypes: object(5)
memory usage: 3.2+ KB


Here we see the "Date" column featuring some fairly long strings we can simplify in our final dataframe.

In [4]:
bugs.head()

Unnamed: 0,Name,Location,Value,Time,Date
0,Common Butterfly,Flying,160,4 a.m. - 7 p.m.,September-June (Northern) / March-December (So...
1,Yellow Butterfly,Flying,160,4 a.m. - 7 p.m.,"September-June (Northern) / March-April, Septe..."
2,Tiger Butterfly,Flying,240,4 a.m. - 7 p.m.,March-September (Northern) / September-March (...
3,Peacock Butterfly,Flying,2500,4 a.m. - 7 p.m.,March-June (Northern) / September-December (So...
4,Common Bluebottle,Flying,300,4 a.m. - 7 p.m.,April-August (Northern) / October-February (So...


The fish dataset is much more complete than the insect dataset and also includes "Size", which does not appear in the insect dataset.

In [5]:
fish.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 6 columns):
Name        80 non-null object
Location    80 non-null object
Size        77 non-null object
Value       77 non-null object
Time        80 non-null object
Date        80 non-null object
dtypes: object(6)
memory usage: 3.9+ KB


The values in the fish dataset are formatted very similarly to the values in the insect dataset.

In [6]:
fish.head()

Unnamed: 0,Name,Location,Size,Value,Time,Date
0,Bitterling,River,Smallest,900,All day,November-March (Northern) / May-September (Sou...
1,Pale Chub,River,Smallest,160,9 a.m. - 4 p.m.,Year-round (Northern and Southern)
2,Crucian Carp,River,Small,160,All day,Year-round (Northern and Southern)
3,Dace,River,Medium,240,4 p.m. - 9 a.m.,Year-round (Northern and Southern)
4,Carp,Pond,Large,300,All day,Year-round (Northern and Southern)


## Step 2: Re-formatting DataFrames

The first thing we'll do is drop the size column from the fish dataset. This allows us to combine the two datasets easily as they now share the same column names. Meanwhile, the size information is still somewhat represented in the "Value" column, as it's generally understood that bigger fish sell for higher prices.

In [7]:
fish = fish.drop("Size", axis=1)

We'll also add a "Type" to each dataframe as to keep a reminder of the origin of each record.

In [8]:
fish["type"] = "Fish"
bugs["type"] = "Bug"

With the columns from each dataframe exactly the same, we'll use **pd.concat** to add the rows from one to the other and create a new combined dataframe.

In [9]:
critters = pd.concat([fish, bugs], ignore_index=True)
critters

Unnamed: 0,Name,Location,Value,Time,Date,type
0,Bitterling,River,900,All day,November-March (Northern) / May-September (Sou...,Fish
1,Pale Chub,River,160,9 a.m. - 4 p.m.,Year-round (Northern and Southern),Fish
2,Crucian Carp,River,160,All day,Year-round (Northern and Southern),Fish
3,Dace,River,240,4 p.m. - 9 a.m.,Year-round (Northern and Southern),Fish
4,Carp,Pond,300,All day,Year-round (Northern and Southern),Fish
...,...,...,...,...,...,...
155,Pill Bug,Hit rocks,250,11 p.m. - 4 p.m.,September-June (Northern) / March-December (So...,Bug
156,Centipede,Hit rocks,300,4 p.m. - 11 p.m.,September-June (Northern) / March-December (So...,Bug
157,Spider,Falls from shaking trees,480,7 p.m. - 8 a.m.,Year-round (Northern and Southern),Bug
158,Tarantula,On ground,8000,7 p.m. - 4 a.m.,November-April (Northern) / May-October (South...,Bug


For the sake of usability, we'll pass this dictionary of lower-cased strings to rename our column titles and reorder the columns of the combined dataframe.

In [10]:
new_cols = {"Name":"name",
           "Location":"location",
            "Value": "value",
           "Time": "time",
           "Date": "date"}

critters = critters.rename(new_cols, axis=1)
critters = critters[["name", "type", "value", "location", "time", "date"]]

In [11]:
critters

Unnamed: 0,name,type,value,location,time,date
0,Bitterling,Fish,900,River,All day,November-March (Northern) / May-September (Sou...
1,Pale Chub,Fish,160,River,9 a.m. - 4 p.m.,Year-round (Northern and Southern)
2,Crucian Carp,Fish,160,River,All day,Year-round (Northern and Southern)
3,Dace,Fish,240,River,4 p.m. - 9 a.m.,Year-round (Northern and Southern)
4,Carp,Fish,300,Pond,All day,Year-round (Northern and Southern)
...,...,...,...,...,...,...
155,Pill Bug,Bug,250,Hit rocks,11 p.m. - 4 p.m.,September-June (Northern) / March-December (So...
156,Centipede,Bug,300,Hit rocks,4 p.m. - 11 p.m.,September-June (Northern) / March-December (So...
157,Spider,Bug,480,Falls from shaking trees,7 p.m. - 8 a.m.,Year-round (Northern and Southern)
158,Tarantula,Bug,8000,On ground,7 p.m. - 4 a.m.,November-April (Northern) / May-October (South...


## Step 3: Reformatting Values

The first thing we'll do here is to convert the "Values" column as numeric for later analysis. It's easily done by removing the comma from each value in the series and using **Series.astype()** to float.

In [12]:
critters["value"] = critters["value"].str.replace(",", "")
critters["value"] = critters["value"].astype(float)

Upon inspection of the "Dates" column, we see that Northern and Southern hemisphere values are represented in the dataframe. We'll simplify this to just the Northern Hemisphere values for use in the U.S.

In [13]:
critters["date"].value_counts()

Year-round (Northern and Southern)                                                        29
July-August (Northern) / January-February (Southern)                                      18
June-September (Northern) / December-March (Southern)                                     13
May-October (Northern) / November-April (Southern)                                         8
April-September (Northern) / October-March (Southern)                                      7
May-September (Northern) / November-March (Southern)                                       5
April-November (Northern) / October-May (Southern)                                         5
March-November (Northern) / September-May (Southern)                                       5
July-September (Northern) / January-March (Southern)                                       5
August-November (Northern) / February-May (Southern)                                       3
March-June, September-November (Northern) / March-May, September-Decem

The use of regular expression took care of most of the values in one line of code, but there are a few exceptions to the rule that we'll need to address--mainly the records where a creature will appear in two separate windows during the year.

In [14]:
critters["date"] = critters["date"].str.replace(r" \(.+", "")

In [15]:
critters["date"].value_counts()

Year-round                            29
July-August                           19
June-September                        13
April-September                        8
May-October                            8
April-November                         6
March-November                         5
May-September                          5
July-September                         5
September-June                         4
April-August                           3
September-November                     3
August-November                        3
March-June, September-November         3
June-August                            2
December-March                         2
December-February                      2
May-August                             2
November-March                         2
November-February                      2
June-October                           2
November-April                         2
August-September                       2
September-October                      2
September       

We'll use the following loop to isolate the index numbers of values where the dates match the two-window gaps.

In [16]:
double_date = ["March-June, October", "March-June, September-November", "July-September, November-April", "June-September, December-March", "April-September, December-February", "May-June, September-November"]
dd_index = []

for months in double_date:
    #slices the dataframe and returns an index obj where the date matches the two-window gap
    hits = critters[critters["date"] == months].index
    #loops through the index object and adds the index to a list
    for hit in hits:
        dd_index.append(hit)
        
print(dd_index)

[122, 26, 27, 28, 66, 88, 90, 125]


From here, we'll separate the two-window records to form separate date records with the following loop.

In [17]:
for index in dd_index:
    #saves the row at the specified index
    row = critters.loc[index].copy()
    #add a copy of that row at the end of the dataframe
    critters = critters.append(row, ignore_index=True)
    #takes the two-window date and splits into two values
    date = critters.iloc[index, 5].split(",")
    #assigns first date to original record
    critters.iloc[index, 5] = date[0]
    #assigns second second to the duplicated record at the end of the dataframe
    critters.iloc[-1, 5] = date[1][1:]

In [18]:
critters["date"].value_counts()

Year-round            29
July-August           19
June-September        14
April-September        9
May-October            8
September-November     7
April-November         6
July-September         6
May-September          5
March-June             5
March-November         5
September-June         4
November-April         3
August-November        3
December-March         3
April-August           3
December-February      3
August-September       2
November-February      2
November-March         2
April-October          2
September-October      2
March-July             2
June-October           2
May-August             2
September              2
June-August            2
December-August        1
October                1
September-March        1
November-May           1
March-September        1
October-April          1
February-November      1
July-November          1
March-May              1
August-October         1
December-May           1
July-October           1
October-March          1


And to handle the remainder exceptions, we have the following code to have them fit in with the rest of the values.

In [19]:
critters["date"] = critters["date"].str.replace("June", "June-June")
critters["date"] = critters["date"].str.replace("October", "October-October")
critters["date"] = critters["date"].str.replace("Year-round", "Always-Always")

We'll split the hyphenated dates and create the starting and ending window for each creature in the list.

In [20]:
critters["season_start"] = critters["date"].str.extract(r"(^\w+)")
critters["season_end"] = critters["date"].str.extract(r"(\w+$)")

In [21]:
critters = critters.drop("date", axis =1)

In [22]:
critters.head()

Unnamed: 0,name,type,value,location,time,season_start,season_end
0,Bitterling,Fish,900.0,River,All day,November,March
1,Pale Chub,Fish,160.0,River,9 a.m. - 4 p.m.,Always,Always
2,Crucian Carp,Fish,160.0,River,All day,Always,Always
3,Dace,Fish,240.0,River,4 p.m. - 9 a.m.,Always,Always
4,Carp,Fish,300.0,Pond,All day,Always,Always


## Step 4: Data Selection

Now that we have the cleaned combined dataframe, we're able to write a simple function that will allow us to see what creatures will be waving goodbye at the end of the month.

In [23]:
def last_chance(month):
    month = str(month).capitalize()
    window = critters[critters["season_end"] == month].sort_values("value", ascending=False)
    return window

last_chance("March")

Unnamed: 0,name,type,value,location,time,season_start,season_end
29,Stringfish,Fish,15000.0,River (Clifftop),4 p.m. - 9 a.m.,December,March
45,Sturgeon,Fish,10000.0,River (mouth),All day,September,March
165,Emperor Butterfly,Bug,4000.0,Flying,5 p.m. - 8 a.m.,December,March
76,Football Fish,Fish,2500.0,Sea,4 p.m. - 9 a.m.,November,March
46,Sea Butterfly,Fish,1000.0,Sea,All day,December,March
0,Bitterling,Fish,900.0,River,All day,November,March
20,Yellow Perch,Fish,300.0,River,All day,October,March


In [24]:
last_chance("April")

Unnamed: 0,name,type,value,location,time,season_start,season_end
164,Blue Marlin,Fish,10000.0,Pier,All day,November,April
158,Tarantula,Bug,8000.0,On ground,7 p.m. - 4 a.m.,November,April
65,Tuna,Fish,7000.0,Pier,All day,November,April
60,Dab,Fish,300.0,Sea,All day,October,April


In [25]:
last_chance("may")

Unnamed: 0,name,type,value,location,time,season_start,season_end
77,Oarfish,Fish,9000.0,Sea,All day,December,May
116,Mole Cricket,Bug,500.0,Underground,All day,November,May
16,Loach,Fish,400.0,River,All day,March,May
