# 🐶 Animal Shelter Data Analysis 🐱

This project analyzes animal shelter data by combining two public datasets: data from Louisville Metro Animal Services and from Sonoma County Animal Services. The Analysis portion has "Questions for Further Study" that go beyond the scope of this project but are meant for critical thinking and inspiration for further work.

## 1. Load and Clean Data

First, let’s create dataframes by reading in the two CSV files. We’ll remove columns that won’t be used, and since many of the columns have similar but not identical names, we’ll rename them, too. We’ll also remove rows with blank values.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator


lou_df = pd.read_csv('Data\Louisville_Data.csv')
sonoma_df = pd.read_csv('Data\Sonoma_County_Data.csv')

# delete unused columns
lou_df = lou_df.drop(columns=['kennel', 'surreason', 'bites', 'sourcezipcode', 'ObjectId'])
sonoma_df = sonoma_df.drop(columns=['Name', 'Date Of Birth', 'Impound Number', 'Kennel Number', 
                        'Intake Condition', 'Outcome Condition', 'Intake Jurisdiction', 
                        'Outcome Jurisdiction', 'Location', 'Count'])

# rename columns so we can merge with Sonoma later
lou_df = lou_df.rename(columns={'animaltype': 'Type', 'breed': 'Breed', 'color': 'Color', 
                                'sex': 'Sex', 'petsize': 'Size', 'animalid': 'Animal ID',
                                'indate': 'Intake Date', 'outdate': 'Outcome Date', 'intype': 'Intake Type',
                                'insubtype': 'Intake Subtype', 'outtype': 'Outcome Type', 'outsubtype': 'Outcome Subtype',
                                'jurisdiction': 'Outcome Zip Code'})

# remove rows with blank values
lou_df = lou_df.dropna()
sonoma_df = sonoma_df.dropna()

lou_df.head()


Next, let’s look at the datatypes for each dataset to make sure they line up with what we need. We’ll start with Louisville’s data.

In [None]:
lou_df.dtypes

Everything is currently being treated as an object (text), which is good for most of our data, but not for our date columns. We need to change the *Intake Date* and *Outcome Date* columns to the **datetime** datatype. This will also allow us to create a new *Days in Shelter* column, which the Sonoma dataset already has.

In [None]:
lou_df['Intake Date'] = pd.to_datetime(lou_df['Intake Date'], errors='coerce')
lou_df['Outcome Date'] = pd.to_datetime(lou_df['Outcome Date'], errors='coerce')

# Creates a new 'Days in Shelter' column by calculating the difference from Outcome Date and Intake Date
lou_df['Days in Shelter'] = (lou_df['Outcome Date'] - lou_df['Intake Date']).dt.days

lou_df.head()

Louisville’s data is looking much better now. Let’s move to Sonoma’s dataset, starting with checking their datatypes.

In [None]:
sonoma_df.dtypes

Nearly everything is an object; so, we’ll once again need to convert the date columns to **datetime**. Let’s also change the *Outcome Zip Code* from a float into an **integer** (since zip codes don’t use decimals). Sonoma’s data goes back to 2013, but Louisville’s only goes to 2019. Let’s remove the Sonoma rows that precede 2019 to make the data merge cleaner.

In [None]:
sonoma_df['Intake Date'] = pd.to_datetime(sonoma_df['Intake Date'], errors='coerce')
sonoma_df['Outcome Date'] = pd.to_datetime(sonoma_df['Outcome Date'], errors='coerce')
sonoma_df['Outcome Zip Code'] = sonoma_df['Outcome Zip Code'].astype('int')

# remove rows with dates before 2019 (because Louisville's data only goes back to 2019)
sonoma_df = sonoma_df.loc[sonoma_df['Intake Date'] >= pd.to_datetime('2019-01-01')]

sonoma_df.head()

Speaking of merging, both datasets have items with the same meaning but are written differently (e.g., Louisville says “OTC” while Sonoma says “Over the Counter”). Since Louisville’s data is generally more abbreviated, let’s change Louisville’s items to match Sonoma’s so that it’s more clear and easy to read.

In [None]:
lou_df['Sex'] = lou_df['Sex'].replace({
    'M' : 'Male',
    'F' : 'Female',
    'N' : 'Neutered',
    'S' : 'Spayed',
    'U' : 'Unknown'
    })

lou_df['Intake Type'] = lou_df['Intake Type'].replace({
    'OWNER SUR' : 'OWNER SURRENDER'
})

lou_df['Intake Subtype'] = lou_df['Intake Subtype'].replace({
    'OTC' : 'OVER THE COUNTER'
})

lou_df['Outcome Type'] = lou_df['Outcome Type'].replace({
    'RTO' : 'RETURN TO OWNER',
    'EUTH' : 'EUTHANIZE',
})

lou_df.head()

## 🐦 2. Merge and Analysis 🐇

Now we’re ready to merge the data using pandas' **concatenate** function.

In [None]:
df = pd.concat([lou_df, sonoma_df], ignore_index=True)

df

After all that cleaning, we still have over 41,000 rows of data. But does that mean 41,000 animals were served?

In [None]:
print("Unique animals:", df['Animal ID'].nunique())

By counting the number of unique Animal ID’s, we see that roughly 33,000 animals were actually served. This means that about 8,000 (41,000 - 33,000) appear multiple times in the dataset, meaning they were served more than once.

Since we looked at unique animals, let’s look at which animal types and breeds were served the most.


In [None]:
print(df.value_counts('Type').head()) 
print(df.value_counts('Breed').head())

The significant majority were dogs and cats. But although there were more dogs, the most common breed is the Domestic Shorthair cat.

For our first visualization, let’s show how many animals were taken in per year by looking at the Intake Date column. To make the comparison easier, let’s create a new Indate Year column that extracts the year from the Intake Date.


In [None]:
df.insert(5, 'Intake Year', pd.to_datetime(df['Intake Date']).dt.year)
df.insert(9, 'Outcome Year', pd.to_datetime(df['Outcome Date']).dt.year)

df.head()

Before making the visualization, we still need to get value counts for each year. Let’s save that as a new dataframe called indate_df. Then, we can make the line plot.

In [None]:
indate_df = df.value_counts('Intake Year').reset_index().sort_values(['Intake Year'])
indate_df.columns = ['Year', 'Count']

#change X axis to only show integers
ax = plt.figure().gca()
ax.xaxis.set_major_locator(MaxNLocator(integer=True))

plt.plot(indate_df['Year'], indate_df['Count'])
plt.xlabel('Year')
plt.ylabel('Number of Animals')
plt.title('Intakes by Year')
plt.show()

There was a significant (and understandable) decrease in 2020 due to COVID. The number for 2023 seems lower than expected. Surprisingly, the most recent dates in the data (at the time of creating this study) go to mid-November 2023, meaning that the data–as far as we can tell–is up to date. The difference between 2022 and 2023 looks to be about 2,000.

Questions for further study:
- Why are numbers lower for 2023? Is there a huge backlog that will be entered before the end of the year? How does animal population affect this data? How does *human* population affect this data?
- Should we expect 2024’s numbers to make an incline or continue moving down?

---

For our second visual, let’s see how long the animals typically stayed by using the Days in Shelter column.


In [None]:
day_shelter_df = df.value_counts('Days in Shelter').reset_index().sort_values(['Days in Shelter'])
day_shelter_df.columns = ['Days', 'Count']

plt.scatter(day_shelter_df['Days'], day_shelter_df['Count'])
plt.rcParams["figure.figsize"] = (20, 3) #makes graph thinner since the default size is too wide
plt.axis((0, 20, 0, 9000))
plt.xticks(range(0, 20))
plt.xlabel('Days in Shelter')
plt.ylabel('Number of Animals')
plt.title('Number of Animals per Length of Stay')
plt.show()

The largest group of animals (about 8,500) were taken in and out on the same day (stayed for 0 days). Most of the data follows a pattern: as the number of days increases, the amount of animals staying that long usually decreases. From day 6 onward, the data typically follows a pattern of remaining around the same place for a day or two, and then moving down a bit–stay a bit, move down a bit, and so on.

Questions for further study: 
- Why are most animals taken in and out on the same day: treating minor injuries, receiving shots/vaccinations?
- Would most euthanized animals be put in this category?
- What other outcomes typically occur on the same day as the intake?

---

Doing a value count of Intake Type, the significant majority of animals are taken in because they’re stray. What happens to them? Let’s make a visual to show that.


In [None]:
stray_df = df.loc[df['Intake Type'] == 'STRAY']

stray_outcomes = stray_df['Outcome Type'].value_counts().reset_index().head()
stray_outcomes.columns = ['Outcome Type', 'Count']

plt.rcParams["figure.figsize"] = (10, 3)
plt.bar(stray_outcomes['Outcome Type'], stray_outcomes['Count'])
plt.xlabel('Outcome')
plt.ylabel('Number of Animals')
plt.title('Outcome for Stray Animals')
plt.show()

Most stray animals are adopted or returned to their owners. There's some room for faith in humanity after all.

Questions for further study: 
- The third most likely outcome for stray animals is that they’re transferred to another shelter–so what happens to them *after* they're transferred?
- Are transferred animals more/less likely to be adopted or returned to owners?
- How long does it take for stray animals to be adopted vs. returned to their owners?

## 🐈 Conclusion 🐕

In this project, I learned how the number of intakes has significantly changed over the past few years. I also learned that most animals at these shelters typically stay for less than 4-5 days. Although we have more work to do in helping these animals, we should pat ourselves on the back for adopting the majority of stray animals or returning them to their owners. 🎉

The category I've grown most interested in throughout this project is the transferred animals. I wonder how being transferred affects their well-being, physically as well as mentally and emotionally. Moving to an unfamiliar place with unfamiliar faces can be hard for us humans. Growing up in school, I remember new classmates suddenly materializing in the middle of the school year. They were called "transfers."

I imagine transferring for our animal friends is difficult probably for some of the same reasons we humans struggle with it. Do I know anyone here? Will I be okay? Can I be *happy* here?

What happens to them after they're transferred? What policies are in place to care for and help them through that adjustment? What research could be done to find the most effective ways of transferring animals and helping them to more likely be adopted? There's plenty of more work to be done. 🙂