# Project: Long Beach Animal Shelter Intakes and Outcomes

## Description

**Objective**: \
\
Answer questions for various shareholders in the city of Long Beach, CA concerning intakes and outcomes at the local animal shelter.

**Dataset**: \
\
This dataset was pulled from the [Long Beach Open Data Portal](https://data.longbeach.gov/explore/dataset/animal-shelter-intakes-and-outcomes/). \
It is a 7.8MB CSV file containing intake and outcome data for animals captured by or surrendered to the city.

**Tools Used**:

- pandas
- Matplotlib
- Seaborn

# Introduction

For any city that has at least one animal shelter, there are various shareholders interested in how that shelter is run and what happens to the animals that pass through the shelter's doors.\
\
This analysis looks to answer questions for the following parties in Long Beach, CA:
- **Shelter managers:**
  - How long do animals typically stay in the shelter by species or intake condition?
  - What intake reasons are most strongly correlated with negative outcomes (e.g., euthanasia)?
  - Are there seasonal trends in animal intakes or outcomes?
- **Animal welfare advocates:**
  - What percentage of animals are adopted vs. euthanized, and how does that vary by type, sex, or condition?
  - Are there disparities in outcomes for specific breeds or geographic areas?
  - How many animals are returned to owners vs. adopted?
- **Local government officials:**
  - Is there a correlation between specific neighborhoods and high intake rates?
  - Has the shelter’s performance improved over time (e.g., reduced euthanasia rates)?
  - What’s the annual intake/output volume and trend?
- **Local citizenry:**
  - When is the best time of year to adopt (e.g., more animals available)?
  - What types of animals are most commonly available for adoption?
  - Can geographic patterns inform community outreach for fostering or adoption?
- **Internal analysts:**
  - What features best predict positive outcomes using logistic regression or clustering?
  - Can intake condition be used to forecast outcome types?

# Data handling

## Preview

In [29]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

# Import helper functions and variables
from utilities.config import get_path_obj, raw_data_path, processed_data_path, products_dir, images_dir, data_dir

In [11]:
# Load data
df = pd.read_csv(raw_data_path, parse_dates=['DOB', 'Intake Date', 'Outcome Date'],)

# Preview
df.head()

Unnamed: 0,Animal ID,Animal Name,Animal Type,Primary Color,Secondary Color,Sex,DOB,Intake Date,Intake Condition,Intake Type,...,Outcome Type,Outcome Subtype,latitude,longitude,intake_is_dead,outcome_is_dead,was_outcome_alive,geopoint,intake_duration,is_current_month
0,A594350,*HEAVY CREAM,CAT,BLACK,,Neutered,2014-07-28,2017-07-28,NORMAL,STRAY,...,ADOPTION,REPEAT ADT,33.79976,-118.126388,Alive on Intake,False,1,"33.7997598, -118.1263884",81.0,0
1,A347815,DUKE,DOG,BLACK,TAN,Neutered,2005-04-14,2018-11-30,NORMAL,OWNER SURRENDER,...,RESCUE,LIVELOVE,33.79976,-118.126388,Alive on Intake,False,1,"33.7997598, -118.1263884",27.0,0
2,A707449,*TABITHA,DOG,BLACK,WHITE,Spayed,2022-10-23,2023-09-23,NORMAL,STRAY,...,ADOPTION,,33.798953,-118.167334,Alive on Intake,False,1,"33.7989532, -118.167334",18.0,0
3,A712850,*KIWI,DOG,BLONDE,GOLD,Spayed,2022-07-06,2024-02-03,NORMAL,RETURN,...,ADOPTION,WEB,33.798936,-118.195889,Alive on Intake,False,1,"33.7989357, -118.1958891",0.0,0
4,A738972,KITTEN 2,CAT,BLACK,,Unknown,2025-03-28,2025-04-04,NORMAL,STRAY,...,RESCUE,LITTLELION,33.798936,-118.195889,Alive on Intake,False,1,"33.7989357, -118.1958891",0.0,0


### Structure

In [12]:
# Structure and summary
display(df.dtypes)
display(df.columns)
display(df.describe(include='all'))

Animal ID                    object
Animal Name                  object
Animal Type                  object
Primary Color                object
Secondary Color              object
Sex                          object
DOB                  datetime64[ns]
Intake Date          datetime64[ns]
Intake Condition             object
Intake Type                  object
Intake Subtype               object
Reason for Intake            object
Outcome Date         datetime64[ns]
Crossing                     object
Jurisdiction                 object
Outcome Type                 object
Outcome Subtype              object
latitude                    float64
longitude                   float64
intake_is_dead               object
outcome_is_dead                bool
was_outcome_alive             int64
geopoint                     object
intake_duration             float64
is_current_month              int64
dtype: object

Index(['Animal ID', 'Animal Name', 'Animal Type', 'Primary Color',
       'Secondary Color', 'Sex', 'DOB', 'Intake Date', 'Intake Condition',
       'Intake Type', 'Intake Subtype', 'Reason for Intake', 'Outcome Date',
       'Crossing', 'Jurisdiction', 'Outcome Type', 'Outcome Subtype',
       'latitude', 'longitude', 'intake_is_dead', 'outcome_is_dead',
       'was_outcome_alive', 'geopoint', 'intake_duration', 'is_current_month'],
      dtype='object')

Unnamed: 0,Animal ID,Animal Name,Animal Type,Primary Color,Secondary Color,Sex,DOB,Intake Date,Intake Condition,Intake Type,...,Outcome Type,Outcome Subtype,latitude,longitude,intake_is_dead,outcome_is_dead,was_outcome_alive,geopoint,intake_duration,is_current_month
count,33707,19956,33707,33707,15964,33707,29433,33707,33707,33707,...,33374,29842,33707.0,33707.0,33707,33707,33707.0,33707,33381.0,33707.0
unique,32557,9996,10,80,44,5,,,16,12,...,18,240,,,1,2,,10154,,
top,A637086,*,CAT,BLACK,WHITE,Male,,,NORMAL,STRAY,...,RESCUE,SPCALA,,,Alive on Intake,False,,"33.8096122, -118.0826161",,
freq,8,104,16083,8548,9380,7739,,,15297,23719,...,7842,4074,,,33707,26766,,570,,
mean,,,,,,,2018-11-03 22:44:42.295383040,2021-02-04 00:22:07.771679488,,,...,,,33.815444,-118.149526,,,0.794078,,18.741949,0.012075
min,,,,,,,1993-09-15 00:00:00,2017-01-01 00:00:00,,,...,,,19.297815,-122.695911,,,0.0,,0.0,0.0
25%,,,,,,,2016-09-16 00:00:00,2018-09-29 00:00:00,,,...,,,33.78399,-118.190865,,,1.0,,0.0,0.0
50%,,,,,,,2019-03-28 00:00:00,2021-01-02 00:00:00,,,...,,,33.806783,-118.173175,,,1.0,,5.0,0.0
75%,,,,,,,2022-04-06 00:00:00,2023-05-26 00:00:00,,,...,,,33.85121,-118.128915,,,1.0,,16.0,0.0
max,,,,,,,2025-07-06 00:00:00,2025-07-15 00:00:00,,,...,,,45.521885,-73.99236,,,1.0,,1410.0,1.0


In [34]:
# Rename columns

def rename(name: str):
    """Formats "name" by replacing spaces with underscores and changing the case to lower

    Args:
        name (str): the name to be formatted

    Returns:
        str: the formatted name
    """    
    name = name.replace(' ', '_')
    name = name.lower()
    if name == 'dob':
        name = 'date_of_birth'
    return name

df = df.rename(columns=rename)
df.columns

Index(['animal_id', 'animal_name', 'animal_type', 'primary_color',
       'secondary_color', 'sex', 'date_of_birth', 'intake_date',
       'intake_condition', 'intake_type', 'intake_subtype',
       'reason_for_intake', 'outcome_date', 'crossing', 'jurisdiction',
       'outcome_type', 'outcome_subtype', 'latitude', 'longitude',
       'intake_is_dead', 'outcome_is_dead', 'was_outcome_alive', 'geopoint',
       'intake_duration', 'is_current_month'],
      dtype='object')

### Variables (columns)

In [37]:
# Organize variables by attributes: animal, intake, outcome, datetime

def check_type_date(name: str):
    """Checks if a column is a datetime or timedelta type by searching the name for keywords

    Args:
        name (str): The string to be checked

    Returns:
        Match|None: A Match object if a match is found
    """      
    return re.search(r'.*date|month|duration.*', name)

animal_vars = [
    'animal_type',
    'primary_color',
    'secondary_color',
    'sex',
]
intake_vars = [x for x in df.columns if 'intake' in x and not check_type_date(x)]
outcome_vars = [x for x in df.columns if 'outcome' in x and not check_type_date(x)]
datetime_vars = [x for x in df.columns if check_type_date(x)]
geography_vars = [
    'latitude',
    'longitude',
    'geopoint',
    'crossing',
    'jurisdiction'
]
print(animal_vars, intake_vars, outcome_vars, datetime_vars, geography_vars, sep='\n')
vars_dict = {
    'animal': animal_vars,
    'intake': intake_vars,
    'outcome': outcome_vars,
    'datetime': datetime_vars,
    'geography': geography_vars
}

['animal_type', 'primary_color', 'secondary_color', 'sex']
['intake_condition', 'intake_type', 'intake_subtype', 'reason_for_intake', 'intake_is_dead']
['outcome_type', 'outcome_subtype', 'outcome_is_dead', 'was_outcome_alive']
['date_of_birth', 'intake_date', 'outcome_date', 'intake_duration', 'is_current_month']
['latitude', 'longitude', 'geopoint', 'crossing', 'jurisdiction']


#### Inspection

In [39]:
# Get counts for each varibale and print to a CSV for visual inspection
for vars in vars_dict.values():
    for var in vars:
        df[var].value_counts().to_csv(get_path_obj(data_dir, 'variable counts', f'{var}.csv'))

## Cleaning and preparation

In [None]:
# Remove superfluous columns

In [None]:
# Add an age column

## Exploratory Data Analysis (EDA)

## Deeper analysis and modeling

# Analysis

## Insights and recommendations

### Insights

### Recommedations

## Summary

*This report can also be found [here](../products/report.md).*

## Appendix