# Austin Animal Center Outcome Prediction
## CS 363M Machine Learning Project - Riya Mittal

This notebook implements a machine learning solution to predict animal outcomes at the Austin Animal Center.

1. Data Loading and Exploration
2. Data Preprocessing
   - Handling Missing Values
   - Feature Engineering
   - Encoding Categorical Variables
3. Model Development
   - Base Model
   - Model Optimization
   - Model Evaluation
4. Final Predictions and Submission

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import xgboost as xgb
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

## 1. Data Loading and Exploration
First, I'll load the data and perform initial exploratory data analysis to understand our dataset better.

In [6]:
# Load the training data
train_data = pd.read_csv('train.csv')

# Display basic information about the dataset
print("Dataset Shape:", train_data.shape)
print("\nDataset Info:")
train_data.info()

# Display first few rows
print("\nFirst few rows:")
train_data.head()

Dataset Shape: (111157, 14)

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111157 entries, 0 to 111156
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Id                111157 non-null  object
 1   Name              79774 non-null   object
 2   Intake Time       111157 non-null  object
 3   Found Location    111157 non-null  object
 4   Intake Type       111157 non-null  object
 5   Intake Condition  111157 non-null  object
 6   Animal Type       111157 non-null  object
 7   Sex upon Intake   111155 non-null  object
 8   Age upon Intake   111156 non-null  object
 9   Breed             111157 non-null  object
 10  Color             111157 non-null  object
 11  Outcome Time      111157 non-null  object
 12  Date of Birth     111157 non-null  object
 13  Outcome Type      111157 non-null  object
dtypes: object(14)
memory usage: 11.9+ MB

First few rows:


Unnamed: 0,Id,Name,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,Outcome Time,Date of Birth,Outcome Type
0,A706918,Belle,07/05/2015 12:59:00 PM,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,07/05/2015 03:13:00 PM,07/05/2007,Return to Owner
1,A724273,Runster,04/14/2016 06:43:00 PM,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,04/21/2016 05:17:00 PM,04/17/2015,Return to Owner
2,A857105,Johnny Ringo,05/12/2022 12:23:00 AM,4404 Sarasota Drive in Austin (TX),Public Assist,Normal,Cat,Neutered Male,2 years,Domestic Shorthair,Orange Tabby,05/12/2022 02:35:00 PM,05/12/2020,Transfer
3,A743852,Odin,02/18/2017 12:46:00 PM,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,2 years,Labrador Retriever Mix,Chocolate,02/21/2017 05:44:00 PM,02/18/2015,Return to Owner
4,A635072,Beowulf,04/16/2019 09:53:00 AM,415 East Mary Street in Austin (TX),Public Assist,Normal,Dog,Neutered Male,6 years,Great Dane Mix,Black,04/18/2019 01:45:00 PM,06/03/2012,Return to Owner


## Next Steps:
1. Load the actual data and analyze its characteristics
2. Implement data preprocessing steps
3. Create feature engineering pipeline
4. Implement XGBoost model with proper cross-validation
5. Optimize model performance
6. Generate predictions for submission

Note: This notebook will be expanded as we progress through each step of the modeling process.