# Brief overview of this study
The data comes from the Austin Animal Center, a shelter, and spans from October 1, 2013, to March 2016.

### Objective

The task is to predict the fate of each animal based on available information. It's essentially a classification task. The classes are: Adoption, Died, Euthanasia, Return to owner, Transfer. 

We consider all classes equally important regardless of their representation in the dataset. Therefore, the prediction quality is assessed using the macro-averaged F1 score.

---

**Assignment**

Using the exact scheme proposed in this template is optional, but within this notebook, you should develop:

- Clear and clean,
- Well-commented,
- Reproducible (fix all possible random seeds),
- Motivated

**code** that **generates your best solution**. See competition rules for futher information.


### Methods

`TODO: Describe your feature preprocessing techniques`

`TODO: List the models and parameters you have tried`


### Results

`TODO: Share observations, success stories, and futile efforts; what interesting things can you say about the dataset? what conclusions can you draw?`

---


In [21]:
# Import necessary libraries
import pandas as pd
import numpy as np
import random
import os

from sklearn.dummy import DummyClassifier
from sklearn.metrics import classification_report

# Set fixed random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
random.seed(SEED)

## Configurations and Constants
(avoid magic numbers in your code)

In [7]:
OUTCOME2LABEL = {
    "Adoption": 0,
    "Transfer": 1,
    "Return_to_owner": 2,
    "Euthanasia": 3,
    "Died": 4,
}
LABEL2OUTCOME = {v: k for k, v in OUTCOME2LABEL.items()}

## Libraries
(all imports should ideally be placed here)

In [8]:
import sklearn
import pandas as pd
import numpy as np

Let's download and examine the data.

#### Questions to Consider:
- What kind of data transformations might we need?
- What are the potential pitfalls in the data preprocessing stage?

In [9]:
# Load the data
df_train = pd.read_csv("/kaggle/input/animal-shelter-log/train.csv", encoding="utf-8")
df_test = pd.read_csv("/kaggle/input/animal-shelter-log/test.csv", encoding="utf-8")
df_train.head(5)

Unnamed: 0,Name,SexuponOutcome,AnimalType,AgeuponOutcome,Breed,Color,DateTime,Outcome,ID
0,Socks,Neutered Male,Cat,2 months,Domestic Shorthair Mix,Black/White,2014-06-11 14:36:00,0,0
1,Vera,Intact Female,Cat,1 month,Domestic Shorthair Mix,Tortie/White,2014-07-18 08:10:00,3,1
2,Biscuit,Neutered Male,Dog,3 months,Chihuahua Shorthair Mix,Yellow,2016-01-02 17:28:00,2,2
3,Kitten,Spayed Female,Cat,2 years,Domestic Shorthair Mix,Calico,2014-02-19 17:27:00,0,3
4,,Neutered Male,Cat,2 months,Domestic Shorthair Mix,Orange Tabby,2014-07-21 17:34:00,0,4


### Exploratory Data Analysis

In [10]:
# TODO: explore the data, plot graphs, seek valuable insights, ...

## Feature Preparation

#### Dates

Convert date columns into a numerical format. What format is most suitable and why?

In [11]:
def pandas_dates2number(date_series: pd.Series):
    return pd.to_datetime(date_series).values.astype(np.int64) // 10**9


pandas_dates2number(pd.Series(["2020-12-10"]))

array([1607558400])

#### Other Features

Based on your EDA, preprocess other features.

In [12]:
# TODO

Combine everything, ensure same preprocessing or train and test. 

*HINT: use sklearn Pipeline of OneHorEncoder, ...* 

In [13]:
# don't be silly, why would one place imports in the middle of his code
from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

In [14]:
def prepare_features(df, preprocessor=None):
    # apply feature exraction
    df["DateTime"] = pandas_dates2number(df["DateTime"])
    ...

    # drop unsused columns
    categorical_features = ["AnimalType"]
    numerical_features = ["DateTime"]
    target = None
    if "Outcome" in df.columns:
        target = df["Outcome"]
    features = df[categorical_features + numerical_features]

    # preprocess
    if preprocessor is None:
        preprocessor = ColumnTransformer(
            transformers=[
                ("num", StandardScaler(), numerical_features),
                ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
            ]
        )
        features = preprocessor.fit_transform(df)
    else:
        features = preprocessor.transform(df)

    return features, target, preprocessor


X_train, y_train, preprocessor = prepare_features(df_train)
X_test, _, _ = prepare_features(df_test)

In [15]:
X_train

array([[-0.77214914,  1.        ,  0.        ],
       [-0.62379129,  1.        ,  0.        ],
       [ 1.53052486,  0.        ,  1.        ],
       ...,
       [-0.05715149,  0.        ,  1.        ],
       [-0.69902202,  0.        ,  1.        ],
       [-1.45437534,  1.        ,  0.        ]])

# Model

**TODO:**
* train-val split
* cross calidation
* advanced models and ensembling
* hyperparameter tuning

In [16]:

model = DummyClassifier(
    strategy="constant",
    constant=OUTCOME2LABEL["Died"],  # memento mori
)
model.fit(X_train, y_train)

In [23]:
y_pred = model.predict(X_train)
print(classification_report(y_train, y_pred))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00      7538
           1       0.00      0.00      0.00      6595
           2       0.00      0.00      0.00      3350
           3       0.00      0.00      0.00      1089
           4       0.01      1.00      0.01       138

    accuracy                           0.01     18710
   macro avg       0.00      0.20      0.00     18710
weighted avg       0.00      0.01      0.00     18710



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


finally, save test predictions of the best model to csv

In [17]:
preds = model.predict(X_test)

In [24]:
# Create a submission using constant predictions
submission = pd.DataFrame({"ID": df_test["ID"], "Outcome": preds})

# Save the submission
submission.to_csv("submission.csv", index=False)

### Place for the feedack or meme

...