<h2>Enter your computer's home firectory</h2>

In [None]:
home_folder = r"/Users/wrngnfreeman"

<h2>Importing required modules</h2>

In [None]:
import sys
sys.path.append(home_folder + r"/Github/Shelter-Animal-Outcomes-by-kaggle.com/src")
import data_processing, model_training

<h2>Data preparation</h2>

<ol  type="1">
    <li><b>Age</b>: Cleans the <code>AgeuponOutcome</code> column, converts age to days, and groups ages into categories.</li>
    <li><b>Sex</b>: Cleans the <code>SexuponOutcome</code> column by removing unwanted spaces and unknown values, then splits it into two columns for detailed categorization.</li>
    <li><b>Breed</b>:
        <ol type="i">
            <li>Standardizes text in the <code>Breed</code> column using regular expressions to handle spaces, unknowns, and specific terms.</li>
            <li>Splits breeds containing 'Mix', creating a new <code>Mix</code> column indicating mixed breed status.</li>
            <li>Separates multiple breeds listed in the same entry of the <code>Breed</code> column into individual rows.</li>
            <li>Maps each breed to its respective type (e.g., Terrier, Working) using a predefined dictionary and assigns an <code>nan</code> category if no match is found.</li>
            <li>Calculates the frequency of each animal's occurrence and updates the <code>Mix</code> status based on these counts.</li>
            <li>Ensures that breeds are properly categorized and mixed status is accurately reflected across all related DataFrames.</li>
        </ol>
    </li>
    <li><b>Coat</b>:
        <ol type="i">
            <li>Coat Color Standardization: Adjusts the <code>Color</code> attribute according to the <code>AnimalType</code> ('Dog', 'Cat') for consistency in color naming.</li>
            <li>Pattern Extraction: Identifies and extracts coat patterns from colors.</li>
            <li>Pattern Removal: Strips out recognized pattern indicators from the <code>Color</code> string.</li>
            <li>Data Merging: Combines the original data with processed color information into <code>coat_color</code>.</li>
            <li>List Separation: Separates multiple colors listed in the same entry of the <code>Color</code> column into individual rows.</li>
        </ol>
    </li>
</ol>

<h3>The train dataset</h3>

In [None]:
train_df = data_processing.process_data(
    file_path=home_folder + r"/Library/CloudStorage/OneDrive-Personal/shared_projects/Shelter Animal Outcomes/raw_data/train.csv",
    AnimalID=r"AnimalID",
    dep_var=r"OutcomeType"
)
display(train_df)

Unnamed: 0,AnimalID,OutcomeType,Name,DateTime,AnimalType,AgeuponOutcome,SexuponOutcome,Sterilization,BreedType,Mix,CoatColor,CoatPattern
0,A671945,Return_to_owner,Hambone,2014-02-12 18:22:00,Dog,<5 years,Male,Sterilized,Herding,Mix,Brown,
1,A671945,Return_to_owner,Hambone,2014-02-12 18:22:00,Dog,<5 years,Male,Sterilized,Herding,Mix,White,
2,A656520,Euthanasia,Emily,2013-10-13 12:44:00,Cat,<5 years,Female,Sterilized,Unknown,Mix,Cream,Tabby
3,A686464,Adoption,Pearce,2015-01-31 12:28:00,Dog,<5 years,Male,Sterilized,Unknown,Mix,Gray,
4,A686464,Adoption,Pearce,2015-01-31 12:28:00,Dog,<5 years,Male,Sterilized,Unknown,Mix,White,
...,...,...,...,...,...,...,...,...,...,...,...,...
45725,A698128,Adoption,Zeus,2015-03-09 13:33:00,Dog,<5 years,Male,Sterilized,Unknown,Mix,White,
45726,A698128,Adoption,Zeus,2015-03-09 13:33:00,Dog,<5 years,Male,Sterilized,Unknown,Mix,Cream,
45727,A677478,Transfer,,2014-04-27 12:22:00,Cat,<1 month,Male,Intact,Unknown,Mix,Black,
45728,A706629,Transfer,,2015-07-02 09:00:00,Cat,<5 years,Male,Intact,Unknown,Mix,Brown,Tabby


<h3>The scoring dataset</h3>

In [None]:
test_df = data_processing.process_data(
    file_path=home_folder + r"/Library/CloudStorage/OneDrive-Personal/shared_projects/Shelter Animal Outcomes/raw_data/test.csv",
    AnimalID=r"ID"
)
display(test_df)

Unnamed: 0,ID,Name,DateTime,AnimalType,AgeuponOutcome,SexuponOutcome,Sterilization,BreedType,Mix,CoatColor,CoatPattern
0,1,Summer,2015-10-12 12:15:00,Dog,<1 year,Female,Intact,Sporting,Mix,Red,
1,1,Summer,2015-10-12 12:15:00,Dog,<1 year,Female,Intact,Sporting,Mix,White,
2,2,Cheyenne,2014-07-26 17:59:00,Dog,<5 years,Female,Sterilized,Herding,Mix,Black,
3,2,Cheyenne,2014-07-26 17:59:00,Dog,<5 years,Female,Sterilized,Herding,Mix,Cream,
4,2,Cheyenne,2014-07-26 17:59:00,Dog,<5 years,Female,Sterilized,Working,Mix,Black,
...,...,...,...,...,...,...,...,...,...,...,...
19562,11453,,2014-10-21 12:57:00,Cat,<1 month,Female,Intact,Unknown,Mix,Gray,
19563,11454,,2014-09-29 09:00:00,Cat,<5 years,Female,Intact,Unknown,Mix,Calico,
19564,11455,Rambo,2015-09-05 17:16:00,Dog,<10 years,Male,Sterilized,Herding,Mix,Black,
19565,11455,Rambo,2015-09-05 17:16:00,Dog,<10 years,Male,Sterilized,Herding,Mix,Cream,


<h2>Model training</h2>

<h3>Random Forest Model</h3>

In [None]:
file_path = home_folder + r"/Library/CloudStorage/OneDrive-Personal/shared_projects/Shelter Animal Outcomes/raw_data/train.csv"
code_modules_path = home_folder + r"/Github/Shelter-Animal-Outcomes-by-kaggle.com/src"
export_model_path = home_folder + r"/Github/Shelter-Animal-Outcomes-by-kaggle.com/pickle_files/rf_model.pkl"

rf_model = model_training.train_model(
    file_path=file_path,
    AnimalID="AnimalID",
    dep_var='OutcomeType',
    code_modules_path=code_modules_path,
    export_model_path=export_model_path
)

Classification Report
              precision    recall  f1-score   support

           1       0.63      0.84      0.72      3865
           2       0.44      0.33      0.38      1718
           3       0.70      0.59      0.64      2981
           4       0.00      0.00      0.00        56
           5       0.34      0.10      0.16       526

    accuracy                           0.61      9146
   macro avg       0.42      0.37      0.38      9146
weighted avg       0.59      0.61      0.59      9146

Accuracy: 0.6143669363656243

Feature Importances


Unnamed: 0,feature,importance
10,Sterilization_Sterilized,0.361081
9,Age_<6 months,0.091196
3,Age_<1 month,0.050990
0,AnimalType_Cat,0.050344
1,Sex_Female,0.038495
...,...,...
21,CoatColor_Apricot,0.000044
32,CoatColor_Chocolate,0.000031
26,CoatColor_Blue Cream,0.000011
20,CoatColor_Agouti,0.000008
