# Machine Learning Project #2 - Recommending Dogs


### Data Information

##### About the Data
Our client requested a recommender system that has the ability to recommend new dog owners an optimal dog for them. While unfortunately the data regarding that topic is limited, this assignment will feature a dataset of 194 unique dog breeds and attributes describing them scaled from 1-5. For instance, an attribute may be "Affectionate With Family," with a ranked scale of 5, meaning the particular breed is one of the most affectionate with family members, and so on. Each breed has 16 attributes attained to them.


##### Loading the Libraries / Data

To begin the assignment, we employed pandas and numpy to load in and manipulate the data. We then loaded MinMaxScaler from sklearn's preprocessing package which normalizes our features by scaling to a common scale. Next, we loaded in the LabelEncoder which was necessary to convert the Coat Type and Coat Length columns from categorical values to numerical values to insert them into the algorithm. We utilized sklearn's cosine_similarity package in order to help us create a content-based recommender system. This package works by finding the cosine similarity between the features, which indicates how similar they are. We lastly loaded in google colab's drive to load in our dataset from google drive. Below is the last 5 items in the dataframe.

In [None]:
# Load Packages
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity
from google.colab import drive

drive.mount('/content/drive')
data = pd.read_excel("/content/drive/My Drive/ColabNotebooks/DSC3344/Data/breed_traits.xlsx")
data.tail()

Mounted at /content/drive


Unnamed: 0,Breed,Affectionate With Family,Good With Young Children,Good With Other Dogs,Shedding Level,Coat Grooming Frequency,Drooling Level,Coat Type,Coat Length,Openness To Strangers,Playfulness Level,Watchdog/Protective Nature,Adaptability Level,Trainability Level,Energy Level,Barking Level,Mental Stimulation Needs
190,CeskyÂ Terriers,4,5,3,2,2,1,Wavy,Medium,4,3,3,4,3,3,3,3
191,AmericanÂ Foxhounds,3,5,5,3,1,1,Smooth,Short,3,3,3,3,3,4,5,3
192,Azawakhs,3,3,3,2,2,1,Smooth,Short,1,3,3,3,2,3,1,3
193,EnglishÂ Foxhounds,5,5,5,3,1,2,Double,Short,4,4,3,4,4,4,5,4
194,NorwegianÂ Lundehunds,3,3,3,3,2,1,Double,Short,3,3,3,3,3,3,3,3


##### Preparing the Data

We begin preparing the data for analysis by cleaning the column names. While initially attempting to perform cosine similarity, the column names were unidentifiable due to some characters therefore requiring these two first lines of code. We then label encoded both the 'Coat Type' and 'Coat Length' columns into numerical data as prior they were categorical data which is not applicable in cosine similarity and our recommender system. 3rd order done was to drop the 'Breed' column as it is just an identifier and will not be utilized in the model. To finalize this chunk of code, we scaled all of the features to ensure they are on an even range to be used on the algorithm.

In [None]:
# Clean column names
data.columns = data.columns.str.strip()
data.columns = data.columns.str.replace('Â', '')

# Label Encoding for Coat Type and Coat Length
label_encoder = LabelEncoder()
data['Coat Type'] = label_encoder.fit_transform(data['Coat Type'])
data['Coat Length'] = label_encoder.fit_transform(data['Coat Length'])

# Drop 'Breed' column before scaling the features
features = data.drop(columns=['Breed'])

# Define and fit the MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(features)

data.tail()

Unnamed: 0,Breed,Affectionate With Family,Good With Young Children,Good With Other Dogs,Shedding Level,Coat Grooming Frequency,Drooling Level,Coat Type,Coat Length,Openness To Strangers,Playfulness Level,Watchdog/Protective Nature,Adaptability Level,Trainability Level,Energy Level,Barking Level,Mental Stimulation Needs
190,CeskyÂ Terriers,4,5,3,2,2,1,8,1,4,3,3,4,3,3,3,3
191,AmericanÂ Foxhounds,3,5,5,3,1,1,7,3,3,3,3,3,3,4,5,3
192,Azawakhs,3,3,3,2,2,1,7,3,1,3,3,3,2,3,1,3
193,EnglishÂ Foxhounds,5,5,5,3,1,2,2,3,4,4,3,4,4,4,5,4
194,NorwegianÂ Lundehunds,3,3,3,3,2,1,2,3,3,3,3,3,3,3,3,3


####Define the "get_user_preferences" Function
Now that our data has been prepared, we will define a function that collects user input that will ultimately decide the dog breeds that will be suggested. The function asks one question at a time and users will answer each by inputting an integer between 1-5 into the given box based on lifestyle and preferences. Each answer is recorded in the dictionary, "user_preferences", for use in the next function, "scale_user_preferences". The questions were chosen based on the features given in the data that describe the different dog breeds. Each one can make a big difference in which breeds are recommended.

In [None]:
# Function to collect user information
def get_user_preferences():
    print("Please rate the following on a scale of 1 to 5 (1 being least important and 5 being most important):")
    user_preferences = {
        'Affectionate With Family': int(input("How affectionate do you want the dog to be with your family? (1-5): ")),
        'Good With Young Children': int(input("How important is it that the dog gets along with young children? (1-5): ")),
        'Good With Other Dogs': int(input("How important is it that the dog gets along with other dogs? (1-5): ")),
        'Shedding Level': int(input("How much shedding are you willing to tolerate? (1-5, where 1 is least shedding): ")),
        'Coat Grooming Frequency': int(input("How much grooming are you willing to do? (1-5, where 1 is least frequent grooming): ")),
        'Drooling Level': int(input("How much drooling are you comfortable with? (1-5, where 1 is least drooling): ")),
        'Openness To Strangers': int(input("How open do you want the dog to be to strangers? (1-5): ")),
        'Playfulness Level': int(input("How playful would you like the dog to be? (1-5): ")),
        'Watchdog/Protective Nature': int(input("How protective do you want the dog to be? (1-5): ")),
        'Adaptability Level': int(input("How often will you keep your dog outside? (1-5): ")),
        'Trainability Level': int(input("How trainable will you need the dog to be? (1-5): ")),
        'Energy Level': int(input("How energetic do you want the dog to be? (1-5): ")),
        'Barking Level': int(input("How much barking are you comfortable with? (1-5): ")),
        'Mental Stimulation Needs': int(input("How much time will you devote to caring for and playing with your dog? (1-5): "))
    }
    return user_preferences

####Define the "scale_user_preferences" Function
This function takes two parameters--the scaler we trained with the dog breed data and the "user_preferences" dictionary created in "get_user_preferences". It starts off by creating a dictionary just like the one used as a parameter. This dictionary has all of the same keys as "user_preferences", but all the values are defaulted as 0. The new dictionary, "default_preferences", is then updated with the values from "user_preferences" and converted into a pandas dataframe. The resulting vector is scaled using the scaler we trained previously using the dog breed data. The function essentially takes the data collected from the user input and converts it into a form that is usable for finding cosine similarity with the dog breed data's entries.

In [None]:
# Convert dictionary of user information to pandas dataframe and scale using MinMaxScaler
def scale_user_preferences(user_preferences, scaler):
    default_preferences = {
        'Affectionate With Family': 0,
        'Good With Young Children': 0,
        'Good With Other Dogs': 0,
        'Shedding Level': 0,
        'Coat Grooming Frequency': 0,
        'Drooling Level': 0,
        'Openness To Strangers': 0,
        'Playfulness Level': 0,
        'Watchdog/Protective Nature': 0,
        'Adaptability Level': 0,
        'Trainability Level': 0,
        'Energy Level': 0,
        'Barking Level': 0,
        'Mental Stimulation Needs': 0,
        'Coat Type': 0,
        'Coat Length': 0
    }

    # Update the user_preferences
    default_preferences.update(user_preferences)

    # Create new dataframe with user preferences
    user_vector = pd.DataFrame([default_preferences], columns=features.columns)

    # Scale the user data
    return scaler.transform(user_vector)

####Call the Functions, Calculate Cosine Similarity, and Output Recommendations
This chunk of code calls the functions that were previously defined in order to create a vector that is compatible for cosine similarity with the dog breed data. The cosine similarities between the user preferences and each breed's entry in the dog breed data. Each breed is given a score based on similarity to the user's preferences. These scores are then added to the dog breeds data frame as a new coloumn called "Similarities". A new data frame called data_sorted is created just using the similarity scores and breed names from the original dataframe. This is sorted based on the similarity score in descending order. This essentially ranks all of the breeds based on what will work the best for the user. Finally, the output is printed and the top five recommended dog breeds for the user are given.

In [None]:
# Get user preferences
user_preferences = get_user_preferences()

# Ensure the user preferences have the correct number of columns
user_vector_scaled = scale_user_preferences(user_preferences, scaler)

# Calculate cosine similarity between the user preferences and each breed
similarities = cosine_similarity(user_vector_scaled, data_scaled)

# Add breed names and similarities to the dataframe
data['Similarity'] = similarities[0]
data_sorted = data[['Breed', 'Similarity']].sort_values(by='Similarity', ascending=False)

# Display top 5 recommended breeds
print("\nTop 5 recommended breeds based on your preferences:")
print(data_sorted.head(5))

Please rate the following on a scale of 1 to 5 (1 being least important and 5 being most important):
How affectionate do you want the dog to be with your family? (1-5): 5
How important is it that the dog gets along with young children? (1-5): 2
How important is it that the dog gets along with other dogs? (1-5): 5
How much shedding are you willing to tolerate? (1-5, where 1 is least shedding): 1
How much grooming are you willing to do? (1-5, where 1 is least frequent grooming): 1
How much drooling are you comfortable with? (1-5, where 1 is least drooling): 1
How open do you want the dog to be to strangers? (1-5): 5
How playful would you like the dog to be? (1-5): 5
How protective do you want the dog to be? (1-5): 3
How often will you keep your dog outside? (1-5): 2
How trainable will you need the dog to be? (1-5): 5
How energetic do you want the dog to be? (1-5): 3
How much barking are you comfortable with? (1-5): 1
How much time will you devote to caring for and playing with your dog? 

##### Conclusion

By using cosine similarity, we have effectively created a content-based filtering recommender system to ask a user questions and based on those answers, recommend a dog breed. This solves the client's request as we created a system that can recommend new dog owners which dog is best for them based on a number of user preferences. It has become common for people to return their dogs, send them to animal shelters, or even abandon them when they realize they cannot care for or handle them. This is a very sad outcome that can be avoided if the proper preparation for buying a pet is taken. This content-based recommender system can help people make better decisions when buying dogs and ensure that they will be able to provide a loving environment that is optimal for the pet and user.