Join GitHub today
Audience and Decision Context
Many people would love to get a dog someday, but it is an important and often life-changing decision. It requires considerations in lifestyle compatibility and financial responsibility. In order to inform people’s decision on getting a dog, our team conducted data analysis on dog benefits, factors of dog ownership, and dog breeds to develop a web-app that informs the user on whether they should get a dog and if so, what breeds are most and least compatible with them/their lifestyle. Our stakeholders are therefore potential dog owners, families of these owners or others affected by the dog's ownership, as well as dogs themselves.
Our analysis and web app aim to help with making such a daunting decision easier. People who are thinking of adopting a dog may be worried about whether they are ready to take on the responsibility of having a dog. Some may not even know how hard taking care of a dog may be. However, having a dog has its benefits too. Dogs can bring happiness and companionship, aid feelings of loneliness, encourage regular exercise for their owners with daily walks, and many other physical and health benefits. Potential dog owners need to weigh the benefits with the responsibilities that come with owning a dog. Additionally, the owner must be aware that different breeds of dogs required different types of living arrangements and lifestyles.
The data used for our breed suggestion functionality was scraped from DogTime, a website that has profile pages for many common dog breeds. Each breed profile has a list of 26 characteristics rated on a scale of 1 to 5, where 1 means that characteristic is not representative of that breed and 5 means that characteristic is very representative of that breed. These characteristics fell under five categories: adaptability, all-around friendliness, health grooming, trainability, and exercise needs.
We wrote a python script with BeautifulSoup that scrapes the profile page of every dog breed for their characteristics and ratings, as well as information on their life span. The results are then saved to a CSV file. Our script also checks for breeds with missing information and prints to the console what was missing.
200 out of 201 breeds were successfully scraped. The Korean Jindo Dog was missing the characteristic “Tendency To Bark Or Howl”. Therefore, we left out the Korean Jindo Dog from our analysis. And though "Life Span" could have been further "cleaned" by extracting out, for example, a minimum life span, a maximum life span, and an average, we decided to disregard this piece of the data for the purposes of our research and web app. The resulting CSV was clean, ready for further analysis in R and implementation into a Shiny app.
In order to create a weighting system for dog ownership factors, we used research studies and meta analyses. We came up with 9 weights: emotional/mental health, physical health, allergies, is of old age, is a child, housing, living alone, living with others, and income. These predictive variables will be used for the "Should you get a dog?" outcome, which will be represented as a numeric score.
These studies were:
A meta-analysis by the Department of Psychology at Northern Arizona University of five studies showed AAA (animal-assisted activities) and AAT (animal-assisted therapy) improved depression. They report a statistically significant and medium aggregate effect size.
A study conducted by two researchers from Miami University and Saint Louis University found that pet owners had greater self-esteem, greater levels of exercise and physical fitness, and tended to be less lonely than non-owners. In Table 2, they list the mean differences between pet owners and non-owners on well-being, as well as other factors, but we only considered those that fell under the categories of emotional/mental health and physical health.
A meta-analysis from Purdue University reviewed dog ownership and physical activity across 17 studies between 1996 and 2010. Their results indicated that dog ownership was consistently associated with higher levels of physical activity than non-dog owners. They report a statistically significant and small to medium aggregate effect size. They calculated a point estimate for random effects meta-analysis to be a standardized mean difference between dog owners and non-dog owners of 0.26.
Dogs may be beneficial in reducing cardiovascular risk by providing a non-human form of social support and increasing physical activity. There has been a study that aimed to investigate the association of dog ownership with incident cardiovascular disease (CVD) and death in a register-based prospective nation-wide cohort (n = 3,432,153) with up to 12 years of follow-up. Self-reported health and lifestyle habits were available for 34,202 participants in the Swedish Twin Register.
A study exploring the differences between pet and non-pet owners. We briefly review the research evidence, including the hypothesized mechanisms through which pet ownership may influence health outcomes. This study examines how pet and non-pet owners differ across a variety of socio-demographic and health measures, which has implications for the proper interpretation of a large number of correlational studies that attempt to draw causal attributions.
Meta-analysis of determinants for pet ownership in 12 European birth cohorts on asthma and allergies. The objective of this study was to describe determinants of cat and dog ownership in European families with and without allergies. We used this study for both allergies and housing.
Study about pet ownership and blood pressure in old age. It has been proposed that pet ownership improves cardiovascular health. This study examines the relation of pet ownership with systolic and diastolic blood pressure, pulse pressure, mean arterial pressure, and hypertension in a large sample of older men and women.
Study exploring early exposure to dogs and farm animals and the risk of childhood asthma. The association between early exposure to animals and childhood asthma is not clear, and previous studies have yielded contradictory results. In this study, the data support the hypothesis that exposure to dogs and farm animals during the first year of life reduces the risk of asthma in children at age 6 years.
For our breed selection functionality, we used R script that was embedded into a Shiny app to show the top three and bottom three suggested dog breeds in real time for users. What this means is that the list starts as the dogs that are easiest and hardest to take care of before being filtered by user input. Each question that a user answered filtered the dog data and sorted it by the traits stored in our dog data tables. These were measured on a scale of 1-5 and contained traits such as how well they withstand extreme weather and how easily trained they are.
For our suggestion functionality that tells a user whether or not they should get a dog, we created a weighting system based on the idea of creating a centralized unit to show the benefits and detriments of owning a dog. This was necessary, as there was no direct way to compare the benefits of adopting a dog within a certain age range and emotional support. To accomplish this, we had to look to outside resources in order to find a way to compare so many different units together. Initially, we thought that the t/z values of the meta-analyses we used would suffice, but these statistical measurements meant nothing without units.
Both R scripts can be found in this GitHub Repo - file path: project-jaakt/app/assets/scripts/
Method Choice Reasoning
In order to answer the question, “Should You Get a Dog?” - we decided to generate an overall recommendation score and breed compatibility scores for a single user.
The decision to get a dog includes many factors and will vary from person to person, and dog to dog, because of this subjectivity, we decided that a functional form would be the most appropriate modeling option. We felt that other approaches would not be able to adequately or realistically represent our decision context. We had considered using decision theory, but since every dog ownership experience will be different, it would be difficult and rather inaccurate to try to assign probabilities to certain outcomes. We also considered using a decision tree, however there isn’t necessarily an exact criteria/path that leads to dog ownership, so that option wouldn't make sense either. In utilizing a weighted sum function, we are able to included the many factors that go into dog ownership, and be able to combine, relate and weight them in a way that reflects the reality of the decision context. We could also customize the results, as the factors included in the function could be modified based on user inputs.
Our intent was to create a function that could take in information about a specific user, evaluate the potential dog ownership benefits/costs and compatibility factors of their inputs, and then output a recommendation score of whether or not that user should get a dog. In addition to this main “Should You Get a Dog?” Recommendation Score, we also felt that showing which dog breeds are most and least compatible with the user would also be useful in informing a user’s decision. As the decision context is more than just a yes or no question of whether or not a person should get a dog. Their suitability for dog ownership also depends heavily on the breed of dog. For this portion of our artifact we also used a function to calculate and rank the breed compatibility scores.
General basis of the score functions:
Score = Factor1 * weight1 + Factor2 * weight2 + .... + FactorN * weightN
There are multiple factors that go into dog ownership or breed compatibility. Those factors are multiplied by a given weight, that signify the importance of that factor. Then each of those weighted factors are added to together to produce the score.
Process of Generating Weights
1. Decided on Factors related to dog ownership
2. Assigned Weights to factor based on domain knowledge and initial research
- Emotional/Mental Health: +1.5
- Physical Health: +1.5
- Allergies: -5
- Age (old): + 2
- Age (child): 0
- Housing (house): 1
- Prior Owner (yes): 1
- Income (significantly below avg cost of dog): -5
- Income (below avg cost of dog): -3
- Income (significantly above avg cost of dog): +5
- Income (above avg cost of dog): +3
Second Approach (Meta-meta Analysis)
1. Through our research, we consolidated a list of outcome effects from various meta-analysis
2. Converted all outcome effect to a common unit (Cohen's d)
- HR = OR → Cohens
- We assumed we can treat HR as an OR then converted it to a cohens
- OR → Cohens
- T-value(df) → Cohens
- OR → Cohens
3. Categorized each outcome effect (row in table) to a dog ownership factor
For example, dog ownership benefits for old people and dog ownership benefits for children would belong to the Age Factor.
4. Decided on the value coefficients for each factor
Proportionate to our initial weighting approach (based in domain knowledge)
- Allergies: 100
- Health: 50
- Income: 100
- Age: 60
- Housing: 30
- Prior Ownership: 30
5. Calculated the weights for each outcome effect (row in table)
Weight = Outcome effect * SD * Value Coefficient
Assumed SD was normal (SD = 1)
6. Calculated a single weight for each factor
By averaging weights of rows that belong to that factor
- Emotional/Mental Benefit: +17.35419784
- Physical Health Benefit: +7.229100643
- Allergies (yes): -5.81
- Age (old): 15.816
- Age (child): 3.486
- Housing (house): 9.71
- Family (alone): 5.34
- Family (not alone): 3.845
- Income (below avg cost of dog): -27.25
For more details see 'WEIGHTS' file in JAAKT's Google Team Drive
As talked about in our assumptions and limitations, we later realized that using 1 for the standard deviation was incorrect. To fix this we went back to the studies and took the upper and lower bounds of the 95% confidence intervals. We used the same method to convert these ranges to Cohen's d and added them to our weights. We used these values in our updated score that is presented to the user. Now we have the score with a lower and upper range. We believe that producing the score as a distribution was a more accurate and realistic representation then having the score be a single value.
Here are our final weights:
|Physical Health Benefit||6.928548586||8.432225643||9.846107687|
|Family (not alone)||2.7525||3.845||4.825|
|Income (below avg cost of dog)||19.66||27.25||33.97|
Breed Compatibility Scores
For our breed selection, we used vectorized functions to help our calculations. First, we took the data that we web scraped and created a matrix that had each breed as a row, and the columns represented the scores for each category. This resulted in a matrix with size (200x26). Then we converted the user's score into a column vector that would dynamically change along with the inputs. This was a (26x1) sized vector with the weights added to them. With both the breed matrix and weight vector we multiplied them to create a breed score vector. The (200x26) times (26x1) produces a (200x1) vector that represents the score for each breed. We then get the top three and bottom three scores and display it to the users. Using matrix multiplication instead of a loop gives us a faster calculation which allows us to present breeds reactively.
For more details see 'Breed Weight Matrix' file in JAAKT's Google Team Drive
Our algorithm successfully was able to recommend getting a dog based on a centralized score and also dynamically recommend dog breeds in real time. However, we are not certain of the accuracy of this function. For example, the Chow-Chow is almost always one of the bottom three suggested dogs in our output display. This may be caused by the fact that they are hard to take care of, but also alienates someone who would inherently have a preference for a Chow-Chow.
In terms of the most valuable weights, we found that income, age and desire for mental/emotional health benefits were two of the strongest predictors of if someone should get a dog.
For our final artifact, we built an interactive Dog Recommender application using Shiny, that utilizes and demonstrates the findings of our analysis.
How it works:
- User answers a series of questions about themselves and their current living situation and preferences
- Each question is related to one or more of the dog ownership and/or breed compatibility factors
- Their information is then inputted into our functions
- Then the App recommends if the user should get a dog (based on their Recommendation Score)
- Zero-based threshold (positive = yes, negative = no)
- App also recommends the top and bottom three dog breeds for the user (based on the Breed Compatibility Score)
- With this gained insight, the user can better decide whether or not dog ownership is right for them
Web App is published to: https://asrinagesh.shinyapps.io/should_you_get_a_dog/
Assumptions & Limitations
Our analysis was challenging because of our data, on the benefits of dogs and factors affecting dog ownership, is inherently qualitative. Much of our data was aggregated data and factoids, so it was challenging to try to encompass all the factors and properly weigh them. We assigned weights based on research studies or other sources that used data to justify the statistical difference between dog owners and non-dog owners. This information was limited, especially when we were comparing certain factors such as how your income influences whether or not to get a dog. For these, we based our weights on other sources, although they were not supported by research, but were rather estimates made by professionals. Other factors in comparing dog ownership didn't have any data to justify a weight value.
Another limitation is that even though our decision context can be informed with data, it is very subjective. One person’s experience with a dog will differ depending on a multitude of personal factors that our analysis does not cover. We attempted to utilize the most important personal factors in deciding whether or not to get a dog, however, this will vary from person to person, with each one having a different individual weight. Our algorithms are simply a general recommendation.
Statistical domain knowledge was also another limitation of ours. Our original weights were defined by using the product of cohen's d and a weight. We believed that this product would be a good weight because it combined both our intuition and the meta-analyses. However, after some guidance the flaw in this became apparent. Effectively our final weight was "effect size per utility unit", this is something that is not intuitive and doesn't make much sense to compare. Sadly, we did not have enough knowledge of statistics to improve our study. Instead, we added distributions to our weights and added that in our report. This allows the user to see a range of scores rather than one fixed number, giving us a better range. We assumed that these ranges could be averaged, so we took the average of the confidence intervals lower and upper bounds for each category.
Potential Future Work
If more research is conducted, perhaps on what dog breeds different personality types are most compatible with, or other data is available for us to closer encompass all factors of dog ownership, we would be able to provide more accurate/personalized results. The meta-analysis comparing the difference between dog owners and non-dog owners in relation to certain traits or living conditions are necessary in order to justify the weights we use in our algorithms.
To improve our recommendation algorithm, we would need further research on the benefits or detriments of owning a dog. We believed that while we selected some of the most important features like age, allergies and living condition, there are numerous other factors that have a high influence on an individual's life when owning a dog, these just need to also be identified and statistically analyzed for our algorithm to be more accurate. As long as there is a statistically significant difference between dog owners and non-dog owners in relation to a single factor, that factor can be integrated into our algorithm. However, we would only aim to use factors that would have high weights in deciding whether or not to own a dog.
To improve dog breed suggestion, we would need to either find or collect additional data on all the dog breeds analyzed in our project. By discovering additional personality and behavior trends in different breeds we can incorporate more questions into our survey module in the web application so breed recommendations are more accurate.
Greater sampling and testing of our survey would also make our algorithms stronger, to adjust our weights to be more accurate. Conducting several pilot studies prior to a large scale one would allow us to narrow down problems with our algorithm and increase accuracy. Despite this, the goal of our application would be to suggest a dog breed and ownership decision based on factual and scientific reasons. Personal preference would ideally play a role in a person selecting a dog breed, but the point of our application should be to suggest one that fits best with their lifestyle rather than one that they personally find most attractive, to make sure that the dog will be properly cared for. Preference should play a role in the algorithm in the future, but perhaps should be weighted less than other more important factors (unless research proves that a dog will be cared for better if its breed is more greatly preferred by their owner).
A more detailed distribution of results could also be generated, based on data supporting the variety of dog ownership experiences for the same answers to our survey to gauge the effectiveness of it, in order to identify any potential issues for us to improve on in our recommendations.