Analysis questions addressed in this notebook: 
    1. What times (months, weekdays, hour) do users complete the tests? 
    2. What are the geographical locations where most users live? What times do US users tend to complete tests?
    3. Do certain breeds of dogs complete more tests than others? 
    4. Are DNA-tested dogs or neutered dogs more likely to complete more tests? 
    
Since the database is not locally accessible, the analyses in this notebook is restricted to SQL. For an analysis using a combination of Python packages, SQL queries and Tableau charts, see "Dognition:User Retention."  

In [1]:
%load_ext sql
%sql mysql://studentuser:studentpw@localhost/dognitiondb
%sql USE dognitiondb

 * mysql://studentuser:***@localhost/dognitiondb
0 rows affected.


[]

In each months, how many tests do users complete and which tests are they? Exclude the dogs that show abnormalities from earlier investigation (dogs that are not test accounts, dogs that do not have null ID, dogs that do not have an 'exclude' mark).

In [22]:
%%sql 
SELECT COUNT(cleaned_tests.created_at) AS completed_tests, 
MONTH(cleaned_tests.created_at) AS month_n, cleaned_tests.test_name
FROM 
(SELECT c.created_at, c.test_name, d.dog_guid
FROM complete_tests c LEFT JOIN dogs d ON c.dog_guid=d.dog_guid 
WHERE (d.dog_guid NOT IN 
    (SELECT DISTINCT(dog_guid) FROM dogs 
     WHERE (weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND c.dog_guid IS NOT NULL)) AS cleaned_tests 
GROUP BY MONTH(cleaned_tests.created_at), cleaned_tests.test_name 
ORDER BY MONTH(cleaned_tests.created_at)

 * mysql://studentuser:***@localhost/dognitiondb
480 rows affected.


completed_tests,month_n,test_name
24,1,1 vs 1 Game
33,1,3 vs 1 Game
56,1,5 vs 1 Game
586,1,Arm Pointing
418,1,Cover Your Eyes
328,1,Delayed Cup Game
2,1,Different Perspective
6,1,Expression Game
594,1,Eye Contact Game
676,1,Eye Contact Warm-up


In [21]:
%%sql 
SELECT COUNT(cleaned_tests.created_at) AS completed_tests, 
CASE WHEN WEEKDAY(cleaned_tests.created_at)=0 THEN 'Monday'
    WHEN WEEKDAY(cleaned_tests.created_at)=1 THEN 'Tuesday'
    WHEN WEEKDAY(cleaned_tests.created_at)=2 THEN 'Wednesday'
    WHEN WEEKDAY(cleaned_tests.created_at)=3 THEN 'Thursday'
    WHEN WEEKDAY(cleaned_tests.created_at)=4 THEN 'Friday'
    WHEN WEEKDAY(cleaned_tests.created_at)=5 THEN 'Saturday'
    WHEN WEEKDAY(cleaned_tests.created_at)=6 THEN 'Sunday' END AS Weekday,
cleaned_tests.test_name
FROM 
(SELECT c.created_at, c.test_name, d.dog_guid
FROM complete_tests c LEFT JOIN dogs d ON c.dog_guid=d.dog_guid 
WHERE d.dog_guid NOT IN 
    (SELECT DISTINCT(dog_guid) FROM dogs 
     WHERE (weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND c.dog_guid IS NOT NULL) AS cleaned_tests 
GROUP BY WEEKDAY(cleaned_tests.created_at), cleaned_tests.test_name 
ORDER BY WEEKDAY(cleaned_tests.created_at)

 * mysql://studentuser:***@localhost/dognitiondb
280 rows affected.


completed_tests,Weekday,test_name
28,Monday,1 vs 1 Game
34,Monday,3 vs 1 Game
65,Monday,5 vs 1 Game
1672,Monday,Arm Pointing
979,Monday,Cover Your Eyes
642,Monday,Delayed Cup Game
12,Monday,Different Perspective
12,Monday,Expression Game
2351,Monday,Eye Contact Game
2700,Monday,Eye Contact Warm-up


In [68]:
%%sql 
# Where do most of Dognition's users live? 
SELECT COUNT(DISTINCT user_guid), country
FROM users 
GROUP BY country 
ORDER BY COUNT(user_guid)

 * mysql://studentuser:***@localhost/dognitiondb
70 rows affected.


COUNT(DISTINCT user_guid),country
1,TR
1,HR
1,BS
1,LA
1,DO
1,TT
1,LT
1,UA
1,LV
1,ME


It looks like US has the most customers. 
Looking at US customers, when in the day do they complete the most tests? 

In [3]:
%%sql 
# Note that the time zones of U.S. users are different. For the numbers by 
# adjusted time zones, see the notebook on user retention. 

SELECT COUNT(c.created_at) AS count, HOUR(c.created_at) AS hour_n 
FROM complete_tests c 
LEFT JOIN dogs d ON c.dog_guid=d.dog_guid 
LEFT JOIN 
    (SELECT DISTINCT(user_guid) FROM users 
    WHERE NOT exclude =1 
    AND country='US'
    AND user_guid NOT IN ('ce2258a6-7144-11e5-ba71-058fbc01cf0b','ce225842-7144-11e5-ba71-058fbc01cf0b')) 
    AS us_users 
    ON us_users.user_guid=d.user_guid
WHERE (d.dog_guid NOT IN 
    (SELECT DISTINCT(dog_guid) FROM dogs 
     WHERE (weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND c.dog_guid IS NOT NULL) 
GROUP BY HOUR(c.created_at) 
ORDER BY HOUR(c.created_at);

 * mysql://studentuser:***@localhost/dognitiondb
24 rows affected.


count,hour_n
17263,0
16278,1
12544,2
8412,3
5068,4
2468,5
1433,6
1102,7
924,8
1088,9


In [3]:
%%sql 
SELECT COUNT(c.test_name) AS count_test, COUNT(DISTINCT dog_breeds.dog_guid) AS count_dogs, 
COUNT(c.test_name)/COUNT(DISTINCT dog_breeds.dog_guid) AS avg_n_tests,
dog_breeds.breed_group AS breed_group 
FROM complete_tests c
JOIN 
    (SELECT DISTINCT dog_guid, breed_group
     FROM dogs) AS dog_breeds
    ON dog_breeds.dog_guid=c.dog_guid 
WHERE (dog_breeds.dog_guid NOT IN 
    (SELECT DISTINCT(dog_guid) FROM dogs 
     WHERE (weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND c.dog_guid IS NOT NULL) 
    AND breed_group IS NOT NULL 
    AND NOT breed_group = ''
GROUP BY breed_group 
ORDER BY count_test DESC, avg_n_tests DESC

 * mysql://studentuser:***@localhost/dognitiondb
7 rows affected.


count_test,count_dogs,avg_n_tests,breed_group
27149,2470,10.9915,Sporting
19952,1774,11.2469,Herding
9659,964,10.0197,Non-Sporting
9024,1025,8.8039,Toy
8854,865,10.2358,Working
7748,780,9.9333,Terrier
5674,564,10.0603,Hound


In [7]:
%%sql 
SELECT COUNT(c.created_at) AS count_test, COUNT(dna_fixed.dog_guid) AS n_dogs,
 COUNT(c.created_at)/COUNT(dna_fixed.dog_guid) AS avg_test_p_dog,
dna_fixed.dna_tested AS DNA_tested 
FROM complete_tests c
JOIN 
    (SELECT DISTINCT dog_guid, dna_tested, dog_fixed 
     FROM dogs 
    WHERE NOT ((weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND dog_guid IS NOT NULL) AS dna_fixed 
    ON c.dog_guid = dna_fixed.dog_guid 
GROUP BY dna_fixed.dna_tested
ORDER BY count_test DESC

 * mysql://studentuser:***@localhost/dognitiondb
2 rows affected.


count_test,n_dogs,avg_test_p_dog,DNA_tested
3205,3205,1.0,0
498,498,1.0,1


In [6]:
%%sql 
SELECT COUNT(c.created_at) AS count_tests, COUNT(DISTINCT dna_fixed.dog_guid) AS count_dogs,
COUNT(c.created_at)/COUNT(DISTINCT dna_fixed.dog_guid) AS avg_tests_p_dog,
dna_fixed.dog_fixed AS neutered
FROM complete_tests c
JOIN 
    (SELECT DISTINCT dog_guid, dna_tested, dog_fixed 
     FROM dogs 
    WHERE NOT ((weight=190 AND breed='Shih Tzu') OR exclude=1)
    AND dog_guid IS NOT NULL) AS dna_fixed 
    ON c.dog_guid = dna_fixed.dog_guid 
GROUP BY neutered
ORDER BY count_tests DESC

 * mysql://studentuser:***@localhost/dognitiondb
2 rows affected.


count_tests,count_dogs,avg_tests_p_dog,neutered
3249,137,23.7153,1
454,20,22.7,0


As can be seen, the datapoints regarding DNA tested or neutered are relatively small in comparison with the whole. There are no distinction between the average number of tests completed by dogs in these two groups, but there are more dogs that are neutered or not DNA-tested. 
