# Cross Device Users

### Introduction

We have two tables. One table has all mobile actions, i.e. all visits by the users on mobile. The other table has all web actions, i.e. all page visits on web by the users.

> We can begin by upgrading our version of sqlite.

In [1]:
from time import process_time
start_time = process_time()
import subprocess
try:
    import google.colab # if colab exists, install pysqlite-binary
    subprocess.run(['pip', 'install', 'pysqlite3-binary'], capture_output=False)
    import pysqlite3 as sqlite3

except ModuleNotFoundError:
    pass
end_time = process_time()

import warnings
warnings.simplefilter(action='ignore', category=UserWarning)
import pandas as pd

## Loading our Data

In [2]:
mobile_users = pd.read_csv('https://raw.githubusercontent.com/data-eng-10-21/sql-interview-questions/main/1-mobile-web-cross-users/query/query_two_mobile.csv')
web_users = pd.read_csv('https://raw.githubusercontent.com/data-eng-10-21/sql-interview-questions/main/1-mobile-web-cross-users/query/query_two_web.csv')

Currently, we have a list of unique mobile users.  

In [3]:
conn = sqlite3.connect('users.db')

mobile_users.to_sql('mobile_users', conn, index = False, if_exists = 'replace')
web_users.to_sql('web_users', conn, index = False, if_exists = 'replace')

2021

### Exploring our Data

In our database we have a unique list of `web_users` and a unique list of `mobile_users` in those respective tables.

In [7]:
pd.read_sql("select * from web_users limit 10", conn)

Unnamed: 0,user_id,page
0,1210,page_1_web
1,1275,page_1_web
2,1283,page_4_web
3,1163,page_4_web
4,96,page_2_web
5,2000,page_5_web
6,908,page_1_web
7,180,page_1_web
8,361,page_7_web
9,333,page_2_web


In [9]:
pd.read_sql("select * from mobile_users order by user_id limit 10", conn)

Unnamed: 0,user_id,page
0,1,page_3_mobile
1,1,page_2_mobile
2,2,page_7_mobile
3,4,page_2_mobile
4,5,page_3_mobile
5,7,page_4_mobile
6,7,page_4_mobile
7,8,page_3_mobile
8,9,page_4_mobile
9,9,page_8_mobile


### Finding Web Only

Find the percentage of *web users* who are not mobile users.  
> That is, find the percentage of user who have a visited the website one or more times, but who have never visited on mobile.

> You should find that 23% of the web users are not mobile users.

> **Hint**: to display the calculation, multiply the percentage by 100.

In [44]:
query = """
select
  count(distinct w.user_id)
from
  web_users w

where
  w.user_id not in (select user_id from mobile_users)
"""

query2 = """
select count(*)
from
  web_users w
full join
  mobile_users u on w.user_id = u.user_id
"""

query3 = """
SELECT count(distinct w.user_id) + count(distinct m.user_id) as total_users,
  COUNT(m.user_id) as mobile_user_count,
  count(*) - COUNT(m.user_id) as no_mobile,
  ((count(*) - COUNT(m.user_id)) * 100  / count(*))
FROM web_users w
LEFT JOIN mobile_users m ON w.user_id = m.user_id
"""

# Doing an left outer join will show us all the web users and the mobile u
# that match. If they do not match, the mobile part
# will have null values.
pd.read_sql(query3, conn)

# 	web_but_not_mobile
# 0	23

Unnamed: 0,total_users,mobile_user_count,no_mobile,((count(*) - COUNT(m.user_id)) * 100 / count(*))
0,2218,3038,456,13


### Finding Mobile Only

In [None]:
q = """
WITH mobile_u
AS
(SELECT
    distinct * from mobile_users)

SELECT
    AVG(Total) average_product_quantity
FROM cte_quantity;
"""

In [58]:
query4 = '''
with web_only as (select distinct * from web_users)
select
  (count(distinct w.user_id) + count(distinct m.user_id)) as total_users,
  count(distinct m.user_id) as mobile_user_count,
  (count(distinct w.user_id)) as web_user_count,
  sum(case when m.user_id is null then 1 else 0 end) as persons_without_mobile
from
  web_only as w
left outer join
  mobile_users as m on m.user_id = w.user_id
'''

query = """
select count(distinct w.user_id) as web,
count(distinct m.user_id) as mobile,
count(distinct w.user_id) - count(distinct m.user_id) as w_n_m
from web_users as w
left join mobile_users as m
  on w.user_id = m.user_id

"""

# Doing an left outer join will show us all the web users and the mobile u
# that match. If they do not match, the mobile part
# will have null values.
pd.read_sql(query, conn)

# 	web_but_not_mobile
# 0	23

Unnamed: 0,web,mobile,w_n_m
0,1256,962,294


Now find the percentage of mobile users who are not web users.  Try not to reference your above query in coming up with the solution.

> We should find that 37% of mobile users are not web users.

In [60]:
query = """
select count(distinct w.user_id) as web,
count(distinct m.user_id) as mobile,
count(distinct m.user_id) - count(distinct w.user_id) as m_n_w,
from mobile_users as m
left join web_users as w
  on w.user_id = m.user_id
"""

pd.read_sql(query, conn)

# 	mobile_not_web
# 0	37

Unnamed: 0,web,mobile,w_n_m
0,962,1539,577


### Cross device users

Write a query that returns the percentage of users who only visited mobile, only web and both. That is, the percentage of users who are only in the mobile table, only in the web table and in both tables. The sum of the percentages should return 1, but .99 is ok too.

In [65]:
query = """
select count(distinct w.user_id) as web,
count(distinct m.user_id) as mobile,
count(distinct w.user_id) + count(distinct m.user_id) as total,
count(distinct w.user_id) - count(distinct m.user_id) as w_n_m
from web_users as w
left join mobile_users as m
  on w.user_id = m.user_id
"""
pd.read_sql(query, conn)

# WEB_ONLY	MOBILE_ONLY	BOTH
# 0	16	31	52

Unnamed: 0,web,mobile,total,w_n_m
0,1256,962,2218,294


[subqueries](https://www.essentialsql.com/sql-subqueries/)