# Cross Device Users

### Introduction

We have two tables. One table has all mobile actions, i.e. all visits by the users on mobile. The other table has all web actions, i.e. all page visits on web by the users.

> We can begin by upgrading our version of sqlite.

In [1]:
from time import process_time
start_time = process_time()
import subprocess
try:
    import google.colab # if colab exists, install pysqlite-binary
    subprocess.run(['pip', 'install', 'pysqlite3-binary'], capture_output=False)
    import pysqlite3 as sqlite3

except ModuleNotFoundError:
    pass
end_time = process_time()

import warnings
warnings.simplefilter(action='ignore', category=UserWarning)
import pandas as pd

## Loading our Data

In [2]:
mobile_users = pd.read_csv('https://raw.githubusercontent.com/data-eng-10-21/sql-interview-questions/main/1-mobile-web-cross-users/query/query_two_mobile.csv')
web_users = pd.read_csv('https://raw.githubusercontent.com/data-eng-10-21/sql-interview-questions/main/1-mobile-web-cross-users/query/query_two_web.csv')

Currently, we have a list of unique mobile users.  

In [3]:
conn = sqlite3.connect('users.db')

mobile_users.to_sql('mobile_users', conn, index = False, if_exists = 'replace')
web_users.to_sql('web_users', conn, index = False, if_exists = 'replace')

2021

### Exploring our Data

In our database we have a unique list of `web_users` and a unique list of `mobile_users` in those respective tables.

In [4]:
pd.read_sql("select * from web_users limit 3", conn)

Unnamed: 0,user_id,page
0,1210,page_1_web
1,1275,page_1_web
2,1283,page_4_web


In [5]:
pd.read_sql("select * from mobile_users limit 3", conn)

Unnamed: 0,user_id,page
0,128,page_5_mobile
1,1324,page_2_mobile
2,1343,page_6_mobile


### Finding Web Only

Find the percentage of *web users* who are not mobile users.  
> That is, find the percentage of user who have a visited the website one or more times, but who have never visited on mobile.

> You should find that 23% of the web users are not mobile users.

> **Hint**: to display the calculation, multiply the percentage by 100.

In [8]:
query = """
select count(distinct user_id)
FROM web_users
"""

pd.read_sql(query, conn)

# 	web_but_not_mobile
# 0	23

Unnamed: 0,count(distinct user_id)
0,1256


In [13]:
query = """
select count(distinct m.user_id) as m_w_users
FROM web_users w
OUTER LEFT JOIN mobile_users m ON w.user_id = m.user_id
"""

pd.read_sql(query, conn)

Unnamed: 0,m_w_users
0,962


In [16]:
query = """
select
  count(distinct w.user_id) as web,
  count(distinct m.user_id) as m_w_users,
  ((count(distinct w.user_id) - count(distinct m.user_id)) * 100) / count(distinct w.user_id) as percent
FROM web_users w
OUTER LEFT JOIN mobile_users m ON w.user_id = m.user_id
"""

pd.read_sql(query, conn)

Unnamed: 0,web,m_w_users,percent
0,1256,962,23


### Finding Mobile Only

Now find the percentage of mobile users who are not web users.  Try not to reference your above query in coming up with the solution.

> We should find that 37% of mobile users are not web users.

In [20]:
query = """
select count(distinct m.user_id) as mobile
from mobile_users m
"""

pd.read_sql(query, conn)

# 	mobile_not_web
# 0	37

Unnamed: 0,count(distinct m.user_id)
0,1539


In [23]:
query = """
select count(distinct w.user_id) as w_m_users
from mobile_users m
OUTER LEFT JOIN web_users w on m.user_id = w.user_id
"""

pd.read_sql(query, conn)

Unnamed: 0,count(distinct w.user_id)
0,962


In [26]:
query = """
select
  count(distinct m.user_id) as mobile,
  count(distinct w.user_id) as w_m_users,
  ((count(distinct m.user_id) - count(distinct w.user_id)) * 100) / count(distinct m.user_id) as percent
from mobile_users m
OUTER LEFT JOIN web_users w on m.user_id = w.user_id
"""

pd.read_sql(query, conn)

Unnamed: 0,mobile,w_m_users,percent
0,1539,962,37


### Cross device users

Write a query that returns the percentage of users who only visited mobile, only web and both. That is, the percentage of users who are only in the mobile table, only in the web table and in both tables. The sum of the percentages should return 1, but .99 is ok too.

In [30]:
query = """
select count(distinct m.user_id), count(distinct w.user_id)
from web_users w
left outer join mobile_users m on w.user_id = m.user_id
"""
pd.read_sql(query, conn)

# WEB_ONLY	MOBILE_ONLY	BOTH
# 0	16	31	52

Unnamed: 0,count(distinct m.user_id),count(distinct w.user_id)
0,962,1256


[subqueries](https://www.essentialsql.com/sql-subqueries/)