In this code, I import four csv files (posts, posts2, users, and subreddits) into a SQLite database and perform several advanced SQL queries to gather information from the data. I used the pandas library to read in the csv files and the sqlite3 library to connect to the database.

Some of the advanced SQL skills I used include:

* Using LEFT JOIN to combine data from multiple tables
* Using GROUP BY and HAVING to aggregate data
* Using ORDER BY to sort data
* Using LIMIT to limit the number of rows returned
* Using MAX() to find the maximum value of a column.
 
 
#### The queries I wrote allowed me to answer questions such as:
 
 
* Which user has the highest score?
* What post has the highest score?
* What are the top 5 subreddits with the highest subscriber count?
* How many posts have each user made?

## Setup

In [2]:
import pandas as pd 

In [3]:
import sqlite3

In [4]:
df_posts = pd.read_csv("posts.csv")

In [5]:
df_posts2 = pd.read_csv("posts2.csv")

In [6]:
df_users = pd.read_csv("users.csv")

In [7]:
df_subreddits = pd.read_csv("subreddits.csv")

In [8]:
cnn = sqlite3.connect('jupyter_sql.db')

In [9]:
df_posts.to_sql('posts',cnn, if_exists='replace')
df_posts2.to_sql('posts2',cnn, if_exists='replace')
df_users.to_sql('users',cnn, if_exists='replace')
df_subreddits.to_sql('subreddits',cnn, if_exists='replace')

In [10]:
%load_ext sql

In [11]:
%sql sqlite:///jupyter_sql.db

In [12]:
%%sql 

SELECT * 
FROM users
LIMIT 5;
 

 * sqlite:///jupyter_sql.db
Done.


index,id,username,email,join_date,score
0,1,sonnynomnom,mbosence0@ycombinator.com,14/05/2008,185713.0
1,2,coler1,kmonkhouse1@indiatimes.com,11/12/2011,136965.0
2,3,lauracle,rkilfether2@independent.co.uk,14/05/2011,277721.0
3,4,kassablanca,trigard3@stanford.edu,30/10/2006,143478.0
4,5,yakovkagan,treggio4@sciencedirect.com,06/06/2009,242023.0


In [13]:
%%sql

SELECT * 
FROM posts
LIMIT 5;


 * sqlite:///jupyter_sql.db
Done.


index,id,title,user_id,subreddit_id,score,created_date
0,1,Delivery drones are being attacked by hawks,89.0,15.0,40070.0,21/10/2015
1,2,What is the best programming language to learn in 2020?,90.0,1.0,9746.0,03/02/2013
2,3,First picture of a black hole has been taken,91.0,2.0,7367.0,27/11/2013
3,4,Scientists develop waterproof shoes,27.0,15.0,38476.0,29/12/2012
4,5,Running DOOM on a toaster,86.0,1.0,143728.0,22/10/2016


In [14]:
%%sql


SELECT * 
FROM subreddits
LIMIT 5;

 * sqlite:///jupyter_sql.db
Done.


index,id,name,created_date,subscriber_count
0,1,programming,28/02/2006,2717072
1,2,science,18/10/2006,24543061
2,3,funny,25/01/2008,14926599
3,4,gaming,25/01/2008,27061546
4,5,pics,25/01/2008,25239687


### information about the data:

users: users data

posts: posts information

subreddits: information about subreddits

# Starting the code

### Write a query to count how many different subreddits there are.

In [15]:
%%sql
select count(*)
from subreddits

 * sqlite:///jupyter_sql.db
Done.


count(*)
20


### Write a few more queries to figure out the following information:

* What user has the highest score?

In [16]:
%%sql
select username, max(score)
from users;

 * sqlite:///jupyter_sql.db
Done.


username,max(score)
ctills1w,300895.0


* What post has the highest score?

In [17]:
%%sql
select title, max(score)
from posts;

 * sqlite:///jupyter_sql.db
Done.


title,max(score)
Picture of a kitten,149176.0


* What are the top 5 subreddits with the highest subscriber_count?

In [18]:
%%sql
select name, subscriber_count
from subreddits
order by 2 desc
limit 5;

 * sqlite:///jupyter_sql.db
Done.


name,subscriber_count
AskReddit,28837356
gaming,27061546
aww,25653577
pics,25239687
science,24543061


Now let’s join the data from the different tables to find out some more information.

Use a LEFT JOIN with the users and posts tables to find out how many posts each user has made. Have the users table as the left table and order the data by the number of posts in descending order.

In [26]:
%%sql
select username, count(*) as 'Number of posts'
from users
left join posts
on users.id = posts.user_id
group by users.id
order by count(*) desc
limit 10;

 * sqlite:///jupyter_sql.db
Done.


username,Number of posts
nwealthall1t,7
lbenedetti2o,6
dsheaj,6
hassandri2d,5
jotaki,5
jreamesw,5
ldeshonq,5
dcarette2p,4
cambrozewicz2k,4
laskin2g,4


Over time, posts may be removed and users might delete their accounts.

We only want to see existing posts where the users are still active, so use an INNER JOIN to write a query to get these posts. Have the posts table as the left table.

In [29]:
%%sql
select * 
from posts
join users 
on posts.user_id = users.id
limit 10;

 * sqlite:///jupyter_sql.db
Done.


index,id,title,user_id,subreddit_id,score,created_date,index_1,id_1,username,email,join_date,score_1
0,1,Delivery drones are being attacked by hawks,89.0,15.0,40070.0,21/10/2015,88,89,laskin2g,mmarley2g@deviantart.com,25/07/2011,268026.0
1,2,What is the best programming language to learn in 2020?,90.0,1.0,9746.0,03/02/2013,89,90,sciementini2h,lcrenshaw2h@issuu.com,09/04/2008,294170.0
2,3,First picture of a black hole has been taken,91.0,2.0,7367.0,27/11/2013,90,91,junuuki89,nsurgeoner2i@engadget.com,29/08/2013,171038.0
3,4,Scientists develop waterproof shoes,27.0,15.0,38476.0,29/12/2012,26,27,ldeshonq,bbazogeq@cbslocal.com,19/12/2006,167811.0
4,5,Running DOOM on a toaster,86.0,1.0,143728.0,22/10/2016,85,86,hassandri2d,sharvison2d@pagesperso-orange.fr,26/08/2010,256099.0
5,6,"As a kid, you're also watching your parents grow up",51.0,10.0,30249.0,22/07/2015,50,51,rneate1e,sseville1e@yandex.ru,12/08/2011,116273.0
6,7,Created some entertaining Christmas cards,64.0,3.0,18297.0,15/01/2012,63,64,kbrosini1r,jquested1r@army.mil,08/04/2007,139040.0
7,8,"I am Gill Bates, founder of Macrohard. Ask me Anything.",7.0,11.0,96731.0,28/04/2019,6,7,ttroctor6,tblowfield6@ucoz.ru,23/06/2007,111915.0
8,9,Someone reverse engineered Super Mario... with Minecraft command blocks.,17.0,1.0,9196.0,21/12/2015,16,17,jzimekg,jcorcutg@icio.us,14/05/2013,106783.0
9,10,What is the most exciting personal project you've worked on?,40.0,1.0,70951.0,09/12/2019,39,40,penguinDev,studyhard.swe.2020@gmail.com,24/05/2014,112328.0


Some new posts have been added to Reddit!

Stack the new posts2 table under the existing posts table to see them.

In [31]:
%%sql
select * 
from posts 
UNION 
select *
from posts2
limit 10;

 * sqlite:///jupyter_sql.db
Done.


index,id,title,user_id,subreddit_id,score,created_date
0,1,Delivery drones are being attacked by hawks,89.0,15.0,40070.0,21/10/2015
0,1,Engineers create a mech suit capable of lifting 1 ton,72.0,15.0,35211.0,21/06/2020
1,2,New Pokeman games Axe and Spear announced,23.0,4.0,25031.0,23/05/2020
1,2,What is the best programming language to learn in 2020?,90.0,1.0,9746.0,03/02/2013
2,3,First picture of a black hole has been taken,91.0,2.0,7367.0,27/11/2013
2,3,"If you could live in any fictional world, which one would it be?",4.0,12.0,37268.0,11/04/2020
3,4,Scientists develop waterproof shoes,27.0,15.0,38476.0,29/12/2012
3,4,Space elevator being developed,43.0,13.0,4275.0,26/04/2020
4,5,Man creates song using only rocks as the instruments,67.0,7.0,46117.0,19/07/2020
4,5,Running DOOM on a toaster,86.0,1.0,143728.0,22/10/2016


 ## More advanced queries


Now you need to find out which subreddits have the most popular posts. We’ll say that a post is popular if it has a score of at least 5000. We’ll do this using a WITH and a JOIN.

First, you’ll need to create the temporary table that we’ll nest in the WITH clause by writing a query to select all the posts that have a score of at least 5000.

Next, place the previous query within a WITH clause, and alias this table as popular_posts.

Finally, utilize an INNER JOIN to join this table with the subreddits table, with subreddits as the left table. Select the subreddit name, the title and score of each post, and order the results by each popular post’s score in descending order.

In [33]:
%%sql
With popular_posts as (
    select * 
    from posts
    where score > 5000
)
select subreddits.name, popular_posts.title, popular_posts.score
from subreddits
left join popular_posts
on subreddits.id = popular_posts.subreddit_id
order by popular_posts.score desc
limit 10;

 * sqlite:///jupyter_sql.db
Done.


name,title,score
aww,Picture of a kitten,149176.0
programming,Running DOOM on a toaster,143728.0
news,Promising advances made toward cure for cancer,136532.0
programming,Codecademy releases their new database courses,133728.0
IAmA,"I am Gill Bates, founder of Macrohard. Ask me Anything.",96731.0
videos,Codecademy programming tutorial videos,85347.0
programming,What is the most exciting personal project you've worked on?,70951.0
science,Clean water ice found just below Mars' surface,49477.0
Music,Playlist for programmers,49129.0
science,Paleontologists have dug up the skeleton of the ancestor of all dogs,48629.0
