# `SQL` Exercises

We are going to continue down the path of querying our twitter database. 
For this exercise, much of our activities will revolve around querying the `tweet` table (in the practice we spent most of our time in the `hashtag` table). 
This time around, all of the queries will be written by you. 

**Please keep in mind** that if your query is taking a very long time, 
it is probably because you are scanning the entire database. 
You will want to put a `LIMIT` on the lowest level of your query in order for results to render quickly. 
Take for example a query configured like so

```sql
SELECT <cols>
FROM (
    SELECT <cols>
    FROM <tables>
    LIMIT <int>) AS lowest_level_table
```

The query in the parentheses will be executed first and then the query outside of it will be executed on the results of the query inside of the parentheses. 

---

Before we begin, we will establish a connection...

In [1]:
import psycopg2
import pandas as pd

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='pgsql.dsa.lan' password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
except:
    print("Something went wrong...probably the wrong permissions")

We will start off with a simple query...

**Exercise 1**: Write a query that returns all of the attributes and **200** rows from the tweet table. Then execute the query by running `pd.read_sql_query()`.

In [7]:
# Code for Exercise 1 goes here
# --------------

q = """
SELECT * 
FROM twitter.tweet
LIMIT 200
"""

pd.read_sql_query(q, conn)



Unnamed: 0,tweet_id_str,job_id,created_at,text,from_user,from_user_id_str,from_user_name,from_user_fullname,from_user_created_at,from_user_followers,...,from_user_timezone,to_user,to_user_id_str,to_user_name,source,location_geo,location_geo_0,location_geo_1,iso_language,analysis_state
0,852322262590029824,290,2017-04-13 00:47:17,Did you really go to the concert if you didn't...,2940008511,2940008511,mullet_rain,Andy Dufresne,2014-12-24 21:22:56,220,...,,,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
1,852322262652866560,273,2017-04-13 00:47:17,"RT @adamm0rgan: Five months later, ignoring ev...",196702543,196702543,dzhyde,Dara Hyde,2010-09-29 18:27:58,1665,...,Pacific Time (US & Canada),,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,,en,0
2,852322262674006016,273,2017-04-13 00:47:17,Lust for life is about to scalp us all. https:...,734135355914342400,734135355914342400,virginrenegade,Kia,2016-05-21 21:34:43,81,...,,,,,"<a href=""http://twitter.com/download/android"" ...",,,,en,0
3,852322262929874945,259,2017-04-13 00:47:17,"RT @_macccy11: all boys are stupid, you just g...",261855787,261855787,kalasilver_,k sil,2011-03-06 21:06:53,1257,...,Mountain Time (US & Canada),,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
4,852322262963281920,266,2017-04-13 00:47:17,Yes he does! #maga #maga @realDonaldTrump htt...,3229739982,3229739982,bakegoodsbysuz1,@memaw,2015-05-29 13:37:47,395,...,,,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,852322271578341377,257,2017-04-13 00:47:19,I didn't believe Trump Putin Russian story fro...,809950081516470272,809950081516470272,155thMed,markjrobinson,2016-12-17 02:35:22,1130,...,Pacific Time (US & Canada),,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,,en,0
196,852322271641219072,273,2017-04-13 00:47:19,RT @clockedlin: ë¦¬ì¼ë¯¸í° 4/12 (CBS ê¹íì...,367032454,367032454,lacknights,ë¬¸ì¬ì¸ ëíµë ¹,2011-09-03 07:52:11,1656,...,Seoul,,,,"<a href=""http://twitter.com/download/android"" ...",,,,ko,0
197,852322271641448448,221,2017-04-13 00:47:19,RT @BSmile: The 2016 Chicago #Cubs World Serie...,748407690,748407690,Ben_D_33,Ben Duncan,2012-08-10 02:21:19,836,...,,,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
198,852322271746293761,266,2017-04-13 00:47:19,"RT @Perrieeele17: i don't mind,\\ni think so,\...",846336762574196737,846336762574196737,BVBAndyBlck,Andy Black,2017-03-27 12:23:03,9,...,,,,,"<a href=""https://mobile.twitter.com"" rel=""nofo...",,,,en,0


The `job_id` for this table refers to which city the tweet was collected from. Since we all now have some tie to Columbia, MO, we can query for those tweets that are specifically tied to Columbia. Columbia's `job_id` is 261.

**Exercise 2**: Write and execute a query that pulls all attributes and 200 rows where the job id is 261.

In [9]:
# Code for Exercise 2 goes here
# --------------

q = """
SELECT * 
FROM twitter.tweet
WHERE job_id = 261
LIMIT 200
"""

pd.read_sql_query(q, conn)

Unnamed: 0,tweet_id_str,job_id,created_at,text,from_user,from_user_id_str,from_user_name,from_user_fullname,from_user_created_at,from_user_followers,...,from_user_timezone,to_user,to_user_id_str,to_user_name,source,location_geo,location_geo_0,location_geo_1,iso_language,analysis_state
0,852326730606022656,261,2017-04-13 01:05:02,Cuanzo Martin talked with students and played ...,15533253,15533253,CoMissourian,Columbia Missourian,2008-07-22 16:39:55,34176,...,Central Time (US & Canada),,,,"<a href=""http://www.socialflow.com"" rel=""nofol...",,,,en,0
1,852326730849275904,261,2017-04-13 01:05:02,Star Wars is on my tv and I don't have to work...,16396352,16396352,Kolt27,Koltyn Tranbarger,2008-09-21 23:12:01,211,...,Central Time (US & Canada),,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
2,852326746338807809,261,2017-04-13 01:05:06,Wow what a Fcking baby... Lets throw our bat b...,390960552,390960552,theoryofdeb,Deb,2011-10-14 20:35:09,394,...,America/Chicago,,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,,en,0
3,852326755041923072,261,2017-04-13 01:05:08,RT @MizzouHoops: âï¸ The Return. \\n\\n#MIZ,18978731,18978731,TheWrightMyke,Mykael C. Wright,2009-01-14 15:00:01,2186,...,Arizona,,,,"<a href=""http://twitter.com/download/android"" ...",,,,en,0
4,852326756212191232,261,2017-04-13 01:05:08,RT @MizzouWBB:,557090270,557090270,MizzouAtlanta,MizzouAtlanta,2012-04-18 18:08:18,686,...,Eastern Time (US & Canada),,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,und,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,852335592528650241,261,2017-04-13 01:40:15,@MURentDoctor Somebody mention more shots?,2829046412,2829046412,greglogs,Greg Logsdon,2014-09-24 01:16:30,764,...,,239594879,239594879,MURentDoctor,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
196,852335613395087360,261,2017-04-13 01:40:20,@MarekKapolka thank you so much!!! ^^;;; it's ...,552157759,552157759,thmsbrngr,thombus,2012-04-12 20:45:05,230,...,Pacific Time (US & Canada),2593485342,2593485342,MarekKapolka,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,,en,0
197,852335631854428160,261,2017-04-13 01:40:24,"Bleh, I just can't imagine investing in a new ...",3073952019,3073952019,perpetuallyaud,Audrey,2015-03-06 02:21:49,190,...,Pacific Time (US & Canada),,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0
198,852335685021446144,261,2017-04-13 01:40:37,RT @MizzouHoops: âï¸ The Return. \\n\\n#MIZ,528144782,528144782,TylarkCoder_292,Tyler Coder,2012-03-18 04:30:05,436,...,America/Chicago,,,,"<a href=""http://twitter.com/download/iphone"" r...",,,,en,0


The database also collects the language that the tweet was written in as long as it is identifiable. This is stored in the `iso_language` column as an attribute of the `tweet`. 

**Exercise 3**: Write and execute a query that returns the distinct languages and the count of each language for 2000 rows in Columbia, MO. Keep in mind this will be a nested query and that at the lowest level you have a limit of 2000.

In [31]:
# Code for Exercise 3 goes here
# ------------

q = """
SELECT DISTINCT iso_language, COUNT(*)
FROM(
    SELECT iso_language
    FROM twitter.tweet
    WHERE job_id = 261
    LIMIT 2000) AS t1
GROUP BY iso_language
"""

pd.read_sql_query(q, conn)


Unnamed: 0,iso_language,count
0,de,4
1,en,1874
2,es,3
3,fr,3
4,in,4
5,it,1
6,pt,1
7,ro,1
8,tl,1
9,und,108


In [30]:
# Code 3 if Ordered to see top language(s)
# -----------------------------

q = """
SELECT DISTINCT iso_language, COUNT(*)
FROM(
    SELECT iso_language
    FROM twitter.tweet
    WHERE job_id = 261
    LIMIT 2000) AS t1
GROUP BY iso_language
ORDER BY count DESC
"""

pd.read_sql_query(q, conn)


Unnamed: 0,iso_language,count
0,en,1850
1,und,107
2,de,10
3,da,7
4,es,6
5,pt,4
6,in,3
7,no,3
8,eu,2
9,fr,2


**Exercise 4**: Now find out who tweets the most in Columbia, MO for 2000 rows of data. Arrange the data in descending order by count. User ids are stored in the `from_user` attribute.

In [29]:
# Code for Exercise 4 goes here
# ------------

q = """
SELECT DISTINCT from_user, COUNT(*)
FROM(
    SELECT from_user
    FROM twitter.tweet
    WHERE job_id = 261
    LIMIT 2000) AS t1
GROUP BY from_user
ORDER BY count DESC
"""

pd.read_sql_query(q, conn)

Unnamed: 0,from_user,count
0,786210891490471936,30
1,780487612511334400,19
2,240158989,17
3,33556854,16
4,76483392,15
...,...,...
1305,97349265,1
1306,98530631,1
1307,987426348,1
1308,995623944,1


the column `from_user_followers` stores the number of followers the user who wrote the tweet had when that tweet was created.

**Challenge Question 1**: Write and execute a query that finds the average number of `from_user_followers` for each distinct `iso_language` from only 2000 rows of Columbia, MO tweets.

In [37]:
# Code for Challenge Question 1 goes here
# ------------

q = """
SELECT DISTINCT iso_language, AVG(from_user_followers)
FROM(
    SELECT iso_language, from_user_followers
    FROM twitter.tweet
    WHERE job_id = 261
    LIMIT 2000) AS t1
GROUP BY iso_language
ORDER BY avg DESC
"""

pd.read_sql_query(q, conn)

Unnamed: 0,iso_language,avg
0,no,5417.0
1,en,1788.517422
2,ht,1568.666667
3,cy,1525.0
4,in,1524.5
5,und,1375.407258
6,tl,628.0
7,ja,582.0
8,es,531.5
9,cs,491.0


# Save your notebook, then `File > Close and Halt`