# Get Started

After forking this notebook, run the code in the following cell:

In [1]:
# import package with helper functions 
import bq_helper

# create a helper object for this dataset
hacker_news = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                   dataset_name="hacker_news")

# print the first couple rows of the "comments" table
hacker_news.head("comments")

Unnamed: 0,id,by,author,time,time_ts,text,parent,deleted,dead,ranking
0,2701393,5l,5l,1309184881,2011-06-27 14:28:01+00:00,And the glazier who fixed all the broken windo...,2701243,,,0
1,5811403,99,99,1370234048,2013-06-03 04:34:08+00:00,Does canada have the equivalent of H1B/Green c...,5804452,,,0
2,21623,AF,AF,1178992400,2007-05-12 17:53:20+00:00,"Speaking of Rails, there are other options in ...",21611,,,0
3,10159727,EA,EA,1441206574,2015-09-02 15:09:34+00:00,Humans and large livestock (and maybe even pet...,10159396,,,0
4,2988424,Iv,Iv,1315853580,2011-09-12 18:53:00+00:00,I must say I reacted in the same way when I re...,2988179,,,0


In [2]:
hacker_news.list_tables()

['comments', 'full', 'full_201510', 'stories']

# Question
Using the Hacker News dataset in BigQuery, answer the following questions:

#### 1) How many stories (use the "id" column) are there of each type (in the "type" column) in the full table?

In [3]:
hacker_news.head("full")

Unnamed: 0,by,score,time,timestamp,title,type,url,text,parent,deleted,dead,descendants,id,ranking
0,marie2006,1.0,1337576542,2012-05-21 05:02:22+00:00,Get the Welcome Bar | AddThis,story,http://undefined/#.T7nMCcqMsbE.hackernews,,,,True,-1.0,4001364,
1,lpelypenko,,1513571737,2017-12-18 04:35:37+00:00,,comment,,will it work on iphone?,15949291.0,,,,15949461,
2,ahamedirshad123,,1517484916,2018-02-01 11:35:16+00:00,,comment,,"Funny thing is NRIs, who don&#x27;t have to en...",16270801.0,,,,16281095,
3,pavlik_enemy,,1476794696,2016-10-18 12:44:56+00:00,,comment,,"Of course there&#x27;s a reason, that&#x27;s b...",12727673.0,,,,12733830,
4,cwp,,1391215239,2014-02-01 00:40:39+00:00,,comment,,"Yes, it would be great if everything always ju...",7159926.0,,,,7160278,


In [4]:
# Solution
query = """SELECT count(DISTINCT id), type
            FROM `bigquery-public-data.hacker_news.full`
            GROUP BY type
        """
df = hacker_news.query_to_pandas_safe(query, max_gb_scanned=0.5)
df

Unnamed: 0,f0_,type
0,14022531,comment
1,2929318,story
2,11873,pollopt
3,10527,job
4,1735,poll


In [5]:
# Solution
query = """SELECT count(id), type
            FROM `bigquery-public-data.hacker_news.full`
            GROUP BY type
        """
df = hacker_news.query_to_pandas_safe(query, max_gb_scanned=0.5)
df

Unnamed: 0,f0_,type
0,2929318,story
1,14022531,comment
2,10527,job
3,11873,pollopt
4,1735,poll


#### 2) How many comments have been deleted? (If a comment was deleted the "deleted" column in the comments table will have the value "True".)

In [6]:
# Solution
query = """SELECT count(id)
            FROM `bigquery-public-data.hacker_news.full`
            WHERE deleted = True AND type = "comment"
        """
df = hacker_news.query_to_pandas_safe(query, max_gb_scanned=0.5)
df

Unnamed: 0,f0_
0,381965


In [7]:
# Solution
query = """SELECT count(id), type
            FROM `bigquery-public-data.hacker_news.full`
            WHERE deleted = True
            GROUP BY type
        """
df = hacker_news.query_to_pandas_safe(query, max_gb_scanned=0.5)
df

Unnamed: 0,f0_,type
0,381965,comment
1,128106,story
2,534,job
3,290,pollopt
4,177,poll


#### 3) Modify one of the queries you wrote above to use a different aggregate function.
You can read about aggregate functions other than COUNT() **[in these docs](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#aggregate-functions)**

In [8]:
# Example Solution:
# What is the time of the most recently deleted of each type
query = """SELECT MAX(time), type
            FROM `bigquery-public-data.hacker_news.full`
            WHERE deleted = True
            GROUP BY type
        """
df = hacker_news.query_to_pandas_safe(query, max_gb_scanned=0.5)
df

Unnamed: 0,f0_,type
0,1525273147,comment
1,1525270600,story
2,1523402597,job
3,1519059974,pollopt
4,1519578690,poll


---

# Keep Going
[Click here](https://www.kaggle.com/dansbecker/order-by) to move on and learn about the ORDER BY clause.

# Feedback
Bring any questions or feedback to the [Learn Discussion Forum](kaggle.com/learn-forum).

----

*This exercise is part of the [SQL Series](https://www.kaggle.com/learn/sql) on Kaggle Learn.*