# Looking back 2019 through Hacker News and BigQuery
date: 2020-01-01T07:12:54-08:00
<!--eofm-->

This is a re-run of [Looking back 2017, through Hacker News](https://blog.8-p.info/en/2018/01/01/hacker-news-2017/), which was inspired by [Looking back at 9 years of Hacker News](http://debarghyadas.com/writes/looking-back-at-9-years-of-hacker-news/).

## Use %%bigquery magic

Previously I was using `%bq` magic, but it seems deprecated, according to [Migrating from the datalab Python package](https://cloud.google.com/bigquery/docs/datalab-migration).

For the BigQuery client library, I need to [setup authentication by setting GOOGLE_APPLICATION_CREDENTIALS environment variable](https://cloud.google.com/bigquery/docs/reference/libraries#setting_up_authentication). Without that, you would get `Project was not passed and could not be determined from the environment`.

In [1]:
# Import pandas for just setting max_colwidth. Without that, pandas turncates long strings, such as URLs
import pandas as pd
pd.set_option('display.max_colwidth', -1)

In [2]:
# To use "%%bigquery"
%load_ext google.cloud.bigquery

## Most Popular (Upvoted) Stories

In [3]:
%%bigquery
SELECT title, url, score, id, timestamp FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story' AND extract(year FROM timestamp) = 2019
ORDER BY score DESC LIMIT 30

Unnamed: 0,title,url,score,id,timestamp
0,Switch from Chrome to Firefox,https://www.mozilla.org/en-US/firefox/switch/,3287,20052623,2019-05-30 16:09:19+00:00
1,I Sell Onions on the Internet,https://www.deepsouthventures.com/i-sell-onions-on-the-internet/,3015,19728132,2019-04-23 13:00:24+00:00
2,Announcing unlimited free private repos,https://blog.github.com/2019-01-07-new-year-new-github/,2867,18847043,2019-01-07 17:03:59+00:00
3,Slack’s new WYSIWYG input box is terrible,https://quuxplusone.github.io/blog/2019/11/20/slack-rich-text-box/,2776,21589647,2019-11-20 23:13:09+00:00
4,Show HN: A retro video game console I've been working on in my free time,https://internalregister.github.io/2019/03/14/Homebrew-Console.html,2690,19393279,2019-03-14 20:25:03+00:00
5,My Business Card Runs Linux,https://www.thirtythreeforty.net/posts/2019/12/my-business-card-runs-linux/,2584,21871026,2019-12-24 10:15:42+00:00
6,Blizzard Suspends Professional Hearthstone Player for Hong Kong Comments,https://playhearthstone.com/en-us/blog/23179289/,2525,21190265,2019-10-08 09:23:08+00:00
7,Raspberry Pi 4,https://www.raspberrypi.org/blog/raspberry-pi-4-on-sale-now-from-35,2504,20260863,2019-06-24 06:00:28+00:00
8,Twitter to ban political advertising,https://twitter.com/jack/status/1189634360472829952,2447,21401973,2019-10-30 20:07:19+00:00
9,"No Thank You, Mr. Pecker",https://medium.com/@jeffreypbezos/no-thank-you-mr-pecker-146e3922310f,2444,19109474,2019-02-07 22:52:16+00:00


## Commonly Shared/Upvoted Domains

Note that `None` means stories without associated URLs, such as "Ask HN".

In [4]:
%%bigquery
SELECT
  domains_year.domain,
  COUNT(1) AS count,
  SUM(score) AS score
FROM (
  SELECT
    REGEXP_EXTRACT(url,r'^https?://(?:www.)?([^/]*)/?(?:.*)') AS domain,
    score
  FROM
    `bigquery-public-data.hacker_news.full`
  WHERE extract(year FROM timestamp) = 2019) domains_year
GROUP BY
  domains_year.domain
ORDER BY
  count DESC
LIMIT
  30

Unnamed: 0,domain,count,score
0,,2786600,229058
1,medium.com,17424,103059
2,github.com,13539,262248
3,youtube.com,8008,47554
4,nytimes.com,6643,174981
5,en.wikipedia.org,4566,52026
6,theguardian.com,3926,59598
7,bloomberg.com,3863,103875
8,twitter.com,3775,102478
9,arstechnica.com,3231,44255
