# Exploring Hacker News

In this project, we will be exploring one of the popular technology site called Hacker News.
Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

In [1]:
from csv import reader

In [4]:
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

hn_header = hn[:1]
hn = hn[1:]

In [6]:
hn_header

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]

In [7]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [11]:
ask_posts[:2]

[['12296411',
  'Ask HN: How to improve my personal website?',
  '',
  '2',
  '6',
  'ahmedbaracat',
  '8/16/2016 9:55'],
 ['10610020',
  'Ask HN: Am I the only one outraged by Twitter shutting down share counts?',
  '',
  '28',
  '29',
  'tkfx',
  '11/22/2015 13:43']]

In [12]:
len(ask_posts)

1744

In [13]:
len(show_posts)

1162

In [14]:
len(other_posts)

17194

In [15]:
hn_header

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]

In [17]:
total_ask_comments = 0
for row in ask_posts:
    num = row[4]
    num = int(num)
    total_ask_comments += num

In [20]:
total_ask_comments

24483

In [19]:
avg_ask_comments = total_ask_comments / len(ask_posts)

In [21]:
avg_ask_comments

14.038417431192661

In [22]:
total_show_comments = 0
for row in show_posts:
    num = row[4]
    num = int(num)
    total_show_comments += num
total_show_comments

11988

In [23]:
avg_show_comments = total_show_comments / len(show_posts)
avg_show_comments

10.31669535283993

Posts which start with 'Ask HN' received more comments with average 14.03 per post while posts which start with 'Show HN' received average 10.32 comments per post. This result is quite obvious because people would try to answer questions, hence the high comment count.

In [34]:
import datetime as dt

In [28]:
result_list = []
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

In [44]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = row[0]
    num_comment = int(row[1])
    date = dt.datetime.strptime(date, '%m/%d/%Y %H:%M')
    hour = date.strftime('%H')
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = num_comment        
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += num_comment

In [50]:
comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

In [51]:
avg_by_hour = []
for key in comments_by_hour:
    avg_by_hour.append([key, comments_by_hour[key]/counts_by_hour[key]])

In [52]:
avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

In [53]:
swap_avg_by_hour = []
for i in avg_by_hour:
    swap_avg_by_hour.append([i[1], i[0]])
swap_avg_by_hour

[[5.5777777777777775, '09'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [16.796296296296298, '16'],
 [7.985294117647059, '23'],
 [9.41095890410959, '12'],
 [11.46, '17'],
 [38.5948275862069, '15'],
 [16.009174311926607, '21'],
 [21.525, '20'],
 [23.810344827586206, '02'],
 [13.20183486238532, '18'],
 [7.796296296296297, '03'],
 [10.08695652173913, '05'],
 [10.8, '19'],
 [11.383333333333333, '01'],
 [6.746478873239437, '22'],
 [10.25, '08'],
 [7.170212765957447, '04'],
 [8.127272727272727, '00'],
 [9.022727272727273, '06'],
 [7.852941176470588, '07'],
 [11.051724137931034, '11']]

In [55]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

In [66]:
for i in sorted_swap[:5]:
    avg = i[0]
    time = i[1]
    time = dt.datetime.strptime(time, '%H').strftime('%H:00')
    print('{}: {:.2f} average comments per post'.format(time, avg))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


Now we know that, to get more comments, we should post Ask HN at around 15:00