# Exploring Hacker News Posts
In this project, I will explore Hacker News Posts and try to answer some questions such as:
* Do posts that ask questions receive more comments on average than posts that show the Hacker News community a project, product, or just general information?
* Do posts created at a certain time receive more comments on average?

![here](https://s3.amazonaws.com/dq-content/354/hacker_news.jpg)

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result. The data set used in this project can be found in https://www.kaggle.com/hacker-news/hacker-news-posts.

In [7]:
import csv as reader
opened_file = open('hacker_news.csv')
hn = list(csv.reader(opened_file))
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [8]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [13]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
print(ask_posts[:3])

1744
1162
17194
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']]


In [17]:
total_ask_comments = 0

for row in ask_posts:
    total_ask_comments += int(row[4])
avg_ask_comments = total_ask_comments/len(ask_posts)
print('Average comments in ask posts: ', avg_ask_comments)

Average comments in ask posts:  14.038417431192661


In [18]:
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4])
avg_show_comments = total_show_comments/len(show_posts)
print('Average comments in show posts: ', avg_show_comments)

Average comments in show posts:  10.31669535283993


The average number of comments on ask posts is about 14 comments per post, which is higher than the of show posts, which is about 10 comments per post. There may be more interaction of users in ask posts as there are usually many users answering a particular post and further questions usually arises. For a show post, most of the comments may be suggestions or acknowledgements.

Since ask posts are more likely to receive comments, I'll focus my remaining analysis just on these posts. I'll determine if ask posts created at a certain *time* are more likely to attract comments.

In [20]:
import datetime as dt
result_list = []

for row in ask_posts:
    result_list.append([row[6],int(row[4])])
    
print(result_list[:3])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1]]


In [34]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    hour = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M').strftime('%H')
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
        
print(counts_by_hour)
print(comments_by_hour)

{'23': 68, '00': 55, '22': 71, '09': 45, '21': 109, '17': 100, '07': 34, '19': 110, '02': 58, '06': 44, '11': 58, '01': 60, '08': 48, '13': 85, '10': 59, '20': 80, '04': 47, '16': 108, '18': 109, '15': 116, '03': 54, '14': 107, '05': 46, '12': 73}
{'23': 543, '00': 447, '22': 479, '09': 251, '21': 1745, '17': 1146, '07': 267, '19': 1188, '02': 1381, '06': 397, '11': 641, '01': 683, '08': 492, '13': 1253, '10': 793, '20': 1722, '04': 337, '16': 1814, '18': 1439, '15': 4477, '03': 421, '14': 1416, '05': 464, '12': 687}


In [36]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour,comments_by_hour[hour]/counts_by_hour[hour]])
    
print(avg_by_hour)

[['23', 7.985294117647059], ['00', 8.127272727272727], ['22', 6.746478873239437], ['09', 5.5777777777777775], ['21', 16.009174311926607], ['17', 11.46], ['07', 7.852941176470588], ['19', 10.8], ['02', 23.810344827586206], ['06', 9.022727272727273], ['11', 11.051724137931034], ['01', 11.383333333333333], ['08', 10.25], ['13', 14.741176470588234], ['10', 13.440677966101696], ['20', 21.525], ['04', 7.170212765957447], ['16', 16.796296296296298], ['18', 13.20183486238532], ['15', 38.5948275862069], ['03', 7.796296296296297], ['14', 13.233644859813085], ['05', 10.08695652173913], ['12', 9.41095890410959]]


In [40]:
import pprint
pprint.pprint(sorted(avg_by_hour))

[['00', 8.127272727272727],
 ['01', 11.383333333333333],
 ['02', 23.810344827586206],
 ['03', 7.796296296296297],
 ['04', 7.170212765957447],
 ['05', 10.08695652173913],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['08', 10.25],
 ['09', 5.5777777777777775],
 ['10', 13.440677966101696],
 ['11', 11.051724137931034],
 ['12', 9.41095890410959],
 ['13', 14.741176470588234],
 ['14', 13.233644859813085],
 ['15', 38.5948275862069],
 ['16', 16.796296296296298],
 ['17', 11.46],
 ['18', 13.20183486238532],
 ['19', 10.8],
 ['20', 21.525],
 ['21', 16.009174311926607],
 ['22', 6.746478873239437],
 ['23', 7.985294117647059]]


In [44]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])

sorted_swap = sorted(swap_avg_by_hour,reverse=True)
pprint.pprint(sorted_swap)

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]


In [49]:
print('Top 5 Hours for Ask Posts Comments')

for row in sorted_swap[0:5]:
    print('{time}: {avg:.2f} average comments per post'.format(time=dt.datetime.strptime(row[1],'%H').strftime('%I %p'),avg=row[0]))

Top 5 Hours for Ask Posts Comments
03 PM: 38.59 average comments per post
02 AM: 23.81 average comments per post
08 PM: 21.52 average comments per post
04 PM: 16.80 average comments per post
09 PM: 16.01 average comments per post


## Conclusion
The hour that received the most average comments in at around 3 PM, with about 38.59 comments on average per post. To maximize the number of comments per post, I recommend posting an ask post around 3 PM.