# Hacker news posts - When can one get the best response

This project explores various kinds of Hacker news posts. We will analyze the posts and come up with some meaningful insights. Since the data in the dataset is huge, we will be restricting ourselves to analyzing user submitted posts which begin with `Ask HN` or `Show HN`

`Ask HN` are posts submitted by the users asking some advise or tips from `Hacker News` community. For eg

>`
Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Aby recent changes to CSS that broke mobile?
`

Similarly `Show HN`is used by uses to showcase some project, product, intersting articles, etc Some examples include 

>`
Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm
`

Our general approach would be to 

1. Separate header from rest of the data
2. Create separate `Ask HN`, `Show HN` posts from other posts
3. Calculate total number of comments and average comments per post for both `Ask HN` and `Show HN`
4. Dig deeper into either one of them based on which category gets more comments on averate


## 1. Separate header from rest of the data

In [1]:
from csv import reader

file_open = open("hacker_news.csv")
hn_reader = reader(file_open)
hn = list(hn_reader)

print(hn[:5], sep="\n\n")


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


We have an header in this dataset. We need to remove the header and so that we can easily loop through the dateset for our various analysis. WE don't want to lose the header so we will remove it and assign it to a separate variable alled `headers`

In [2]:
headers = hn[0]
hn = hn[1:]
print(*hn[:5], sep="\n\n")

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']

['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


## 2. Create separate `Ask HN`, `Show HN` posts from other posts

Since we are only concentrating on `Ask HN` or `Show HN` messages we need to isolate them by looking at the title. Python provides us string methods like startswith() which can check if a string starts with a given string. We will use this method to check to if our titles start with `Ask HN` and `Show HN` and add them to a separate list

In [3]:
ask_posts = []
show_posts = []
other_posts = []

for post in hn:
    title = post[1]
    
    if title[:7].lower().startswith("show hn") :
        show_posts.append(post)
    elif title[:7].lower().startswith("ask hn") :
        ask_posts.append(post)
    else:
        other_posts.append(post)


print("Show HN count = {}".format(len(show_posts)))
print(*show_posts[:2], sep="\n\n")
print()

print("Ask HN count = {}".format(len(ask_posts)))
print(*ask_posts[:2], sep="\n\n")
print()

print("Other posts count = {}".format(len(other_posts)))
print(*other_posts[:2], sep="\n\n")
print()

Show HN count = 1162
['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']

['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']

Ask HN count = 1744
['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']

['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']

Other posts count = 17194
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']



## 3. Calculate total number of comments and average comments per post for both `Ask HN` and `Show HN`

Now that we have separate lists containing Ask and Show posts, lets go ahead calculate the total comments and average comments each of these posts attracted. Our bigger goal is to check which one of `Show HN and Ask HN` post attract more comments on average

In [4]:
total_ask_comments = 0

for ask_post in ask_posts:
    total_ask_comments += int(ask_post[4])

avg_ask_comments = total_ask_comments/len(ask_posts)
print("{} 'Ask HN:' posts have recieved total {} comments. Average comments per post is {:.2f}".format(len(ask_posts), total_ask_comments, avg_ask_comments))


total_show_comments = 0

for show_post in show_posts:
    total_show_comments += int(show_post[4])

avg_show_comments = total_show_comments/len(show_posts)
print("{} 'Show HN:' posts have recieved total {} comments. Average comments per post is {:.2f}".format(len(show_posts), total_show_comments, avg_show_comments))

1744 'Ask HN:' posts have recieved total 24483 comments. Average comments per post is 14.04
1162 'Show HN:' posts have recieved total 11988 comments. Average comments per post is 10.32


`Show HN` comments have received an average 10 comments per post while `Ask HN`posts have received nearly 35% more comments per post with average comments being around 14.04. This shows that clearly the Hacker News community is more engaged when users ask them questions then merely point out at intersting projects, products or topics. 

## 4. Dig deeper into `Ask HN`  posts and find out what time of the day attracts most comments

Consdiering that `Ask HN` posts attrack the most comments, we will concentrate on them for our future analysis, We want to find out the time of the day when the ask posts attrack the most number of comments.To get this insight we need to first create a frequency table which lists hour of the day and the number of posts in that hour. We will do this using datetime.strptime() constructor

In [5]:
import datetime as dt

counts_by_hour = {}
comments_by_hour = {}

for ask_post in ask_posts:
    date_str = ask_post[6]
    comment_count = int(ask_post[4])
    date = dt.datetime.strptime(date_str, "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment_count
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment_count
   
    
print(*counts_by_hour.items())
print("\n")
                                
print(*comments_by_hour.items())
print("\n")

('11', 58) ('18', 109) ('05', 46) ('13', 85) ('19', 110) ('03', 54) ('08', 48) ('04', 47) ('21', 109) ('09', 45) ('22', 71) ('20', 80) ('06', 44) ('07', 34) ('23', 68) ('17', 100) ('01', 60) ('02', 58) ('12', 73) ('00', 55) ('10', 59) ('16', 108) ('15', 116) ('14', 107)


('11', 641) ('18', 1439) ('05', 464) ('13', 1253) ('19', 1188) ('03', 421) ('08', 492) ('04', 337) ('21', 1745) ('09', 251) ('22', 479) ('20', 1722) ('06', 397) ('07', 267) ('23', 543) ('17', 1146) ('01', 683) ('02', 1381) ('12', 687) ('00', 447) ('10', 793) ('16', 1814) ('15', 4477) ('14', 1416)




Now that we know the number of posts and total number of comments for any given hour of the day we can go ahead and calculate average number of comments in a given hour. We will

1. First create a list of list containing first element as average comments in the hour and the second element is the hour itself
2. We will sort this list in decending order to get the hour with maximum average to come on top



In [6]:
swap_avg_by_hour = []

for hour in counts_by_hour:
    swap_avg_by_hour.append([comments_by_hour[hour]/counts_by_hour[hour], hour])

print()
print(*swap_avg_by_hour)
print()
print()
#sort in decending order

sorted_swap = sorted(swap_avg_by_hour, reverse= True)
print(*sorted_swap)



[11.051724137931034, '11'] [13.20183486238532, '18'] [10.08695652173913, '05'] [14.741176470588234, '13'] [10.8, '19'] [7.796296296296297, '03'] [10.25, '08'] [7.170212765957447, '04'] [16.009174311926607, '21'] [5.5777777777777775, '09'] [6.746478873239437, '22'] [21.525, '20'] [9.022727272727273, '06'] [7.852941176470588, '07'] [7.985294117647059, '23'] [11.46, '17'] [11.383333333333333, '01'] [23.810344827586206, '02'] [9.41095890410959, '12'] [8.127272727272727, '00'] [13.440677966101696, '10'] [16.796296296296298, '16'] [38.5948275862069, '15'] [13.233644859813085, '14']


[38.5948275862069, '15'] [23.810344827586206, '02'] [21.525, '20'] [16.796296296296298, '16'] [16.009174311926607, '21'] [14.741176470588234, '13'] [13.440677966101696, '10'] [13.233644859813085, '14'] [13.20183486238532, '18'] [11.46, '17'] [11.383333333333333, '01'] [11.051724137931034, '11'] [10.8, '19'] [10.25, '08'] [10.08695652173913, '05'] [9.41095890410959, '12'] [9.022727272727273, '06'] [8.12727272727


Lets print the top five hours in the readable format



In [7]:
print("Top 5 Hours for Ask Posts Comments")

def formatstring(my_list):
    date = dt.datetime.strptime(my_list[1], "%H")
    hour = date.strftime("%H:%M")
    return "{}: {:.2f} average comments per post".format(hour, my_list[0])

for item in sorted_swap[:6]:
    print( formatstring(item))


Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post




As can be seen from the above analysis the best times to post "Ask HN" on Hacker News are 15:00, 02:00 and 20:00 hours. They all get 20+ comments on average and those around 15:00 tend to get 38+ comments which is 60% better than the next best hour which is at 02:00 AM. 

Hours 16:00, 21:00 and 13:00 also tend to bring fairly high number of comments 

We hope our analysis helps you to find the best time to ask Hacker news community anything you have been itching to ask 