### Exploring Hacker News Posts

**Dataset description**  
*id*: The unique identifier from Hacker News for the post  
*title*: The title of the post  
*url*: The URL that the posts links to, if it the post has a URL  
*num_points*: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes  
*num_comments*: The number of comments that were made on the post  
*author*: The username of the person who submitted the post  
*created_at*: The date and time at which the post was submitted

We're specifically interested in posts whose titles begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting.

In [2]:
from csv import reader
opened_file = open('hacker_news.csv', encoding="utf-8")
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [4]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


In [6]:
total_ask_comments = 0
for rows in ask_posts:
    total_ask_comments += int(rows[4])
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

total_show_comments = 0
for rows in show_posts:
    total_show_comments += int(rows[4])
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

14.038417431192661
10.31669535283993


"Ask HN" posts get 14 comments on average than "Show HN" posts that receive 10 posts on average. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

In [7]:
import datetime as dt

In [25]:
result_list = [] #this will be a list of lists
for rows in ask_posts:
    created = rows[6]
    comments = int(rows[4])
    result_list.append([created, comments])
#print(result_list[:5])    
   
    
counts_by_hour = {}
comments_by_hour = {}


for row in result_list:
    time = row[0]
    comment = row[1]
    date_1_dt = dt.datetime.strptime(time, '%m/%d/%Y %H:%M')
    hour = date_1_dt.strftime('%H')
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment
        
print(counts_by_hour)
print('\n')
print(comments_by_hour)

{'07': 34, '11': 58, '17': 100, '05': 46, '15': 116, '08': 48, '01': 60, '19': 110, '18': 109, '00': 55, '03': 54, '06': 44, '23': 68, '09': 45, '04': 47, '13': 85, '22': 71, '02': 58, '12': 73, '10': 59, '14': 107, '20': 80, '21': 109, '16': 108}


{'07': 267, '11': 641, '17': 1146, '05': 464, '15': 4477, '08': 492, '01': 683, '19': 1188, '18': 1439, '00': 447, '03': 421, '06': 397, '23': 543, '09': 251, '04': 337, '13': 1253, '22': 479, '02': 1381, '12': 687, '10': 793, '14': 1416, '20': 1722, '21': 1745, '16': 1814}


In [28]:
avg_by_hour = []
for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
    
avg_by_hour

[['07', 7.852941176470588],
 ['11', 11.051724137931034],
 ['17', 11.46],
 ['05', 10.08695652173913],
 ['15', 38.5948275862069],
 ['08', 10.25],
 ['01', 11.383333333333333],
 ['19', 10.8],
 ['18', 13.20183486238532],
 ['00', 8.127272727272727],
 ['03', 7.796296296296297],
 ['06', 9.022727272727273],
 ['23', 7.985294117647059],
 ['09', 5.5777777777777775],
 ['04', 7.170212765957447],
 ['13', 14.741176470588234],
 ['22', 6.746478873239437],
 ['02', 23.810344827586206],
 ['12', 9.41095890410959],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['20', 21.525],
 ['21', 16.009174311926607],
 ['16', 16.796296296296298]]

In [32]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
swap_avg_by_hour

sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

In [34]:
print("Top 5 Hours for Ask Posts Comments")

Top 5 Hours for Ask Posts Comments


In [38]:
for row in sorted_swap[:5]:
    date_format = '%H'
    dt1 = dt.datetime.strptime(row[1], date_format)
    time_format = dt1.strftime('%H:%M')
    output = str(time_format) + ': {:.2f} average comments per post'.format(row[0])
    print(output)

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The 5 best hours to post on Hacker News to get comments is 3 pm, 2 am, 8 pm, 4 pm, 9 pm. Compared to the bottom 4 hours, 3 pm receives a drastically higher number of comments, probably because the regular working day is winding down and people have more window to get on non-work related websites. [Documentation](https://www.kaggle.com/hacker-news/hacker-news-posts/home) uses US Eastern time.