# Analysis of popularity for Ask HN vs Show HN posts on Hackers News

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

The data set used here was found on Kaggle and is a scrape that has been reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions.

We're specifically interested in posts whose titles begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting. 

We'll compare these two types of posts to determine the following:
- Do Ask HN or Show HN receive more comments on average?
- Do posts created at a certain time receive more comments on average?

In [1]:
import statistics
from IPython.display import display, HTML

In [2]:
opened_file_HNposts = open('HN_posts_year_to_Sep_26_2016.csv')
from csv import reader
read_file_HNposts = reader(opened_file_HNposts)
hn = list(read_file_HNposts)
# android_header = android_list[0]
# android = android_list[1:]

header = hn[0:1]
hn = hn[1:]
print(header)
print("\n")
print(hn[0:5])

#print(android_list[0:2])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]


In [8]:
#Assign title based on post type

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

#Check lengths of lists
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

9139
10158
273822


In [4]:
#Find the total number of comments in asks posts and assign it to total_ask_comments
total_ask_comments = 0
num_ask_comments_per_post = []
for row in ask_posts:
    num_comments = row[4]
    total_ask_comments += int(num_comments)
    num_ask_comments_per_post.append(int(num_comments))
print('total_ask_comments: '+ str(total_ask_comments))

#Compute the average number of comments on ask posts and assign it to avg_ask_comments.
avg_ask_comments = "{:,.0f}".format(statistics.mean(num_ask_comments_per_post))
print('avg_ask_comments: '+str(avg_ask_comments))

#Find the total number of comments in show posts and assign it to total_show_comments.
total_show_comments = 0
num_show_comments_per_post = []
for row in show_posts:
    num_comments = row[4]
    total_show_comments += int(num_comments)
    num_show_comments_per_post.append(int(num_comments))
print('total_show_comments: '+ str(total_show_comments))

#Compute the average number of comments on show posts and assign it to avg_show_comments.
avg_show_comments = "{:,.0f}".format(statistics.mean(num_show_comments_per_post))
print('avg_show_comments: '+str(avg_show_comments))






total_ask_comments: 94986
avg_ask_comments: 10
total_show_comments: 49633
avg_show_comments: 5


## Findings:

Reviewing these numbers we see that ask posts receive 2:1 more comments on average

## Deeper dive: The amount of Asks Posts vs comments received by Hour Created

Next we examine how time of Ask Posts results in varying volumes of comments

In [12]:
#See if ask posts created at a certain time are more likely to attract comments

#Import the datetime module as dt
import datetime as dt

#Create an empty list and assign it to result_list
result_list = []

#Iterate over ask posts and append to result_list list with two elements: created_at and number of comments of the post
for row in ask_posts:
    created_at = row[6]
    post_comments = int(row[4])
    result_list.append([created_at, post_comments])
    
#display(result_list)

#Create two empty dictionaries called counts_by_hour and comments_by_hour
counts_by_hour = {}
comments_by_hour = {}

#Loop through each row of result_list. Extract hour from the date. Use datetime.strptime() method to parse the date and create a datetime object
for row in result_list:
    #print(row)
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour in counts_by_hour:
        comments_by_hour[hour] += row[1]
        counts_by_hour[hour] += 1
    else:
        comments_by_hour[hour] = row[1]
        counts_by_hour[hour] = 1
#display(result_list)
#display(comments_by_hour)

#find the average number of comments per posts for posts created during each hour of the day
avg_by_hour = []
for row in comments_by_hour:
    avg_by_hour.append([row, comments_by_hour[row]/counts_by_hour[hour]])

print('Average number of comments asks posts received per hour of day posted:')
print('#We will clean this up in the next step')
display(avg_by_hour)


Average number of comments asks posts received per hour of day posted:
#We will clean this up in the next step


[['02', 12.803418803418804],
 ['01', 8.927350427350428],
 ['22', 14.41025641025641],
 ['21', 19.23076923076923],
 ['19', 16.897435897435898],
 ['17', 23.705128205128204],
 ['15', 79.16666666666667],
 ['14', 21.247863247863247],
 ['13', 30.96153846153846],
 ['11', 11.952991452991453],
 ['10', 12.876068376068377],
 ['09', 6.311965811965812],
 ['07', 6.773504273504273],
 ['03', 9.205128205128204],
 ['23', 9.816239316239317],
 ['20', 19.068376068376068],
 ['16', 19.085470085470085],
 ['08', 10.094017094017094],
 ['00', 9.73076923076923],
 ['18', 20.84188034188034],
 ['12', 18.094017094017094],
 ['04', 10.085470085470085],
 ['06', 6.782051282051282],
 ['05', 7.854700854700854]]

In [16]:
#Create a list that equals avg_by_hour with swapped columns
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
#print(swap_avg_by_hour)

#Use the sorted() function to sort swap_avg_by_hour in descending order
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

display(sorted_swap)

[[79.16666666666667, '15'],
 [30.96153846153846, '13'],
 [23.705128205128204, '17'],
 [21.247863247863247, '14'],
 [20.84188034188034, '18'],
 [19.23076923076923, '21'],
 [19.085470085470085, '16'],
 [19.068376068376068, '20'],
 [18.094017094017094, '12'],
 [16.897435897435898, '19'],
 [14.41025641025641, '22'],
 [12.876068376068377, '10'],
 [12.803418803418804, '02'],
 [11.952991452991453, '11'],
 [10.094017094017094, '08'],
 [10.085470085470085, '04'],
 [9.816239316239317, '23'],
 [9.73076923076923, '00'],
 [9.205128205128204, '03'],
 [8.927350427350428, '01'],
 [7.854700854700854, '05'],
 [6.782051282051282, '06'],
 [6.773504273504273, '07'],
 [6.311965811965812, '09']]

In [18]:
print("The top 5 hours for Ask Posts comments")
for avg, hr in sorted_swap[:5]:
    print(
            "{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"), avg
            )
    )
    

The top 5 hours for Ask Posts comments
15:00: 79.17 average comments per post
13:00: 30.96 average comments per post
17:00: 23.71 average comments per post
14:00: 21.25 average comments per post
18:00: 20.84 average comments per post


The hour that receives the most comments per post is 3:00 pm followed by 1:00 pm. It is worthy to note that the number 1 time has nearly 60% more posts for the hour than the second nearest hour.

## Conclusion

In this project we examined posts that had received comments on the Hacker News website. We looked for popularity patterns, first in Ask vs Show posts, then for the most popular time of day to make the post. Based on our analysis we determined that Ask Posts receive more comments than Show Posts. Further, that 3:00 pm is the best hour of the day to maximize the amount of comments a post receives. One should follow both of those directions to maximize the number of comments his/her post will receive.