# Hacker News Posts - Ask HN or Show HN?

There are two specific types of posts on Hacker News - Ask HN and Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question, and Show HN posts to show the Hacker News community a project, product, or just something interesting.

The goal of the project is to analyze what kind of posts on Hacker News website are commented the most, Ask or Show HN.

We will examine a data set containing Hacker News posts from September 2015 to September 2016.

Source of dataset:
https://www.kaggle.com/datasets/hacker-news/hacker-news-posts

The dataset includes the following columns:
- id: the unique identifier from Hacker News for the post
- title: title of the post (self explanatory)
- url: the url of the item being linked to
- num_points: the number of upvotes the post received
- num_comments: the number of comments the post received
- author: the name of the account that made the post
- created_at: the date and time the post was made (the time zone is Eastern Time in the US)

In [1]:
# Open dataset and store it as a list of lists
opened_hn = open('HN_posts_year_to_Sep_26_2016.csv')

from csv import reader # import function reader from csv module
read_hn = reader(opened_hn)

hn = list(read_hn)

## Snapshot of HN data set

In [2]:
# Take a look at the dataset
print("The dataset has", len(hn[1:]), "entries", "and a header",'\n')
for row in hn[:5]:
    print(row, '\n')

The dataset has 293119 entries and a header 

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] 

['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'] 

['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'] 

['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'] 

['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'] 



In [21]:
count_comments = 0
for row in hn[1:]:
    num_comments = int(row[4])
    if num_comments != 0:
        count_comments += 1
print(len(hn) - count_comments,
      "posts (",
      round(1 - count_comments/len(hn), 2)*100,
      "% of all HN posts)",
      "received zero comments", '\n')

212719 posts ( 73.0 % of all HN posts) received zero comments 



## Data Cleaning

To better understand which posts get the most attention, we will reduce the dataset by removing all submissions that didn't receive any comments

In [22]:
hn_short = []
for row in hn[1:]:
    num_comments = int(row[4])
    if num_comments != 0:
        hn_short.append(row)

#Take a look at a reduced dataset
print("Reduced dataset has", len(hn_short), "entries", '\n') 

for row in hn_short[:5]:
    print(row, '\n')

Reduced dataset has 80401 entries 

['12578975', 'Saving the Hassle of Shopping', 'https://blog.menswr.com/2016/09/07/whats-new-with-your-style-feed/', '1', '1', 'bdoux', '9/26/2016 3:13'] 

['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'] 

['12578822', 'Amazons Algorithms Dont Find You the Best Deals', 'https://www.technologyreview.com/s/602442/amazons-algorithms-dont-find-you-the-best-deals/', '1', '1', 'yarapavan', '9/26/2016 2:26'] 

['12578694', 'Emergency dose of epinephrine that does not cost an arm and a leg', 'http://m.imgur.com/gallery/th6Ua', '2', '1', 'dredmorbius', '9/26/2016 1:54'] 

['12578624', 'Phone Makers Could Cut Off Drivers. So Why Dont They?', 'http://www.nytimes.com/2016/09/25/technology/phone-makers-could-cut-off-drivers-so-why-dont-they.html', '4', '1', 'danso', '9/26/2016 1:37'] 



We'll compare these two types of posts to determine the following:
- Do Ask HN or Show HN receive more comments on average?
- Do Ask HN or Show HN receive more upvotes on average?
- Do posts created at a certain time receive more comments on average?

In [4]:
# Separate Ask HN, Show HN, and all other posts
ask_posts = []
show_posts = []
other_posts = []

for row in hn_short:
    title = row[1].lower() # extract the title of the post and make it lowercase for easy comparison
    if title.startswith('ask hn'): # if title starts with 'ask hn' the row goes to new ask posts dataset
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

#Take a look at Ask HN, Show HN and other posts datasets
print("Ask HN posts dataset has", len(ask_posts), "entries", '\n') 
print(round(len(ask_posts)/len(hn_short), 2)*100, "% of commented HN posts are Ask HN posts", '\n')

print("Show HN posts dataset has", len(show_posts), "entries", '\n') 
print(round(len(show_posts)/len(hn_short), 2)*100, "% of commented HN posts are Show HN posts", '\n')

print("Other posts dataset has", len(other_posts), "entries", '\n') 
print(round(len(other_posts)/len(hn_short), 2)*100, "% of commented HN posts are Other posts", '\n')

Ask HN posts dataset has 6911 entries 

9.0 % of commented HN posts are Ask HN posts 

Show HN posts dataset has 5059 entries 

6.0 % of commented HN posts are Show HN posts 

Other posts dataset has 68431 entries 

85.0 % of commented HN posts are Other posts 



## Comparison of Ask HN and Show HN posts

Now we can compare which posts, Ask HN or Show HN, receive more comments and upvotes on average

In [24]:
# Count average Ask HN comments and upvotes
total_ask_comments = 0
total_ask_upvotes = 0
for row in ask_posts:
    num_comments = int(row[4])
    num_upvotes = int(row[3])
    total_ask_comments += num_comments
    total_ask_upvotes += num_upvotes

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_ask_upvotes = total_ask_upvotes / len(ask_posts)
print('Ask HN posts on average receive',
      round(avg_ask_comments, 1),
      'comments and',
      round(avg_ask_upvotes, 1),
      'upvotes')

# Count average Show HN comments and upvotes
total_show_comments = 0
total_show_upvotes = 0
for row in show_posts:
    num_comments = int(row[4])
    num_upvotes = int(row[3])
    total_show_comments += num_comments
    total_show_upvotes += num_upvotes

avg_show_comments = total_show_comments / len(show_posts)
avg_show_upvotes = total_show_upvotes / len(show_posts)
print('Show HN posts on average receive',
      round(avg_show_comments, 1),
      'comments and',
      round(avg_show_upvotes, 1),
      'upvotes')

Ask HN posts on average receive 13.7 comments and 14.4 upvotes
Show HN posts on average receive 9.8 comments and 26.6 upvotes


On average, HN users tend to comment more on posts that ask questions, but upvote more posts that show something. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

## What time is best to post Ask HN?

Next, we'll determine if ask posts created at a certain time are more likely to attract comments

In [6]:
import datetime as dt # import datetime module to work with dates and times of posts

In [7]:
result_list = []
for row in ask_posts:
    created_at = dt.datetime.strptime(row[6], '%m/%d/%Y %H:%M') # store the creation time 
                                                                # of the post as a datetime object
    num_comments = int(row[4]) # store the number of comments for the post
    result_list.append([created_at, num_comments])
print(result_list[:5])

[[datetime.datetime(2016, 9, 26, 2, 53), 7], [datetime.datetime(2016, 9, 26, 1, 17), 3], [datetime.datetime(2016, 9, 25, 22, 48), 3], [datetime.datetime(2016, 9, 25, 21, 50), 2], [datetime.datetime(2016, 9, 25, 19, 30), 1]]


In [8]:
# Create two auxillary dictionaries to store how many posts were created at each hour of the day,
# and how many comments were given to posts created at each hour
counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
    hour = row[0].strftime('%H') # extract only the hour the post was created
    comments = row[1]
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
print(counts_by_hour, comments_by_hour)

{'02': 227, '01': 223, '22': 287, '21': 407, '19': 420, '17': 404, '15': 467, '14': 378, '13': 326, '11': 251, '10': 219, '09': 176, '07': 157, '03': 212, '16': 415, '08': 190, '00': 231, '23': 276, '20': 392, '18': 452, '12': 274, '04': 186, '06': 176, '05': 165} {'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '16': 4466, '08': 2362, '00': 2277, '23': 2297, '20': 4462, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


In [9]:
# Create a list with average number of comments per post for each hour of the day
avg_comments_by_hour = []
for hour in counts_by_hour:
    avg_comment = round(comments_by_hour[hour] / counts_by_hour[hour], 1) # calculate average comments 
                                                                        # per post and round it to 1 decimal point
    avg_comments_by_hour.append([hour, avg_comment])
print(avg_comments_by_hour)

[['02', 13.2], ['01', 9.4], ['22', 11.7], ['21', 11.1], ['19', 9.4], ['17', 13.7], ['15', 39.7], ['14', 13.2], ['13', 22.2], ['11', 11.1], ['10', 13.8], ['09', 8.4], ['07', 10.1], ['03', 10.2], ['16', 10.8], ['08', 12.4], ['00', 9.9], ['23', 8.3], ['20', 11.4], ['18', 10.8], ['12', 15.5], ['04', 12.7], ['06', 9.0], ['05', 11.1]]


In [10]:
# Sorting the list by highest average number of comments per post
avg_comments = []
for row in avg_comments_by_hour:
    avg_comments.append([row[1], row[0]])
sorted_avg_comments = sorted(avg_comments, reverse = True)
print(sorted_avg_comments)

[[39.7, '15'], [22.2, '13'], [15.5, '12'], [13.8, '10'], [13.7, '17'], [13.2, '14'], [13.2, '02'], [12.7, '04'], [12.4, '08'], [11.7, '22'], [11.4, '20'], [11.1, '21'], [11.1, '11'], [11.1, '05'], [10.8, '18'], [10.8, '16'], [10.2, '03'], [10.1, '07'], [9.9, '00'], [9.4, '19'], [9.4, '01'], [9.0, '06'], [8.4, '09'], [8.3, '23']]


In [11]:
print('Top 5 Hours for Ask Posts Comments:')
for row in sorted_avg_comments[:5]:
    hour = dt.datetime.strptime(row[1], '%H')
    hour = hour.strftime('%H:%M')
    print(hour, ' : ', row[0], 'average comments per post')

Top 5 Hours for Ask Posts Comments:
15:00  :  39.7 average comments per post
13:00  :  22.2 average comments per post
12:00  :  15.5 average comments per post
10:00  :  13.8 average comments per post
17:00  :  13.7 average comments per post


## Conclusion

The goal of the project was to analyze what kind of posts on Hacker News website are commented the most, posts that ask the community (Ask HN) or posts that show something (Show HN).

We found out that Ask HN posts get more comments (and less upvotes) on average than Show HN posts, implying HN users tend to say something when they are directly asked to. And the best time to ask them will be around 15:00 Eastern Time US.

Finally comments/upvotes dynamic for Show HN posts suggests users prefer prefer simpler activities that do not require involvement to more complex ones. The fact that 73% of all posts on HN were never commented at all further supports this suggestion.