# Optimising Engagement on Hacker News: Peak Interaction Times for Posts

## Introduction

[Hacker News](https://news.ycombinator.com/) is a site, similar to Reddit, where users submit posts and receive votes and comments. It is extremely popular in technology and startup circles, and popular posts can attract hundreds and thousands of visitors. 

This project will be an exploration of submissions made on this website. The submissions that we're specifically interested in are the ones that begin with the titles `'Show HN'` and `'Ask HN'`. `'Show HN'` posts are submitted to show the Hacker News community a project, product, or simply something interesting. While `'Ask HN'` submissions ask the community a specific question.

This project will compare these two types of posts to determine:

* Do `'Ask HN'` or `'Show HN'` receive more comments on average?
* Do posts created at a specific time receive more comments on average?
* Do `'Ask HN'` or `'Show HN'` receive more points on average?
* Do posts created at a specific time receive more points on average?

The dataset used for this project is a reduced version of this [Kaggle submission](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts). It's worth noting that the dataset was reduced from ~300,000 to ~20,000 rows by removing all entries that didn't receive comments and randomly sampling from those remaining.

## Reading Data, Creating Lists of Lists, and Removing Header

Opening, reading, and creating a list of lists of the 'hacker_news.csv' file:

In [43]:
# Importing necessary module
from csv import reader

# Open and read the file
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

# Seperating header from data
header = hn[0]
hn = hn[1:]

## Extracting Ask HN and Show HN Posts

Now to seperate posts beginning with `'Ask HN'` and `'Show HN'` from our dataset:

In [44]:
# Categorising posts
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()  # Controlling for case
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

# Printing the counts
print("Number of 'Ask HN' posts:", len(ask_posts))
print("Number of 'Show HN' posts:", len(show_posts))
print("Number of other posts:", len(other_posts))

Number of 'Ask HN' posts: 1744
Number of 'Show HN' posts: 1162
Number of other posts: 17194


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

Now to determine if `'Ask HN'` or `'Show HN'` receive more comments on average:

In [45]:
# Finding total number of comments in ask posts
total_ask_comments = 0

for row in ask_posts:
    total_ask_comments += int(row[4])

# Computing average number of comments in ask posts
avg_ask_comments = total_ask_comments / len(ask_posts)
print("Average number of 'Ask HN' comments:", avg_ask_comments)

Average number of 'Ask HN' comments: 14.038417431192661


In [46]:
# Finding total number of comments in show posts
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4])

# Computing average number of comments in show posts
avg_show_comments = total_show_comments / len(show_posts)
print("Average number of 'Show HN' comments:", avg_show_comments)

Average number of 'Show HN' comments: 10.31669535283993


From the data, we can see that `'Ask HN'` submissions recieve 14 comments on average, whereas, `'Show HN'` posts only recieve 10. 

Since `'Ask HN'` posts recieve more comments on average, we will focus the rest of this analysis on these submissions.

## Finding the Amount of Ask Posts and Comments by Hour Created


As stated in the introduction, our next goal is to determine if `'Ask HN'` posts created at a certain *time* are more likely to attract comments. 

The first step for this analysis will be to calculate the number of ask posts created in each hour of the day, along with the number of comments recieved.

In [47]:
import datetime as dt

result_list = []

# Creating a list of lists with two elements: 'created_at' and 'num_comments'
for post in ask_posts:
    result_list.append([post[6], int(post[4])])

# Initialising dictionaries to store the counts of posts and comments per hour
posts_by_hour = {}
comments_by_hour = {}

date_format = "%m/%d/%Y %H:%M"  # Defining the format for date parsing

# Iterating through the result_list to populate posts_by_hour and comments_by_hour dictionaries
for row in result_list:
    date = row[0]
    comments = row[1]  # Number of comments
    time = dt.datetime.strptime(date, date_format)  # Parsing the dates stored as strings
    hour = time.strftime("%H")  # Extracting the hour from the date object as a string
    if hour not in posts_by_hour:
        posts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    else:
        posts_by_hour[hour] += 1  # Summing the number of posts per hour
        comments_by_hour[hour] += comments  # Summing the number of comments per hour

comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

## Calculating the Average Number of Comments for Ask HN Posts by Hour

Next, we'll use these two dictionaries (`'comments_by_hour'` and `'posts_by_hour'`) to calculate the average number of comments for posts created during each hour of the day.

In [48]:
avg_by_hour = []

# Iterating over the keys of 'comments_by_hour'
for hour in comments_by_hour:
    # Appending a list with two attributes:
    # 1) The hour (key from 'comments_by_hour')
    # 2) The average number of comments per post for that hour (value corresponding to that hour key divided by the value corresponding to the same hour key in 'posts_by_hour')
    avg_by_hour.append([hour, comments_by_hour[hour] / posts_by_hour[hour]])

avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

## Sorting and Printing Values from a List of Lists

Though the list above shows us the results we need, we'll improve the format so that its easier to identify the hours with the highest values:

In [49]:
# Swapping the columns in 'avg_by_hour'
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

# Sorting average number of comments in descending order
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

sorted_swap

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

In [51]:
# String formatting and displaying the top 5 hours
print("Top 5 Hours for 'Ask HN' Comments\n")

for avg, hour in sorted_swap[:5]:
    print("{hour}: {comments:.2f} average comments per post".format(hour=dt.datetime.strptime(hour, "%H").strftime("%H:%M"), comments=avg))

Top 5 Hours for 'Ask HN' Comments

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The hour that receives the most comments per post, on average, is 15:00, with an average of 38.59 comments per post. 

According to the dataset [documentation](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts), the time zone used is Eastern Time in the US. 

## Determining Whether Show or Ask Posts Receive More Points on Average

Using a similar method to earlier, we will now determine which type of post receives more points on average.

In [52]:
# Calculating the total points for 'Ask HN' and 'Show HN' posts
ask_points_total = 0
show_points_total = 0

for row in ask_posts:
    num_points = int(row[3])
    ask_points_total += num_points

for row in show_posts:
    num_points = int(row[3])
    show_points_total += num_points

# Computing the average number of points for 'Ask HN' and 'Show HN' posts
avg_ask_points = ask_points_total / len(ask_posts)
print("Average Number of Points for 'Ask HN' Posts:", avg_ask_points)

avg_show_points = show_points_total / len(show_posts)
print("Average Number of Points for 'Show HN' Posts:", avg_show_points)

Average Number of Points for 'Ask HN' Posts: 15.061926605504587
Average Number of Points for 'Show HN' Posts: 27.555077452667813


From the analysis we can see that `'Show HN'` posts receive more points (27.7) on average than `'Ask HN'` posts (15.1).

## Calculating the Average Number of Points for Show HN Posts by Hour

Now we know that `'Show HN'` posts receive more points, we will now detemine at what time posts are more likely to receive them.

In [55]:
result_list = []

# Creating a list of lists with two elements: 'created_at' and 'num_points'
for post in show_posts:
    result_list.append([post[6], int(post[3])])

# Initialising dictionaries to store the counts of posts and points per hour
posts_by_hour = {}
points_by_hour = {}

date_format = "%m/%d/%Y %H:%M"  # Defining the format for date parsing

# Iterating through the result_list to populate posts_by_hour and points_by_hour dictionaries
for row in result_list:
    date = row[0]
    points = row[1]
    time = dt.datetime.strptime(date, date_format)  # Parsing the dates stored as strings
    hour = time.strftime("%H")  # Extracting the hour from the date object as a string
    if hour not in posts_by_hour:
        posts_by_hour[hour] = 1
        points_by_hour[hour] = points
    else:
        posts_by_hour[hour] += 1  # Summing the number of posts per hour
        points_by_hour[hour] += points  # Summing the number of points per hour

# Computing the average number of points per post for each hour
avg_by_hour = []

for hour in points_by_hour:
    avg_by_hour.append([hour, points_by_hour[hour] / posts_by_hour[hour]])

# Swapping the columns in 'avg_by_hour'
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

# Sorting average number of points in descending order
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

# Printing the top 5 hours for 'Ask HN' points
print("Top 5 Hours for 'Ask HN' Points\n")

for avg, hour in sorted_swap[:5]:
    print("{hour}: {comments:.2f} average points per post".format(hour=dt.datetime.strptime(hour, "%H").strftime("%H:%M"), comments=avg))

Top 5 Hours for 'Ask HN' Points

23:00: 42.39 average points per post
12:00: 41.69 average points per post
22:00: 40.35 average points per post
00:00: 37.84 average points per post
18:00: 36.31 average points per post


The hour that receives the most points per post, on average, is 23:00, with an average of 42.39 points per post. 

## Conclusion

Our analysis aimed to understand engagement patterns on Hacker News, particularly focusing on 'Ask HN' and 'Show HN' posts. Comparing comments and points received, we found 'Ask HN' posts garner more comments, with peak engagement around 15:00 (3:00 PM) Eastern Time. Conversely, 'Show HN' posts accrue more points, particularly noticeable late at night around 23:00 (11:00 PM) Eastern Time. These insights offer practical guidance for users and content creators to optimise their engagement strategies on the platform.