# Hacker News Profile - Ask HN VS Show HN

Hacker News (HN) is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. HN is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can garner hundreds of thousands of visitors as a result.

As an avid reader of Hacker News, I wanted to analyze posts with titles **Ask HN** & **Show HN** to better understand engagement on this platform, which continues to be a thought catalyst with considerable influence in the tech sector writ large.

#### Users submit Ask HN posts to ask the Hacker News community a specific question. Below are a few examples:

- Ask HN: How to improve my personal website?
- Ask HN: Am I the only one outraged by Twitter shutting down share counts?
- Ask HN: Aby recent changes to CSS that broke mobile?

#### Users submit Show HN posts to show the Hacker News community a project, product, or just something interesting. Below are a few examples:

- Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
- Show HN: Something pointless I made
- Show HN: Shanhu.io, a programming playground powered by e8vm

### Objective

I'll compare these two types of posts to determine the following:

1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

---

## Dataset

The dataset used can be found on [Kaggle](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts). The origial set has ~269K entries but I will be using a truncated version of ~20K entries to save on compute time.

**Descriptions of the columns**:

- `id`: the unique identifier from Hacker News for the post
- `title`: the title of the post
- `url`: the URL that the posts links to, if the post has a URL
- `num_points`: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
- `num_comments`: the number of comments on the post
- `author`: the username of the person who submitted the post
- `created_at`: the date and time of the post's submission

---

### Import libraries needed for reading the dataset

In [25]:
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)

hn_data = list(read_file)

'''
Data Columns

['id', 'title', 'url', 'num_points',
'num_comments', 'author', 'created_at']

'''
headers = hn_data[0]


hn = hn_data[1:]

## Extracting Ask HN and Show HN Posts

In [26]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    
    if title.startswith('Ask HN'):
        ask_posts.append(row)
    elif title.startswith('Show HN'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('Ask posts:', len(ask_posts))
print('Show posts:', len(show_posts))
print('Other posts:', len(other_posts))

Ask posts: 1742
Show posts: 1161
Other posts: 17197


## Determine if Ask posts or Show posts receive more comments on average

In [27]:
total_ask_posts = len(ask_posts)
total_show_posts = len(show_posts)
comment_freq = {'Ask': 0, 'Show': 0}

for row in ask_posts:
    comments = int(row[4])
    comment_freq['Ask'] += comments

for row in show_posts:
    comments = int(row[4])
    comment_freq['Show'] += comments
    
comment_freq['Ask'] = round(comment_freq['Ask'] / total_ask_posts, 2)
comment_freq['Show'] = round(comment_freq['Show'] / total_show_posts, 2)

print(comment_freq)

{'Ask': 14.04, 'Show': 10.32}


### Ask posts tend to receive more comments on average than Show posts

- **Ask**: 14.04 avg comments
- **Show**: 10.32 avg comments

It would make sense that HN posts where the premise of asking the community about a problem would generate more conversation. Leaving room for debate and differing opinion on the correct answer also increases engagement.

---

## Finding the Number of Ask Posts and Comments by Hour Created

In [56]:
import datetime as dt

result_list = list()

for row in ask_posts:
    created_date = row[6]
    comments = int(row[4])
    
    date = dt.datetime.strptime(created_date, '%m/%d/%Y %H:%M')
    result_list.append([date, comments])

# track counts by hour and comments by hour
counts_by_hour = dict()
comments_by_hour = dict()
    
for pair in result_list:
    date, comments = pair
    
    if date.hour not in counts_by_hour:
        counts_by_hour[date.hour] = 1
        comments_by_hour[date.hour] = comments
    else:
        counts_by_hour[date.hour] += 1
        comments_by_hour[date.hour] += comments

# print(counts_by_hour)
# print(comments_by_hour)

# return a list of lists that will hold the hour of the day followed
# by the average number of comments in that hour

avg_by_hour = list()

for hour in counts_by_hour.keys():
    total_comments = comments_by_hour[hour]
    hour_count_total = counts_by_hour[hour]
    
    avg_by_hour.append([round(total_comments / hour_count_total), hour])


for hour_comment in reversed(sorted(avg_by_hour)):
    print(hour_comment[1], 'Avg comments:', hour_comment[0])

15 Avg comments: 39
2 Avg comments: 24
20 Avg comments: 22
16 Avg comments: 17
21 Avg comments: 16
13 Avg comments: 15
18 Avg comments: 13
14 Avg comments: 13
10 Avg comments: 13
19 Avg comments: 11
17 Avg comments: 11
11 Avg comments: 11
1 Avg comments: 11
8 Avg comments: 10
5 Avg comments: 10
12 Avg comments: 9
6 Avg comments: 9
23 Avg comments: 8
7 Avg comments: 8
3 Avg comments: 8
0 Avg comments: 8
22 Avg comments: 7
4 Avg comments: 7
9 Avg comments: 6


### Best time of day to post 'Ask HN'

#### 🥇 3PM  -- Avg comments: 39
#### 🥈 2AM  -- Avg comments: 24
#### 🥉 8PM  -- Avg comments: 22
---
- 4PM -- Avg comments: 17
- 9PM -- Avg comments: 16
- 1PM -- Avg comments: 15
- 6PM -- Avg comments: 13
- 2PM -- Avg comments: 13
- 10AM -- Avg comments: 13
- 7PM -- Avg comments: 11
- 5PM -- Avg comments: 11
- 11AM -- Avg comments: 11
- 1AM -- Avg comments: 11
- 8AM -- Avg comments: 10
- 5AM -- Avg comments: 10
---

## Analysis

For the highest probability of post engagement on *Hacker News*, you should aim to post at **3pm**, **2am**, or **8pm**.

Another added benefit is you have two time slots in **4pm** and **9pm**  that will also yield decent engagement results should you miss the top 3 time windows.