# Exploring Hacker News Posts

## Are you curious about what posts generate the most engagement on technology and startup forums? 

As a freelance data analyst, I recently completed a guided project that analyzed a dataset of submissions to the popular technology site, Hacker News. 

In this project, I focused on two types of user-submitted posts: Ask HN and Show HN.

Ask HN posts are submissions where the user asks the Hacker News community a specific question, while Show HN posts are submissions where the user showcases a project, product, or something interesting.

The goal was to determine which type of post received more comments on average and whether posts created at certain times of the day received more comments. 

Through my analysis, I discovered some fascinating insights that could help you understand the factors that contribute to engagement on online forums. 

With this knowledge, you could potentially improve your own online content and engagement strategies.

Check out my analysis of this Hacker News dataset to learn more!

## Reading in the data

Note: The data set we're working with was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions.

In [11]:
# import modules and definitions

import pandas as pd
from csv import reader

open_file = open('hacker_news.csv')
read_file = reader(open_file)

hn = list(read_file) # read it in as a list of lists


# separating header from data

headers = hn[0]

hn = hn[1:]

In [12]:
# verify seperation

print(headers)

print('\n')

print(hn[:6])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'], ['10482257', '

In [15]:
# create a pandas dataframe from the list of lists - out of curiosity

df = pd.DataFrame(hn, columns = headers)

df.head()

Unnamed: 0,id,title,url,num_points,num_comments,author,created_at
0,12224879,Interactive Dynamic Video,http://www.interactivedynamicvideo.com/,386,52,ne0phyte,8/4/2016 11:52
1,10975351,How to Use Open Source and Shut the Fuck Up at...,http://hueniverse.com/2016/01/26/how-to-use-op...,39,10,josep2,1/26/2016 19:30
2,11964716,Florida DJs May Face Felony for April Fools' W...,http://www.thewire.com/entertainment/2013/04/f...,2,1,vezycash,6/23/2016 22:20
3,11919867,Technology ventures: From Idea to Enterprise,https://www.amazon.com/Technology-Ventures-Ent...,3,1,hswarna,6/17/2016 0:01
4,10301696,Note by Note: The Making of Steinway L1037 (2007),http://www.nytimes.com/2007/11/07/movies/07ste...,8,2,walterbell,9/30/2015 4:12


## Extracting Ask HN, Show HN, and other posts

We separate posts based on whether they start with Ask HN or Show HN and collect the data for those two types of posts in different lists.

In [17]:
# collect posts into different lists

ask_posts = []
show_posts = []
other_posts = []

for row in hn: # loop through hn to separate the posts
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

# check the number of posts in each list        

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


## Calculate which posts receive more comments

In [26]:
# calculate the average comments per Ask HN post:

total_ask_comments = 0

for row in ask_posts: # sum the number of all comments
    comments = int(row[4])
    total_ask_comments += comments

print(total_ask_comments)
avg_ask_comments = total_ask_comments/len(ask_posts) # calculate the average number of comments

print(round(avg_ask_comments))

24483
14


In [27]:
# calculate the average comments per Show HN post:

total_show_comments = 0

for row in show_posts: # sum the number of all comments
    comments = int(row[4])
    total_show_comments += comments

print(total_show_comments)
avg_show_comments = total_show_comments/len(show_posts) # calculate the average number of comments

print(round(avg_show_comments))

11988
10


The results of our calculations clearly indicate that Ask HN posts receive a far greater average number of comments (14) than Show HN posts (10) do.

This is not too surprising since asking question is a great way to engage people.

We'll focus our further analysis just on the Ask HN posts.