# Exploring Hacker News Posts
Hacker News is a site that allows users to submit posts and interact with oen another, similar to Reddit. This site is popular in technology circles, with tops posts drawing in thousands of visitors. This analysis will explore the various aspects of a post and how it effects it's susccess. We will be comparing posts with titles including "Ask HN" or "Show HN", which are tags used by the community to ask a question or show off personal work. We will be analyzing the effects these tags have on their respective posts.

1.   Do "Ask HN" or "Show HN" tags receive more comments on average?
2.   Do posts crearted at a certain time receive more comments on average?

**Data**

The source data for this study can be found [here](https://www.kaggle.com/hacker-news/hacker-news-posts). It contains almost 300,000 rows, each row representing a post. The data is of 2016. However, for this study we make use of a version that been reduced to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions. This file was prepared by Dataquest.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
#Import file and display first 5 rows
from csv import reader
opened_file = open('/content/drive/My Drive/Colab Notebooks/Hacker News/hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [0]:
#Isolate header row
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [0]:
#Separate posts into categories
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


In [0]:
#Determine which posts receive more comments on average

total_ask_comments = 0
for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments = total_ask_comments + num_comments

avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments = total_show_comments + num_comments
    
avg_show_comment = total_show_comments / len(show_posts)
print(avg_show_comment)
    

14.038417431192661
10.31669535283993


According to the calculation done in the cell above, we can see that "Ask HN" posts receive more comments on average compared to "Show HN" posts. On average, "Ask HN" posts receive 14.04 comments and "Show HN" posts receive 10.32 comments. Going forward, we will do our remaining analysis on just "Ask HN" posts.

In [0]:
# Calculate the amount of ask posts created per hour along with the total amount of comments.
import datetime as dt
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour = date.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]   

{'09': 251, '10': 793, '02': 1381, '21': 1745, '23': 543, '16': 1814, '17': 1146, '22': 479, '11': 641, '19': 1188, '07': 267, '08': 492, '04': 337, '05': 464, '13': 1253, '01': 683, '06': 397, '18': 1439, '14': 1416, '03': 421, '20': 1722, '12': 687, '00': 447, '15': 4477}


In [0]:
avg_by_hour = []

for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])

print(avg_by_hour)
    

[['09', 5.5777777777777775], ['10', 13.440677966101696], ['02', 23.810344827586206], ['21', 16.009174311926607], ['23', 7.985294117647059], ['16', 16.796296296296298], ['17', 11.46], ['22', 6.746478873239437], ['11', 11.051724137931034], ['19', 10.8], ['07', 7.852941176470588], ['08', 10.25], ['04', 7.170212765957447], ['05', 10.08695652173913], ['13', 14.741176470588234], ['01', 11.383333333333333], ['06', 9.022727272727273], ['18', 13.20183486238532], ['14', 13.233644859813085], ['03', 7.796296296296297], ['20', 21.525], ['12', 9.41095890410959], ['00', 8.127272727272727], ['15', 38.5948275862069]]


In [0]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print("Top 5 Hours for Ask Posts Comments")
for row in sorted_swap[:5]:
        avg = row[0]
        hour = row[1]
        hour_format = "%H"
        hour = dt.datetime.strptime(hour, hour_format)
        hour = hour.strftime('%H:%M')
        
        hour_avg_string = "{h} {a:.2f} avg comments per post".format(h=hour, a=avg)
        print(hour_avg_string)

Top 5 Hours for Ask Posts Comments
15:00 38.59 avg comments per post
02:00 23.81 avg comments per post
20:00 21.52 avg comments per post
16:00 16.80 avg comments per post
21:00 16.01 avg comments per post


According to the results showed above, it appears that the best time to make a "Ask HN" post is 15:00 (3PM EST) as those posts had the highest number of comments per post. Posting during this time generally has more engagement.