# Exploring Hacker News Posts

Setup: Hacker News is a website where user-submitted stories are voted and comment upon, similar to Reddit.

Objective: We want to explore the posts to determine if ask HN or show HN posts receive more comments on average. Also, do posts at certain times receive more comments on average?

## Removing Headers from a List of Lists

In [1]:
from csv import reader
import datetime as dt

hn = open('hacker_news.csv')
hn = list(reader(hn))
headers = hn[0]
hn = hn[1:]
print(hn[:10])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'], ['10482257', 'Title II kills investment? Comcast and other ISPs are now spending more', 'http

## Extracting Ask HN and Show HN Posts

In [2]:
ask_posts =[]
show_posts = []
other_posts = []
for row in hn:
    post = row[1]
    post = post.split(':')
    post_handle = post[0].lower()
    if post_handle == 'ask hn':
        ask_posts.append(row)
    elif post_handle == 'show hn':
        show_posts.append(row)
    else: 
        other_posts.append(row)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1738
1162
17200


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [3]:
index_num_comments = headers.index('num_comments')

ask_num_comments = []
for row in ask_posts:
    ask_num_comments.append(float(row[index_num_comments]))
avg_ask_num_comments  = sum(ask_num_comments)/len(ask_num_comments)

show_num_comments = []
for row in show_posts:
    show_num_comments.append(float(row[index_num_comments]))
avg_show_num_comments  = sum(show_num_comments)/len(show_num_comments)

print(avg_ask_num_comments)
print(avg_show_num_comments)

14.06674338319908
10.31669535283993


## Finding the Amount of Ask Posts and Comments by Hour Created

In [4]:
posts_per_hour = {}
comments_per_hour = {}
for row in ask_posts:
    date = dt.datetime.strptime(row[-1], '%m/%d/%Y %H:%M')
    hour = date.hour
    if hour in posts_per_hour:
        posts_per_hour[hour] +=1
        comments_per_hour[hour] += float(row[4])
    else:
        posts_per_hour[hour] = 1
        comments_per_hour[hour] = float(row[4])
posts_per_hour = dict(sorted(posts_per_hour.items()))
comments_per_hour = dict(sorted(comments_per_hour.items()))

import itertools
def glance(d):
    return dict(itertools.islice(d.items(), 3))
print(glance(posts_per_hour))
print(glance(comments_per_hour))

{0: 54, 1: 59, 2: 58}
{0: 443.0, 1: 662.0, 2: 1381.0}


## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [5]:
avg_comments_per_hour = {}
for key in posts_per_hour.keys():
    avg_comments_per_hour[key] = comments_per_hour[key]/posts_per_hour[key]

print(glance(avg_comments_per_hour))

{0: 8.203703703703704, 1: 11.220338983050848, 2: 23.810344827586206}


## Sorting and Printing Values from a List of Lists

In [6]:
avg_comments_per_hour_swap_keyValue = {value:key for key, value in avg_comments_per_hour.items()}
avg_comments_per_hour_swap_keyValue = dict(sorted(avg_comments_per_hour_swap_keyValue.items(), reverse = True))
for key, value in avg_comments_per_hour_swap_keyValue.items():
    print(f'{value}: {key:.2f} comments per hour')

15: 38.59 comments per hour
2: 23.81 comments per hour
20: 21.52 comments per hour
16: 17.08 comments per hour
21: 16.01 comments per hour
13: 14.74 comments per hour
10: 13.44 comments per hour
14: 13.23 comments per hour
18: 13.20 comments per hour
17: 11.55 comments per hour
1: 11.22 comments per hour
11: 11.05 comments per hour
19: 10.86 comments per hour
8: 10.25 comments per hour
5: 10.09 comments per hour
12: 9.41 comments per hour
6: 9.02 comments per hour
0: 8.20 comments per hour
23: 7.99 comments per hour
7: 7.85 comments per hour
3: 7.80 comments per hour
4: 7.17 comments per hour
22: 6.75 comments per hour
9: 5.58 comments per hour
