### Exploring Hacker News Posts

In this project, I'll work with a dataset of submissions to popular technology site Hacker News, a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. 

I'm specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Below are a few examples:

 ##### Example
    Ask HN: How to improve my personal website?
    Ask HN: Am I the only one outraged by Twitter shutting down share counts?
    Ask HN: Aby recent changes to CSS that broke mobile?

Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just something interesting. Below are a few examples:

##### Example
    Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
    Show HN: Something pointless I made
    Show HN: Shanhu.io, a programming playground powered by e8vm

I'll compare these two types of posts to determine the following:

    Do Ask HN or Show HN receive more comments on average?
    Do posts created at a certain time receive more comments on average?

In [1]:
opened_file = open('hacker_news.csv')
from csv import reader
read_file = reader(opened_file)
hn = list(read_file)
headers = hn[0]
hn = hn[1:]

print(headers)
print('\n')
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [2]:
ask_posts = []
show_posts = []
other_posts = []

# Loop through each row in hn
for row in hn:
    title = row[1]  # Assign the title in each row to a variable named title
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [3]:
ask_posts[:6]

[['12296411',
  'Ask HN: How to improve my personal website?',
  '',
  '2',
  '6',
  'ahmedbaracat',
  '8/16/2016 9:55'],
 ['10610020',
  'Ask HN: Am I the only one outraged by Twitter shutting down share counts?',
  '',
  '28',
  '29',
  'tkfx',
  '11/22/2015 13:43'],
 ['11610310',
  'Ask HN: Aby recent changes to CSS that broke mobile?',
  '',
  '1',
  '1',
  'polskibus',
  '5/2/2016 10:14'],
 ['12210105',
  'Ask HN: Looking for Employee #3 How do I do it?',
  '',
  '1',
  '3',
  'sph130',
  '8/2/2016 14:20'],
 ['10394168',
  'Ask HN: Someone offered to buy my browser extension from me. What now?',
  '',
  '28',
  '17',
  'roykolak',
  '10/15/2015 16:38'],
 ['10284812',
  'Ask HN: Limiting CPU, memory, and I/O usage on a program for testing',
  '',
  '2',
  '1',
  'zatkin',
  '9/26/2015 23:23']]

In [4]:
total_ask_comments = 0
for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments +=num_comments
print(total_ask_comments)

24483


In [5]:
if len(ask_posts) > 0:
    avg_ask_comments = total_ask_comments/len(ask_posts)
else:
    avg_ask_comments = 0
print(avg_ask_comments)

14.038417431192661


In [6]:
total_show_comments = 0
for posts in show_posts:
    num_comments = int(posts[4])
    total_show_comments += num_comments
print(total_show_comments)

11988


In [7]:
if len(show_posts) > 0 :
    avg_show_comments = total_show_comments/len(show_posts)
else:
    avg_show_comments =0

print(avg_show_comments)

10.31669535283993


After running the above code, we can determine which type of post ("Ask HN" or "Show HN") receives more comments on average. Based on the average number of comments computed, we will print a message indicating whether "Ask HN" posts or "Show HN" posts receive more comments on average.

In [10]:
import datetime as dt
result_list = []
for post in ask_posts:
    created_at = post[6]
    num_comments = int(post[4])
    result_list.append([created_at, num_comments])

print(result_list[:5])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17]]


In [16]:
counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
    created_at = row[0]
    num_comments = row[1]
    date_obj = dt.datetime.strptime(created_at, "%m/%d/%Y %H:%M")
    hour = date_obj.strftime("%H")

    if hour not in counts_by_hour:
        counts_by_hour[hour] =1
        comments_by_hour[hour] = num_comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] +=num_comments

print(counts_by_hour)
print(comments_by_hour)

{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}
{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}
