# Exploring Hacker News Posts
Hacker News is a popular website in the technology industry. Users submit posts and comment on those posts. 

Today I will explore a dataset of <a href="https://www.kaggle.com/hacker-news/hacker-news-posts">Hacker News posts </a> from Kaggle.com. I will use data analysis to answer two questions:
<ul><li>Do "Ask HN" posts or "Show HN" posts receive more comments on average?</li><li>Is there a time of day during which "Ask HN" posts receive more comments on average?</li></ul>

#### Assumptions:
I will assume that posts that received zero comments are not worth analyzing. This will reduce the size of the dataset to a reasonable size and leave me with content that is clearly more interesting to the technology industry.

In [1]:
from csv import reader

open_file = open('HN_posts_year_to_Sep_26_2016.csv')
read_file = reader(open_file)
hn = list(read_file)

hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12579008',
  'You have two days to comment if you want stem cells to be classified as your own',
  'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018',
  '1',
  '0',
  'altstar',
  '9/26/2016 3:26'],
 ['12579005',
  'SQLAR  the SQLite Archiver',
  'https://www.sqlite.org/sqlar/doc/trunk/README.md',
  '1',
  '0',
  'blacksqr',
  '9/26/2016 3:24'],
 ['12578997',
  'What if we just printed a flatscreen television on the side of our boxes?',
  'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43',
  '1',
  '0',
  'pavel_lishin',
  '9/26/2016 3:19'],
 ['12578989',
  'algorithmic music',
  'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext',
  '1',
  '0',
  'poindontcare',
  '9/26/2016 3:16']]

In [2]:
len(hn)

293120

In [4]:
hn_no_comment = []

for row in hn[1:]:
    if float(row[4]) != 0:
        hn_no_comment.append(row)
        
len(hn_no_comment)

80401

In [5]:
hn_no_comment[:5]

[['12578975',
  'Saving the Hassle of Shopping',
  'https://blog.menswr.com/2016/09/07/whats-new-with-your-style-feed/',
  '1',
  '1',
  'bdoux',
  '9/26/2016 3:13'],
 ['12578908',
  'Ask HN: What TLD do you use for local development?',
  '',
  '4',
  '7',
  'Sevrene',
  '9/26/2016 2:53'],
 ['12578822',
  'Amazons Algorithms Dont Find You the Best Deals',
  'https://www.technologyreview.com/s/602442/amazons-algorithms-dont-find-you-the-best-deals/',
  '1',
  '1',
  'yarapavan',
  '9/26/2016 2:26'],
 ['12578694',
  'Emergency dose of epinephrine that does not cost an arm and a leg',
  'http://m.imgur.com/gallery/th6Ua',
  '2',
  '1',
  'dredmorbius',
  '9/26/2016 1:54'],
 ['12578624',
  'Phone Makers Could Cut Off Drivers. So Why Dont They?',
  'http://www.nytimes.com/2016/09/25/technology/phone-makers-could-cut-off-drivers-so-why-dont-they.html',
  '4',
  '1',
  'danso',
  '9/26/2016 1:37']]

In [6]:
headers = hn[0]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [30]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn_no_comment:
    title = row[1]
    title = title.lower()
    if 'ask hn' in title[:10]:
        ask_posts.append(row)
    elif 'show hn' in title[:10]:
        show_posts.append(row)
    else:
        other_posts.append(row)

length_ask = len(ask_posts)
length_show = len(show_posts)
length_other = len(other_posts)
print('Number of Ask HN Posts:  ', length_ask)
print('Number of Show HN Posts: ', length_show)
print('Number of Other Posts:   ', length_other)
print('\n')
if (length_ask + length_show + length_other) == len(hn_no_comment):
    print(True)

Number of Ask HN Posts:   6913
Number of Show HN Posts:  5059
Number of Other Posts:    68429


True


In [33]:
total_ask_comments = 0
total_show_comments = 0

for row in ask_posts:
    comments = row[4]
    total_ask_comments += int(comments)
    
for row in show_posts:
    comments = row[4]
    total_show_comments += int(comments)
    
avg_ask_comments = round(total_ask_comments/length_ask, 2)
avg_show_comments = round(total_show_comments/length_show, 2)

print('Average Number of Comments per Ask HN Post:  ', avg_ask_comments)
print('Average Number of Comments per Show HN Post: ', avg_show_comments)

Average Number of Comments per Ask HN Post:   13.74
Average Number of Comments per Show HN Post:  9.81


In [81]:
import datetime as dt

result_list = []

for row in ask_posts:
    date = row[-1]
    comments = int(row[4])
    result_list.append([date,comments])
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    dt_object = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(dt_object, '%H')
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

In [82]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([round(comments_by_hour[hour]/counts_by_hour[hour], 2), hour])
    
sorted_avg = sorted(avg_by_hour, reverse=True)

print('Top 5 Hours for Ask Posts Comments')
for i in range(5):
    print('{1}:00: {0} average comments per post.'.format(sorted_avg[i][0], sorted_avg[i][1]))

Top 5 Hours for Ask Posts Comments
15:00: 39.59 average comments per post.
13:00: 22.22 average comments per post.
12:00: 15.45 average comments per post.
10:00: 13.76 average comments per post.
17:00: 13.73 average comments per post.


## Conclusion
"Ask HN" posts clearly have more comments per post than "Show HN" posts.

The best time to post an "Ask HN" post to have the highest chance of receiving the most comments are during the hours of:
<ol><li> 3:00 pm</li><li> 1:00 pm</li><li>12:00 pm</li><li>10:00 am</li><li> 5:00 pm</li></ol>

All times given are in Eastern Standard Time (EST).