# Exploring Hacker News Posts


This project will analyse a data set of [Hacker News](https://news.ycombinator.com/) posts to understand if the types of posts, and the time when they are posted, affect the number of comments they receive on average. 

In [2]:
from csv import reader
opened_file = open('hacker_news.csv', encoding='utf8')
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:4])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']]


In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)
print('\n')
print(hn[:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']]


In [4]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print(len(ask_posts))
print('\n')
print(len(show_posts))
print('\n')
print(len(other_posts))
print(ask_posts[:4])
print('\n')
print(show_posts[:4])
print('\n')
print(other_posts[:4])
print('\n')

9139


10158


273822
[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48']]


[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44'], ['12577991', 'Show HN: Pomodoro-centric, heirarchical project management wit

In [14]:
total_ask_comments = 0
for post in ask_posts:
    total_ask_comments += int(post[4])
avg_ask_comments = total_ask_comments / len(ask_posts)
print('Average number of comments for "ask" posts is {:2f}'.format(avg_ask_comments))
print('\n')
total_show_comments = 0
for post in show_posts:
    total_show_comments += int(post[4])
avg_show_comments = total_show_comments / len(show_posts)
print('Average number of comments for "show" posts is {:2f}'.format(avg_show_comments))

Average number of comments for "ask" posts is 10.393478


Average number of comments for "show" posts is 4.886100


By examining ask and show posts, we can see that ask posts get, on average, more than twice the number comments that show posts receive. 

In [24]:
import datetime as dt
result_list = []
for post in ask_posts:
    created_at = post[6]
    num_comments = int(post[4])
    result_list.append([created_at, num_comments])
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    created_at = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = created_at.hour
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    if time in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

print(counts_by_hour)
print('\n')
print(comments_by_hour)

{2: 1, 1: 1, 22: 1, 21: 1, 19: 1, 17: 1, 15: 1, 14: 1, 13: 1, 11: 1, 10: 1, 9: 1, 7: 1, 3: 1, 23: 1, 20: 1, 16: 1, 8: 1, 0: 1, 18: 1, 12: 1, 4: 1, 6: 1, 5: 1}


{2: 7, 1: 3, 22: 0, 21: 2, 19: 1, 17: 3, 15: 0, 14: 0, 13: 2, 11: 2, 10: 0, 9: 97, 7: 4, 3: 1, 23: 0, 20: 0, 16: 0, 8: 7, 0: 2, 18: 12, 12: 6, 4: 2, 6: 1, 5: 0}


In [25]:
avg_by_hour = []
for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
print(avg_by_hour)

[[2, 7.0], [1, 3.0], [22, 0.0], [21, 2.0], [19, 1.0], [17, 3.0], [15, 0.0], [14, 0.0], [13, 2.0], [11, 2.0], [10, 0.0], [9, 97.0], [7, 4.0], [3, 1.0], [23, 0.0], [20, 0.0], [16, 0.0], [8, 7.0], [0, 2.0], [18, 12.0], [12, 6.0], [4, 2.0], [6, 1.0], [5, 0.0]]


In [30]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
print(swap_avg_by_hour)

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for Ask Posts Comments:")
output = "{}: {:.2f} average comments per post"
for row in sorted_swap[:5]:
    the_time = dt.datetime.strptime(str(row[1]), "%H")
    the_time = the_time.strftime("%H:%M")
    print(output.format(the_time, row[0]))

[[7.0, 2], [3.0, 1], [0.0, 22], [2.0, 21], [1.0, 19], [3.0, 17], [0.0, 15], [0.0, 14], [2.0, 13], [2.0, 11], [0.0, 10], [97.0, 9], [4.0, 7], [1.0, 3], [0.0, 23], [0.0, 20], [0.0, 16], [7.0, 8], [2.0, 0], [12.0, 18], [6.0, 12], [2.0, 4], [1.0, 6], [0.0, 5]]
Top 5 Hours for Ask Posts Comments:
09:00: 97.00 average comments per post
18:00: 12.00 average comments per post
08:00: 7.00 average comments per post
02:00: 7.00 average comments per post
12:00: 6.00 average comments per post
