# Exploring Hacker News posts

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. 

Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

In [1]:
from csv import reader

opened_file = open('HN_posts_year_to_Sep_26_2016.csv')
readed_list = list(reader(opened_file))
hn_data_header = readed_list[0]
hn_data = readed_list[1:]

print('Header:')
print(hn_data_header,'\n')
print('Data:')
for row in hn_data[0:5]:
    print(row)
print('...','\n')
print('Total:')
print(len(hn_data))

Header:
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] 

Data:
['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']
['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']
['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']
['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']
['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']
... 



#### Columns descriptions

| Index | Label       | Description                                        |
|:-----:|:-----------:|:---------------------------------------------------|
|0      |id           |The unique identifier from Hacker News for the post |
|1      |title        |The title of the post
|2      |url          |The URL that the posts links to, if it the post has a URL
|3      |num_points   |The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
|4      |num_comments |The number of comments that were made on the post
|5      |author       |The username of the person who submitted the post
|6      |created_at   |The date and time at which the post was submitted

#### Filtering data
Now that we've removed the headers from hn, we're ready to filter our data. Since we're only concerned with post titles beginning with Ask HN or Show HN, we'll create new lists of lists containing just the data for those titles.

In [2]:
ASK_HN = 'ask hn'
SHOW_HN = 'show hn'

ask_posts = []
show_posts = []
other_posts = []

for row in hn_data:
    
    title = str(row[1]).lower()
    
    if (title.startswith(ASK_HN)):
        ask_posts.append(row)
    elif (title.startswith(SHOW_HN)):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('Ask posts:', len(ask_posts))
print('Show posts:', len(show_posts))
print('Other posts:', len(other_posts))

Ask posts: 9139
Show posts: 10158
Other posts: 273822


Next, let's determine if ask posts or show posts receive more comments on average.

In [3]:
total_ask_comments = 0
for row in ask_posts:
    total_ask_comments += int(row[4])
avg_ask_comments = total_ask_comments/len(ask_posts)
print ('Average num comments in Ask posts:', avg_ask_comments)

total_show_comments = 0
for row in show_posts:
    total_show_comments += int(row[4])
avg_show_comments = total_show_comments/len(show_posts)
print ('Average num comments in Show posts:', avg_show_comments)


Average num comments in Ask posts: 10.393478498741656
Average num comments in Show posts: 4.886099625910612


Clearly "Ask" posts' comments are in average more then twice respect of "Show" posts (english? what?)

Now we're aggregating data by questions by hour and comments by hour

In [26]:
import datetime as dt

result_list = []

for row in ask_posts:
    result_list.append([row[6], row[4]])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    comments = int(row[1])
    datetime = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour_str = dt.datetime.strftime(datetime, "%H")
    
    if hour_str in counts_by_hour:
        counts_by_hour[hour_str] += 1
        comments_by_hour[hour_str] += comments
    else:
        counts_by_hour[hour_str] = 1
        comments_by_hour[hour_str] = comments
        
for key in comments_by_hour:
    print("hour:", key, "\tcount:", counts_by_hour[key], "\tcomments:", comments_by_hour[key])


hour: 02 	count: 269 	comments: 2996
hour: 01 	count: 282 	comments: 2089
hour: 22 	count: 383 	comments: 3372
hour: 21 	count: 518 	comments: 4500
hour: 19 	count: 552 	comments: 3954
hour: 17 	count: 587 	comments: 5547
hour: 15 	count: 646 	comments: 18525
hour: 14 	count: 513 	comments: 4972
hour: 13 	count: 444 	comments: 7245
hour: 11 	count: 312 	comments: 2797
hour: 10 	count: 282 	comments: 3013
hour: 09 	count: 222 	comments: 1477
hour: 07 	count: 226 	comments: 1585
hour: 03 	count: 271 	comments: 2154
hour: 23 	count: 343 	comments: 2297
hour: 20 	count: 510 	comments: 4462
hour: 16 	count: 579 	comments: 4466
hour: 08 	count: 257 	comments: 2362
hour: 00 	count: 301 	comments: 2277
hour: 18 	count: 614 	comments: 4877
hour: 12 	count: 342 	comments: 4234
hour: 04 	count: 243 	comments: 2360
hour: 06 	count: 234 	comments: 1587
hour: 05 	count: 209 	comments: 1838


`avg_by_hour` is a list of lists in which the first element is the hour and the second element is the average number of comments per post

In [34]:
avg_by_hour = []
for hour in comments_by_hour:
    average = comments_by_hour[hour]/counts_by_hour[hour]
    avg_by_hour.append([hour, average])
    
avg_by_hour.sort(key = lambda avg_by_hour: avg_by_hour[0])
print('avg_by_hour:')
for row in avg_by_hour:
    print('{}:00: {:.2f} average comments per post'.format(row[0], row[1]))


avg_by_hour:
00:00: 7.56 average comments per post
01:00: 7.41 average comments per post
02:00: 11.14 average comments per post
03:00: 7.95 average comments per post
04:00: 9.71 average comments per post
05:00: 8.79 average comments per post
06:00: 6.78 average comments per post
07:00: 7.01 average comments per post
08:00: 9.19 average comments per post
09:00: 6.65 average comments per post
10:00: 10.68 average comments per post
11:00: 8.96 average comments per post
12:00: 12.38 average comments per post
13:00: 16.32 average comments per post
14:00: 9.69 average comments per post
15:00: 28.68 average comments per post
16:00: 7.71 average comments per post
17:00: 9.45 average comments per post
18:00: 7.94 average comments per post
19:00: 7.16 average comments per post
20:00: 8.75 average comments per post
21:00: 8.69 average comments per post
22:00: 8.80 average comments per post
23:00: 6.70 average comments per post


Few dicks guys, if you want to receive feedback, early afternoon is da way

`15:00: 28.68 average comments per post`