# Analyzing HackerNews

We will be analyzing HackerNews post data, comparing Ask HN and Show HN posts and delving deeper into, on average, which hours of day Ask HN post gets the most comments.

The data was provided by Kaggle.com. 
See [link](https://www.kaggle.com/hacker-news/hacker-news-posts)

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

In [2]:
headers = hn[0]
hn = hn[1:]

In [3]:
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [4]:
for row in hn[:5]:
    print(row, '\n')

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] 

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] 

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] 

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'] 

['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'] 



In [14]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1].lower()
    
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

len_ask_posts = len(ask_posts)
len_show_posts = len(show_posts)
len_other_posts = len(other_posts)

print('Number of rows in ask_posts:', len_ask_posts)
print('Number of rows in show_posts:', len_show_posts)
print('Number of rows in other_posts:', len_other_posts)

Number of rows in ask_posts: 1744
Number of rows in show_posts: 1162
Number of rows in other_posts: 17194


In [15]:
for row in ask_posts[:5]:
    print(row, '\n')

['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'] 

['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'] 

['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'] 

['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'] 

['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38'] 



In [16]:
for row in show_posts[:5]:
    print(row, '\n')

['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'] 

['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'] 

['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'] 

['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'] 

['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45'] 



In [22]:
total_ask_commments = 0

for row in ask_posts:
    total_ask_commments += int(row[4])
    
avg_ask_comments = total_ask_commments / len_ask_posts

print('Average number of comments per Ask HN post:', 
      round(avg_ask_comments,2))

Average number of comments per Ask HN post: 14.04


In [23]:
total_show_commments = 0

for row in show_posts:
    total_show_commments += int(row[4])
    
avg_show_comments = total_show_commments / len_show_posts

print('Average number of comments per Show HN post:', 
      round(avg_show_comments,2))

Average number of comments per Show HN post: 10.32


### Comments per Ask HN vs Show HN

It appears that there are more comments per post for a Ask HN post compared with that of a Show HN post. This might be explained by the fact that Ask HN posts are soliciting the users to get involved and post whereas a Show HN post is simply a post sharing something interesting with the community.

In [24]:
import datetime as dt

In [44]:
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    
    # first column of result_list is timestamp of post
    # second column of result_list is the number of comments 
    result_list.append([created_at, num_comments])

In [54]:
result_list[0][0]
dt_post = dt.datetime.strptime(result_list[20][0], '%m/%d/%Y %H:%M')
hour = dt.datetime.strftime(dt_post, '%H')
hour

'18'

In [55]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    dt_post = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(dt_post, '%H')
    
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]

In [65]:
for i in counts_by_hour:
    print(i, ':', counts_by_hour[i])

09 : 45
22 : 71
12 : 73
17 : 100
23 : 68
10 : 59
04 : 47
00 : 55
19 : 110
15 : 116
14 : 107
21 : 109
16 : 108
11 : 58
07 : 34
20 : 80
08 : 48
02 : 58
06 : 44
13 : 85
05 : 46
01 : 60
18 : 109
03 : 54


In [66]:
for i in comments_by_hour:
    print(i, ':', comments_by_hour[i])

09 : 251
22 : 479
12 : 687
17 : 1146
23 : 543
10 : 793
04 : 337
00 : 447
19 : 1188
15 : 4477
14 : 1416
21 : 1745
16 : 1814
11 : 641
07 : 267
20 : 1722
08 : 492
02 : 1381
06 : 397
13 : 1253
05 : 464
01 : 683
18 : 1439
03 : 421


24

In [81]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, 
                        round(
                            comments_by_hour[hour]/counts_by_hour[hour]
                        , 2)])


In [82]:
avg_by_hour

[['09', 5.58],
 ['22', 6.75],
 ['12', 9.41],
 ['17', 11.46],
 ['23', 7.99],
 ['10', 13.44],
 ['04', 7.17],
 ['00', 8.13],
 ['19', 10.8],
 ['15', 38.59],
 ['14', 13.23],
 ['21', 16.01],
 ['16', 16.8],
 ['11', 11.05],
 ['07', 7.85],
 ['20', 21.52],
 ['08', 10.25],
 ['02', 23.81],
 ['06', 9.02],
 ['13', 14.74],
 ['05', 10.09],
 ['01', 11.38],
 ['18', 13.2],
 ['03', 7.8]]

In [83]:
swap_avg_by_hour = []

for hour in avg_by_hour:
    swap_avg_by_hour.append([hour[1], hour[0]])

In [98]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print('Top 5 Hours for Ask Post Comments:')

for row in sorted_swap[:5]:
    # convert string for hour into datetime obj
    hour = dt.datetime.strptime(row[1], '%H')
    # convert datetime obj into string with preferred formatting 
    hour = dt.datetime.strftime(hour, '%H:%M')
    
    print("{hour}: {ave} average comments per post".format(hour=hour,
                                                    ave=row[0]))

Top 5 Hours for Ask Post Comments:
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.8 average comments per post
21:00: 16.01 average comments per post


## Conclusion 

According to our analysis, Ask HN posts made at 3:00PM EST will receive the most comments on average. To adjust to PST, that comes out to 1:00PM PST. 

Other good times for potentially attracting more comments would be:
* 2:00AM EST (11:00PM PST)
* 8:00PM EST (6:00PM PST)
* 4:00PM EST (2:00PM PST)
* 9:00PM EST (7:00PM PST)

