# Data Analysis to find most popular posting times on Hacker News

We will compare the HN Show posts and the HN Ask posts to identify if one is more popular than the other. We will use the vote count and the number of comments to derive our conclusion. 

In addition we will dwell depper into both types of post to understand if posts made on any one particular time of day attacts more attention than others. We will map the vote and comments against the post time to prepare this analysis.

In [11]:
from csv import reader
f = open('hacker_news.csv')
hn = list(reader(f))
print('Total Records: ', len(hn[1:]))
hn[:5]

Total Records:  20100


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [12]:
header = hn[0]
hn = hn[1:]
print(header)
hn[:5]

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

In [22]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = (row[1]).lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print('Ask Posts: ', len(ask_posts))
print('Show Posts: ', len(show_posts))
print('Other Posts: ', len(other_posts))

Ask Posts:  1744
Show Posts:  1162
Other Posts:  17194


In [42]:
def comment_count(data_set):
    total_comments = 0
    for row in data_set:
        comment = int(row[4])
        total_comments += comment
    print('Total Comments:', total_comments)
    print('Average Comments / post:', round((total_comments/len(data_set)), 2))

In [43]:
comment_count(ask_posts)

Total Comments: 24483
Average Comments / post: 14.04


In [44]:
comment_count(show_posts)

Total Comments: 11988
Average Comments / post: 10.32


In [58]:
import datetime as dt

result_list = []
for row in ask_posts:
    comments = row[4]
    post_time_str = row[-1]
    post_time = dt.datetime.strptime(post_time_str, '%m/%d/%Y %H:%M')
    result_list.append([post_time, comments])

counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
    comment_count = int(row[1])
    post_hour = (row[0]).hour
    if post_hour in counts_by_hour:
        counts_by_hour[post_hour] += 1
        comments_by_hour[post_hour] += comment_count
    else:
        counts_by_hour[post_hour] = 1
        comments_by_hour[post_hour] = comment_count

print(counts_by_hour)
print(comments_by_hour)

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}
{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


In [60]:
avg_by_hour = {}
for key in counts_by_hour:
    comments = comments_by_hour[key]
    avg_by_hour[key] = comments / counts_by_hour[key]
print(avg_by_hour)

{9: 5.5777777777777775, 13: 14.741176470588234, 10: 13.440677966101696, 14: 13.233644859813085, 16: 16.796296296296298, 23: 7.985294117647059, 12: 9.41095890410959, 17: 11.46, 15: 38.5948275862069, 21: 16.009174311926607, 20: 21.525, 2: 23.810344827586206, 18: 13.20183486238532, 3: 7.796296296296297, 5: 10.08695652173913, 19: 10.8, 1: 11.383333333333333, 22: 6.746478873239437, 8: 10.25, 4: 7.170212765957447, 0: 8.127272727272727, 6: 9.022727272727273, 7: 7.852941176470588, 11: 11.051724137931034}


In [76]:
average_sort = {k:v for k,v in sorted(avg_by_hour.items(), key = lambda item: item[1], reverse = True)}
print('Times in PST')
for k, v in average_sort.items():
    dt_time = dt.datetime.strptime(str(k), '%H')
    time_diff = dt.timedelta(hours = 3)
    print('{}:00: {:.2f} average comments per post'
          .format((dt_time - time_diff).strftime("%H"), v))

Times in PST
12:00: 38.59 average comments per post
23:00: 23.81 average comments per post
17:00: 21.52 average comments per post
13:00: 16.80 average comments per post
18:00: 16.01 average comments per post
10:00: 14.74 average comments per post
07:00: 13.44 average comments per post
11:00: 13.23 average comments per post
15:00: 13.20 average comments per post
14:00: 11.46 average comments per post
22:00: 11.38 average comments per post
08:00: 11.05 average comments per post
16:00: 10.80 average comments per post
05:00: 10.25 average comments per post
02:00: 10.09 average comments per post
09:00: 9.41 average comments per post
03:00: 9.02 average comments per post
21:00: 8.13 average comments per post
20:00: 7.99 average comments per post
04:00: 7.85 average comments per post
00:00: 7.80 average comments per post
01:00: 7.17 average comments per post
19:00: 6.75 average comments per post
06:00: 5.58 average comments per post


The best times to write a post that will most probably be widely read and commented on is at 12am followed by 11pm Pacific Standard Time.
The worst times to write a post that will garner the least attention is 6am followed by 7pm Pacific Standard Time.