# Q: When should I post to get a higher number of comments?

A: Take a look at the hours when your post is more likely to receive comments

## Introduction

This notebook contains an analysis of the *show posts* and *ask posts* in the popular website called **Hacker News**. The goal is to inform the reader at which hour there is a higher chance on receiving feedback to their post. Moreover, I analyze if there is a difference in times if the post is written as a question or not.

### Import libraries

In [1]:
from csv import reader
import datetime as dt

### Read the file and explore the features of every post

In [2]:
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


### Divide the data

Count the number of *ask posts* and *show posts*, and divide them in different lists.

In [4]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)   
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


## Which type of posts receive a higher number of comments?

Accordingly to this analysis, the *ask posts* are the one receiving a higher feedback from other users. 

In [5]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print("Average number of comments in ask posts: {:,.2f}".format(avg_ask_comments))


total_show_comments = 0

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments/len(show_posts)
print("Average number of comments in show posts: {:,.2f}".format(avg_show_comments))


Average number of comments in ask posts: 14.04
Average number of comments in show posts: 10.32


Just an example of two *ask posts* and two *show posts*

In [6]:
print(ask_posts[:2])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']]


In [7]:
print(show_posts[:2])

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']]


## Analyze the posting time with respect to the number of comments

In [8]:
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for date, n_com in result_list:
    date_final = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    
    if date_final.hour not in counts_by_hour:
        counts_by_hour[date_final.hour] = 1
        comments_by_hour[date_final.hour] = n_com
    else:
        counts_by_hour[date_final.hour] += 1
        comments_by_hour[date_final.hour] += n_com
        
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
    
    
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top hours to get an answer on an ask post")
for avg, hour in sorted_swap:
    hour = dt.datetime.strptime(str(hour), "%H")
    hour_format = dt.datetime.strftime(hour, "%H:%M")
    print("{} {:.2f} average comments per post".format(hour_format, avg))

Top hours to get an answer on an ask post
15:00 38.59 average comments per post
02:00 23.81 average comments per post
20:00 21.52 average comments per post
16:00 16.80 average comments per post
21:00 16.01 average comments per post
13:00 14.74 average comments per post
10:00 13.44 average comments per post
14:00 13.23 average comments per post
18:00 13.20 average comments per post
17:00 11.46 average comments per post
01:00 11.38 average comments per post
11:00 11.05 average comments per post
19:00 10.80 average comments per post
08:00 10.25 average comments per post
05:00 10.09 average comments per post
12:00 9.41 average comments per post
06:00 9.02 average comments per post
00:00 8.13 average comments per post
23:00 7.99 average comments per post
07:00 7.85 average comments per post
03:00 7.80 average comments per post
04:00 7.17 average comments per post
22:00 6.75 average comments per post
09:00 5.58 average comments per post


In [9]:
result_list = []

for row in show_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for date, n_com in result_list:
    date_final = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    
    if date_final.hour not in counts_by_hour:
        counts_by_hour[date_final.hour] = 1
        comments_by_hour[date_final.hour] = n_com
    else:
        counts_by_hour[date_final.hour] += 1
        comments_by_hour[date_final.hour] += n_com
        
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
    
    
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top hours to get an answer on a show post")
for avg, hour in sorted_swap:
    hour = dt.datetime.strptime(str(hour), "%H")
    hour_format = dt.datetime.strftime(hour, "%H:%M")
    print("{} {:.2f} average comments per post".format(hour_format, avg))

Top hours to get an answer on a show post
18:00 15.77 average comments per post
00:00 15.71 average comments per post
14:00 13.44 average comments per post
23:00 12.42 average comments per post
22:00 12.39 average comments per post
12:00 11.80 average comments per post
16:00 11.66 average comments per post
07:00 11.50 average comments per post
11:00 11.16 average comments per post
03:00 10.63 average comments per post
20:00 10.20 average comments per post
19:00 9.80 average comments per post
17:00 9.80 average comments per post
09:00 9.70 average comments per post
13:00 9.56 average comments per post
04:00 9.50 average comments per post
06:00 8.88 average comments per post
01:00 8.79 average comments per post
10:00 8.25 average comments per post
15:00 8.10 average comments per post
21:00 5.79 average comments per post
08:00 4.85 average comments per post
02:00 4.23 average comments per post
05:00 3.05 average comments per post
