# How to receive maximum comments on Hacker News

In this project, we will attempt to answer the following questions:

1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

In [2]:
# First, we read in the data
from csv import reader

opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

# Number of posts
print(len(hn))

20101


In [3]:
# Extract the headers
headers = hn[0]
hn = hn[1:]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


## Filtering data

Since we're only concerned with post titles beginning with Ask HN or Show HN, we'll create new lists of lists containing just the data for those titles.

In [4]:
ask_posts = []
show_posts = []
other_posts = []

# Loop through each post to sort to according list
for row in hn:
    title = row[1]
    title = title.lower() # cast all to lower
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print("# of Ask HN: ",len(ask_posts), " Show HN: ", len(show_posts),
      " Others: ",len(other_posts))
    

# of Ask HN:  1744  Show HN:  1162  Others:  17194


## Calculating average

Now, it is time to find out whether ask or show posts have more comments on average

In [5]:
total_ask_comments = 0
ask_count = 0

total_show_comments = 0
show_count = 0

for row in ask_posts:
    total_ask_comments += int(row[4])
    ask_count += 1

for row in show_posts:
    total_show_comments += int(row[4])
    show_count += 1
    
print("Average # of comments for ask:", (total_ask_comments/ask_count))
print("Average # of comments for show:", (total_show_comments/show_count))

Average # of comments for ask: 14.038417431192661
Average # of comments for show: 10.31669535283993


## Finding the Amount of Ask Posts and Comments by Hour Created

Since ask posts receive more comments, we'll focus our remaining analysis just on ask posts.Now, we will determine if ask posts created at a certain time are more likely to attract comments. 

In [27]:
import datetime as dt

result_list = []

for row in ask_posts:
    res = []
    res.append(row[6]) #created_at
    res.append(int(row[4])) #num_comments
    result_list.append(res)

counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    time = dt.datetime.strptime(row[0], date_format)
    time = (time.strftime("%H"))
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = row[1]
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += row[1]

comments_by_hour

{'00': 447,
 '01': 683,
 '02': 1381,
 '03': 421,
 '04': 337,
 '05': 464,
 '06': 397,
 '07': 267,
 '08': 492,
 '09': 251,
 '10': 793,
 '11': 641,
 '12': 687,
 '13': 1253,
 '14': 1416,
 '15': 4477,
 '16': 1814,
 '17': 1146,
 '18': 1439,
 '19': 1188,
 '20': 1722,
 '21': 1745,
 '22': 479,
 '23': 543}

In [28]:
avg_by_hour = []

for hr in comments_by_hour:
    avg_by_hour.append([hr, comments_by_hour[hr] / counts_by_hour[hr]])

avg_by_hour

[['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['19', 10.8],
 ['16', 16.796296296296298],
 ['06', 9.022727272727273],
 ['00', 8.127272727272727],
 ['01', 11.383333333333333],
 ['04', 7.170212765957447],
 ['02', 23.810344827586206],
 ['08', 10.25],
 ['03', 7.796296296296297],
 ['10', 13.440677966101696],
 ['11', 11.051724137931034],
 ['09', 5.5777777777777775],
 ['05', 10.08695652173913],
 ['14', 13.233644859813085],
 ['13', 14.741176470588234],
 ['15', 38.5948275862069],
 ['22', 6.746478873239437],
 ['20', 21.525],
 ['07', 7.852941176470588],
 ['18', 13.20183486238532],
 ['21', 16.009174311926607]]

Now, let's get a sorted version of the dictionary to see when we should post on HN.

In [37]:
avg_by_hour = sorted(avg_by_hour,key=lambda x: x[1], reverse=True)
avg_by_hour

[['15', 38.5948275862069],
 ['02', 23.810344827586206],
 ['20', 21.525],
 ['16', 16.796296296296298],
 ['21', 16.009174311926607],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['18', 13.20183486238532],
 ['17', 11.46],
 ['01', 11.383333333333333],
 ['11', 11.051724137931034],
 ['19', 10.8],
 ['08', 10.25],
 ['05', 10.08695652173913],
 ['12', 9.41095890410959],
 ['06', 9.022727272727273],
 ['00', 8.127272727272727],
 ['23', 7.985294117647059],
 ['07', 7.852941176470588],
 ['03', 7.796296296296297],
 ['04', 7.170212765957447],
 ['22', 6.746478873239437],
 ['09', 5.5777777777777775]]

# Conclusion

As we can see, it is best to best at times: 15:00, 02:00 and 20:00 for most comments.

It is also best to post stuff to ask for something!