# Exploring Hacker News

This project will explore "Ask HN" and "Show HN" posts. Do either of the posts receive more comments than the other? Do posts created at a certain time receive more comments than average? 

In [1]:
import csv
opened_file = open('hacker_news.csv')
hn = list(csv.reader(opened_file))



# Removing Headers from a List

In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


# Extracting Ask HN and Show HN Posts

In [3]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


# Calculate the Number of Comments for Ask HN and Show HN Posts

In [4]:
total_ask_comments = 0
for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

total_show_comments = 0
for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)



14.038417431192661
10.31669535283993


"HN Ask" posts receive an average of 14 comments per post while "HN Show" posts" receive an average of 10.3 comments per post.

Since "HN Ask" posts receive the most comments, we'll focus our attentnion on those. Next, we'll analyze if a certain time of day is more likely to receive more comments. 

# Find the Amount of Ask Posts and Comments Created Per Hour

In [5]:
import datetime as dt
result_list = []
for row in ask_posts:
    created_at = row[6]
    comments = row[4]
    comments = int(comments)
    result_list.append([created_at, comments])
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    hour = row[0]
    comment = row[1]
    comment = int(comment)
    hour_dt = dt.datetime.strptime(hour, "%m/%d/%Y %H:%M").strftime("%H")
    
    if hour_dt not in counts_by_hour:
        counts_by_hour[hour_dt] = 1
        comments_by_hour[hour_dt] = comment
    elif hour_dt in counts_by_hour:
        counts_by_hour[hour_dt] += 1
        comments_by_hour[hour_dt] += comment
        
print(counts_by_hour)
print(comments_by_hour)
    

{'00': 55, '21': 109, '09': 45, '16': 108, '05': 46, '08': 48, '11': 58, '18': 109, '07': 34, '23': 68, '14': 107, '02': 58, '12': 73, '20': 80, '15': 116, '13': 85, '17': 100, '22': 71, '01': 60, '04': 47, '03': 54, '06': 44, '19': 110, '10': 59}
{'00': 447, '21': 1745, '09': 251, '16': 1814, '05': 464, '08': 492, '11': 641, '18': 1439, '07': 267, '23': 543, '14': 1416, '02': 1381, '12': 687, '20': 1722, '15': 4477, '13': 1253, '17': 1146, '22': 479, '01': 683, '04': 337, '03': 421, '06': 397, '19': 1188, '10': 793}


Next we will create a new list of lists containing two elements. The first element of the lists will be the hours posts were created. The second will be the average number of comments posts received at those hours.

In [6]:
avg_by_hour = []

for hr in comments_by_hour:
    avg_by_hour.append([hr, comments_by_hour[hr] / counts_by_hour[hr]])
           
print(avg_by_hour)


[['00', 8.127272727272727], ['21', 16.009174311926607], ['09', 5.5777777777777775], ['16', 16.796296296296298], ['05', 10.08695652173913], ['08', 10.25], ['11', 11.051724137931034], ['18', 13.20183486238532], ['07', 7.852941176470588], ['23', 7.985294117647059], ['14', 13.233644859813085], ['02', 23.810344827586206], ['12', 9.41095890410959], ['20', 21.525], ['15', 38.5948275862069], ['13', 14.741176470588234], ['17', 11.46], ['22', 6.746478873239437], ['01', 11.383333333333333], ['04', 7.170212765957447], ['03', 7.796296296296297], ['06', 9.022727272727273], ['19', 10.8], ['10', 13.440677966101696]]


# Sort the List and Print the 5 Highest Values

In [11]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for Ask Posts Comments")


for avg, hr in sorted_swap[:5]:
    template = "{}: {:.2f} average comments per post"
    template = template.format(dt.datetime.strptime(hr, "%H").strftime("%H:%M"), avg)
    print(template)

    
    

[[8.127272727272727, '00'], [16.009174311926607, '21'], [5.5777777777777775, '09'], [16.796296296296298, '16'], [10.08695652173913, '05'], [10.25, '08'], [11.051724137931034, '11'], [13.20183486238532, '18'], [7.852941176470588, '07'], [7.985294117647059, '23'], [13.233644859813085, '14'], [23.810344827586206, '02'], [9.41095890410959, '12'], [21.525, '20'], [38.5948275862069, '15'], [14.741176470588234, '13'], [11.46, '17'], [6.746478873239437, '22'], [11.383333333333333, '01'], [7.170212765957447, '04'], [7.796296296296297, '03'], [9.022727272727273, '06'], [10.8, '19'], [13.440677966101696, '10']]


Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


# Conclusion

According to this sample of data, the posts on Hacker News that get the most interaction are "Ask HN" posts. The best time to post them are 15:00 Eastern time, with an average of 38.59 comments for each post. 

"Show HN" was a close contender for a high number of comments on posts as well. "Show HN" averaged 10.31 comments per post while "Ask HN" averaged 14.03 comments per post. 