## Hacker News Ask/Show Posts Analysis

In this project we examine the differences between ask and show posts on the website "Hacker News". In ask posts, users will pose specific quesitons to the Hacker News community, while in show posts, users will show the community a project, product, or just something interesting. This project will examine specifically which type of post receives more comments on average, and if posts created at a certain time receive more comments on average.

In [2]:
# opereading hacker_news.csv

from csv import reader
opened_file=open("hacker_news.csv")
read_file=reader(opened_file)
hn=list(read_file)

print(hn[0:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [2]:
# extracting headers (code removed to prevent accidental deletion)

headers=hn[0]
hn=hn[1:]
print(headers)
print(hn[0:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [3]:
# categorizing posts into ask, show, and other posts

ask_posts=[]
show_posts=[]
other_posts=[]

for row in hn:
    title=row[1]
    if (title.lower()).startswith("ask hn"):
        ask_posts.append(row)
    elif (title.lower()).startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

print("No. ask posts:",len(ask_posts))
print("No. show posts:",len(show_posts))
print("No. other posts:",len(other_posts))

No. ask posts: 1744
No. show posts: 1162
No. other posts: 17194


In [4]:
# calculating avg. no. comments on ask posts

total_ask_comments=0
no_ask_posts=len(ask_posts)

for row in ask_posts:
    ask_comments=int(row[4])
    total_ask_comments+=ask_comments

avg_ask_comments=total_ask_comments/no_ask_posts
print("Avg. no. comments on ask posts:",avg_ask_comments)

# calculating avg. no. comments on show posts

total_show_comments=0
no_show_posts=len(show_posts)

for row in show_posts:
    show_comments=int(row[4])
    total_show_comments+=show_comments

avg_show_comments=total_show_comments/no_show_posts
print("Avg. no. comments on show posts:",avg_show_comments)

Avg. no. comments on ask posts: 14.038417431192661
Avg. no. comments on show posts: 10.31669535283993


We can see that ask posts get ~36% more comments than show posts. Ask posts are probably more inviting for discussion and debate, which naturally creates more comments than show posts, which are more purposed towards appreciation (or the opposite).

Now we will dig deeper into ask posts and analyze volume of ask posts by time of day posted.

In [5]:
# calculating the amount of posts made at each hour of the day and the no. of comments made at each hour of the day

import datetime as dt

result_list=[]

for row in ask_posts:
    result_list.append([row[6],int(row[4])])
    
counts_by_hour={}
comments_by_hour={}

for row in result_list:
    created_at=dt.datetime.strptime(row[0],"%m/%d/%Y %H:%M")
    created_at_hour=created_at.hour
    if created_at_hour not in counts_by_hour:
        counts_by_hour[created_at_hour]=1
        comments_by_hour[created_at_hour]=row[1]
    else:
        counts_by_hour[created_at_hour]+=1
        comments_by_hour[created_at_hour]+=row[1]
        
print(counts_by_hour)
print(comments_by_hour)

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}
{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


In [8]:
# calculating average posts per post by hour

avg_by_hour=[]

for row in counts_by_hour:
    post_hour_avg=comments_by_hour[row]/counts_by_hour[row]
    avg_by_hour.append([row,post_hour_avg])
    
print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


In [30]:
# printing top 5 most actively commented hours in the day

swap_avg_by_hour=[]

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])

sorted_swap=sorted(swap_avg_by_hour,reverse=True)
print(sorted_swap)

print("\n","Top 5 Hours for Ask Posts Comments:")

for row in sorted_swap[0:5]:
    hour_template=dt.datetime.strptime(str(row[1]),"%H")
    hour_template=hour_template.strftime("%H:%M")
    template="{}: {:.2f} average comments per post"
    print(template.format(hour_template,row[0]))

[[38.5948275862069, 15], [23.810344827586206, 2], [21.525, 20], [16.796296296296298, 16], [16.009174311926607, 21], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [13.20183486238532, 18], [11.46, 17], [11.383333333333333, 1], [11.051724137931034, 11], [10.8, 19], [10.25, 8], [10.08695652173913, 5], [9.41095890410959, 12], [9.022727272727273, 6], [8.127272727272727, 0], [7.985294117647059, 23], [7.852941176470588, 7], [7.796296296296297, 3], [7.170212765957447, 4], [6.746478873239437, 22], [5.5777777777777775, 9]]

 Top 5 Hours for Ask Posts Comments:
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


From our analysis, we can see that the times with most engagement on posts, (i.e. most comments per posts) are in the afternoon and the evening, with early morning posts doing quite poorly. It is worth noting that these times are in EST.