# Most popular posts in Hacker News

In this project we'll analyse a sample dataset from Hacker News (HN) to determine whether `Ask HN` or `Show HN`get more comments on average, and whether the time at which the posts are created influence the number of comments received.

In [1]:
from csv import reader

In [14]:
opened_file=open('hacker_news.csv')
read_file=reader(opened_file)
hn=list(read_file)

headers=hn[0]
hn=hn[1:]

print(headers)

for row in hn[0:5]:
    print('\n')
    print(row)

print("\n Number of rows: ", len(hn))
    

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']

 Number of 

## Filtering data

Let's filter those posts starting with `Ask HN` or `Show HN`

In [12]:
ask_posts=[] # List of posts starting with "Ask HN"
show_posts=[] # List of posts starting with "Show HN"
other_posts=[] # All others

for row in hn:
    title=row[1]
    title_lower=title.lower()
    if title_lower.startswith('ask hn'):
        ask_posts.append(row)
    elif title_lower.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('The number of "Ask HN" posts is:\t', len(ask_posts))
print('The number of "Show HN" posts is:\t', len(show_posts))  
print('The number of other posts is:\t\t', len(other_posts))    

The number of "Ask HN" posts is:	 1744
The number of "Show HN" posts is:	 1162
The number of other posts is:		 17194


## Average number of comments

In [20]:
total_ask_comments=0 # Here we will sum up all "Ask HN" comments

for row in ask_posts:
    comments=int(row[4])
    total_ask_comments+=comments

total_show_comments=0 # Here we will sum up all "Show HN" comments
for row in show_posts:
    comments=int(row[4])
    total_show_comments+=comments

# Calculate average comments on each group
avg_ask_comments=round(total_ask_comments/len(ask_posts))
avg_show_comments=round(total_show_comments/len(show_posts))

# Display averages
print('"Ask HN" posts received an average of ', avg_ask_comments, " comments")
print('"Show HN" posts received an average of ', avg_show_comments, " comments")

"Ask HN" posts received an average of  14  comments
"Show HN" posts received an average of  10  comments


## Does time influence the number of comments?

We will create a frequency table to calculate the amount of posts created per hour along with the amount of comments

In [21]:
import datetime as dt

In [27]:
result_list=[] #list of lists, each list will contain date and number of comments

for row in ask_posts:
    created_at=row[6]
    comments=int(row[4])
    result_list.append([created_at,comments])   

print(result_list[0:3])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1]]


In [31]:
counts_by_hour={}
comments_by_hour={}

for row in result_list:
    date_and_time=row[0] #Extract the date and time as a a string
    dt_object=dt.datetime.strptime(date_and_time, "%m/%d/%Y %H:%M") # Parse date and time as a datetime object
    hour=dt_object.strftime("%H") # Extract the hour as a string
    if hour not in counts_by_hour:
        counts_by_hour[hour]=1
        comments_by_hour[hour]=row[1]
    else:
        counts_by_hour[hour]+=1
        comments_by_hour[hour]+=row[1]

print(" Posts by hour:\n")
print(counts_by_hour)
print("\n Comments by hour:\n")

print(comments_by_hour)

 Posts by hour:

{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}

 Comments by hour:

{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


## Average number of comments on a post

Let's iterate over the comments dictionary and divide each number of comments by the number of posts, so to obtain an average number of comment per post

In [65]:
avg_by_hour=[] #average comments on a post by hour

for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])

swap_avg_by_hour=[] #swapped columns to perform the sorting based on comments
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap=sorted(swap_avg_by_hour, reverse=True) #it sorts based on the first column

print("Top 5 Hours for Ask Posts Comments, Eastern Standard Time:\n")

for row in sorted_swap[0:5]: #displays results
    hour_object=dt.datetime.strptime(row[1],"%H")
    hour_string=hour_object.strftime("%H:%M")
    print("{} ET: {:.2f} average comments per post".format(hour_string,row[0]))

print("\n")
print("Top 5 Hours for Ask Posts Comments, Santiago time:\n")

for row in sorted_swap[0:5]: #displays results
    hour_object=dt.datetime.strptime(row[1],"%H")
    hour_SCL=hour_object+dt.timedelta(hours=2)
    hour_string=hour_SCL.strftime("%H:%M")
    print("{}: {:.2f} average comments per post".format(hour_string, row[0])) 

Top 5 Hours for Ask Posts Comments, Eastern Standard Time:

15:00 ET: 38.59 average comments per post
02:00 ET: 23.81 average comments per post
20:00 ET: 21.52 average comments per post
16:00 ET: 16.80 average comments per post
21:00 ET: 16.01 average comments per post


Top 5 Hours for Ask Posts Comments, Santiago:

17:00: 38.59 average comments per post
04:00: 23.81 average comments per post
22:00: 21.52 average comments per post
18:00: 16.80 average comments per post
23:00: 16.01 average comments per post


## Conclusions

Based on this results, the best time to comment an "Ask HN" post is 17:00, local time