## Determining which type and timing of posts on Hacker News yields highest number of comments:

Amongst the many categories of posts in Hacker News, there are 2 categories of concern:<br\>
1) Ask HN where users submit Ask HN posts to ask the Hacker News community a specific question<br\>
2)Show HN where users submit Show HN posts to show the Hacker News Communiity a project, product or just generally something interesting

This data analysis aims to find which kind of posts and timing yields the highest number of comments and points per post.  With this information, users of Hacker News can decide on which type of post and when to post in order to receive desired audience engagement.

The documentation for this dataset can be found at: https://www.kaggle.com/hacker-news/hacker-news-posts/home

In [215]:
#Displaying first 5 rows of data from the raw data file:
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

count = 0
for each_row in hn:
    count += 1
    if count < 6:
        print(each_row)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


### 1) Seperating rows into 'Ask HN', 'Show HN' and 'Other Posts' and finding out how many posts belong in each category:

In [216]:
headers = hn[0:1]
hn = hn[1:]

#Gathering a list of all rows corresponding to Ask HN or Show HN:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn') == True: 
        ask_posts.append(row)
    elif title.startswith('show hn') == True:
        show_posts.append(row)
    else:
        other_posts.append(row)
        
#Number of posts in ask_posts:
ask_posts_count = 0 
for row in ask_posts:
    ask_posts_count += 1
print(ask_posts_count)

#Number of posts in show_posts:
show_posts_count = 0
for row in show_posts:
    show_posts_count += 1
print(show_posts_count)

#Number of posts in other_posts:
other_posts_count = 0
for row in other_posts:
    other_posts_count += 1
print(other_posts_count) 

1744
1162
17194


### 2) Determining number of total and average number of comments in each category of posts:

In [217]:
#Determining total number of comments in Ask Hn-type posts:
total_ask_comments = 0
for each_post in ask_posts:
    total_ask_comments += int(each_post[4])
print('Total number of comments in Ask Hn-type posts are:', total_ask_comments)

#Determining average number of comments in Ask Hn-type posts:
ask_average_num_comments = total_ask_comments/ask_posts_count
print('Average number of comments in Ask Hn-type posts are:', ask_average_num_comments)

#Determining total number of comments in Show Hn-type posts:
total_show_comments = 0
for each_post in show_posts:
    total_show_comments += int(each_post[4])
print('Total number of comments in Show Hn-type posts are:', total_show_comments)

#Determining average number of comments in Show Hn-type posts:
average_show_comments = total_show_comments/show_posts_count
print('Average number of comments in Show Hn-type posts are:', average_show_comments)

Total number of comments in Ask Hn-type posts are: 24483
Average number of comments in Ask Hn-type posts are: 14.038417431192661
Total number of comments in Show Hn-type posts are: 11988
Average number of comments in Show Hn-type posts are: 10.31669535283993


Ask Hn-type posts generates a higher average number of comments per post than Show Hn-type posts and hence the remaining analysis in this report is focused on Ask Hn-type posts.

### 3) Determining highest number of posts and comments by hour:

In [218]:
import datetime as dt

#List of lists where by each row has 2 elements: Hour, number of comments
list_of_time_and_comments = []
for row in ask_posts:
    date_and_time = dt.datetime.strptime(row[-1], '%m/%d/%Y %H:%M')
    time_in_hr = dt.datetime.strftime(date_and_time, '%H')
    list_of_time_and_comments.append([time_in_hr, float(row[4])])
    
#Forming 2 dictionaries of hour:number of posts and hour:number of comments
num_of_posts_per_hr = {}
num_of_comments_per_hr = {}
for row in list_of_time_and_comments:
    if row[0] in num_of_posts_per_hr:
        num_of_posts_per_hr[row[0]] += 1
    else:
        num_of_posts_per_hr[row[0]] = 1
        
for row in list_of_time_and_comments:
    if row[0] in num_of_comments_per_hr:
        num_of_comments_per_hr[row[0]] += float(row[1])
    else:
        num_of_comments_per_hr[row[0]] = float(row[1])
        
        
#Printing the no. posts by hour in descending order:
counts_by_hr = [(v,k) for k,v in num_of_posts_per_hr.items()]#converting dict into tuple for sorting
counts_by_hr = sorted(counts_by_hr, reverse=True)
for tuple in counts_by_hr:
    print('At', tuple[1], '00 hours --> num of posts =', tuple[0])
    
print('\n')

#Printing the no. of comments by hour in descending order:
comments_by_hour = [(v,k) for k, v in num_of_comments_per_hr.items()]
comments_by_hour = sorted(comments_by_hour, reverse=True)
for tuple in comments_by_hour:
    print('At', tuple[1], '00 hours --> num of comments =', tuple[0])
    
        
    
    

At 15 00 hours --> num of posts = 116
At 19 00 hours --> num of posts = 110
At 21 00 hours --> num of posts = 109
At 18 00 hours --> num of posts = 109
At 16 00 hours --> num of posts = 108
At 14 00 hours --> num of posts = 107
At 17 00 hours --> num of posts = 100
At 13 00 hours --> num of posts = 85
At 20 00 hours --> num of posts = 80
At 12 00 hours --> num of posts = 73
At 22 00 hours --> num of posts = 71
At 23 00 hours --> num of posts = 68
At 01 00 hours --> num of posts = 60
At 10 00 hours --> num of posts = 59
At 11 00 hours --> num of posts = 58
At 02 00 hours --> num of posts = 58
At 00 00 hours --> num of posts = 55
At 03 00 hours --> num of posts = 54
At 08 00 hours --> num of posts = 48
At 04 00 hours --> num of posts = 47
At 05 00 hours --> num of posts = 46
At 09 00 hours --> num of posts = 45
At 06 00 hours --> num of posts = 44
At 07 00 hours --> num of posts = 34


At 15 00 hours --> num of comments = 4477.0
At 16 00 hours --> num of comments = 1814.0
At 21 00 hours 

### 4) Determining highest number of comments per post by hour:

In [219]:
average_comments_per_post_hr = {}
for x in num_of_posts_per_hr:
    for y in num_of_comments_per_hr:
        if x == y:
            average_comments_per_post_hr[x] = float(num_of_comments_per_hr[x])/float(num_of_posts_per_hr[x])
            
avg_comments_per_post_by_hr = [(v,k) for k, v in average_comments_per_post_hr.items()]
avg_comments_per_post_by_hr = sorted(avg_comments_per_post_by_hr, reverse=True)

for tuple in avg_comments_per_post_by_hr:
    new = '{:.2f}'.format(tuple[0])
    print('At', tuple[1], '00 hours --> Average num of comments per post =', new)

At 15 00 hours --> Average num of comments per post = 38.59
At 02 00 hours --> Average num of comments per post = 23.81
At 20 00 hours --> Average num of comments per post = 21.52
At 16 00 hours --> Average num of comments per post = 16.80
At 21 00 hours --> Average num of comments per post = 16.01
At 13 00 hours --> Average num of comments per post = 14.74
At 10 00 hours --> Average num of comments per post = 13.44
At 14 00 hours --> Average num of comments per post = 13.23
At 18 00 hours --> Average num of comments per post = 13.20
At 17 00 hours --> Average num of comments per post = 11.46
At 01 00 hours --> Average num of comments per post = 11.38
At 11 00 hours --> Average num of comments per post = 11.05
At 19 00 hours --> Average num of comments per post = 10.80
At 08 00 hours --> Average num of comments per post = 10.25
At 05 00 hours --> Average num of comments per post = 10.09
At 12 00 hours --> Average num of comments per post = 9.41
At 06 00 hours --> Average num of comment

The top 5 hours with highest average number of comments per post are as follows(12 hrs ahead):

1) 1500<br\>
2) 0200<br\>
3) 2000<br\>
4) 1600<br\>
5) 2100

It should be noted these values correspond to Eastern Time in the US.

Top 5 timing after converting to Singapore timezone:

1) 0300<br\>
2) 1400<br\>
3) 0800<br\>
4) 0400<br\>
5) 0900<br\>

We have arrived at a conclusion that between Ask Hn posts and Show Hn posts, posts from Ask Hn at the above mentioned timings results in highest number of comments.  However, it is also important for us to determine which posts receive higher number of points as it is also a signal of how well received the posts are by the audience which is another important factor for a user wanting to decide what kind and when to post for positive maximum reach.

### 5) Determining if Ask Hn type posts or Show Hn type posts receive higher number of points per post on average:

In [220]:
#Determining the average number of points per post for Ask Hn type posts:
ask_posts_points = 0
total_num_of_posts = 0
for row in ask_posts:
    total_num_of_posts += 1
    ask_posts_points += float(row[3])
print('Average number of points per post for Ask Hn type posts = {:.2f}'.format(ask_posts_points/total_num_of_posts))
    
#Determining the average number of points per post for Show Hn type posts:
show_posts_points = 0
total_num_of_posts = 0
for row in show_posts:
    total_num_of_posts += 1
    show_posts_points += float(row[3])
print('Average number of points per post for Show Hn type posts = {:.2f}'.format(show_posts_points/total_num_of_posts))

Average number of points per post for Ask Hn type posts = 15.06
Average number of points per post for Show Hn type posts = 27.56


Show Hn type posts receices more points per post despite having lower number of comments than Ask Hn type posts.  This could mean that the audience viewing Show Hn posts prefer the content more than Ask Hn posts but are less compelled to comment on the content.  

### 6) Determining if posts created at a certain time are more likely to receive more points:

Since we have determined that Show Hn posts receive higher number of points per post on average, we will focus on Show Hn posts:

In [221]:
import datetime as dt

num_of_posts_per_hr = {}
num_of_points_per_hr = {}
for row in show_posts:
    time = dt.datetime.strptime(row[-1], '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(time,'%H')
    if hour in num_of_points_per_hr and hour in num_of_posts_per_hr:
        num_of_points_per_hr[hour] += float(row[3])
        num_of_posts_per_hr[hour] += 1
    else:
        num_of_points_per_hr[hour] = float(row[3])
        num_of_posts_per_hr[hour] = 1
    
average_points_per_hr = {}
for x in num_of_points_per_hr:
    for y in num_of_posts_per_hr:
        if y == x:
            average_points_per_hr[y] = num_of_points_per_hr[x]/num_of_posts_per_hr[y]

average_points_per_hour = [(v,k) for k, v in average_points_per_hr.items()]
average_points_per_hour = sorted(average_points_per_hour, reverse=True)

for tuple in average_points_per_hour:
    print('At', tuple[1], '00 hours --> average points per post = ', '{:.2f}'.format(tuple[0]))
    

At 23 00 hours --> average points per post =  42.39
At 12 00 hours --> average points per post =  41.69
At 22 00 hours --> average points per post =  40.35
At 00 00 hours --> average points per post =  37.84
At 18 00 hours --> average points per post =  36.31
At 11 00 hours --> average points per post =  33.64
At 19 00 hours --> average points per post =  30.95
At 20 00 hours --> average points per post =  30.32
At 15 00 hours --> average points per post =  28.56
At 16 00 hours --> average points per post =  28.32
At 17 00 hours --> average points per post =  27.11
At 14 00 hours --> average points per post =  25.43
At 03 00 hours --> average points per post =  25.15
At 01 00 hours --> average points per post =  25.00
At 13 00 hours --> average points per post =  24.63
At 06 00 hours --> average points per post =  23.44
At 07 00 hours --> average points per post =  19.00
At 10 00 hours --> average points per post =  18.92
At 09 00 hours --> average points per post =  18.43
At 21 00 hou

Top 5 post timings based on number of points:

1) 2300(eastern time) -> 1100(Sg time)<br\>
2) 1200(eastern time) -> 0000(Sg time)<br\>
3) 2200(eastern time) -> 1000(Sg time)<br\>
4) 0000(eastern time) -> 1200(Sg time)<br\>
5) 1800(eastern time) -> 0600(Sg time)

Hence, for users who are more concerned with posting well-received content rather than posts with highest number of comments, Show-Hn type posts at the above-mentioned timings will probably generate the desired audience engagement.

### Comparing the Ask Hn posts and Show Hn posts with other posts based on average number of comments and points per post:

In [222]:
#Determining number of comments and points per post for Other Posts:
num_of_other_posts = 0
total_comments = 0
total_points = 0
for row in other_posts:
    num_of_other_posts += 1
    total_comments += float(row[4])
    total_points += float(row[3])
    
print('For Others category, the avg num of comments per post is {:.2f}'.format(total_comments/num_of_other_posts))
print('For Others category, the avg num of points per post is {:.2f}'.format(total_points/num_of_other_posts))

For Others category, the avg num of comments per post is 26.87
For Others category, the avg num of points per post is 55.41


For Ask Hn cateogy, avg num of comments per post is 14.04
For Ask Hn category, avg num of points per post is 15.06

For Show Hn category, avg num of comments per post is 10.32
For Show Hn category, avg num of points per post is 27.56

Hence we can see that despite this report focusing on comparison of Ask Hn type posts and Show Hn type posts, the average number of points and comments is much higher in other categories.

### Conclusion

When comparing between Ask Hn and Show Hn type posts, Ask Hn type posts is more likely to generate more comments at 0300, 1400, 0800, 0400 and 0900 Hours.(Sg time) However Show Hn type posts is more likely to generate more points at 1100, 0000, 1000, 1200, 0600 hours.  Hence users wanting more comments from audience should pick to post Ask Hn type post at the abovementioned hours.  In contrast, users wanting more points should pick Shown Hn type posts at the abovementioned hours.  This analysis has only been limited to comparison between Ask Hn and Shown Hn categories but comparison with other categories show that Ask Hn and Shown Hn generates much lesser average num of points and comments per post as opposed to posts in 'others' category.  