## Analysis of Hacker News Dataset
* Hacker News is a site similar to Reddit, where people post topics and they are voted and commented upon.
* We are going to focus on posts which begin with "Ask HN" or "Show HN". 
* "Ask HN" posts are those where a user posts something to ask from HN community, where "Show HN" posts are those which a user posts to show the HN community some project, product or something of interest.

[Click here to view the Hacker News Dataset](https://www.kaggle.com/hacker-news/hacker-news-posts)

---

**Note** - I have used a subset of the dataset from the link given. Your results for the same code as below, might vary.

In [6]:
#reading file
#converting into list
#separating header row
import datetime as dt
from csv import reader
opened=open("hacker_news.csv")
read=reader(opened)
hn=list(read)
headers=hn[0]
hn=hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [37]:
#separating "Ask HN","Show HN" and Other posts as per title
ask_posts=[]
show_posts=[]
other_posts=[]
for row in hn:
    title=row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print("Records with 'Ask HN' : ",len(ask_posts))
print("Records with 'Show HN' : ",len(show_posts))
print("Other Records : ",len(other_posts))

Records with 'Ask HN' :  1744
Records with 'Show HN' :  1162
Other Records :  17194


In [4]:
#checking if 'Ask HN' or 'Show HN' posts receive more comments on average
def avg_comments(recv_list):
    total_comments=0
    for row in recv_list:
        num_comments=int(row[4])
        total_comments+=num_comments
    avg_comments=total_comments/len(recv_list)
    return total_comments,avg_comments

total_ask_comments,avg_ask_comments=avg_comments(ask_posts)
total_show_comments,avg_show_comments=avg_comments(show_posts)
print("Average comments on ASK posts : ",avg_ask_comments)
print("Average comments on SHOW posts : ",avg_show_comments)

Average comments on ASK posts :  14.038417431192661
Average comments on SHOW posts :  10.31669535283993


#### Our inference from comment calculation above
* In the above cell, we calculated the total and average comments received on 'Ask HN' and 'Show HN' posts.
* As per the results, the 'Ask HN' posts have received more comments on average.
* We will focus on the 'Ask HN' list/subset for further analysis.

In [25]:
#calculating amount of 'Ask HN' posts and
#comments on them created in each hour of the day
result_list=[]
for row in ask_posts:
    local_list=[]
    local_list.append(row[6])
    local_list.append(int(row[4]))
    result_list.append(local_list)
    
counts_by_hour={}
comments_by_hours={}

for row in result_list:
    date=dt.datetime.strptime(row[0],"%m/%d/%Y %H:%M")
    hour=date.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour]=1
        comments_by_hours[hour]=row[1]
    else:
        counts_by_hour[hour]+=1
        comments_by_hours[hour]+=row[1]

#counting avg comments per post per given hour
avg_comments_hour=[]
for hour in counts_by_hour:
    local_list=[]
    local_list.append(hour)
    avg_comments_per_post=comments_by_hours[hour]/counts_by_hour[hour]
    local_list.append(avg_comments_per_post)
    avg_comments_hour.append(local_list)

print(avg_comments_hour)

[['21', 16.009174311926607], ['08', 10.25], ['17', 11.46], ['05', 10.08695652173913], ['07', 7.852941176470588], ['14', 13.233644859813085], ['20', 21.525], ['02', 23.810344827586206], ['03', 7.796296296296297], ['19', 10.8], ['04', 7.170212765957447], ['01', 11.383333333333333], ['22', 6.746478873239437], ['06', 9.022727272727273], ['15', 38.5948275862069], ['16', 16.796296296296298], ['11', 11.051724137931034], ['23', 7.985294117647059], ['09', 5.5777777777777775], ['00', 8.127272727272727], ['10', 13.440677966101696], ['13', 14.741176470588234], ['12', 9.41095890410959], ['18', 13.20183486238532]]


In [36]:
#swapping and sorting the list obtained above in descending by hours value
swap_avg_by_hour=[]
for row in avg_comments_hour:
    swap_avg_by_hour.append([row[1],row[0]])
print(swap_avg_by_hour)
sorted_swap=sorted(swap_avg_by_hour,reverse=True)

print("Top 5 Hours for Ask Posts Comments : ")
for row in sorted_swap[:5]:
    date=dt.datetime.strptime(row[1],"%H")
    hour=date.strftime("%H:00")
    output="{0}: {1:.2f} average comments per post"
    print(output.format(hour,row[0]))

    

[[16.009174311926607, '21'], [10.25, '08'], [11.46, '17'], [10.08695652173913, '05'], [7.852941176470588, '07'], [13.233644859813085, '14'], [21.525, '20'], [23.810344827586206, '02'], [7.796296296296297, '03'], [10.8, '19'], [7.170212765957447, '04'], [11.383333333333333, '01'], [6.746478873239437, '22'], [9.022727272727273, '06'], [38.5948275862069, '15'], [16.796296296296298, '16'], [11.051724137931034, '11'], [7.985294117647059, '23'], [5.5777777777777775, '09'], [8.127272727272727, '00'], [13.440677966101696, '10'], [14.741176470588234, '13'], [9.41095890410959, '12'], [13.20183486238532, '18']]
Top 5 Hours for Ask Posts Comments : 
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
