#  Researching Hacker News Posts

## Introduction
[Hacker News](https://news.ycombinator.com/) is a popular website focuses on Computer science and Entrepreneurship, where technology related posts are commented and voted upon. There are two major types of posts. Ask HN and Show HN. Ask HN is where user posts specific question and Show HN is where user posts a project, product or something interesting.

In this project we will try to find:

Do Ask HN or Show HN receive more comments on average?

Do posts created at a certain time receive more comments on average?

### Summary of Results
After analyzing the data, we found that *Ask HN* posts receive more comments on average than *Show HN* posts. We also determined if the *Ask HN* post were create between 3.00 - 4.00 PM EST the average number of comments increases over 60%.

For more details, please refer to our analysis below.

## Data Set
The link to the data set can be found [here](https://www.kaggle.com/hacker-news/hacker-news-posts). It has been downloaded to the PC and named as "*hacker_news.csv*"

## Opening and Exploring Data

Let us start by opening the data set and convert it into a list of list.

In [1]:
#Opening and converting data to list of list
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

After we converted the data set into a list of list, lets print first few rows of it and see how it looks like.

In [2]:
#Printing first 4 rows
for each_row in hn[:4]:
    print(each_row)
    print("\n")

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']




As you see above, first row contain all the column names. We can move those into another variable so that it will be easier to analyse the data.

In [3]:
#Extracting header row
header = hn[0]
hn = hn[1:]
print(header)


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


we have extracted the first row into a variable named *header* and the *hn* list will only contain post.

In [4]:
#verifying hn list
for each_row in hn[:3]:
    print(each_row)
    print("\n")

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']




## Extracting Ask HN and Show HN Posts
In our project we are dealing only with *Ask HN* and *Show HN* posts. So we will segrete these into 2 different lists from other posts. 

For this we will use the string method *startswith* which extracts all the posts whose name starts with *Ask HN* and *Show HN*

We will also be using another string method *lower* to account for case variations.

In [5]:
#Extract Ask HN and Show HN posts
ask_posts = []
show_posts = []
other_posts = []

for each_post in hn:
    name = each_post[1]
    if name.lower().startswith("ask hn"):
        ask_posts.append(each_post)
    elif name.lower().startswith("show hn"):
        show_posts.append(each_post)
    else:
        other_posts.append(each_post)
        

Lets find out how many Ask, Show and other posts we have.

In [6]:
#Finding number of posts in each list
print("The number of Ask HN posts is: ", len(ask_posts))
print("\n")
print("The number of Show HN posts is: ", len(show_posts))
print("\n")
print("The number of other posts is: ", len(other_posts))
print("\n")

The number of Ask HN posts is:  1744


The number of Show HN posts is:  1162


The number of other posts is:  17194




There as total of 1744 *Ask HN* posts and 1162 *Show HN* posts. Lets print few rows of both.

In [7]:
#Printing few Ask HN posts
print(header)
print("\n")
for each in ask_posts[:3]:
    print(each)
    print("\n")

#Printing few Ask HN posts
print(header)
print("\n")
for each in show_posts[:3]:
    print(each)
    print("\n")

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']


['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']


['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']


['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']


['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']




## Average Comments for Ask HN and Show HN Posts

Now lets us find which type of post gather more number of comments. For that, we first need to find the average comments for each type of posts.

In [8]:
#Average number of comments for Ask HN posts
sum_ask = 0
for each_post in ask_posts:
    comment = int(each_post[4])#converting comments into int type
    sum_ask += comment
    
ask_average = sum_ask / len(ask_posts)
    
#Average number of comments for Show HN posts
sum_show = 0
for each_post in show_posts:
    comment = int(each_post[4])#converting comments into int type
    sum_show += comment
    
show_average = sum_show / len(show_posts) 

print("Average comment for Ask Posts is: ", ask_average)
print("\n")
print("Average comment for Show Posts is: ", show_average)

Average comment for Ask Posts is:  14.038417431192661


Average comment for Show Posts is:  10.31669535283993


So we have found that *Ask HN* post on average gets ~ 14 comments per post which is 40% higher than the ~10 comments received by *Show HN* posts.

So we will proceed only with *Ask HN* posts further in our analysis.


## Discover Amount of Ask HN Posts and Comments by Hour

Our second goal in the project was to find whether if posts created at certain time receive more comments. So we will start by finding the number of posts and comments per hour and find the average comment per each hour.

In [9]:
#Extracting number of posts and comments for every hour
import datetime as dt
comment_hour = {} 
posts_hour = {}
for each_post in ask_posts:
    comment = int(each_post[4])
    time = dt.datetime.strptime(each_post[6], "%m/%d/%Y %H:%M") 
    if time.hour in comment_hour:
        comment_hour[time.hour] += comment
        posts_hour[time.hour] += 1
    else:
        comment_hour[time.hour] = comment
        posts_hour[time.hour] = 1
        
#Finding average number of comments per hour
average_comment_hour = {}
for each in comment_hour:
    average_comment_hour[each] = comment_hour[each] / posts_hour[each]
    
#Sorting the average_comment_hour in descending order
average_list = []
for hour,comments in average_comment_hour.items():
    average_list.append([comments,hour])
    
sorted_list = sorted(average_list, reverse = True)    

#Formatting for better readability
for comment, hour in sorted_list:
    print("The average comments per post during hour {}:00 is {:.2f}".format(hour, comment))

The average comments per post during hour 15:00 is 38.59
The average comments per post during hour 2:00 is 23.81
The average comments per post during hour 20:00 is 21.52
The average comments per post during hour 16:00 is 16.80
The average comments per post during hour 21:00 is 16.01
The average comments per post during hour 13:00 is 14.74
The average comments per post during hour 10:00 is 13.44
The average comments per post during hour 14:00 is 13.23
The average comments per post during hour 18:00 is 13.20
The average comments per post during hour 17:00 is 11.46
The average comments per post during hour 1:00 is 11.38
The average comments per post during hour 11:00 is 11.05
The average comments per post during hour 19:00 is 10.80
The average comments per post during hour 8:00 is 10.25
The average comments per post during hour 5:00 is 10.09
The average comments per post during hour 12:00 is 9.41
The average comments per post during hour 6:00 is 9.02
The average comments per post during h

So we can see is that *ASK HN posts* posted during 15:00 - 15.59 receives on average, 38.59 comments, which is 60% more than what hour 2 receives, which has ~24 comments per post on average.

The timezone in the data set is EST. So 15:00 corresponds to 3 PM EST.

## Conclusion:
In this project, we analysed Hacker News posts and determined *ASK HN* posts receive more comments on average, compared with *Show HN* posts. Also we found that *Ask HN* posts receive more comments on average, if the posts were during the time 3.00 - 4.00 EST. 

So we recommend creating a *Ask HN* post between 3.00 - 4.00 PM EST, if you are looking to maximize the number of comments your post receives.
