# Exploring Hacker News Ask HN and Show HN posts

### I will be playing around with the Hacker News posts database and specifically comparing Ask HN and Show HN. By doing that I want to determine which gets more comments on average and if posts that are created at certain times get more comments on average.

In [1]:
#Reading in the database
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

### id: The unique identifier from Hacker News for the post
### title: The title of the post
### url: The URL that the posts links to, if the post has a URL
### num_points: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
### num_comments: The number of comments that were made on the post
### author: The username of the person who submitted the post
### created_at: The date and time at which the post was submitted

In [5]:
for x in hn[:5]:
    print('{} \n'.format(x))


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] 

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] 

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] 

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] 

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'] 



In [6]:
#fetching out the headers and removing them from the dataset
headers = hn[0]
hn.pop(0)
for x in hn[:5]:
    print('{} \n'.format(x))

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] 

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] 

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] 

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'] 

['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'] 



In [17]:
#Separating out different types of posts
ask_posts = []
show_posts = []
other_posts = []

#Using lower so we can use startswith next to get all the posts sorted
for row in hn:
    title = row[1].lower()
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

#Checking out how many posts each type has
print('There are a total of {amount} posts in our database'.format(amount=len(hn)))
print("There are a total of {amount} Ask HN posts".format(amount=len(ask_posts)))
print("There are a total of {amount} Show HN posts".format(amount=len(show_posts)))
print("There are a total of {amount} other posts".format(amount=len(other_posts)))

There are a total of 20100 posts in our database
There are a total of 1744 Ask HN posts
There are a total of 1162 Show HN posts
There are a total of 17194 other posts


### Now that we have separated the posts to different lists I will be determining if Ask HN or Show HN have more comments

In [23]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments

avg_ask_comments = total_ask_comments / len(ask_posts)

total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments

avg_show_comments = total_show_comments / len(show_posts)

print("On average Ask HN get {:.2f} comments on each post".format(avg_ask_comments))
print("On average Show HN get {:.2f} comments on each post".format(avg_show_comments))

if avg_ask_comments > avg_show_comments:
    difference = avg_ask_comments - avg_show_comments
    print("On average Ask HN gets {difference:.2f} more comments on each post than Show HN".format(difference=difference))
    
else:
    difference = avg_show_comments - avg_ask_comments
    print("On average Show HN gets {difference:.2f} more comments on each post than Ask HN".format(difference=difference))

On average Ask HN get 14.04 comments on each post
On average Show HN get 10.32 comments on each post
On average Ask HN gets 3.72 more comments on each post than Show HN


### As we can see from the above cell Ask HN gets 3.72 more comments on each post than Show HN and that was predicted as Ask HN posts are the type of posts where people ask for advice or help (few examples down below). Where as Show HN people show something they have discovered (few examples down below) or want to share some advice/tips. Therefore if someone posts an ask type of post they tend to get more comments as it is aimed at people to give advice and answer to the post.

In [26]:
for x in ask_posts[:5]:
    print(x[1])

Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Aby recent changes to CSS that broke mobile?
Ask HN: Looking for Employee #3 How do I do it?
Ask HN: Someone offered to buy my browser extension from me. What now?


In [25]:
for x in show_posts[:5]:
    print(x[1])

Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm
Show HN: Webscope  Easy way for web developers to communicate with Clients


### I will now direct my focus more towards Ask HN posts to find out if there is correlation between more comments and the time the posts were made.

In [40]:
import datetime as dt

result_list = []

#Adding the time the post was created and the number of comments of each post to result_list
for row in ask_posts:
    temp_list = [row[6], int(row[4])]
    result_list.append(temp_list)
    
counts_by_hour = {}
comments_by_hour = {}
#8/4/2016 11:52

date_format = "%m/%d/%Y %H:%M"
for row in result_list[:5]:
    created_at_dt = dt.datetime.strptime(row[0], date_format)
    created_at_hour_dt = created_at_dt.hour

    print(created_at_hour_dt.strftime("%H"))
    

AttributeError: 'int' object has no attribute 'strftime'