# Exploring Hacker News Posts
In this project, we'll work with a data set of submissions to popular technology site [Hacker News](https://news.ycombinator.com/).

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

### Analysis goal
Our main goal will be to deteminate **Top 5 hours for posting to get most comments**

First we need to separate headers from rest of the data

In [4]:
from csv import reader

with open("data_sets/HN_posts_year_to_Sep_26_2016.csv", encoding='utf8') as data_file:
    hn = list(reader(data_file))

print(len(hn))
headers = hn[0]
hn.remove(headers)
print(len(hn))

293120
293119


Now let's take a look on the first few rows of our data set

In [10]:
print("Headers:\n%s\n\nData:" % headers)
for row in hn[:5]:
    print(row)

Headers:
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

Data:
['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']
['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']
['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']
['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']
['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']


We're specifically interested in posts whose titles begin with either _Ask HN_ or _Show HN_. 

Users submit _Ask HN_ posts to ask the Hacker News community a specific question. Below are a couple examples:
* Ask HN: How to improve my personal website?
* Ask HN: Am I the only one outraged by Twitter shutting down share counts?
* Ask HN: Aby recent changes to CSS that broke mobile?

Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting. Below are a couple of examples:
* Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
* Show HN: Something pointless I made
* Show HN: Shanhu.io, a programming playground powered by e8vm

We'll compare these two types of posts to determine the following:

1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

Let's separate posts beginning with _Ask HN_ and _Show HN_ (and case variations) into two different lists next.

In [None]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))