# Exploring Hacker News Posts

## Introduction
Hacker News is a famous forum in technology and startup circles, and posts that make it to the top 
of the Hacker News listings can get hundreds of thousands of visitors as a result.
In this project, we intend to analyze the insights provided by this website.

## Dataset
The dataset contains the detail of the 20,000 posts inclduing their id, title, url, num_points, num_comments, author, and created_at information. 

- id: the unique identifier of each post.
- title: the title of the post.
- url: the web address of the post.
- num_points: number calculated by using upvotes minus downvotes.
- num_comments: the number of comments. 
- author: the creator of the post.
- created_at: the timestamp of the post.`

In [7]:
# translating a CSV file to a list of lists
import csv
with open('hacker_news.csv') as f:
    reader = csv.reader(f)
    hn = list(reader)

print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


Notice that the first list is the column headers, which we need to remove.

In [8]:
headers = hn[0]
hn = hn[1:]

print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


Now that we've removed the headers from hn, we are ready to filter our data. Since we are only concerned with post titles beginning with Ask HN or Show HN, we will create new list of lists contaning just titles.

In [10]:
ask_posts = []
show_posts = []
other_posts = []

for list in hn:
    title = list[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(list)
    elif title.startswith('show hn'):
        show_posts.append(list)
    else:
        other_posts.append(list)

print('The length of ask posts is:', len(ask_posts))
print('The length of show posts is:', len(show_posts))
print('The length of other posts is:', len(other_posts))

print(ask_posts[:5])
print(show_posts[:5])
print(other_posts[:5])

The length of ask posts is: 1744
The length of show posts is: 1162
The length of other posts is: 17194
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]
[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590

Next, let's determine if ask posts or show posts receive more comments on average.

In [11]:
total_ask_comments = 0

for post in ask_posts:
    total_ask_comments += int(post[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
print('The average number of comments for ask posts is:', avg_ask_comments)

total_show_comments = 0

for post in show_posts:
    total_show_comments += int(post[4])

avg_show_comments = total_show_comments / len(show_posts)
print('The average number of comments for show posts is:', avg_show_comments)

The average number of comments for ask posts is: 14.038417431192661
The average number of comments for show posts is: 10.31669535283993


The ask posts receive more comments on average than show posts, which indicates people are more likely to reply when you ask in advance.