## Hacker News

Author: Julian Moors\
Contact: julian.moors@outlook.com

### Introduction
_This project will analyse the title field of the Hacker News dataset and determine which posts have a larger engagement percentage._

### Data Dictionary

| Name     | Description             |
| -------- | ----------------------- |
| Column 0 | ID                      |
| Column 1 | Title                   |
| Column 2 | URL                     |
| Column 3 | Votes (minus downvotes) |
| Column 4 | Number of comments      |
| Column 5 | Username                |
| Column 6 | Timestamp               |

In [1]:
import pandas as pd

# load the dataset from the csv file
hn = pd.read_csv('data/hacker-news.csv')

# remove headers from dataset
headers = hn.columns
hn.columns = range(hn.shape[1])

# show first 5 rows
hn[:5]

Unnamed: 0,0,1,2,3,4,5,6
0,12224879,Interactive Dynamic Video,http://www.interactivedynamicvideo.com/,386,52,ne0phyte,8/4/2016 11:52
1,10975351,How to Use Open Source and Shut the Fuck Up at...,http://hueniverse.com/2016/01/26/how-to-use-op...,39,10,josep2,1/26/2016 19:30
2,11964716,Florida DJs May Face Felony for April Fools' W...,http://www.thewire.com/entertainment/2013/04/f...,2,1,vezycash,6/23/2016 22:20
3,11919867,Technology ventures: From Idea to Enterprise,https://www.amazon.com/Technology-Ventures-Ent...,3,1,hswarna,6/17/2016 0:01
4,10301696,Note by Note: The Making of Steinway L1037 (2007),http://www.nytimes.com/2007/11/07/movies/07ste...,8,2,walterbell,9/30/2015 4:12


In [2]:
# select and display all posts that contain 'Ask HN:'
mask = hn.iloc[:, 1].str.contains('Ask HN:', case=False, na=False)
ask_posts = hn[mask]
ask_posts

Unnamed: 0,0,1,2,3,4,5,6
7,12296411,Ask HN: How to improve my personal website?,,2,6,ahmedbaracat,8/16/2016 9:55
17,10610020,Ask HN: Am I the only one outraged by Twitter ...,,28,29,tkfx,11/22/2015 13:43
22,11610310,Ask HN: Aby recent changes to CSS that broke m...,,1,1,polskibus,5/2/2016 10:14
30,12210105,Ask HN: Looking for Employee #3 How do I do it?,,1,3,sph130,8/2/2016 14:20
31,10394168,Ask HN: Someone offered to buy my browser exte...,,28,17,roykolak,10/15/2015 16:38
...,...,...,...,...,...,...,...
20039,10994357,Ask HN: Is it feasible to port Apple's Swift t...,,3,17,schappim,1/29/2016 9:42
20042,12241954,Ask HN: What to do when a developer goes dark?,,3,3,bittysdad,8/7/2016 12:58
20045,12029526,Ask HN: Killer app for AR?,,2,2,davidiach,7/4/2016 8:50
20048,11227969,Ask HN: How do you balance a serious relations...,,10,4,audace,3/5/2016 1:25


In [3]:
# select and display all posts that contain 'Show HN:'
mask = hn.iloc[:, 1].str.contains('Show HN:', case=False, na=False)
show_posts = hn[mask]
show_posts

Unnamed: 0,0,1,2,3,4,5,6
13,10627194,Show HN: Wio Link ESP8266 Based Web of Things...,https://iot.seeed.cc,26,22,kfihihc,11/25/2015 14:03
39,10646440,Show HN: Something pointless I made,http://dn.ht/picklecat/,747,102,dhotson,11/29/2015 22:46
46,11590768,"Show HN: Shanhu.io, a programming playground p...",https://shanhu.io,1,1,h8liu,4/28/2016 18:05
84,12178806,Show HN: Webscope Easy way for web developers...,http://webscopeapp.com,3,3,fastbrick,7/28/2016 7:11
97,10872799,Show HN: GeoScreenshot Easily test Geo-IP bas...,https://www.geoscreenshot.com/,1,9,kpsychwave,1/9/2016 20:45
...,...,...,...,...,...,...,...
19993,11222099,Show HN: Geocoding API built with government o...,https://latlon.io,6,6,evanmarks,3/4/2016 4:50
19999,11735438,Show HN: Decorating: Animated pulsed for your ...,https://github.com/ryukinix/decorating,3,1,lerax,5/20/2016 3:48
20014,10200913,Show HN: Idea to startup,https://ideatostartup.org,14,17,nikhildaga,9/10/2015 22:17
20065,11444393,"Show HN: PhantomJsCloud, Headless Browser SaaS",https://PhantomJsCloud.com,2,1,novaleaf,4/7/2016 3:04


In [4]:
# select and display all posts that don't contain 'Ask HN:' or 'Show HN:'
mask = ~hn.iloc[:, 1].str.contains('Ask HN:', case=False, na=False) & \
       ~hn.iloc[:, 1].str.contains('Show HN:', case=False, na=False)
other_posts = hn[mask]
other_posts

Unnamed: 0,0,1,2,3,4,5,6
0,12224879,Interactive Dynamic Video,http://www.interactivedynamicvideo.com/,386,52,ne0phyte,8/4/2016 11:52
1,10975351,How to Use Open Source and Shut the Fuck Up at...,http://hueniverse.com/2016/01/26/how-to-use-op...,39,10,josep2,1/26/2016 19:30
2,11964716,Florida DJs May Face Felony for April Fools' W...,http://www.thewire.com/entertainment/2013/04/f...,2,1,vezycash,6/23/2016 22:20
3,11919867,Technology ventures: From Idea to Enterprise,https://www.amazon.com/Technology-Ventures-Ent...,3,1,hswarna,6/17/2016 0:01
4,10301696,Note by Note: The Making of Steinway L1037 (2007),http://www.nytimes.com/2007/11/07/movies/07ste...,8,2,walterbell,9/30/2015 4:12
...,...,...,...,...,...,...,...
20095,12379592,How Purism Avoids Intels Active Management Tec...,https://puri.sm/philosophy/how-purism-avoids-i...,10,6,AdmiralAsshat,8/29/2016 2:22
20096,10339284,YC Application Translated and Broken Down,https://medium.com/@zreitano/the-yc-applicatio...,4,1,zreitano,10/6/2015 14:57
20097,10824382,Microkernels are slow and Elvis didn't do no d...,http://blog.darknedgy.net/technology/2016/01/0...,169,132,vezzy-fnord,1/2/2016 0:49
20098,10739875,How Product Hunt really works,https://medium.com/@benjiwheeler/how-product-h...,695,222,brw12,12/15/2015 19:32


In [5]:
# display the total number of comments from 'ask_posts'
num_ask = ask_posts.iloc[:,4].sum()
num_ask

np.int64(24450)

In [6]:
# display the total number of comments from 'show_posts'
num_show = show_posts.iloc[:,4].sum()
num_show

np.int64(11988)

In [7]:
# display the total number of comments from other posts
num_other = other_posts.iloc[:,4].sum()
num_other

np.int64(462088)

### Data Analysis

In [8]:
# define total number of comments of all posts
num_total = num_ask + num_show + num_other

In [9]:
print("Ask Posts: {:.2f}%".format(num_ask / num_total * 100))

Ask Posts: 4.90%


In [10]:
print("Show Posts: {:.2f}%".format(num_show / num_total * 100))

Show Posts: 2.40%


In [11]:
print("Other Posts: {:.2f}%".format(num_other / num_total * 100))

Other Posts: 92.69%


### Conclusion
_The percentage of comments of 'Ask Posts:' is larger than the percentage of comments for 'Show Posts:'. This demonstrates that users will engage better when questions are asked of others rather than users showing what they have done, although the total number of comments from both sets of posts is still less than 8%._