# Exploring Hacker News Posts

### A project with an aim to practice Python (without Pandas or Numpy)

In this project, we'll work with a dataset of submissions to popular technology site [Hacker News](https://news.ycombinator.com/).

We will use a reduced dataset from [here](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts).

Below are descriptions of the columns:

+ `id`: the unique identifier from Hacker News for the post
+ `title`: the title of the post
+ `url`: the URL that the posts links to, if the post has a URL
+ `num_points`: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
+ `num_comments`: the number of comments on the post
+ `author`: the username of the person who submitted the post
+ `created_at`: the date and time of the post's submission

We're specifically interested in posts with titles that begin with either `Ask HN` or `Show HN`. Users submit `Ask HN` posts to ask the Hacker News community a specific question. Likewise, users submit `Show HN` posts to show the Hacker News community a project, product, or just something interesting. 

We'll compare these two types of posts to determine the following:

* Do `Ask HN` or `Show HN` receive more comments on average?
* Do posts created at a certain time receive more comments on average?

## Exploring the Dataset

In [1]:
from csv import reader
hn = list(reader(open('hacker_news.csv')))
for row in hn[:5]:
    print(row)
    print("\n")

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']




In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
for row in hn[:5]:
    print(row)
    print("\n")

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']




In [7]:
print(len(hn))

20099


## Extracting Ask HN and Show HN Posts

In [12]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
    
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17193


In [13]:
print(ask_posts[:5])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]


In [14]:
print(show_posts[:5])

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]


## Average Number of Comments

In [16]:
total_ask_comments = 0
for row in ask_posts:
    comments = int(row[4])
    total_ask_comments += comments
avg_ask_comments = total_ask_comments / len(ask_posts)
print('Average number of comments for "Ask HN" posts is {}'.format(avg_ask_comments))

total_show_comments = 0
for row in show_posts:
    comments = int(row[4])
    total_show_comments += comments
avg_show_comments = total_show_comments / len(show_posts)
print('Average number of comments for "Show HN" posts is {}'.format(avg_show_comments))

total_other_comments = 0
for row in other_posts:
    comments = int(row[4])
    total_other_comments += comments
avg_other_comments = total_other_comments / len(other_posts)
print('Average number of comments for other posts is {}'.format(avg_other_comments))

Average number of comments for "Ask HN" posts is 14.038417431192661
Average number of comments for "Show HN" posts is 10.31669535283993
Average number of comments for other posts is 26.871575641249347


We see that `Ask HN` posts recieve on average more comments than `Show HN` posts, but anyway other posts beat them by far.

## Finding the Number of Ask Posts and Comments by Hour Created

We'll determine if ask posts created at a certain time are more likely to attract comments

In [17]:
import datetime as dt

In [18]:
result_list = []

for row in ask_posts:
    created_at = row[6]
    comments = int(row[4])
    result_list.append([created_at, comments])

In [21]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour = date.hour
    comments = row[1]
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments

## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [38]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_comments = comments_by_hour[hour] / counts_by_hour[hour]
    avg_by_hour.append([avg_comments, hour])

In [44]:
avg_by_hour_sorted = sorted(avg_by_hour, reverse=True)

In [45]:
print(avg_by_hour_sorted)

[[38.5948275862069, 15], [23.810344827586206, 2], [21.525, 20], [16.796296296296298, 16], [16.009174311926607, 21], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [13.20183486238532, 18], [11.46, 17], [11.383333333333333, 1], [11.051724137931034, 11], [10.8, 19], [10.25, 8], [10.08695652173913, 5], [9.41095890410959, 12], [9.022727272727273, 6], [8.127272727272727, 0], [7.985294117647059, 23], [7.852941176470588, 7], [7.796296296296297, 3], [7.170212765957447, 4], [6.746478873239437, 22], [5.5777777777777775, 9]]


In [50]:
print("Top 5 Hours for Ask Post Comments")
for row in avg_by_hour_sorted[:5]:
    date = dt.datetime.strptime(str(row[1]), "%H")
    time = date.strftime("%H:00")
    print('{}: {:.2f} average comments per posts'.format(time, row[0]))

Top 5 Hours for Ask Post Comments
15:00: 38.59 average comments per posts
02:00: 23.81 average comments per posts
20:00: 21.52 average comments per posts
16:00: 16.80 average comments per posts
21:00: 16.01 average comments per posts


So, the best time to post a question to have a higher chance of receiving comments is the time after lunch: 15:00-16:00 or in the evening: 20:00-21:00 or late at night: 02:00.