# Exploring Hacker News Posts

Hacker News is a is a site where user-submitted stories are voted and commented upon. Hacker News is popular in technology and startup circles, and posts that make it to the top of the website's listing can receive thousands of visitors.

We want to explore two types of posts found on Hacker News:

 - Ask HN: Users posts questions to the community on a specific question.
 - Show HN: Users posts projects, products, or anything related to technology.
 
Some questions we may posts when comparing this two such as:

- Which post type receive the most comments on average?
- Are posts created within a certain time frame receive more comments on average?

## Importing and Exploring the Data

Lets import the data and view the first 5 rows to see what kind of data we are working with.

In [9]:
from csv import reader

#Hacker News Data
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
header = hn[0]
hn = hn[1:]

#Viewing the first 5 rows
print(header)
print('\n')
print(hn[:5])
print(len(hn))

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]
20100


## Extracting 'Ask HN' and 'Show HN' Posts

We are only interested two types of posts. 'Ask HN' and 'Show HN'.

So lets seperate each type of posts into their own lists.

In [10]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('Ask Posts:', len(ask_posts))
print('Show Posts:', len(show_posts))
print('Other Posts:', len(other_posts))

Ask Posts: 1744
Show Posts: 1162
Other Posts: 17194


### First Five Rows of Ask Posts List

Viewing the first five rows from the Ask Posts List

In [12]:
print(header)
print('\n')
print(ask_posts[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]


### First Five Rows of Show Posts List

Viewing the first five rows from the Show Posts List

In [13]:
print(header)
print('\n')
print(show_posts[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]


## Calculating the Average Number of Comments

We are trying to figure out if which of the two types of posts receive the most average number of comments: Ask Posts or Show Posts.

### Ask Posts

Lets first dive into the Ask Posts to see what the average number of comments it receives.

In [14]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
    
print('Total of Comments in Ask Posts:', total_ask_comments)

avg_ask_comments = total_ask_comments / len(ask_posts)
print('Average of Comments in Ask Posts:', round(avg_ask_comments)) 

Total of Comments in Ask Posts: 24483
Average of Comments in Ask Posts: 14


### Show Posts

Now lets dive into the Show Posts to see what the average number of comments it receive.

In [15]:
total_show_comments = 0

for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
    
print('Total of Comments in Show Posts:', total_show_comments)

avg_show_comments = total_show_comments / len(show_posts)
print('Average of Comments in Show Posts:', round(avg_show_comments))

Total of Comments in Show Posts: 11988
Average of Comments in Show Posts: 10


### Comparing the Ask Posts and Show Posts result

On average, the Ask Posts received 14 comments whereas the Show Posts received 10 comments. One possible reason for this is the purpose of the Ask Posts. The user opens up the question to the community seeking feedback. Because they are seeking feedback, more people are willing to share their insights on that specific question.

## Finding the Amount of Ask Posts and Comments by Hour Created

Now we want to look into if submitting a post at a certain hour will help boosts its exposure and receive more comments because of it. To do this, we will look at the average number comments at each hour of the day.

In order to do this, lets import the datetime module.

In [16]:
import datetime as dt

### Ask Posts

Let's first look at the Ask Posts.

In [32]:
result_list = []

for row in ask_posts:
    result_list.append([row[6], int(row[4])])
    
counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    date = row[0]
    comment = row[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment

comments_by_hour

{'00': 447,
 '01': 683,
 '02': 1381,
 '03': 421,
 '04': 337,
 '05': 464,
 '06': 397,
 '07': 267,
 '08': 492,
 '09': 251,
 '10': 793,
 '11': 641,
 '12': 687,
 '13': 1253,
 '14': 1416,
 '15': 4477,
 '16': 1814,
 '17': 1146,
 '18': 1439,
 '19': 1188,
 '20': 1722,
 '21': 1745,
 '22': 479,
 '23': 543}

## Calculating the Average Number of Comments for Ask HN Posts by Hour

Now that we have a dictionary filled with the number of comments per hour. Lets calculate the average number of comments per hour to determine the best hour to get the most comments

We now have the average number of comments per hour below. However, the data is not sorted in way we can easily tell which hour has the higher average. Let's work to sort the results.

In [40]:
avg_by_hour = []

for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
                    
avg_by_hour

[['02', 23.810344827586206],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['14', 13.233644859813085],
 ['11', 11.051724137931034],
 ['13', 14.741176470588234],
 ['00', 8.127272727272727],
 ['04', 7.170212765957447],
 ['07', 7.852941176470588],
 ['03', 7.796296296296297],
 ['06', 9.022727272727273],
 ['22', 6.746478873239437],
 ['21', 16.009174311926607],
 ['16', 16.796296296296298],
 ['01', 11.383333333333333],
 ['10', 13.440677966101696],
 ['18', 13.20183486238532],
 ['08', 10.25],
 ['17', 11.46],
 ['19', 10.8],
 ['05', 10.08695652173913],
 ['15', 38.5948275862069],
 ['09', 5.5777777777777775],
 ['20', 21.525]]

## Sorting and Printing Values from a List of Lists

In order to rearrange the List by largest value. We have to move indexes around so we can easily use sorted function to help accomplish the goal of sorting by largest value

In [44]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)

sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print(sorted_swap)

[[23.810344827586206, '02'], [7.985294117647059, '23'], [9.41095890410959, '12'], [13.233644859813085, '14'], [11.051724137931034, '11'], [14.741176470588234, '13'], [8.127272727272727, '00'], [7.170212765957447, '04'], [7.852941176470588, '07'], [7.796296296296297, '03'], [9.022727272727273, '06'], [6.746478873239437, '22'], [16.009174311926607, '21'], [16.796296296296298, '16'], [11.383333333333333, '01'], [13.440677966101696, '10'], [13.20183486238532, '18'], [10.25, '08'], [11.46, '17'], [10.8, '19'], [10.08695652173913, '05'], [38.5948275862069, '15'], [5.5777777777777775, '09'], [21.525, '20']]
[[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [13.20183486238532, '18'], [11.46, '17'], [11.383333333333333, '01'], [11.051724137931034, '11'], [10.8, '19'], [10.25, '08'], [10.08695652173913, '05'], [9.41095890410959, '12'], [

In [50]:
print("Top 5 hours for Ask Posts Comments")

print("Top 5 Hours for 'Ask HN' Comments")
for avg, hr in sorted_swap[:5]:
    print("{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg))
    

Top 5 hours for Ask Posts Comments
Top 5 Hours for 'Ask HN' Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


## Conclusion