# Maximizing traffic to your Hacker News Posts 

## A look into the different posts on Hacker News and how time and type affect viewings and comments.

You know the feeling, we all do!  The rush and anxiety of posting something, then checking every few minutes to see if there are responses.  You may see posts that get a lot of attention and you are unsure why, or you may see posts that receive no attention that definitely deserve some, and you may ask yourself, why?  Are there factors other than subject matter that affect the traffic a certain post gets?

In this project we are going to compare posts made on Ask HN and Show HN and see if posting at certain times leads to more comments.

To start let's import our modules and open our data set.

In [2]:
# Imports modules 
from csv import reader
import datetime as dt

In [3]:
# Opens csv files and turns them into a list of lists for analysis
open_file = open('HN_posts_year_to_Sep_26_2016.csv')
read_file = reader(open_file)
hn = list(read_file)
hn_header = hn[0]
hn = hn[1:]
hn[:5]

[['12579008',
  'You have two days to comment if you want stem cells to be classified as your own',
  'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018',
  '1',
  '0',
  'altstar',
  '9/26/2016 3:26'],
 ['12579005',
  'SQLAR  the SQLite Archiver',
  'https://www.sqlite.org/sqlar/doc/trunk/README.md',
  '1',
  '0',
  'blacksqr',
  '9/26/2016 3:24'],
 ['12578997',
  'What if we just printed a flatscreen television on the side of our boxes?',
  'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43',
  '1',
  '0',
  'pavel_lishin',
  '9/26/2016 3:19'],
 ['12578989',
  'algorithmic music',
  'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext',
  '1',
  '0',
  'poindontcare',
  '9/26/2016 3:16'],
 ['12578979',
  'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake',
  'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94',
  '1',
  '0',
  'markgainor1',
  '9/26/2016 3:14']]

After taking a cursory look at our data, let's seperate our posts into 3 categories.  We want to look at posts on Ask HN and Show HN seperately, and sepereate every other post as thery will not be a part of this analysis.

In [5]:
# Creates empty lists to store relevant rows from our dataset
ask_posts = []
show_posts = []
other_posts = []

# loops through the dataset identifying different posts and appending them to the appropriate list
for row in hn:
    title = row[1]
    # the lower() operator takes the string and converts all letters to lower case so we do not have 
    # to worry about capital letters
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))


9139
10158
273822


Excellent!  Now that we have separated the Ask HN and Show HN posts, let's look at the average number of comments each post gets.

In [6]:
total_ask_comments = 0

# Sums the total number of comments for Ask HN posts
for row in ask_posts:
    num_comments = int(row[4]) 
    total_ask_comments += num_comments

# Computes the average number of comments for Ask HN posts    
ave_ask_comments = total_ask_comments / len(ask_posts)
print('The average number of comments for ask HN posts is:', ave_ask_comments)


total_show_comments = 0

# Sums the total number of comments for Show HN posts
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
# Computes the average number of comments on Show HN posts
ave_show_comments = total_show_comments / len(show_posts)
print('The average number of comments for show HN posts is:', ave_show_comments)


    

The average number of comments for ask HN posts is: 10.393478498741656
The average number of comments for show HN posts is: 4.886099625910612


On average the ask HN posts reviece 213% more comments than show HN posts.  This makes sense as ask posts are created to start a discussion while Show HN posts do not necessarily warrant a comment.  Due to Ask HN posts recieving a lot mroe traffic we will restrict the remainder of the analysis to Ask HN posts.

Now we want to know, will posting my question at a specific time garner more comments?

To answer this let's create a 2 column data base that holds the date time the post was created and the number of comments it received.

In [7]:
result_list = []
for row in ask_posts:
    result_list.append([row[6], int(row[4])])
    

In [9]:
# Creates two dictionaries that allow us to build frequency tables, looking at number of posts per hour
# and number of comments per hous
counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M" # this specifies the format the date time is written in to allow python to parse 
                               # through it and extract the necessary data

for row in result_list:
    date = row[0]
    date = dt.datetime.strptime(date, date_format) 
    hour = date.strftime("%H") # extracts the hour information from the date-time
    # generates a frequency table based on post and comments made each hour
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]


Now that we have those two dictionaries let's calculate the average number of comments per post for every hour

In [12]:
ave_comments_hour = []
# uses our frequency tables to compute the average comments per post every hour
for hour in counts_by_hour:
    ave_comments_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
    
ave_comments_hour

[['02', 11.137546468401487],
 ['01', 7.407801418439717],
 ['22', 8.804177545691905],
 ['21', 8.687258687258687],
 ['19', 7.163043478260869],
 ['17', 9.449744463373083],
 ['15', 28.676470588235293],
 ['14', 9.692007797270955],
 ['13', 16.31756756756757],
 ['11', 8.96474358974359],
 ['10', 10.684397163120567],
 ['09', 6.653153153153153],
 ['07', 7.013274336283186],
 ['03', 7.948339483394834],
 ['23', 6.696793002915452],
 ['20', 8.749019607843136],
 ['16', 7.713298791018998],
 ['08', 9.190661478599221],
 ['00', 7.5647840531561465],
 ['18', 7.94299674267101],
 ['12', 12.380116959064328],
 ['04', 9.7119341563786],
 ['06', 6.782051282051282],
 ['05', 8.794258373205741]]

Perfect!  But to improve readability let's sort this by highest average comments to see which times of the day get the highest amount of comments.
First let's switch the columns so that we can sort by average number of comments.

In [14]:
swap_ave_hour = []

for row in ave_comments_hour:
    swap_ave_hour.append([row[1], row[0]]) # switches the columns
swap_ave_hour

[[11.137546468401487, '02'],
 [7.407801418439717, '01'],
 [8.804177545691905, '22'],
 [8.687258687258687, '21'],
 [7.163043478260869, '19'],
 [9.449744463373083, '17'],
 [28.676470588235293, '15'],
 [9.692007797270955, '14'],
 [16.31756756756757, '13'],
 [8.96474358974359, '11'],
 [10.684397163120567, '10'],
 [6.653153153153153, '09'],
 [7.013274336283186, '07'],
 [7.948339483394834, '03'],
 [6.696793002915452, '23'],
 [8.749019607843136, '20'],
 [7.713298791018998, '16'],
 [9.190661478599221, '08'],
 [7.5647840531561465, '00'],
 [7.94299674267101, '18'],
 [12.380116959064328, '12'],
 [9.7119341563786, '04'],
 [6.782051282051282, '06'],
 [8.794258373205741, '05']]

In [15]:
# this sorts our average number of comments in descending order
sorted_swap = sorted(swap_ave_hour, reverse = True)
sorted_swap

[[28.676470588235293, '15'],
 [16.31756756756757, '13'],
 [12.380116959064328, '12'],
 [11.137546468401487, '02'],
 [10.684397163120567, '10'],
 [9.7119341563786, '04'],
 [9.692007797270955, '14'],
 [9.449744463373083, '17'],
 [9.190661478599221, '08'],
 [8.96474358974359, '11'],
 [8.804177545691905, '22'],
 [8.794258373205741, '05'],
 [8.749019607843136, '20'],
 [8.687258687258687, '21'],
 [7.948339483394834, '03'],
 [7.94299674267101, '18'],
 [7.713298791018998, '16'],
 [7.5647840531561465, '00'],
 [7.407801418439717, '01'],
 [7.163043478260869, '19'],
 [7.013274336283186, '07'],
 [6.782051282051282, '06'],
 [6.696793002915452, '23'],
 [6.653153153153153, '09']]

Excellent! Now that we have sorted our list, let's print out our top 5 results in a more readable format

In [16]:
# This loop prints out our results in a hour : average posts format so we can easily see the times 
# when we get the highest average comments
for i in range(0,5):
    num, hour = sorted_swap[i][0], sorted_swap[i][1] 
    hour = dt.datetime.strptime(str(hour), "%H")
    hour = hour.strftime("%H")
    print("{hour}:00 : {num:.2f}".format(hour = hour, num = num))


15:00 : 28.68
13:00 : 16.32
12:00 : 12.38
02:00 : 11.14
10:00 : 10.68


As we can see, posting in Ask HN in the afternoon (specifically 3pm or 1pm) will get you on average the highest number of comments for your post.  So be strategic with the time of day that you are asking Hacker News and happy posting!