# The Key to Posting on Hacker News

This project uses a small segment of data from Kaggle's [Hacker News Posts](https://www.kaggle.com/hacker-news/hacker-news-posts) data in order to determine whether I should post an Ask HN (where I ask a question to the Hacker News forum) or a Show HN (where I show a project I am working on). I will also determine when the best time is to post to recieve the most amount of comments.

The data has been cleaned to remove posts with 0 comments, as well as sampled randomly from the original data to build an 

In [1]:
# reading in the file as a list of lists

from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn[0:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [2]:
# removing the headers from the dataset

headers = hn[0]
hn = hn[1:]
headers
print('\n')
hn[0:5]





[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

## Which type of post is best?

I'll be calculating the average number of comments per type of post by
* Making a list of ask posts, show posts, and other posts and counting the length of each
* Adding up the total number of comments for ask and show posts
* Dividing total number of comments by the number of posts

In [3]:
# determining the number of ask, show, and other posts

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('ASK:', len(ask_posts))
print('SHOW:', len(show_posts))
print('OTHER:', len(other_posts))

ASK: 1744
SHOW: 1162
OTHER: 17194


In [4]:
# finding the total number of comments for ask and show posts

total_ask_comments = 0
for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print('ASK:', avg_ask_comments)

total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments/len(show_posts)
print('SHOW:', avg_show_comments)

ASK: 14.038417431192661
SHOW: 10.31669535283993


Ask HN posts recieve **about four more comments** on average, or **36 percent more comments**, than Show HN posts on Hacker News.

## When should I post?

So now I know that I should be posting an Ask HN post! But in order to find out what hour will allow me to maximize the number of comments I will recieve, I will
* Create a list of lists with just the date/time information and number of comments for each post
* Create a dictionary with the number of total comments per hour and the number of total posts per hour
* Divide the number of total comments by total posts for each hour and create a new list of lists with this information

In [5]:
import datetime as dt

# creating a list with just the relevant information for each post

result_list = []

for post in ask_posts:
    result_list.append([post[6], int(post[4])])

# creating a diictionary for total number of comments and posts by hour    

comments_by_hour = {}
counts_by_hour = {}

for row in result_list:
    date = row[0]
    comments = row[1]
    date = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    
    if hour in counts_by_hour:
        comments_by_hour[hour] += comments
        counts_by_hour[hour] += 1
    else:
        comments_by_hour[hour] = comments
        counts_by_hour[hour] = 1

In [6]:
# creating a list with the new average number of comments per hour

avg_by_hour = []
for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])

avg_by_hour

[['02', 23.810344827586206],
 ['13', 14.741176470588234],
 ['06', 9.022727272727273],
 ['03', 7.796296296296297],
 ['16', 16.796296296296298],
 ['15', 38.5948275862069],
 ['11', 11.051724137931034],
 ['01', 11.383333333333333],
 ['07', 7.852941176470588],
 ['23', 7.985294117647059],
 ['19', 10.8],
 ['05', 10.08695652173913],
 ['10', 13.440677966101696],
 ['09', 5.5777777777777775],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['20', 21.525],
 ['22', 6.746478873239437],
 ['21', 16.009174311926607],
 ['08', 10.25],
 ['18', 13.20183486238532],
 ['14', 13.233644859813085],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727]]

Since this information isn't very organized, we'll just be sorting and cleaning up the results below.

In order to use the `sorted()` function to order by number of comments, the hour and comments need to be swapped.

In [7]:
# switching and sorting in a new list

swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

# formatting the information for easy readability

print("Top 5 Hours for 'Ask HN' Comments")
template = "{}: {:.2f} average comments per post"
for avg, hr in sorted_swap[0:5]:
    print(template.format(dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg))

Top 5 Hours for 'Ask HN' Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The above statements show that the **best time for me (U.S. Central Time) to post** an Ask HN is at 14:00, or **2 p.m**. According to the data, this is most likely to maximize the number of comments I recieve.