# Hacker News Project

## Investigation into Hacker News post data to find out at which time I should post on Hacker News. This will be useful in order to gain the most traction when I have a query that I wish to ask the Hacker News community.

### First import reader and read csv

The data set can be found here: https://www.kaggle.com/hacker-news/hacker-news-posts

In [1]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)


### Seperate headers from data set and remove from main entry

In [2]:
headers = hn[0]
print(headers)

hn = hn[1:]
print(hn[1:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


### Separate the "asking" posts and the "showing" posts (and case variations) into two different lists and check the number of posts for each.

In [3]:
ask_posts = []
show_posts = []
other_posts = []

print(hn[1])

for row in hn:
    title = row[1]
    lower_title = title.lower()
    
    if lower_title.startswith('ask hn'):
        ask_posts.append(row)
    elif lower_title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
1744
1162
17194


### Determine if ask posts or show posts receive more comments on average.

In [4]:
total_ask_comments = 0

for row in ask_posts:
    num_comms = row[4]
    num_comms_int = int(num_comms)
    total_ask_comments = total_ask_comments + num_comms_int
    
avg_ask_comments = round(total_ask_comments / len(ask_posts),2)
print(avg_ask_comments)

14.04


In [5]:
total_show_comments = 0

for row in show_posts:
    num_comms = row[4]
    num_comms_int = int(num_comms)
    total_show_comments = total_show_comments + num_comms_int
    
avg_show_comments = round(total_show_comments / len(show_posts),2)
print(avg_show_comments)

10.32


Ask posts receive more comments (14>10)


### 1. Calculate the amount of ask posts created in each hour of the day, along with the number of comments received.

In [6]:
import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comms_int = int(row[4])
    result = (created_at,num_comms_int)
    result_list.append(result)
    
print(result_list[:5])

[('8/16/2016 9:55', 6), ('11/22/2015 13:43', 29), ('5/2/2016 10:14', 1), ('8/2/2016 14:20', 3), ('10/15/2015 16:38', 17)]


In [7]:
counts_by_hour = {}
comments_by_hour = {}


for row in result_list:
    time_stamp = row[0]
    dt_obj = dt.datetime.strptime(time_stamp, "%m/%d/%Y %H:%M")
    hour = dt_obj.hour
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
        

print(counts_by_hour)
print('              ')
print(comments_by_hour)
    

    

{0: 55, 1: 60, 2: 58, 3: 54, 4: 47, 5: 46, 6: 44, 7: 34, 8: 48, 9: 45, 10: 59, 11: 58, 12: 73, 13: 85, 14: 107, 15: 116, 16: 108, 17: 100, 18: 109, 19: 110, 20: 80, 21: 109, 22: 71, 23: 68}
              
{0: 447, 1: 683, 2: 1381, 3: 421, 4: 337, 5: 464, 6: 397, 7: 267, 8: 492, 9: 251, 10: 793, 11: 641, 12: 687, 13: 1253, 14: 1416, 15: 4477, 16: 1814, 17: 1146, 18: 1439, 19: 1188, 20: 1722, 21: 1745, 22: 479, 23: 543}


### 2 - Calculate the average number of comments ask posts receive by hour created.

In [8]:
ave_comm_by_hour = []

for hour in counts_by_hour:
    ave_comm_by_hour.append([hour,comments_by_hour[hour]/counts_by_hour[hour]])
    
for hour in ave_comm_by_hour:
    ave = hour[1]
    ave = round(ave,2)
    hour[1] = ave
    
print(ave_comm_by_hour)

[[0, 8.13], [1, 11.38], [2, 23.81], [3, 7.8], [4, 7.17], [5, 10.09], [6, 9.02], [7, 7.85], [8, 10.25], [9, 5.58], [10, 13.44], [11, 11.05], [12, 9.41], [13, 14.74], [14, 13.23], [15, 38.59], [16, 16.8], [17, 11.46], [18, 13.2], [19, 10.8], [20, 21.52], [21, 16.01], [22, 6.75], [23, 7.99]]


### Sort the list of lists and print the five highest values in a format that's easier to read.

In [9]:
flipped_acbh = []

for x in ave_comm_by_hour:
    ave = x[1]
    hr = x[0]
    flip = [ave,hr]
    flipped_acbh.append(flip)


print(flipped_acbh)
print('      ')

sorted_ave = sorted(flipped_acbh,reverse=True)
print(sorted_ave)
print('      ')


print("Top 5 Hours for Ask Posts Comments")

for item in sorted_ave[0:5]:
    av = item[0]
    hr = item[1]
    print("{hour}: {average} average comments per post".format(hour = hr, average = av))
    
    
    

[[8.13, 0], [11.38, 1], [23.81, 2], [7.8, 3], [7.17, 4], [10.09, 5], [9.02, 6], [7.85, 7], [10.25, 8], [5.58, 9], [13.44, 10], [11.05, 11], [9.41, 12], [14.74, 13], [13.23, 14], [38.59, 15], [16.8, 16], [11.46, 17], [13.2, 18], [10.8, 19], [21.52, 20], [16.01, 21], [6.75, 22], [7.99, 23]]
      
[[38.59, 15], [23.81, 2], [21.52, 20], [16.8, 16], [16.01, 21], [14.74, 13], [13.44, 10], [13.23, 14], [13.2, 18], [11.46, 17], [11.38, 1], [11.05, 11], [10.8, 19], [10.25, 8], [10.09, 5], [9.41, 12], [9.02, 6], [8.13, 0], [7.99, 23], [7.85, 7], [7.8, 3], [7.17, 4], [6.75, 22], [5.58, 9]]
      
Top 5 Hours for Ask Posts Comments
15: 38.59 average comments per post
2: 23.81 average comments per post
20: 21.52 average comments per post
16: 16.8 average comments per post
21: 16.01 average comments per post


## The most comments occur on posts between 3pm and 4pm EST. This corresponds to 7pm to 8pm GMT so any questions of my own should be posted then.