# Hacker News Project

## Introduction

This project is about the Hacker News website owned by a company named YCombinator. Basically, users post their stories onto the website and receive comments, it's similar to reddit. 

Hacker News is extremely popular in technology and startup circles.

In [1]:
from csv import reader
import datetime as dt

In [2]:
hn = list(reader(open('hacker_news.csv')))
hn[0:4]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20']]

## Removing headers from a list

In [3]:
headers = hn[0]
del hn[0]
print(headers)
print(hn[0:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


## Extracting Ask HN and Show HN Posts

In [4]:
ask_posts, show_posts, other_posts = [], [], []

In [5]:
for row in hn:
    title = row[1].lower()
    if title[0:7] == 'ask hn:':
        ask_posts.append(row)
    elif title[0:7] == 'show hn':
        show_posts.append(row)
    else:
        other_posts.append(row)

In [6]:
len(hn) == len(ask_posts) + len(show_posts) + len(other_posts)

True

In [7]:
print(len(hn), len(ask_posts), len(show_posts), len(other_posts))

20100 1738 1162 17200


In [8]:
ask_posts[0:3]

[['12296411',
  'Ask HN: How to improve my personal website?',
  '',
  '2',
  '6',
  'ahmedbaracat',
  '8/16/2016 9:55'],
 ['10610020',
  'Ask HN: Am I the only one outraged by Twitter shutting down share counts?',
  '',
  '28',
  '29',
  'tkfx',
  '11/22/2015 13:43'],
 ['11610310',
  'Ask HN: Aby recent changes to CSS that broke mobile?',
  '',
  '1',
  '1',
  'polskibus',
  '5/2/2016 10:14']]

## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [9]:
total_ask_comments = 0
for row in ask_posts:
    total_ask_comments += int(row[4])
print(f'Total number of comments are: {total_ask_comments}')

Total number of comments are: 24448


In [10]:
avg_ask_comments = total_ask_comments/len(ask_posts)
print(f'Average number of comments are: {avg_ask_comments}')

Average number of comments are: 14.06674338319908


In [11]:
show_posts[0:3]

[['10627194',
  'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform',
  'https://iot.seeed.cc',
  '26',
  '22',
  'kfihihc',
  '11/25/2015 14:03'],
 ['10646440',
  'Show HN: Something pointless I made',
  'http://dn.ht/picklecat/',
  '747',
  '102',
  'dhotson',
  '11/29/2015 22:46'],
 ['11590768',
  'Show HN: Shanhu.io, a programming playground powered by e8vm',
  'https://shanhu.io',
  '1',
  '1',
  'h8liu',
  '4/28/2016 18:05']]

In [12]:
total_show_comments = 0
for row in show_posts:
    total_show_comments += int(row[4])
print(f'Total number of show comments are: {total_show_comments}')

Total number of show comments are: 11988


In [13]:
avg_show_comments = total_show_comments/len(show_posts)
print(f'Average number of show posts are: {avg_show_comments}')

Average number of show posts are: 10.31669535283993


It seems ask posts on average recieve more comments 

## Finding the Number of Ask Posts and Comments by Hour Created

In [14]:
result_list = []
for row in ask_posts:
    num_comments = int(row[4])
    result_list.append([row[6], num_comments])
    
counts_by_hour, comments_by_hour = {}, {}
for row in result_list:
    time = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    if time.hour not in counts_by_hour:
        counts_by_hour[time.hour] = 1
        comments_by_hour[time.hour] = row[1]
    else: 
        counts_by_hour[time.hour] += 1
        comments_by_hour[time.hour] += row[1]
print(f'counts by hour is: {counts_by_hour}\n')
print(f'comments by hour is: {comments_by_hour}')

counts by hour is: {9: 45, 13: 85, 10: 59, 14: 107, 16: 106, 23: 68, 12: 73, 17: 99, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 109, 1: 59, 22: 71, 8: 48, 4: 47, 0: 54, 6: 44, 7: 34, 11: 58}

comments by hour is: {9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1811, 23: 543, 12: 687, 17: 1143, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1184, 1: 662, 22: 479, 8: 492, 4: 337, 0: 443, 6: 397, 7: 267, 11: 641}


## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [15]:
avg_by_hour = []
for hour in comments_by_hour:
    avg_by_hour.append([str(hour), (comments_by_hour[hour]/counts_by_hour[hour])])
avg_by_hour

[['9', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 17.08490566037736],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.545454545454545],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['2', 23.810344827586206],
 ['18', 13.20183486238532],
 ['3', 7.796296296296297],
 ['5', 10.08695652173913],
 ['19', 10.862385321100918],
 ['1', 11.220338983050848],
 ['22', 6.746478873239437],
 ['8', 10.25],
 ['4', 7.170212765957447],
 ['0', 8.203703703703704],
 ['6', 9.022727272727273],
 ['7', 7.852941176470588],
 ['11', 11.051724137931034]]

We have made what we need but this format makes it difficult to clarify the hours with the highest values

In [16]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
swap_avg_by_hour

[[5.5777777777777775, '9'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [17.08490566037736, '16'],
 [7.985294117647059, '23'],
 [9.41095890410959, '12'],
 [11.545454545454545, '17'],
 [38.5948275862069, '15'],
 [16.009174311926607, '21'],
 [21.525, '20'],
 [23.810344827586206, '2'],
 [13.20183486238532, '18'],
 [7.796296296296297, '3'],
 [10.08695652173913, '5'],
 [10.862385321100918, '19'],
 [11.220338983050848, '1'],
 [6.746478873239437, '22'],
 [10.25, '8'],
 [7.170212765957447, '4'],
 [8.203703703703704, '0'],
 [9.022727272727273, '6'],
 [7.852941176470588, '7'],
 [11.051724137931034, '11']]

## Sorting and Printing Values from a List of Lists

In [17]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[38.5948275862069, '15'],
 [23.810344827586206, '2'],
 [21.525, '20'],
 [17.08490566037736, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.545454545454545, '17'],
 [11.220338983050848, '1'],
 [11.051724137931034, '11'],
 [10.862385321100918, '19'],
 [10.25, '8'],
 [10.08695652173913, '5'],
 [9.41095890410959, '12'],
 [9.022727272727273, '6'],
 [8.203703703703704, '0'],
 [7.985294117647059, '23'],
 [7.852941176470588, '7'],
 [7.796296296296297, '3'],
 [7.170212765957447, '4'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '9']]

In [18]:
print("Top 5 Hours for Ask Posts Comments")
for row in sorted_swap[0:5]:
    print(f'{row[1]}:00:{row[0]:.2f} average comments per post')

Top 5 Hours for Ask Posts Comments
15:00:38.59 average comments per post
2:00:23.81 average comments per post
20:00:21.52 average comments per post
16:00:17.08 average comments per post
21:00:16.01 average comments per post


So, according to the results during 3PM, 2AM, 8PM, 4PM, and 9PM if you make your posts you would get higher chance of getting more readers