Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

This project will analyze two kinds of posts:  posts whose titles begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting.

These two types of posts will be compared to determine whether Ask HN or Show HN receive more comments on average and if posts created at a certain time receive more comments on average?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import csv
from csv import reader
opened_file = open('../input/hacker-news-posts/HN_posts_year_to_Sep_26_2016.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn[:5]

In [None]:
headers = hn[:1]
hn = hn[1:]
print(headers)
print(hn[:5])

### **Extracting Ask HN and Show HN Posts**

In [None]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('No. of posts starting with ask hn: ',len(ask_posts))
print('No. of posts starting with ask hn: ', len(show_posts))
print('Other posts: ', len(other_posts))

### **Which posts receive more comments?**

In [None]:
total_ask_comments = 0
for row in ask_posts:
    num_comments =int(row[4]) 
    total_ask_comments += num_comments
avg_ask_comments = total_ask_comments/len(ask_posts)
print('Average no. of comments for ask_posts: ',avg_ask_comments)

In [None]:
total_show_comments = 0
for row in show_posts:
    num_comments =int(row[4]) 
    total_show_comments += num_comments
avg_show_comments = total_show_comments/len(show_posts)
print('Average no. of comments for show_posts: ',avg_show_comments)

Ask posts receive an average of 10 comments per post while show posts receive only 4 comments per post on average.

### **Finding the amount of Ask Posts and Comments by Hour created**

In [None]:
import datetime as dt
result_list = []
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    new_list = [created_at, num_comments]
    result_list.append(new_list)
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_1 = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour_1 = date_1.strftime('%H')
    if hour_1 not in counts_by_hour:
        counts_by_hour[hour_1] = 1
        comments_by_hour[hour_1] = row[1]
    else:
        counts_by_hour[hour_1] += 1
        comments_by_hour[hour_1] += row[1]

### **Average no. of comments for Ask HN posts by hour**

In [None]:
avg_by_hour = []
for hr in counts_by_hour:
    avg = comments_by_hour[hr]/counts_by_hour[hr]
    avg_by_hour.append([hr,avg])
print(avg_by_hour)    

In [None]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
print(swap_avg_by_hour)
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print("\nTop  5 Hours for Ask Posts Comments\n")
for row in sorted_swap[:5]:
    hour_2 = dt.datetime.strptime(row[1], '%H')
    hour_2 = hour_2.strftime('%H:%M')
    avg = '{:.2f}'.format(row[0])
    print("{}: {} average comments per post ".format(hour_2, avg))

### **Result**
From the above analysis, it can be concluded that there is a higher chance for receiving comments for Ask Posts created at 15.00 hrs(3.00 pm).