# Analysing the number of posts and questions on Hacker news

* Hacker news similar to reddit is an online platform where users can submit stories and other posts that are commented on and rated. Posts with high ratings that make it to the top news on Hacker news can have hundred of thousand of visitors.

* The project's objective is to determine on average how many Ask HN or Show HN posts receive more comments on average. The project will also be focused on determining what time periods have the most average comments overall. Ask HN submissions are used to ask the Hacker News community questions while Show HN is used for posting projects, products or other interesting information. A link to the project's data set is [Here](https://www.kaggle.com/hacker-news/hacker-news-posts)

In [1]:
# opening the data file
from csv import reader
open_data = open('hacker_news.csv')
data = reader(open_data)
hn = list(data)
print(hn[:5])


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [2]:
# removing the header row of the data set
headers = hn[0]
hn = hn[1:]
print(headers)
print()
print(hn[:5])


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [3]:
# seperating ask posts show posts and other posts
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('Number of ask_posts: ', len(ask_posts))
print('Number of show_posts: ', len(show_posts))
print('Number of other_posts: ', len(other_posts))
        


Number of ask_posts:  1744
Number of show_posts:  1162
Number of other_posts:  17194


In [11]:
# finding the average number of comments for ask and show posts
total_ask_comments = 0
for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
avg_ask_comments = total_ask_comments / len(ask_posts)
print('Average number of ask post comments: ', avg_ask_comments)


total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
avg_show_comments = total_show_comments / len(show_posts)
print('Average number of show post comments: ', avg_show_comments)

Average number of ask post comments:  14.038417431192661
Average number of show post comments:  10.31669535283993


On average the number of ask post comments are more than that of show post comments. The average is due to the fact that the total number of ask post comments in the data set is larger than the total number of show post comments.   

In [21]:
# calculating the number of ask post comments
import datetime as dt

result_list = []
for row in ask_posts:
    result_list.append([row[6], int(row[4])])
    #result_list.append(int(row[4]))

counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    create = row[0]
    comment = row[1]
    date = dt.datetime.strptime(create, date_format)
    time = date.strftime("%H")
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = comment
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += comment
        
counts_by_hour
comments_by_hour

    

{'00': 447,
 '01': 683,
 '02': 1381,
 '03': 421,
 '04': 337,
 '05': 464,
 '06': 397,
 '07': 267,
 '08': 492,
 '09': 251,
 '10': 793,
 '11': 641,
 '12': 687,
 '13': 1253,
 '14': 1416,
 '15': 4477,
 '16': 1814,
 '17': 1146,
 '18': 1439,
 '19': 1188,
 '20': 1722,
 '21': 1745,
 '22': 479,
 '23': 543}