# Exploring Hacker News Posts
### Hemanth Soni, June 2020

---

## Introduction and Overview

The goal of this project is to analyze a dataset of posts from [Hacker News](https://news.ycombinator.com/) to understand...
1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

## Importing data

I will start by importing the necessary data into the project: a subset of the [full data set from Kaggle](https://www.kaggle.com/hacker-news/hacker-news-posts). This subset was simplified by the Dataquest team to remove all submissions that didn't receive any comments, and then randomly sample the remaining submissions to a more manageable 20K rows (the original dataset has ~300k).

In [12]:
# Opening file and saving to list
from csv import reader
opened_file = open('hacker_news_posts/hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

# Splitting out header and table
headers = hn[0]
hn = hn[1:]

## Splitting lists into sub-lists

I'll now split out the list into three separate groups: one for Ask posts, one for Show posts, and one for everything else. This will make it easier to conduct the analysis on each type of post and compare them.

In [14]:
ask_posts = []
show_posts = []
other_posts = []

for each in hn:
    title = each[1]
    
    if title.lower().startswith('ask hn'):
        ask_posts.append(each)
    elif title.lower().startswith('show hn'):
        show_posts.append(each)
    else:
        other_posts.append(each)
        
print('Total Ask posts:',len(ask_posts))
print('Total Show posts:',len(show_posts))
print('Other posts:',len(other_posts))

Total Ask posts: 1744
Total Show posts: 1162
Other posts: 17194


## Calculating average comments per type of post

Now that I have each type of comment in its own list, I can determine if Ask or Show posts receive more comments on average. Shown below

In [20]:
def commentCalc(dataset, index):
    
    total_comments = 0
    
    for each in dataset:
        comments = int(each[index])
        total_comments += comments
    
    avg_comments = total_comments / len(dataset)
    
    print('Total comments:',total_comments)
    print('Total posts:',len(dataset))
    print('Average comments:',avg_comments)
    print('')
    
commentCalc(ask_posts,4)
commentCalc(show_posts,4)

Total comments: 24483
Total posts: 1744
Average comments: 14.038417431192661

Total comments: 11988
Total posts: 1162
Average comments: 10.31669535283993



From this quick calculation, I can conclude that typically, Ask posts receive more comments on average (~14 vs. 10.3 for Show posts). If I am seeking to maximize my interactions on the platform, I will likely want to make more Ask posts. For the remainder of the analysis, I'll focus only on those posts.

## Identifying the best time of day to post

By examining the dataset of Ask posts, I can begin to understand the best time of day to make a post. I'll start by calculating the number of posts and comments by hour created.