# Guided Project: Exploring Hacker News Posts

This guided project brings the following skills together for some real-world practice:

- How to work with strings
- Object-oriented programming
- Dates and times

In this project, I'll work with a dataset of submissions to popular technology site [Hacker News](https://news.ycombinator.com/).

In [1]:
# Importing reader function from csv module
from csv import reader

# Read the `hacker_news.csv` file
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

In [2]:
# Displaying first five rows of hn list
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [3]:
# Extracting first row of data to header variable
headers = hn[0]
print(headers)
print("---")
hn = hn[1:]
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
---
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


## Separate posts beginning with Ask HN and Show HN (and case variations) into two different lists

In [4]:
# Creating three empty lists
ask_posts = []
show_posts = []
other_posts = []

# Assigning titles to respective lists
for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

# Checking number of posts in each list        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [5]:
# Finding the total number of comments in ask posts
total_ask_comments = 0

for row in ask_posts:
    comments = int(row[4])
    total_ask_comments = total_ask_comments + comments
    
# Calculating average number of comments on ask posts
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

14.038417431192661


In [6]:
# Finding the total number of comments in show posts
total_show_comments = 0

for row in show_posts:
    comments = int(row[4])
    total_show_comments = total_show_comments + comments
    
# Calculating average number of comments on show posts
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)

10.31669535283993


We can see that "ask" posts receive more comments on average than "show" posts (14.0 vs 10.3 on average).

## Finding the Number of Ask Posts and Comments by Hour Created

In [7]:
# Importing datetime module as dt
import datetime as dt

#Calculating the number of ask posts created per hour, along with the total number of comments
result_list = []

for row in ask_posts:
    created_at = row[6]
    created_at = [created_at]
    comments = int(row[4])
    comments = [comments]
    combined = created_at + comments
    result_list.append(combined)

# "Counts per hour" contains the number of ask posts created during each hour of the day.
counts_by_hour = {}

# "Comments by hour" contains the corresponding number of comments ask posts created at each hour received.
comments_by_hour = {}

for row in result_list:
    created_at = row[0]
    created_at_dt = dt.datetime.strptime(created_at, "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(created_at_dt, "%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [8]:
# Calculating the average number of comments per post for posts created during each hour of the day.

avg_by_hour = []

for row in counts_by_hour:
    avg_by_hour.append([row, comments_by_hour[row] / counts_by_hour[row]])

## Sorting and Printing Values from a List of Lists

Sorting the list of lists and printing the five highest values in a format that's easier to read.

In [9]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print("Top 5 Hours (GMT) for Ask Posts Comments")

for average, hour in sorted_swap[:5]:
    hour_dt = dt.datetime.strptime(hour, "%H") - dt.timedelta(hours=5)
    hour = dt.datetime.strftime(hour_dt, "%H:%M")
    print(
        "{hr}: {avg:.2f} average comments per post".format(hr=hour ,avg=average)
    )

Top 5 Hours (GMT) for Ask Posts Comments
10:00: 38.59 average comments per post
21:00: 23.81 average comments per post
15:00: 21.52 average comments per post
11:00: 16.80 average comments per post
16:00: 16.01 average comments per post


Looks like 10am, 9pm, and 3pm are the best times to creat a post for the most engagement. 10am by far though! (almost 15 more comments on average compared to 9pm)