# Exploring Hacker News Posts

[Dataset and Documentation](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts)

This project uses the Hacker News Posts datset available on Kaggle. 
The aim of this project is to: 
1) Analyze whether posts tagged with 'Ask HN' (posts asking Hacker News community as specific question) or 'Show HN' (posts showing the Hacker News community a project, product, or something interesting) receive more comments on average

2) Analyze whether posts created at a certain time receive more comments on average

In [1]:
import csv

with open('hacker_news.csv') as file: 
    reader = csv.reader(file)
    hn = list(reader)
    
headers = hn[0]
hn = hn[1:]

In [2]:
#printing the headers
print(headers)
print('\n')

# printing the first five rows 
row_index = 0
while row_index <= 5: 
    print(hn[row_index])
    print('\n')
    row_index += 1 

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


['10482257

In [3]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn: 
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else: 
        other_posts.append(row)
        
print(f"The length of posts starting with 'ask hn' is: {len(ask_posts)}.")
print(f"The length of posts starting with 'show hn' is: {len(show_posts)}.")
print(f"The length of other post types is: {len(other_posts)}.")
        
    
    

The length of posts starting with 'ask hn' is: 1744.
The length of posts starting with 'show hn' is: 1162.
The length of other post types is: 17194.


In [4]:
print(ask_posts)

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38'], ['10284812', 'Ask HN: Limiting CPU, memory, and I/O usage on a program for testing', '', '2', '1', 'zatkin', '9/26/2015 23:23'], ['11548576', 'Ask HN: Which framework for a CRUD app in 2016?', '', '4', '4', 'deafcalculus', '4/22/2016 12:24'], ['10573430', 'Ask HN: Enter market with a well-funded competitor?', '', '2', '1', 'sparkling', '11/16/2015 9:22'], ['11168708', 'Ask HN: Do you use any re

As we can see above, the length of 'ask hn' posts receive more coments than show posts. 

Now, lets determine if ask posts created at a certain time are more likely to attract comments. 
1) Calculate the amount of ask posts created in each hour of the day, along with the number of comments received. 

2) Calculate the average number of ask posts receive by hour created. 

In [10]:
import datetime as dt

result_list = []
for row in ask_posts: 
    row_4_int = int(row[4])
    row_append = [row[6], row_4_int]
    result_list.append(row_append)
counts_by_hour = {}
comments_by_hour = {}

# example format '8/16/2016 9:55'

counts_by_hour = {}
comments_by_hour = {}

for row in result_list: 
    comments_no = row[1]
    datevar = row[0]
    
    date_check = dt.datetime.strptime(datevar, "%m/%d/%Y %H:%M")
    hour_check = date_check.hour
    if hour_check not in counts_by_hour: 
        counts_by_hour[hour_check] = 1
        comments_by_hour[hour_check] = comments_no
    elif hour_check in counts_by_hour: 
        counts_by_hour[hour_check] += 1
        comments_by_hour[hour_check] += comments_no

print('Checking counts by hour: ')
print(counts_by_hour)
print('\n')
print('Checking comments by hour: ')
print(comments_by_hour)
    
    
    

Checking counts by hour: 
{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}


Checking comments by hour: 
{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}
