# Analyzing Comment Engagement on Hacker News Posts Without Pandas

This project analyzes Hacker News posts from Y Combinator to determine whether "Ask HN" or "Show HN" posts receive more comments on average and to identify the times of day that generate the highest engagement.

Using Python (without the pandas library), the analysis calculates average comment counts, aggregates engagement by posting hour, and compares interaction patterns across post types. The goal is to uncover behavioral trends in community participation and content timing.


In [73]:
import csv
import datetime as dt

with open('hacker_news.csv') as f:
    hn = list(csv.reader(f))

In [74]:
hn[:5]


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [75]:
headers = hn[0]
hn = hn[1:]
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [76]:
headers[1]

'title'

In [77]:
#extracting ask HN and show HN posts
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    if title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))  
print(len(show_posts))
print(len(other_posts))
      
                        
    

1744
1162
18938


In [78]:
total_ask_comments = 0 

for row in ask_posts:
    comment = int(row[4])
    total_ask_comments += comment

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_ask_comments

14.038417431192661

In [79]:
total_show_comments = 0 

for row in show_posts:
    comment = int(row[4])
    total_show_comments += comment

avg_show_comments = total_show_comments / len(show_posts)
avg_show_comments

10.31669535283993

Ask HN posts (14) receive more comments on average than Show HN posts (10), indicating stronger community engagement with question-based content.

**Now, we will see if ask posts created at a certain time are more likely to attract comments**

In [80]:
ask_posts[:5]

[['12296411',
  'Ask HN: How to improve my personal website?',
  '',
  '2',
  '6',
  'ahmedbaracat',
  '8/16/2016 9:55'],
 ['10610020',
  'Ask HN: Am I the only one outraged by Twitter shutting down share counts?',
  '',
  '28',
  '29',
  'tkfx',
  '11/22/2015 13:43'],
 ['11610310',
  'Ask HN: Aby recent changes to CSS that broke mobile?',
  '',
  '1',
  '1',
  'polskibus',
  '5/2/2016 10:14'],
 ['12210105',
  'Ask HN: Looking for Employee #3 How do I do it?',
  '',
  '1',
  '3',
  'sph130',
  '8/2/2016 14:20'],
 ['10394168',
  'Ask HN: Someone offered to buy my browser extension from me. What now?',
  '',
  '28',
  '17',
  'roykolak',
  '10/15/2015 16:38']]

In [83]:
result_list = []
for row in ask_posts:
    created_at = row[6]
    result_list.append([created_at, int(row[4])])
result_list[:3][0][1]


6

In [84]:
counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
    date = dt.datetime.strptime(row[0],  "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

In [89]:
avg_by_hour = []

for row in comments_by_hour:
    avg_by_hour.append([row, comments_by_hour[row] / counts_by_hour[row]])
    
avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

In [92]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
sorted_swap = sorted(swap_avg_by_hour,reverse=True)

In [97]:
print("Top 5 Hours for Ask Posts Comments")

Top 5 Hours for Ask Posts Comments


In [107]:
for row in sorted_swap[:5]:
    hour = dt.datetime.strptime(row[1], "%H")
    hour = dt.datetime.strftime(hour, "%H:00")
    average = "{:.2f}".format(row[0])
    print(f" {hour} {average} average comments per post")
    

 15:00 38.59 average comments per post
 02:00 23.81 average comments per post
 20:00 21.52 average comments per post
 16:00 16.80 average comments per post
 21:00 16.01 average comments per post


Comment activity appears to peak at 3 PM, with 2 PM and 8 PM also showing elevated engagement. The top five hours collectively indicate that mid-afternoon (2–4 PM) and early evening (8–9 PM) are optimal posting windows for maximizing discussion.

