Title: Unveiling Hacker News Post Engagement

Introduction:
Navigating the dynamic landscape of Hacker News, a bustling platform for tech enthusiasts. Mission is to extract valuable insights from a curated dataset of Hacker News submissions. By blending string manipulation, object-oriented programming, and time analysis, Unravel user engagement patterns with posts.

Hacker News, initiated by Y Combinator, is a haven for diverse posts, ranging from queries ("Ask HN") to showcases ("Show HN"). Focus lies in comparing these categories in terms of average comments and investigating whether post creation time influences engagement.

Comparing Engagement: Discern if "Ask HN" or "Show HN" posts attract more comments on average, shedding light on user preferences.

Time's Impact: Explore whether post creation time correlates with the average comments received, revealing temporal engagement trends.

In [40]:
pip install pandas


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\agraw\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


In [41]:
import pandas as pd

In [42]:
hn_df = pd.read_csv('hacker_news_dataset.csv')

In [43]:
hn = hn_df.values.tolist()

In [44]:
for row in hn[:5]:
    print(row)

[12224879, 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', 386, 52, 'ne0phyte', '8/4/2016 11:52']
[10975351, 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', 39, 10, 'josep2', '1/26/2016 19:30']
[11964716, "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', 2, 1, 'vezycash', '6/23/2016 22:20']
[11919867, 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', 3, 1, 'hswarna', '6/17/2016 0:01']
[10301696, 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', 8, 2, 'walterbell', '9/30/2015 4:12']


In [45]:
headers = ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [46]:
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [47]:
ask_posts=[]
show_posts=[]
other_posts=[]

In [48]:
for i in hn:
    title=i[1]

    if title.lower().startswith('ask hn'):
        ask_posts.append(i)
    elif title.lower().startswith('show hn'):
        show_posts.append(i)
    else:
        other_posts.append(i)
    

In [49]:
len_ask_posts=len(ask_posts)
len_show_posts=len(show_posts)
len_other_posts=len(other_posts)

In [50]:
print(len_ask_posts, len_show_posts, len_other_posts)

1744 1162 17194


In [51]:
for i in ask_posts[:5]:
    print(i)

[12296411, 'Ask HN: How to improve my personal website?', nan, 2, 6, 'ahmedbaracat', '8/16/2016 9:55']
[10610020, 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', nan, 28, 29, 'tkfx', '11/22/2015 13:43']
[11610310, 'Ask HN: Aby recent changes to CSS that broke mobile?', nan, 1, 1, 'polskibus', '5/2/2016 10:14']
[12210105, 'Ask HN: Looking for Employee #3 How do I do it?', nan, 1, 3, 'sph130', '8/2/2016 14:20']
[10394168, 'Ask HN: Someone offered to buy my browser extension from me. What now?', nan, 28, 17, 'roykolak', '10/15/2015 16:38']


In [52]:
for i in show_posts[:5]:
    print(i)

[10627194, 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', 26, 22, 'kfihihc', '11/25/2015 14:03']
[10646440, 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', 747, 102, 'dhotson', '11/29/2015 22:46']
[11590768, 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', 1, 1, 'h8liu', '4/28/2016 18:05']
[12178806, 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', 3, 3, 'fastbrick', '7/28/2016 7:11']
[10872799, 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', 1, 9, 'kpsychwave', '1/9/2016 20:45']


In [53]:
sum_comments_ask=0
for i in ask_posts:
    comment=i[4]
    sum_comments_ask +=comment

In [54]:
sum_comments_ask

24483

In [55]:
sum_comments_show=0
for i in show_posts:
    comment=i[4]
    sum_comments_show +=comment

In [56]:
sum_comments_show

11988

In [57]:
avg_show_comments=0
avg_ask_comments=0

In [58]:
avg_show_comments = sum_comments_show/len(show_posts)
avg_ask_comments = sum_comments_ask/len(ask_posts)

In [59]:
if avg_show_comments>avg_show_comments:
    print(f"The Show Posts with an average number of comments of {avg_show_comments:.2f} have recieved more comments than the Ask Posts with an average number comments of {avg_ask_comments:.2f}" )
else:
    print(f"The Ask Posts with an average number of comments of {avg_ask_comments:.2f} have recieved more comments than the Show Posts with an average number of comments of {avg_show_comments:.2f}")

The Ask Posts with an average number of comments of 14.04 have recieved more comments than the Show Posts with an average number of comments of 10.32


**Analysis: Do Ask HN or Show HN Posts Receive More Comments?**

After conducting an analysis of the Hacker News dataset, found that "Ask HN" posts tend to receive more comments on average compared to "Show HN" posts.

The average number of comments on "Ask HN" posts is 14.04, while the average number of comments on "Show HN" posts is 10.32. This difference in engagement suggests that the Hacker News community is more inclined to participate in discussions and provide feedback on posts that seek input or answers, as is often the case with "Ask HN" posts.

This disparity highlights the interactive and collaborative nature of the "Ask HN" posts, where users actively contribute their insights, experiences, and opinions. On the other hand, "Show HN" posts, while still garnering attention, might have a different objective, such as showcasing projects or sharing interesting findings, leading to comparatively fewer comments.

In conclusion, the data indicates that "Ask HN" posts tend to attract a higher average number of comments, underscoring the community's inclination to engage in meaningful discussions and share valuable information.

In [60]:
import datetime as dt

In [81]:
counts_by_hour = {}
comments_by_hour = {}
for row in ask_posts:
    date = dt.datetime.strptime(row[-1], '%m/%d/%Y %H:%M')
    hour = date.hour
    
    if hour in counts_by_hour:
        counts_by_hour[hour] +=1
        comments_by_hour[hour] += float(row[4])
        
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = float(row[4])
counts_by_hour = dict(sorted(counts_by_hour.items()))
comments_by_hour = dict(sorted(comments_by_hour.items()))
# counts_by_hour={}
# comments_by_hour={}

# for i in ask_posts:
#     date_str = i[0]
#     date_dt = dt.datetime.strptime(date_str, "%m/%d/%Y %H:%M")
#     hour = date_dt.strftime("%H")  

#     # Increment the counts and comments by hour
#     if hour not in counts_by_hour:
#         counts_by_hour[hour] = 1
#         comments_by_hour[hour] = float(row[4])
#     else:
#         counts_by_hour[hour] += 1
#         comments_by_hour[hour] +=float(row[4])
        
for hour, count in counts_by_hour.items():
    print(f"Hour: {hour}, Posts: {count}, Comments: {comments_by_hour[hour]}")

Hour: 0, Posts: 55, Comments: 447.0
Hour: 1, Posts: 60, Comments: 683.0
Hour: 2, Posts: 58, Comments: 1381.0
Hour: 3, Posts: 54, Comments: 421.0
Hour: 4, Posts: 47, Comments: 337.0
Hour: 5, Posts: 46, Comments: 464.0
Hour: 6, Posts: 44, Comments: 397.0
Hour: 7, Posts: 34, Comments: 267.0
Hour: 8, Posts: 48, Comments: 492.0
Hour: 9, Posts: 45, Comments: 251.0
Hour: 10, Posts: 59, Comments: 793.0
Hour: 11, Posts: 58, Comments: 641.0
Hour: 12, Posts: 73, Comments: 687.0
Hour: 13, Posts: 85, Comments: 1253.0
Hour: 14, Posts: 107, Comments: 1416.0
Hour: 15, Posts: 116, Comments: 4477.0
Hour: 16, Posts: 108, Comments: 1814.0
Hour: 17, Posts: 100, Comments: 1146.0
Hour: 18, Posts: 109, Comments: 1439.0
Hour: 19, Posts: 110, Comments: 1188.0
Hour: 20, Posts: 80, Comments: 1722.0
Hour: 21, Posts: 109, Comments: 1745.0
Hour: 22, Posts: 71, Comments: 479.0
Hour: 23, Posts: 68, Comments: 543.0


In [82]:
counts_by_hour

{0: 55,
 1: 60,
 2: 58,
 3: 54,
 4: 47,
 5: 46,
 6: 44,
 7: 34,
 8: 48,
 9: 45,
 10: 59,
 11: 58,
 12: 73,
 13: 85,
 14: 107,
 15: 116,
 16: 108,
 17: 100,
 18: 109,
 19: 110,
 20: 80,
 21: 109,
 22: 71,
 23: 68}

In [83]:
comments_by_hour

{0: 447.0,
 1: 683.0,
 2: 1381.0,
 3: 421.0,
 4: 337.0,
 5: 464.0,
 6: 397.0,
 7: 267.0,
 8: 492.0,
 9: 251.0,
 10: 793.0,
 11: 641.0,
 12: 687.0,
 13: 1253.0,
 14: 1416.0,
 15: 4477.0,
 16: 1814.0,
 17: 1146.0,
 18: 1439.0,
 19: 1188.0,
 20: 1722.0,
 21: 1745.0,
 22: 479.0,
 23: 543.0}

In [85]:
avg_by_hour = []
for hour in comments_by_hour:  # Use the keys of the dictionary
    average_comments = comments_by_hour[hour] / counts_by_hour[hour]
    avg_by_hour.append([hour, average_comments])

# Print the average number of comments per post for each hour
for avg_hour, avg_comments in avg_by_hour:
    print(f"Hour: {avg_hour}, Average Comments: {avg_comments:.2f}")

Hour: 0, Average Comments: 8.13
Hour: 1, Average Comments: 11.38
Hour: 2, Average Comments: 23.81
Hour: 3, Average Comments: 7.80
Hour: 4, Average Comments: 7.17
Hour: 5, Average Comments: 10.09
Hour: 6, Average Comments: 9.02
Hour: 7, Average Comments: 7.85
Hour: 8, Average Comments: 10.25
Hour: 9, Average Comments: 5.58
Hour: 10, Average Comments: 13.44
Hour: 11, Average Comments: 11.05
Hour: 12, Average Comments: 9.41
Hour: 13, Average Comments: 14.74
Hour: 14, Average Comments: 13.23
Hour: 15, Average Comments: 38.59
Hour: 16, Average Comments: 16.80
Hour: 17, Average Comments: 11.46
Hour: 18, Average Comments: 13.20
Hour: 19, Average Comments: 10.80
Hour: 20, Average Comments: 21.52
Hour: 21, Average Comments: 16.01
Hour: 22, Average Comments: 6.75
Hour: 23, Average Comments: 7.99


In [86]:
swap_avg_by_hour=[]

In [87]:
for i in avg_by_hour:
    swap_avg_by_hour.append([i[1],i[0]])

In [88]:
swap_avg_by_hour

[[8.127272727272727, 0],
 [11.383333333333333, 1],
 [23.810344827586206, 2],
 [7.796296296296297, 3],
 [7.170212765957447, 4],
 [10.08695652173913, 5],
 [9.022727272727273, 6],
 [7.852941176470588, 7],
 [10.25, 8],
 [5.5777777777777775, 9],
 [13.440677966101696, 10],
 [11.051724137931034, 11],
 [9.41095890410959, 12],
 [14.741176470588234, 13],
 [13.233644859813085, 14],
 [38.5948275862069, 15],
 [16.796296296296298, 16],
 [11.46, 17],
 [13.20183486238532, 18],
 [10.8, 19],
 [21.525, 20],
 [16.009174311926607, 21],
 [6.746478873239437, 22],
 [7.985294117647059, 23]]

In [89]:
sorted_swap=sorted(swap_avg_by_hour, reverse=True)

In [90]:
sorted_swap

[[38.5948275862069, 15],
 [23.810344827586206, 2],
 [21.525, 20],
 [16.796296296296298, 16],
 [16.009174311926607, 21],
 [14.741176470588234, 13],
 [13.440677966101696, 10],
 [13.233644859813085, 14],
 [13.20183486238532, 18],
 [11.46, 17],
 [11.383333333333333, 1],
 [11.051724137931034, 11],
 [10.8, 19],
 [10.25, 8],
 [10.08695652173913, 5],
 [9.41095890410959, 12],
 [9.022727272727273, 6],
 [8.127272727272727, 0],
 [7.985294117647059, 23],
 [7.852941176470588, 7],
 [7.796296296296297, 3],
 [7.170212765957447, 4],
 [6.746478873239437, 22],
 [5.5777777777777775, 9]]

In [96]:
for i in sorted_swap[:5]:
    print(i[1])

15
2
20
16
21


In [97]:
print("Top 5 Hours for Ask Posts Comments")
for i in sorted_swap[:5]:
    avg_hour=dt.datetime.strptime(str(i[1]), '%H')
    avg_hour_formatted=avg_hour.strftime('%H:%M')
    avg_comments=i[0]
    print(f"{avg_hour_formatted}: {avg_comments:.2f} average comments per post")

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


**Key Insights on Post Creation Time for Hacker News Engagement**
Optimal Posting Hours: Certain hours emerged as peak engagement times for Hacker News users. Specifically, posts created at around 15:00 (3:00 PM), 02:00 (2:00 AM), and 20:00 (8:00 PM) received the highest average comments per post. These hours could be strategically targeted for higher interaction.

Local Time Considerations: The analysis was based on a specific time zone. To effectively leverage the insights, users should adjust posting times according to their target audience's local time zone for optimal impact.

In conclusion, timing plays a crucial role in optimizing post engagement on Hacker News. By aligning post creation with peak engagement hours and considering the preferences of the Hacker News community, users can increase the likelihood of receiving comments and fostering meaningful discussions. However, it's essential to recognize that user behavior may vary due to individual preferences and external factors. Leveraging these insights, users can fine-tune their posting strategy to enhance their overall experience on the platform.