The latest Ask HN: What are you working on thread just dropped. And to give my own answer, building structured datasets!
You can download the dataset as a CSV here: https://github.com/getomni-ai/datasets/blob/main/hn_projects_dataset.csv
Or query directly with SQL using the connection string included below. Note this is a temporary table with read only permissions.
HOST=aws-0-us-east-1.pooler.supabase.com
PORT=6543
DATABASE=postgres
USER=postgres.raeysmhjbudociwvbwre
PASSWORD=!HZRdGLiiFC5iRj
TABLE=hn_projects_august
Full list of all the Open Source projects is at the end.
I wrote a quick scraper for the HN comments. Just pulling every top level comment along with its replies as a nested object.
This ended up pulling 642 top level comments with about 458 replies. I created a posrgres db with this original data set. The replies
I just concatenated together in the order they came in (with an indent
field to mark what level comment it was. Then stringified the json array and added it to the db.
{
id: 99,
created_at: '2024-08-25T18:19:28.756364+00:00',
hn_user: 'spuds',
comment: 'Helping others with their mental health (after my own struggles).Worked as a software dev/manager for a decade, went through workaholism, burnout, then alcoholism, depression, all that. Doing a ton better now, and taking some time off to write about what I went through and hopefully help out others going through the same thing some: https://depthsofrepair.com/',
replies: `[{"commentText":"This is helpful to me (and many others, I'm sure) and I look forward to reading more. Subscribed.","commentUser":"nowami","indent":"1","children":[]}]`,
reply_count: 1
}
I did this using my own tool of course (Omni).
I pulled out the following values:
project_category
- Enum - PERSONAL_PROJECT, STARTUP, SELF_IMPROVEMENT, OTHERis_open_source
- Booleangithub_link
- Stringproject_industry
- Enum - SOFTWARE_DEVELOPMENT, HEALTHCARE, EDUCATION, TRANSPORTATION, etc.one_liner
- String - A one line pitch for the projecttech_stack
- String[]reply_sentiment
- Num - Sentiment betwee 0 and 2 for the comment repliesdemo_link
- Stringai_project
- Boolean
Example setup:
All the results are stored in a Postgres DB. So we can write SQL for all the analyzis. And plug into existing tools like Metabase for some visualizations.
select project_category, count(*) from hn_projects_august group by 1 order by 2 desc;
Reply sentiment was judged on a 0 to 2 scale (with 0 being the most negative). The overall result was 1.57
, so largely positive.
select avg(reply_sentiment::float) from hn_projects_august
where reply_sentiment is not null;
How does that break down by the project_category
. Do HN commenters favor personal projects over startups?
SELECT ROUND(CAST(AVG(reply_sentiment::float) AS numeric), 2) AS avg_sentiment, project_category
FROM hn_projects_august
WHERE reply_sentiment IS NOT NULL
GROUP BY project_category
ORDER BY 1 DESC;
The answer is yes! Commenters favor self improvement posts the most, and startups the least.
The same query can be applies on the is_open_source
classification with obvious results.
SELECT ROUND(CAST(AVG(reply_sentiment::float) AS numeric), 2) AS avg_sentiment, is_open_source
FROM hn_projects_august
WHERE reply_sentiment IS NOT NULL
GROUP BY is_open_source
ORDER BY 1 DESC;
Surprisingly for HN, the AI projects were favored slightly ahead of non AI projects:
SELECT ROUND(CAST(AVG(reply_sentiment::float) AS numeric), 2) AS avg_sentiment, ai_project
FROM hn_projects_august
WHERE reply_sentiment IS NOT NULL
GROUP BY ai_project
ORDER BY 1 DESC;
I started off by classifying the comments into ~25 different industries. This is a better classification for the STARTUP
projects, as some of the PERSONAL_PROJECT
comments don't really need an industry classifier. i.e. one guy said he was working on his car, which got the AUTOMOTIVE
tag.
The number one project type was SOFTWARE_DEVELOPMENT
, which boiled down to primarily dev tools.
select project_industry, count(*) from hn_projects_august group by 1 order by 2 desc;
Software development also dominated in reply count:
select project_industry, sum(reply_count::int) reply_count from hn_projects_august group by 1 order by 2 desc;
This one is a bit hard since HN doesn't display explicit upvotes. But since populatity is roughly determined by position on page, and because I scraped the comments sequentially, we can use the id
as a proxy. Note this is a pretty fuzzy populatity score.
with total_comments as (
select count(*) as count from hn_projects_august
)
select
project_industry,
ROUND(AVG(((select count from total_comments)::int - id::int)), 0) popularity,
ROUND(avg(reply_count::int), 2) reply_count
from hn_projects_august
group by 1 order by 3 desc;
Looking at the one_liner
column sorted by id
. The top post was the DIY Bike Battery which got placed in the AUTOMOTIVE category since there wasn't a better fit.
I've only been playing with the data for a couple hours, so still some interesting items I want to pull out. If anyone has some thoughts on new columns to add, just drop me a note! (tyler@getomni.ai)
Here's a full dump of all the open source projects from the post.
github_link | one_liner | project_category |
---|---|---|
https://github.com/mlang/mc1 | SuperCollider Reimagined Using Python and JIT Compilation | PERSONAL_PROJECT |
https://github.com/jonroig/usBabyNames.js | App Simplifies Baby Name Selection with Data Driven Filters | PERSONAL_PROJECT |
https://github.com/Pulselyre/UpbeatUI | Developing Pulselyre Touchfocused Windows App For Live Electronic Music | PERSONAL_PROJECT |
https://github.com/ziolko/eink-calendar-display | Designing Custom Hardware for Open Source SaaS Project | PERSONAL_PROJECT |
https://github.com/upvpn/upvpn-app | Modern Serverless VPN Explores App Store Publishing Process | STARTUP |
https://github.com/incidentalhq/incidental | Open Source Incident Management Platform Seeking Early Feedback | PERSONAL_PROJECT |
https://github.com/GauntletWizard/cfssl/tree/ted/constraints | Bringing Enterprise-Grade Encryption and CA Infrastructure to Small Businesses | PERSONAL_PROJECT |
https://github.com/mikewarot/Bitgrid | Exploring GitHub Projects And Developing New Skills | PERSONAL_PROJECT |
https://github.com/hsnice16/golang\_learning | Transition to Full-Stack Development and Documenting GoLang Learning | SELF_IMPROVEMENT |
https://github.com/KaliedaRik/Scrawl-canvas | Maintaining and Improving My Canvas Library with New Filters | PERSONAL_PROJECT |
https://codeberg.org/treyd/ecksport | Developing a Protocol Library for Byte-Oriented Data Structures | PERSONAL_PROJECT |
https://github.com/dickeyy/passwords | Open Source Encrypted Password Manager Self Hosted And Customizable | PERSONAL_PROJECT |
https://github.com/cutestuff/FoodDepressionConundrum/blob/ma...latest | Probiotic Solution Eases Plant Digestion Challenges | PERSONAL_PROJECT |
https://github.com/MeoMix/symbiants | Colony Simulation Game in Rust for Personal Growth | PERSONAL_PROJECT |
https://github.com/anacrolix/possum | Possum: Efficient Disk-Backed Cache for File I/O and Snapshots | PERSONAL_PROJECT |
https://github.com/ubavic/mint | Development Update on Custom Markup Language Atex and Its Compiler | PERSONAL_PROJECT |
https://github.com/linkwarden/linkwarden | Self-Hostable Open-Source Collaborative Bookmark Manager Available | PERSONAL_PROJECT |
https://github.com/domino14 | Code Debugging And Scrabble Apps With LLMs And AI | STARTUP |
https://github.com/masto/LED-Marquee | Documenting Personal Project on YouTube and GitHub After Leaving Google | PERSONAL_PROJECT |
https://github.com/brendanv/lynxAnd https://github.com/brendanv/lynx-v2 | Personal Read It Later Service Turned SPA for Learning Go | PERSONAL_PROJECT |
https://github.com/kilroyjones/series\_game\_from\_scratch | Learning IoUring for Fun with Rust Web Game Development | PERSONAL_PROJECT |
https://github.com/hrkck/MyApps/wiki | MyApps App of Apps with Infinite 2D Space and P2P Sync | PERSONAL_PROJECT |
https://github.com/rumca-js/Django-link-archive | Title: Developing an RSS Reader and Web Scraper | PERSONAL_PROJECT |
https://github.com/DDoS/Cadre | Exploring Enhanced E Ink Displays For Smart Picture Frames | PERSONAL_PROJECT |
https://github.com/Vija02/TheOpenPresenter | Open Source Presenter Software for Events and Digital Signage | PERSONAL_PROJECT |
https://github.com/itsOwen/CyberScraper-2077 | CyberScraper 2077 Web Scraper Powered By LLM | PERSONAL_PROJECT |
fujiapple852/trippy#860 | Forward and Backward Packet Loss Heuristics in Trippy | PERSONAL_PROJECT |
https://github.com/bytechefhq/bytechef | ByteChef Open Source API Integration And Workflow Automation Platform | STARTUP |
https://github.com/AvitalTamir/cyphernetes/ | Cyphernetes Innovative Query Language For Kubernetes API | PERSONAL_PROJECT |
https://github.com/willswire/checkd | Exploring Device Authentication with Checkd and Apple's DeviceCheck API | PERSONAL_PROJECT |
https://github.com/sdedovic/wgsltoy | WGSL Toy A WebGPU Playground for Shader Development | PERSONAL_PROJECT |
https://github.com/Trint-ai/TrintAI | TrintAI Open Source Speech to Text and Analysis Tool | STARTUP |
https://github.com/trynova/nova | Nova Data-Oriented JavaScript Engine | PERSONAL_PROJECT |
https://github.com/AmberSahdev/Open-Interface | Title: Open Source LLM Based Autopilot for Multiple OS | PERSONAL_PROJECT |
https://github.com/dvasanth/kadugu | Title: Building a Blazing Speed VPN in Minimal Lines | PERSONAL_PROJECT |
https://github.com/ptah-sh/ptah-server | Open Source Alternative To Heroku With Key Features | STARTUP |
http://github.com/CWood-sdf/banana | HTML Renderer for Neovim Plugins Called Banana | PERSONAL_PROJECT |
http://github.com/leftmove/facebook.js | Developing Facebook.js A Modern API Wrapper for Facebook | PERSONAL_PROJECT |
https://github.com/amoffat/manifest | Title: Python Library for Simplifying LLM Calls | PERSONAL_PROJECT |
https://github.com/humishum/hacker\_news\_keys | Creating a Hacker News Browser Extension Using LLMs | PERSONAL_PROJECT |
https://github.com/csjh/pest | Efficient Row-Based Serialization Format with TypeScript Type Safety | PERSONAL_PROJECT |
https://github.com/WillAdams/gcodepreviewbig | Creating a Python Enhanced OpenSCAD Library for CNC Projects | PERSONAL_PROJECT |
https://github.com/james-a-rob/KodaStream | Title: Interactive Video API for Shoppable Live Streams | STARTUP |
https://github.com/chaosharmonic/escapeHatch | Developing a Lightweight Job Search Tool With Custom Features | PERSONAL_PROJECT |
https://github.com/JUSTSUJAY/Django\_Projects | Embracing Virtual Presence and Mastering Django for Impactful Development | SELF_IMPROVEMENT |
https://github.com/memfreeme/memfree | MemFree AI Search Engine for Instant Accurate Answers | STARTUP |
https://dot-and-box.github.io/dot-and-box/ | Dot And Box Offers Animations Visualizing Algorithms | PERSONAL_PROJECT |
https://github.com/mliezun/caddy-snake | Title: Integrating HTTP Requests for Python Apps Using Go | PERSONAL_PROJECT |
https://github.com/BigJk/end\_of\_eden | Terminal-Based Deck Builder Game With Mouse Support And Image | PERSONAL_PROJECT |
https://github.com/learnbyexample/TUI-apps | Updated Vim Guide Published and New Python TUI App Development | PERSONAL_PROJECT |
https://github.com/rybarix/snaptail | Exploring Single Source File Applications | PERSONAL_PROJECT |
https://github.com/dvasanth/kadugu | Building Blazing Speed VPN In Less Than 1000 Lines Of Code | PERSONAL_PROJECT |
https://github.com/claceio/clace | Clace App Server For Multi Language Containerized Applications | STARTUP |
https://github.com/leondz/garak | Contribution to LLM Vulnerability Scanner Alongside Final Year Studies | PERSONAL_PROJECT |
https://github.com/coreyp1/CTang | Developing CTang A Modern Scripting Language | PERSONAL_PROJECT |
https://github.com/ayinke-llc/malak | Title: Developing an OSS Relationship Hub for Founders and Investors | STARTUP |
https://github.com/fedi-e2ee/public-key-directory-specificat | Building an Open Source Federated Public Key Directory | PERSONAL_PROJECT |
https://github.com/laktak/chkbit | Title: Simplifying Cross-Platform Builds by Rewriting chkbit in Go | PERSONAL_PROJECT |
https://github.com/latebit/latebit-engine | Game Engine for Coders with Integrated Tools in VSCode | PERSONAL_PROJECT |
https://github.com/andrew-johnson-4/lambda-mountain | Verifiable Correctness in Compiler Agnostic Programs | PERSONAL_PROJECT |
https://github.com/spirobel/mininext | Mininext Merges Index PHP Simplicity With NPM And TypeScript | PERSONAL_PROJECT |
https://github.com/styluslabs/maps | Title: Open Source Maps Application | OTHER |
https://github.com/thebigG/GunnerIt | Title: Passion Project Scrolling Shooter Inspired By Strike Gunner STG | PERSONAL_PROJECT |
https://github.com/itissid/privyloci | Zero Trust Proposal for Mobile Location Permission Control | PERSONAL_PROJECT |
https://github.com/brainless/dwata/tree/feature/prepare\_mvp\_ | Open Source App for Emails with AI Features | STARTUP |
https://github.com/rprtr258/pm | Title: Simple Linux Process Manager | PERSONAL_PROJECT |
https://github.com/jasiek/webprog-anytone | Programming Anytone AT878 DMR Radios via Web Browser | PERSONAL_PROJECT |
https://github.com/moj-analytical-services/splink | Version 4 Released for Data Deduplication and Linkage Library | PERSONAL_PROJECT |
https://github.com/preludejs | Developing Standard Libraries for TypeScript and JavaScript | PERSONAL_PROJECT |
https://github.com/b00bl1k/uwan | LoRaWAN Node Device Library Development and Documentation Summary | PERSONAL_PROJECT |
https://github.com/codetiger/PowerTiger | PowerTiger Open Source Energy Monitoring Solution Using RPi Pico W | PERSONAL_PROJECT |
https://github.com/laudspeaker/laudspeaker | Techno Thriller Panopticon Explores Encryption and Espionage | PERSONAL_PROJECT |
https://github.com/certeu/morio | Morio Streamlines Observability Data for Traditional On-Premises Infrastructure | PERSONAL_PROJECT |
https://github.com/iterative/datachain | Title: Out Of Memory Dataframe For Wrangling Unstructured Data At Scale | OTHER |
https://github.com/petabyt/libui-touch | Unfinished C-Based Alternative to React Native | PERSONAL_PROJECT |
https://github.com/David-OConnor/plascad | PlasCAD Molecular Biology Plasmid Editor Seeking Feedback | PERSONAL_PROJECT |
https://github.com/tttapa/Control-Surface | Repurposing a Teensy Synth into a MIDI Controller | PERSONAL_PROJECT |
https://github.com/cmakafui/batchwizard | BatchWizard A Command Line Tool for OpenAI Batch Jobs | PERSONAL_PROJECT |
https://github.com/ruuda/rcl | Title: Enhancements to RCL Language with Float Support and Query Shorthand | PERSONAL_PROJECT |
https://github.com/Eccentric-Anomalies/Tungsten-Moon-Demo-Re | Tungsten Moon VR Desktop Spaceflight Simulator Nears Early Access Release | PERSONAL_PROJECT |
https://github.com/matry/editor | Keyboard-Driven UI Editor Inspired by Vim and Webflow | PERSONAL_PROJECT |
https://github.com/ssherman/weighted\_list\_rank | Title: Improving Book Ranking Algorithm Through Collaboration With Data Scientists | PERSONAL_PROJECT |
https://github.com/DefGuard | DefGuard Open Source SSO Integrating Wireguard and OIDC | STARTUP |
https://github.com/glaretechnologies/substrataCustom | Open Source Metaverse With Custom 3D Engine And Voice Chat | OTHER |
https://github.com/ibizaman/selfhostblocks | Self Host Blocks A Modular Server Management Tool | PERSONAL_PROJECT |
https://github.com/golang-malawi/qatarina | Building a User Acceptance Testing Platform with Go and Encouraging Go Adoption | PERSONAL_PROJECT |
https://github.com/curveball/a12n-server | Title: Open Source OAuth2 Server Competing with Auth0 | OTHER |
https://github.com/jaronilan/stories | Title: Finishing a Yearlong Short Story About SEO | PERSONAL_PROJECT |
https://github.com/featurevisor/featurevisor | Title: Open Source Tool for Declarative Feature Management | STARTUP |
https://github.com/patrulek/modernRX | Upgrading AVX2 to AVX512 in RandomX Algorithm Reimplementation | PERSONAL_PROJECT |
https://github.com/mseravalli/grizol | Grizol: Syncthing Compatible Client Leveraging Rclone Backends | PERSONAL_PROJECT |
https://github.com/beef331/potato | Hot Code Reloading Library for Nim Game Framework | PERSONAL_PROJECT |
https://github.com/chrisdavies/atomic-css | Zero-Dependency Bun Application with Tailwind-Like Layer | PERSONAL_PROJECT |
https://github.com/carlnewton/habitat | Developing Habitat A Self Hosted Social Platform | PERSONAL_PROJECT |
https://github.com/achristmascarl/rainfrog | Rainfrog Postgres Management Terminal with Vim-like Keybindings | PERSONAL_PROJECT |
https://github.com/aravinda0/qtile-bonsai | Qtile Bonsai Completion and Future PDF Parser Plans | PERSONAL_PROJECT |
https://github.com/emlearn/emlearn-micropython | MicroPython for Machine Learning on Microcontroller Sensors | PERSONAL_PROJECT |
https://github.com/julien040/anyquery | Building Anyquery An SQL Query Engine For Diverse Data Sources | PERSONAL_PROJECT |
https://github.com/elijah-potter/harper | Improving On-Device Grammar Checker with Minimal Resource Use | PERSONAL_PROJECT |
https://bgammon.org/code | Open Source Online Backgammon Project Inspired By Lichess | PERSONAL_PROJECT |
https://gist.github.com/skittleson/705624a8f6967187096091cbd | Bluetooth Low Energy Wall of Sheep Toy App | PERSONAL_PROJECT |
https://github.com/elixir-error-tracker/error-tracker | Elixir Based Error Reporting Solution | PERSONAL_PROJECT |
http://github.com/kviklet/kviklet | Streamlining SQL Query Reviews to Prevent Costly Errors | PERSONAL_PROJECT |
https://github.com/opslane/opslane | Building a Copilot for Oncall Engineers Reducing Grunt Work | STARTUP |
https://github.com/pierreyoda/hncli | Developing a Rust Based Hacker News TUI Reader | PERSONAL_PROJECT |
https://github.com/ibudiallo/automated-agents-book | Creating A Comprehensive Guide On Building Effective Chatbots | PERSONAL_PROJECT |
https://github.com/jhspetersson/git-task | Git Task: Local Task Manager and Bug Tracker in Git | PERSONAL_PROJECT |
https://github.com/madprops/goldie | Firefox Vertical Tabs Powerful Python Chat Client Nim Text Finder | PERSONAL_PROJECT |
https://github.com/av/harbor | Building a Toolkit to Save Time Using Local LLMs | PERSONAL_PROJECT |
https://github.com/captn3m0/ideas | Curated Events Platform For Bangalore Using Open Source Tools | STARTUP |