I'm a developer who's been visiting the world of LLMs as a hobby since 2023. My main focus is on locally run, offline, LLMs which I mostly use for even more hobby tinkering.
I generally do most of my development locally, and I do most of my work on weekends. Usually I'm too tired after work to do much in the weekday evenings.
I'm quite passionate in regards to the power of workflows with LLMs, and as a developer I generally prefer more manual chat-style interfacing with LLMs powered by workflows than I do leaving a task to an automated agent. There are some exceptions, however; web searching is a great use of agents, IMO.
But as a developer, with the current tech (as of 2025-08), I feel that I can iterate faster and cleaner sitting in between the AI and my code.
UPDATE: 2025-09-27
I'm no longer on Reddit, which was my main platform since 2023 and where the majority of y'all likely found and/or interacted with me. Currently, my tech blog (someoddcodeguy.dev) contains my most important benchmark posts and will contain my future ramblings.
Additionally- I've just started using X/Twitter. I started the X account way back in 2023 but I never used it until a week or two ago. Since the local AI scene, which is all I really interact with, doesn't have much of a presence there, I currently have all of 0 followers lol. But if you want to see what nonsense I'm up to, you can check one of those 2 spots.
I started Wilmer during the Llama 2 era based on the idea that open-weight models at the time were weak as generalists compared to the big proprietary models like ChatGPT; however, individual fine-tunes within scoped domains (like coding or medical) could often compete with those big models. My goal has always been to try to find a way, either through routing or workflows, to help my local models keep pace with the big APIs.
Obviously, modern open-weight models are strong enough to not need that help nearly as much, but that just means the same methods can push those models even farther.
I'm not a python developer by trade; I picked it up to work on Wilmer, and I've been learning it ever since. Some of the mess in the codebases here are tech debt due to my fumbling along and learning early on as I started to understand it more. In my day job, I'm a dev manager that mostly works with C# and web tech.
- Guide is a bit older now, but still applies. I've automated a lot of this in workflows, but when I'm somewhere like my work, I'd still make use of these techniques.
- Many of my Wilmer workflows are in some part inspired by the general workflows I do here
- M2 Ultra Mac Studio speed tests from freshly loaded models [Github Mirror]
- M2 Ultra Mac Studio speed tests utilizing KoboldCpp's context shifting [Github Mirror]
- M3 Ultra running Command-A 111b and Llama 3.1 405b [Github Mirror]
- M3 Ultra Deepseek V3 Run Speeds and Memory Costs [Github Mirror]
- M3 Ultra R1-0528 Run Speeds and Memory Costs + MLA difference [Github Mirror]
- Comparison of M2 Max, M2 Ultra and RTX 4090 speeds [Github Mirror]
- Comparison of M2 Ultra and M3 Ultra Speeds [Github Mirror]