Skip to content

mikexcohen/Substack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

I have a Substack :)

https://mikexcohen.substack.com/

My substack posts are each 1000-3000 words (5-15 minutes to read), and focus on topics in machine-learning, LLMs, applied math, and related technical topics.

Each post has an accompanying code file that will reproduce and extend the analyses described in the post. I wrote the code files in Google Colab, and therefore, running then in Colab is the easiest way to ensure reproducibility and library installations.

The technical posts are organized into two sections, one about data science and one about large language model mechanisms.

Data science, unpacked

Explore core concepts in data science and applied math through clear explanations, equations, and hands-on code. Each post unpacks a single topic, ranging from correlation and covariance to Fourier transforms, fractals, and neural simulations, translating between theory and Python implementation. Every post comes with a Python notebook so you can reproduce the results, experiment with the methods, and apply them to your own projects.

(Hint: Press ctrl or command while clicking the links to open in a new tab.)

Post title Code file Brief description
Correlation vs. cosine similarity Correlation_vs_cosineSimilarity.ipynb Simulate data to learn the math and implementations of correlation and cosine similarity.
Zipf's law in famous fiction: characters and GPT4 tokens ZipfsLaw_charactersTokens.ipynb Explore character and subword (GPT4 tokens) frequencies in famous fiction books.
The Fourier transform, explained with for-loops Fourier_with_forloops.ipnyb Learn how the Fourier transform works, using for-loops in Python.

Dissecting LLMs with ML

Understand how large language models (LLMs) really work by applying machine learning (ML) methods to their internal activations. Each post explores how LLMs process text, isolate patterns, and generate new outputs. You’ll learn how to probe, manipulate, and explain model internals. Every article includes a complete Python notebook so you can reproduce the results, visualize the mechanisms, and extend the experiments further.

(Hint: Press ctrl or command while clicking the links to open in a new tab.)

Post title Code file Brief description
Drawing text heatmaps to visualize LLM calculations textHeatmaps_GPT2.ipynb Learn how to create text heatmaps, and then use them to visualize GPT2 next-token predictions.

About

Code files that accompany my Substack posts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published