Data science, unpacked

I have a Substack :)

https://mikexcohen.substack.com/

My substack posts are each 1000-3000 words (5-15 minutes to read), and focus on topics in machine-learning, LLMs, applied math, and related technical topics.

Each post has an accompanying code file that will reproduce and extend the analyses described in the post. I wrote the code files in Google Colab, and therefore, running then in Colab is the easiest way to ensure reproducibility and library installations.

The technical posts are organized into two sections, one about data science and one about large language model mechanisms.

Data science, unpacked

Explore core concepts in data science and applied math through clear explanations, equations, and hands-on code. Each post unpacks a single topic, ranging from correlation and covariance to Fourier transforms, fractals, and neural simulations, translating between theory and Python implementation. Every post comes with a Python notebook so you can reproduce the results, experiment with the methods, and apply them to your own projects.

(Hint: Press ctrl or command while clicking the links to open in a new tab.)

Post title	Code file	Brief description
Correlation vs. cosine similarity	Correlation_vs_cosineSimilarity.ipynb	Simulate data to learn the math and implementations of correlation and cosine similarity.
Zipf's law in famous fiction: characters and GPT4 tokens	ZipfsLaw_charactersTokens.ipynb	Explore character and subword (GPT4 tokens) frequencies in famous fiction books.
The Fourier transform, explained with for-loops	Fourier_with_forloops.ipnyb	Learn how the Fourier transform works, using for-loops in Python.

Dissecting LLMs with ML

Understand how large language models (LLMs) really work by applying machine learning (ML) methods to their internal activations. Each post explores how LLMs process text, isolate patterns, and generate new outputs. You’ll learn how to probe, manipulate, and explain model internals. Every article includes a complete Python notebook so you can reproduce the results, visualize the mechanisms, and extend the experiments further.

(Hint: Press ctrl or command while clicking the links to open in a new tab.)

Post title	Code file	Brief description
Drawing text heatmaps to visualize LLM calculations	textHeatmaps_GPT2.ipynb	Learn how to create text heatmaps, and then use them to visualize GPT2 next-token predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
DSUnpacked		DSUnpacked
MLonLLMs		MLonLLMs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data science, unpacked

Dissecting LLMs with ML

About

Uh oh!

Releases

Packages

Languages

License

mikexcohen/Substack

Folders and files

Latest commit

History

Repository files navigation

Data science, unpacked

Dissecting LLMs with ML

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages