Project History Analyzer

This project will reconstruct the development history of a coding project from zip file snapshots. I know that there are better ways to store and track your coding projects over time (this is being hosted on Github, after all), but I still tend to do as I have done for years and use batch files to zip up my project directory and add either an incremental number or a date to it, and store it in a directory that is backed up to the cloud. This makes it a bit of a challenge to go back to look at the state of code at some earlier time, but at least it's possible. Also, I have literally decades of project snapshots stored this way, and I realized I would like a way to dig into those old projects to reconstruct their history.

This code diffs consecutive snapshots locally, then uses statistical analysis to classify changes by magnitude. It then calls into an LLM API to generate narrative descriptions of what changed and why. Finally, it produces a report in markdown format (I use Obsidian for viewing them).

How it works

The pipeline has six phases:

Discovery - Finds and sorts zip snapshots by date/version label.
Local diffing - Extracts and diffs all consecutive pairs (no API calls).
Planning - Computes change magnitudes and uses gap-based natural breaks to classify transitions into minor/moderate/major tiers.
Project understanding - Sends the first snapshot's source code to the LLM for an architectural summary.
Analysis - Analyzes each transition at a depth matching its tier. Minor changes get brief summaries. Major transitions use tool-assisted conversations where the LLM can pull specific diffs and file contents on demand.
Report generation - Assembles everything into a markdown narrative.

Progress is saved after each step, so interrupted runs resume where they left off.

Example output

SimpleCCompiler history report - generated from 57 snapshots of a simple C compiler project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Requirements

Python 3.10+
anthropic and/or openai package
An API key for at least one provider

Setup

Clone the repo
Install dependencies: pip install anthropic openai

Create api_keys.py with your keys:

anthropic_key = "sk-ant-..."
openai_key = "sk-..."

Edit config.json to set zip_directory to the folder containing your zip snapshots

Snapshots should be zip files named like ProjectName_YYYYMMDD.zip, ProjectName_v2.zip, ProjectName_0003.zip, etc. The tool auto-detects several date and version naming patterns.

Usage

# List available projects (auto-detected from zip filenames)
python analyze_project.py --list-projects

# Analyze a project
python analyze_project.py <project_name>

# Preview the analysis plan without making API calls
python analyze_project.py <project_name> --plan-only

# Compare two specific snapshots
python analyze_project.py <project_name> --drill-down <label_A> <label_B>

# Override defaults
python analyze_project.py <project_name> --zip-dir PATH --output-dir PATH --model MODEL_NAME

Output

output/<project_name>_history.md - The generated report
output/<project_name>_progress.json - Resume state
output/api_cache.json - Cached API responses (avoids duplicate calls on re-runs)

Configuration

config.json controls model selection, file paths, and binary extension filtering. CLI flags override zip_directory, output.directory, and current_engine.

Project structure

analyze_project.py          CLI entry point, orchestrates the pipeline
snapshot_discovery.py       Finds and sorts zip snapshots by project name
snapshot_diff.py            Extracts zips, diffs file trees, detects moves
change_analyzer.py          Magnitude calculation, breakpoint detection, tier planning
llm_analysis.py             LLM prompting for summaries and change analysis
tool_assisted_analysis.py   Multi-turn tool-calling for major transitions
progress_tracker.py         JSON-based resumability
report_generator.py         Markdown report assembly
utils/ai_client.py          Anthropic and OpenAI client abstraction
utils/api_cache.py          Local response cache
utils/config.py             Configuration loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project History Analyzer

How it works

Example output

License

Requirements

Setup

Usage

Output

Configuration

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
utils		utils
.gitignore		.gitignore
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE.md		LICENSE.md
README.md		README.md
STATUS.md		STATUS.md
analyze_project.py		analyze_project.py
change_analyzer.py		change_analyzer.py
config.json		config.json
llm_analysis.py		llm_analysis.py
progress_tracker.py		progress_tracker.py
report_generator.py		report_generator.py
snapshot_diff.py		snapshot_diff.py
snapshot_discovery.py		snapshot_discovery.py
tool_assisted_analysis.py		tool_assisted_analysis.py

Folders and files

Latest commit

History

Repository files navigation

Project History Analyzer

How it works

Example output

License

Requirements

Setup

Usage

Output

Configuration

Project structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages