Redirx: Automated 301 Redirect Generation for Website Migrations

Redirx is a student project at the Georgia Institute of Technology.

Redirx is made up of multiple components:

A website: Used as the user-facing component of Redirx. Orchestrates the interaction between itself, the Python module, and the SQL database.
A SQL database: Used to ensure user data persists across sessions.
A Python module: Used to implement the primary logic behind Redirx
A Python script: Used to interact with the website

User uploads CSVs (URLs only, maybe status codes)
PRUNING PHASE #1 (no scraping yet) • Exact URL matches (normalized) • Obvious URL patterns • Exclude blog posts (/blog/, /YYYY/, etc.) • Exclude static assets, admin URLs • Exclude 4xx/3xx on old site
Remaining URLs → Queue for scraping
SCRAPE PHASE (only unmatched URLs) • Scrape old site URLs • Scrape new site URLs • Extract: title, h1s, meta description, main content • Store raw content + cleaned text
PRUNING PHASE #2 (basic content scrapping) • Exact HTML matches
EMBEDDING PHASE • Generate vector embeddings from content • Store in pgvector
MATCHING PHASE • Nearest neighbor search (cosine similarity) • Calculate confidence scores • Flag ambiguous cases
Human review interface

Test sites can be spun up locally using tests/mock_sites/start_servers.py

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.devcontainer		.devcontainer
backend		backend
benchmark_results/worst_case		benchmark_results/worst_case
data		data
database/migrations		database/migrations
docs		docs
frontend		frontend
results		results
scripts		scripts
src/redirx		src/redirx
supabase/functions/public-stats		supabase/functions/public-stats
tests		tests
workstreams		workstreams
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DELETE_AND_VIEW_ALL_FEATURES.md		DELETE_AND_VIEW_ALL_FEATURES.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
DEVOPS_ARCHITECTURE.md		DEVOPS_ARCHITECTURE.md
DEVOPS_QUICK_REF.md		DEVOPS_QUICK_REF.md
EMBED_STAGE_GUIDE.md		EMBED_STAGE_GUIDE.md
EMPTY_STATE_IMPLEMENTATION.md		EMPTY_STATE_IMPLEMENTATION.md
IDEMPOTENCY_BUG_FIX.md		IDEMPOTENCY_BUG_FIX.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
IMPLEMENTATION_VERIFICATION.md		IMPLEMENTATION_VERIFICATION.md
KEYBOARD_SHORTCUTS_IMPLEMENTATION.md		KEYBOARD_SHORTCUTS_IMPLEMENTATION.md
KEYBOARD_SHORTCUTS_TESTING.md		KEYBOARD_SHORTCUTS_TESTING.md
PROGRESS_METER_FIX.md		PROGRESS_METER_FIX.md
QA_AUTHENTICATION_REPORT.md		QA_AUTHENTICATION_REPORT.md
README.md		README.md
RLS_FIX_GUIDE.md		RLS_FIX_GUIDE.md
RedirX_Pricing_Strategy.docx		RedirX_Pricing_Strategy.docx
WORKER_QUICK_REF.md		WORKER_QUICK_REF.md
WORKSTREAMS.md		WORKSTREAMS.md
benchmark_worst_case.py		benchmark_worst_case.py
check_idempotency.py		check_idempotency.py
dev.py		dev.py
hosted new site urls - 30 aligned.csv		hosted new site urls - 30 aligned.csv
hosted old site urls - 30 aligned.csv		hosted old site urls - 30 aligned.csv
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_backend.sh		run_backend.sh
test_idempotency_directly.py		test_idempotency_directly.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redirx: Automated 301 Redirect Generation for Website Migrations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Redirx: Automated 301 Redirect Generation for Website Migrations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages