Generate a redirect map from two sitemaps for website migration.
Switch branches/tags
Nothing to show
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Ignore some paths from git May 4, 2018
LICENSE Initial commit Apr 28, 2017
Pipfile Add dependencies to pipfile May 4, 2018
Pipfile.lock Add dependencies to pipfile May 4, 2018 Fix command in readme May 4, 2018 Formatting in May 4, 2018 Python 3 compatibility with argparse May 4, 2018 Initial commit Apr 28, 2017


Takes two lists of URLs and outputs a mapping that assigns each entry in list 1 an item from list 2 along with a score that indicates how likely the two refer to the same thing.

Use case

This script was created to automatically generate a map of redirects when migrating a website. The input lists would be a sitemap of each the old and new website, both plain text files containing one url per line. The URLs are required to be "pretty", meaning not just /post.php?id=123 but rather something like /blog/why-wordpress-sucks and ideally have their protocol- and domain parts removed.

It can of course be used as a generic tool to fuzzy match two sets of strings. It uses the Levenshtein distance metric as implemented by python-Levenshtein.

Warning: Always check the results manually. Never trust the output of the script blindly. It will assign each item in list 1 one item from list 2, even if it's a really bad match. usage

  1. Clone this repository git clone
  2. Enter it cd redirect-mapper
  3. Install dependencies python install
  4. Use it:
$ python [-h] [-t VALUE] [-c PATH] [-d] list1 list2

Generates a redirect map from two sitemaps for website migration.

By default, all matches are dumped on the standard output. If an item
from list1 is exactly contained in list2, it will be assigned right
away, without calculating distance or checking for ambiguity.

Issues & Documentation:

positional arguments:
  list1                 List of target items for which to find matches. (1 item per line)
  list2                 List of search items on which to search for matches. (1 item per line)

optional arguments:
  -h, --help            show this help message and exit
  -t VALUE, --threshold VALUE
                        Range within which two scores are considered equal. (default: 0.05)
  -c PATH, --csv PATH   If specified, the output will be formatted as CSV and written to PATH
  -d, --drop-exact      If specified, exact matches will be ommited from the output


Generate a list of redirects

Say your're asking where to redirect all the urls from old_sitemap.txt ?. Pass it as the first argument like so:

python old_sitemap.txt new_sitemap.txt

Adjust ambiguity threshold

To influence the level at which two matches are considered equally good, use the -t VALUE argument.

python -t 0.1 old_sitemap.txt new_sitemap.txt

Omit exact matches

If the results are used to set up 301 redirects on the new website to catch all traffic arriving at old URLs, exact matches can be omitted. They will be handled by actual pages exisiting on the new site (list2). Use the -d flag here.

python -d old_sitemap.txt new_sitemap.txt

Save output to CSV file

Specify the output filename with -c PATH.

python -c results.csv old_sitemap.txt new_sitemap.txt

Aggregating URLs from an XML sitemap

A helper exists that lets you crawl an XML sitemap and outputs a flat list of URLs, as required as input by Together with that tool, the whole process of generating a redirect map could look like the following. After that, you would of course manually check the results.csv, taking special care of matches with a low score (≤0.8).

python > old.txt
python > new.txt
python --drop-exact --csv results.csv old.txt new.txt usage

$ python [-h] URL/PATH

Aggregates URLs from a set of XML sitemaps listed under the entry path.

This script processes the XML file at given path, opens all sitemaps
listed inside, and prints all URLs inside those maps to stdout.
It should support most sitemaps that comply with the spec at

It was tested with sitemaps generated by the following WP plugins:
 - (Google XML Sitemaps)[]
 - (XML Sitemap & Google News feeds)[]
 - (Yoast SEO)[]

Issues & Documentation:

positional arguments:
  URL/PATH    Path or URL of the root sitemap.

optional arguments:
  -h, --help  show this help message and exit