xml-diff Compare XML Files

Compare XML files after sorting attributes and fields.

A common database vendor's XML dump utility will regularly produce output where the order of the attributes changes. This makes using the command line diff utility completely useless in comparing two XML files.

xml-diff will sort the attributes and values and then perform a diff on the results.

Install

go get -u github.com/pschlump/xml-diff

To Build

The command line utility is in the top level directory.

cd ~/.../github.com/pschlump/xml-diff
go build
cp xml-diff ~/bin

You can run tests on the command line with

make test

Importing Package

The command line is in this directory. The package that performs most of the work is xmllib.

import "github.com/pschlump/xml-diff/xmllib"

Example

For XML inputs

<?xml version="1.0" encoding="UTF-8"?>
<ConnectedApp xmlns="http://soap.sforce.com/2006/04/metadata">
	<contactEmail>foo@example.org</contactEmail>
	<label>WooCommerce</label>
	<oauthConfig>
		<scopes>Basic</scopes>
		<scopes>Api</scopes>
		<scopes>Web</scopes>
		<scopes>Full</scopes>
		<callbackUrl>https://login.salesforce.com/services/oauth2/callback</callbackUrl>
		<consumerKey>CLIENTID</consumerKey>
	</oauthConfig>
</ConnectedApp>

and

<?xml version="1.0" encoding="UTF-8"?>
<ConnectedApp xmlns="http://soap.sforce.com/2006/04/metadata">
	<contactEmail>foo@example.org</contactEmail>
	<label>WooCommerce</label>
	<oauthConfig>
		<callbackUrl>https://login.salesforce.com/services/oauth2/callback</callbackUrl>
		<consumerKey>OTHER</consumerKey>
		<scopes>Full</scopes>
		<scopes>Basic</scopes>
	</oauthConfig>
</ConnectedApp>

You can run:

./xml-diff -l ./testdata/left.xml -r ./testdata/right.xml

The output is:

If you add the -byLine flag the diff will be shown by lines.

./xml-diff -l ./testdata/left.xml -r ./testdata/right.xml -byLine

The output is:

Algorithm

The diff is based on the Myers algorithm. This is the most common approach to comparing differences between files.

An alternative approach would be to perform the difference on the XML node-tree in memory. Because of my plan to be able to move attributes to values and back this is an undesirable way to express the differences.

Performance

Memory

It takes about 4.2 times the heap size as the size of the XML file to run. This means that if you have a 128 Mb of memory you should be able to xml-diff files of up to 30 Mb in size.

Performance

The XML read/parse and generate will run about 10 MB of XML in a second. 100ms should compare about 1 MB of XML. Performance is heavily dependent on how much data has to be sorted. If there are lots of XML nodes that have to be built and then sorted it will take longer to process.

TODO

Support is in place for moving attributes to values or values back to attributes. The same database vendor seems to arbitrarily swap these in its XML dump. I am testing this and will add documentation for it.
Better documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
cfgLib		cfgLib
out		out
ref		ref
testdata		testdata
xmllib		xmllib
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main-cli.go		main-cli.go
planned-changes.md		planned-changes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xml-diff Compare XML Files

Install

To Build

Importing Package

Example

Algorithm

Performance

Memory

Performance

TODO

About

Releases

Packages

Languages

License

pschlump/xml-diff

Folders and files

Latest commit

History

Repository files navigation

xml-diff Compare XML Files

Install

To Build

Importing Package

Example

Algorithm

Performance

Memory

Performance

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages