Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate unique taxonomy version #32

Closed
fungs opened this issue Mar 31, 2015 · 4 comments
Closed

Calculate unique taxonomy version #32

fungs opened this issue Mar 31, 2015 · 4 comments

Comments

@fungs
Copy link
Owner

fungs commented Mar 31, 2015

Derive a tree hash which can be used to identify a taxonomy solely based on its structure (tree + node identifiers).

@fungs
Copy link
Owner Author

fungs commented Mar 31, 2015

Example: create tuples (parent, child), sort with unique entries, concatenate and hash with md5.

@fungs
Copy link
Owner Author

fungs commented Mar 31, 2015

Alternatively, read from file in taxonomy folder and let the versioning be handled externally.

@fungs
Copy link
Owner Author

fungs commented Mar 31, 2015

Preliminary external bourne code:

taxonomy_version() { cat "$@" | awk -F '\t' '{if($1 != $3) print $1 "\t" $3}' | LC_COLLATE=C sort -u | md5sum | cut -d ' ' -f 1; }

taxonomy_version nodes.dmp

@fungs
Copy link
Owner Author

fungs commented Mar 31, 2015

Close this, because it is done externally. Leaves most freedom to use whatever data and version you like.

@fungs fungs closed this as completed Mar 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant