Skip to content
/ pybkt Public

Python3 Implementation of Burkhard-Keller tree.

License

Notifications You must be signed in to change notification settings

epinal/pybkt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pybkt

pybkt is a pure python3 implementation of Burkhard-Keller tree data structure.

This implementation is based on:

Both lookup methods from above implementations are included:

  • query(): recursive
  • find(): not recursive

Example usage:

Create the tree using Linux English words dictionary and levenshtein distance function.

    >>> tree = BKTree(levenshtein, dict_words())    
    >>> tree.query('book', 1)  # Query the tree to find words at most 1 distance from 'book'
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]
    
    # calling ``find()`` which is not recursive gives the same results
    >>> tree.find('book', 1)
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]

Creating the BkTree data structure using big amounts of data take some time so you can save it into a file and load it after.

    >>> tree = BKTree(levenshtein, dict_words())    
    >>> tree.safe_to_file('tree.json')  # Safe the tree structure in a file, default 'tree.json' 
    
    # Load the tree
    >>> tree1 = BKTree(levenshtein)
    >>> tree1.load('tree.json')
    >>> tree1.find('book', 1)
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]

Measure BKTree time vs brute force

    # Load the saved structure
    >>> tree = BKTree(levenshtein)
    >>> tree.load('tree.json')    
    >>> time_of(brute_query, 'book', list(dict_words()), levenshtein, 1)    
    Time:  5.613326072692871
    >>> time_of(tree.query, 'book', 1)  
    Time:  0.056838035583496094

TODO: Add python2 support.

About

Python3 Implementation of Burkhard-Keller tree.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages