Skip to content

python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the supersets/subsets of a given set from a collection of sets. Also includes a trie-based mapping container where the keys are sets.

License

Notifications You must be signed in to change notification settings

mmihaltz/pysettrie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pysettrie

Build Status

https://github.com/mmihaltz/pysettrie

pysettrie is a python3 package that provides support for efficient storage and querying of sets of sets using the trie data structure, supporting operations like finding all the supersets/subsets of a given set from a collection of sets. The original motivation for this module was to provide efficient search for supersets of sets of feature-value pairs in our natural language parser project (e.g. matching nouns against verb argument positions).

The following classes are included:

  • SetTrie: set-trie container for sets; supports efficient supersets/subsets of a given search set calculations.
  • SetTrieMap: mapping container using sets as keys; supports efficient operations like SetTrie but also stores values associated to the key sets.
  • SetTrieMultiMap: like SetTrieMap, but supports multiple values associated to each key.

For further information, please see documentation

Module test_settrie.py contains unittests for all the containers.

Author: Márton Miháltz https://sites.google.com/site/mmihaltz/

This package depends on the sortedcollection module. One recommended way to install (tested on Ubuntu):

sudo pip3 install sortedcontainers

If you don't have pip3:

sudo apt-get install python3-setuptools
sudo easy_install3 pip

pysettrie is partly based on: I.Savnik: Index data structure for fast subset and superset queries. CD-ARES, IFIP LNCS, 2013. http://osebje.famnit.upr.si/~savnik/papers/cdares13.pdf Remarks on paper:

  • Algorithm 1. does not mention to sort children (or do sorted insert) in insert operation (line 5)
  • Algorithm 4. is wrong, will always return false, line 7 should be: "for (each child of node labeled l: word.currentElement <= l) & (while not found) do"
  • the descriptions of getAllSubSets and getAllSuperSets operations are wrong, would not produce all sub/supersets

See also:

Changes:

  • Version 0.1.3:
    • SetTrieMultiMap.assign() returns number of values associated to key after assignment.

Licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 3.

About

python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the supersets/subsets of a given set from a collection of sets. Also includes a trie-based mapping container where the keys are sets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages