Include `version` attribute in main module #28

Erotemic · 2019-06-18T16:24:04Z

It would be helpful if the main module has a __version__ attribute like most modules. This might be tricky to reconcile with the current way of versioning this lib (i.e. via version.py), but I think it would be an overall improvement.

The text was updated successfully, but these errors were encountered:

mina86 · 2019-07-22T10:30:47Z

Do you have particular use case that you're needing this for? In general I recommend tisting if a feature exist rather than parsing __version__ though I'm not opposed to adding that variable.

version.py could be replaced with a release script which in addition to picking up version from git could also do all the other boring stuff (run all tests, commit version-history.rst changes, tag commit, sdist and upload).

Erotemic · 2019-07-22T16:16:17Z

A very common way to determine the version of a package you have is: python -c "import pkgname; print(pkgname.__version__)". Most packages include a __version__ variable, so this usually works.

I find testing for a feature to be insufficient for two reasons. First, it makes it difficult to help someone who is installing this software as a dependency and might not be familiar with the features of the library, whereas asking them to check a version number of relatively simple. Second, sometimes new features don't expose themselves as top-level functions, which may make it more complicated and package specific to test the version.

I agree that changing version.py to a release script (I usually call mine publish.py), would be an improvement.

Erotemic · 2019-08-08T14:47:07Z

Anther use case for __version__ (aside from the fact that its a defacto standard) is dynamically checking versions at runtime. The most recent update to 2.3.2 broke my code. Instead of simply being able to blindly do:

from distutils.version import LooseVersion
import pygtrie
if LooseVersion(pygtrie.__version__) >= LooseVersion('2.3.2'):
    do_new_thing()
else:
    do_old_thing()

I had to search for an attribute that the new version had and the old one didn't and then use a hasattr test, which doesn't give the code any readability when it comes to what condition corresponds to the new version and which corresponds to the old.

mina86 · 2019-08-08T16:22:11Z

I've been working on a new release procedure which will include __version__ so despite lack of any updates to this bug report, there is progress being made.

On Thu, Aug 8, 2019, 15:48 Jon Crall ***@***.***> wrote: Anther use case for __version__ is dynamically checking versions at runtime. The most recent update to 2.3.2 broke my code.

Could you point out the failure? That might be something that needs to be fixed regardless of adding __version__.

…

Erotemic · 2019-08-09T14:35:48Z

Awesome, I'm glad to hear __version__ is coming.

The breakage was due to the non-standard way I was using the library and not due to any mistakes in the update. I was using non-public variables to gain access to the raw node data structure. The change from using dictionaries to using the _OneChild / _NoChildren / _Children objects was causing me issues because the .values method didn't exist anymore.

Here is the code in question:

def _trie_iternodes(self):
    """
    Generates all nodes in the trie

    # Hack into the internal structure and insert frequencies at each node
    """
    from collections import deque
    stack = deque([[self._root]])
    while stack:
        for node in stack.pop():
            yield node
            try:
                # only works in pygtrie-2.2 broken in pygtrie-2.3.2
                stack.append(node.children.values())
            except AttributeError:
                stack.append([v for k, v in node.children.iteritems()])

And I only use this to hack the values of all nodes to zero:

    # Set the value (frequency) of all nodes to zero.
    for node in _trie_iternodes(trie):
        node.value = 0

For the full context, this is used to implement the shortest_unique_prefix algorithm.

def shortest_unique_prefixes(items, sep=None, allow_simple=True, min_length=0, allow_end=False):
    r"""
    The shortest unique prefix algorithm.

    Args:
        items (list of str): returned prefixes will be unique wrt this set
        sep (str): if specified, all characters between separators are treated
            as a single symbol. Makes the algo much faster.
        allow_simple (bool): if True tries to construct a simple feasible
            solution before resorting to the optimal trie algorithm.
        min_length (int): minimum length each prefix can be
        allow_end (bool): if True allows for string terminators to be
            considered in the prefix

    Returns:
        list of str: a prefix for each item that uniquely identifies it
           wrt to the original items.

    References:
        http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/
        https://github.com/Briaares/InterviewBit/blob/master/Level6/Shortest%20Unique%20Prefix.cpp

    Requires:
        pip install pygtrie

    Example:
        >>> # xdoctest: +REQUIRES(--pygtrie)
        >>> items = ["zebra", "dog", "duck", "dove"]
        >>> shortest_unique_prefixes(items)
        ['z', 'dog', 'du', 'dov']

    Timeing:
        >>> # DISABLE_DOCTEST
        >>> # make numbers larger to stress test
        >>> # L = max length of a string, N = number of strings,
        >>> # C = smallest gaurenteed common length
        >>> # (the setting N=10000, L=100, C=20 is feasible we are good)
        >>> import ubelt as ub
        >>> import random
        >>> def make_data(N, L, C):
        >>>     rng = random.Random(0)
        >>>     return [''.join(['a' if i < C else chr(rng.randint(97, 122))
        >>>                      for i in range(L)]) for _ in range(N)]
        >>> items = make_data(N=1000, L=10, C=0)
        >>> ub.Timerit(3).call(shortest_unique_prefixes, items).print()
        Timed for: 3 loops, best of 3
            time per loop: best=24.54 ms, mean=24.54 ± 0.0 ms
        >>> items = make_data(N=1000, L=100, C=0)
        >>> ub.Timerit(3).call(shortest_unique_prefixes, items).print()
        Timed for: 3 loops, best of 3
            time per loop: best=155.4 ms, mean=155.4 ± 0.0 ms
        >>> items = make_data(N=1000, L=100, C=70)
        >>> ub.Timerit(3).call(shortest_unique_prefixes, items).print()
        Timed for: 3 loops, best of 3
            time per loop: best=232.8 ms, mean=232.8 ± 0.0 ms
        >>> items = make_data(N=10000, L=250, C=20)
        >>> ub.Timerit(3).call(shortest_unique_prefixes, items).print()
        Timed for: 3 loops, best of 3
            time per loop: best=4.063 s, mean=4.063 ± 0.0 s
    """
    import pygtrie
    if len(set(items)) != len(items):
        raise ValueError('inputs must be unique')

    # construct trie
    if sep is None:
        trie = pygtrie.CharTrie.fromkeys(items, value=0)
    else:
        # In some simple cases we can avoid constructing a trie
        if allow_simple:
            tokens = [item.split(sep) for item in items]
            simple_solution = [t[0] for t in tokens]
            if len(simple_solution) == len(set(simple_solution)):
                return simple_solution
            for i in range(2, 10):
                # print('return simple solution at i = {!r}'.format(i))
                simple_solution = ['-'.join(t[:i]) for t in tokens]
                if len(simple_solution) == len(set(simple_solution)):
                    return simple_solution

        trie = pygtrie.StringTrie.fromkeys(items, value=0, separator=sep)

    # Set the value (frequency) of all nodes to zero.
    for node in _trie_iternodes(trie):
        node.value = 0

    # For each item trace its path and increment frequencies
    for item in items:
        final_node, trace = trie._get_node(item)
        for key, node in trace:
            node.value += 1

    # if not isinstance(node.value, int):
    #     node.value = 0

    # Query for the first prefix with frequency 1 for each item.
    # This is the shortest unique prefix over all items.
    unique = []
    for item in items:
        freq = None
        for prefix, freq in trie.prefixes(item):
            if freq == 1 and len(prefix) >= min_length:
                break
        if not allow_end:
            assert freq == 1, 'item={} has no unique prefix. freq={}'.format(item, freq)
        # print('items = {!r}'.format(items))
        unique.append(prefix)
    return unique

Honestly, I haven't looked at this code in so long, I'm don't remember why access to the raw nodes was necessary. When I took a quick look at the new code, it looked like there might have been a new better way of doing this, but opted for the quick solution rather than spending effort to investigate if there was a new public API that gave me what I needed. If the new update has a better way of doing this, let me know.

mina86 · 2019-08-09T15:33:23Z

Yeah, lack of ability to traverse the structure of the trie is something pygtrie would benefit from greatly. Nonetheless, you can get away without using any of the internal methods: ``` import pygtrie if sep is None: trie = pygtrie.CharTrie() else: if allow_simple: # ... trie = pygtrie.StringTrie(separator=sep) for item in items: if trie.setdefault(item, 0) != 0: raise ValueError('inputs must be unique') for step in trie.walk_towards(item): step.set(step.get(0) + 1) unique = [] for item in items: freq = None for prefix, freq in trie.prefixes(item): if freq == 1 and len(prefix) >= min_length: break if not allow_end: assert freq == 1, 'item={} has no unique prefix. freq={}'.format(item, freq) unique.append(prefix) return unique ``` This requires pygtrie 2.3 since that’s when `walk_towards` was added.

mina86 · 2020-04-04T19:27:46Z

2.3.3 now includes __version__ and hopefully it’ll stay that way going forward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include `version` attribute in main module #28

Include `version` attribute in main module #28

Erotemic commented Jun 18, 2019

mina86 commented Jul 22, 2019

Erotemic commented Jul 22, 2019

Erotemic commented Aug 8, 2019 •

edited

Loading

mina86 commented Aug 8, 2019 via email

Erotemic commented Aug 9, 2019

mina86 commented Aug 9, 2019 via email

mina86 commented Apr 4, 2020

Include __version__ attribute in main module #28

Include __version__ attribute in main module #28

Comments

Erotemic commented Jun 18, 2019

mina86 commented Jul 22, 2019

Erotemic commented Jul 22, 2019

Erotemic commented Aug 8, 2019 • edited Loading

mina86 commented Aug 8, 2019 via email

Erotemic commented Aug 9, 2019

mina86 commented Aug 9, 2019 via email

mina86 commented Apr 4, 2020

Include `version` attribute in main module #28

Include `version` attribute in main module #28

Erotemic commented Aug 8, 2019 •

edited

Loading