Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading nexus files from strings > 220bytes fails on windows with Python 3 #258

Open
davidhwyllie opened this issue Feb 17, 2017 · 6 comments

Comments

@davidhwyllie
Copy link
Contributor

Hi

A minor issue; in the Newick parser read_newick function (parsers/newick.py, line 220) called when reading newick files, as described here:
http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-newick-trees

does a kind of 'duck typing' approach to deciding whether the string passed is a file path or a newick string.
On windows this fails if the string is >220bytes because os.stat returns an error from the operating system.

I guess there's various ways of dealing with this, & I've attached a modified module below which implements one such.

Best wishes
David


def read_newick(newick, root_node=None, format=0):
    """ Reads a newick tree from either a string or a file, and returns
    an ETE tree structure.

    A previously existent node object can be passed as the root of the
    tree, which means that all its new children will belong to the same
    class as the root(This allows to work with custom TreeNode
    objects).

    You can also take advantage from this behaviour to concatenate
    several tree structures.
    """
   
    if root_node is None:
        from ..coretype.tree import TreeNode
        root_node = TreeNode()

    if isinstance(newick, six.string_types):
        # we duck-type the string newick.
        # if it is a path, as identified by stat, we assume it is a file
        # otherwise, we assume it is a string.
        # On windows, the maximal permitted path length is 220 and therefore
        # passing large strings raises a ValueError from stat,
        # which states 'String is too long for Windows'
        # as a workaround, one can test whether newick is too long to be a
        # windows string.
        if len(newick)<220:     # then it could be either a path or a tree
            if os.path.exists(newick):
                if newick.endswith('.gz'):
                    import gzip
                    nw = gzip.open(newick).read()
                else:
                    # note: this raises a deprecated error on  Win 7 with python 3.5.2
                    nw = open(newick, 'rU').read()
            else:
                nw = newick
        else:
            nw = newick

        matcher = compile_matchers(formatcode=format)
        nw = nw.strip()        
        if not nw.startswith('(') and nw.endswith(';'):
            return _read_node_data(nw[:-1], root_node, "single", matcher, format)

        elif not nw.startswith('(') or not nw.endswith(';'):
            raise NewickError('Unexisting tree file or Malformed newick tree structure.')
        else:
            return _read_newick_from_string(nw, root_node, matcher, format)

    else:
        raise NewickError("'newick' argument must be either a filename or a newick string.")

@unode
Copy link
Member

unode commented Mar 3, 2017

@davidhwyllie Can you paste a traceback of the actual error?
Depending on the content I'm tempted to report this as an upstream (i.e. python) problem.

@unode
Copy link
Member

unode commented Mar 3, 2017

Also which version of Python 3?

@davidhwyllie
Copy link
Contributor Author

Hi

at the moment I am having trouble producing a unit test to reproduce this. It isn't as simple as the 'strings longer than 220 chars' described above. When I have a reproducible, failing example I will write further.

In the meantime, I have attached a unit test which passes which loads small and large files from strings and files on windows using unpatched ete3.

It produces a deprecation warning related to reading files in 'rU' mode, but this is a python 3 issue not a windows one.

ete3_issue258.zip

@jhcepas
Copy link
Member

jhcepas commented Mar 15, 2017 via email

@davidhwyllie
Copy link
Contributor Author

Hi

Thanks. Sorry for the delay in replying.

This issue can be reproduced on the following system using the latest release of ete3.

System details
OS: Windows-7-6.1.7601-SP1
Python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)]
Ete3: 3.1.1

How to reproduce
A self contained unit test is attached

E:\dwyllie\GitHub\SPIRALTREE\src>python issue258.py

Running testIssue258 
system: Windows-7-6.1.7601-SP1
python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017,
 16:02:32) [MSC v.1900 64 bit (AMD64)]
ete3 version: 3.1.1
Loading small tree
There are 3 nodes
Loading large tree
E
======================================================================
ERROR: runTest (__main__.testIssue258)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "issue258.py", line 37, in runTest
    largeTree=Tree(largeTreeText)
  File "C:\Program Files\Python35\lib\site-packages\ete3_dev_git\ete3\coretype\t
ree.py", line 211, in __init__
    quoted_names=quoted_node_names)
  File "C:\Program Files\Python35\lib\site-packages\ete3_dev_git\ete3\parser\new
ick.py", line 234, in read_newick
    if os.path.exists(newick):
  File "C:\Program Files\Python35\lib\genericpath.py", line 19, in exists
    os.stat(path)
ValueError: stat: path too long for Windows

----------------------------------------------------------------------
Ran 1 test in 0.019s

FAILED (errors=1)

E:\dwyllie\GitHub\SPIRALTREE\src>

How to fix

Modify the read_newick function as shown

def read_newick(newick, root_node=None, format=0, quoted_names=False):
    """ Reads a newick tree from either a string or a file, and returns
    an ETE tree structure.

    A previously existent node object can be passed as the root of the
    tree, which means that all its new children will belong to the same
    class as the root(This allows to work with custom TreeNode
    objects).

    You can also take advantage from this behaviour to concatenate
    several tree structures.
    """
   
    if root_node is None:
        from ..coretype.tree import TreeNode
        root_node = TreeNode()

    if isinstance(newick, six.string_types):
        # we duck-type the string newick.
        # if it is a path, as identified by stat, we assume it is a file
        # otherwise, we assume it is a string.
        # On windows, the maximal permitted path length is 220 and therefore
        # passing large strings raises a ValueError from stat,
        # which states 'String is too long for Windows'
        # as a workaround, one can test whether newick is too long to be a
        # windows string.
        if len(newick)<220:     # then it could be either a path or a tree
            if os.path.exists(newick):
                if newick.endswith('.gz'):
                    import gzip
                    nw = gzip.open(newick).read()
                else:
                    nw = open(newick, 'rU').read()
            else:
                nw = newick
        else:
            nw = newick

        matcher = compile_matchers(formatcode=format)
        nw = nw.strip()        
        if not nw.startswith('(') and nw.endswith(';'):
            return _read_node_data(nw[:-1], root_node, "single", matcher, format)

        elif not nw.startswith('(') or not nw.endswith(';'):
            raise NewickError('Unexisting tree file or Malformed newick tree structure.')
        else:
            return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)

    else:
        raise NewickError("'newick' argument must be either a filename or a newick string.")

Post fix:

E:\dwyllie\GitHub\SPIRALTREE\src>python issue258.py
60430
Reproducing issue #258
Platform: Windows-7-6.1.7601-SP1
python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 6
4 bit (AMD64)]
ete3 version: 3.1.1
Loading small tree
There are 3 nodes
Loading large tree
There are 1026 nodes
.
----------------------------------------------------------------------
Ran 1 test in 0.068s

OK

E:\dwyllie\GitHub\SPIRALTREE\src>

I don't think I have permission to create a PR (sorry, I've not done this before) but I have committed my local change to a copy of your repository [here] (https://github.com/davidhwyllie/ETE3_ISSUE258)

If you would like me to do something different, please let me know.

@davidhwyllie
Copy link
Contributor Author

Thank you for your assistance. I will submit a pull request as advised.
Issue reproduced and resolution tested on the following system:
Windows-10-10.0.17134-SP0
3.6.6rc1 (v3.6.6rc1:1015e38be4, Jun 12 2018, 08:38:06) [MSC v.1900 64 bit (AMD64)]

Unit test and commit fixing issue
https://github.com/davidhwyllie/ete/commit/1752154f361f57144ce4b09f3993f6b8374b2993

jhcepas added a commit that referenced this issue Jun 29, 2019
Resolve long tree string problem on Windows #258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants