Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Encoding problems #311

Merged
merged 32 commits into from
Dec 18, 2017
Merged

[MRG] Encoding problems #311

merged 32 commits into from
Dec 18, 2017

Conversation

Titan-C
Copy link
Member

@Titan-C Titan-C commented Oct 31, 2017

The filesystem's encoding has generated quite some problems #257, #259, #260, #269
I started this branch long ago, I'll bring it back to live now. This starts the test on Travis-CI for the ASCII encoding.
It does not work yet.

@lesteve
Copy link
Member

lesteve commented Nov 10, 2017

It'd be nice to fix this indeed. Your last commit status is green, does that mean that this is ready for review?

@Titan-C
Copy link
Member Author

Titan-C commented Nov 10, 2017 via email

@lesteve
Copy link
Member

lesteve commented Nov 10, 2017

I also need a new test that verifies code blocks are correctly parsed for embedding documentation links, to avoid #259 happening again.

I think there is a test for that already:

def test_embed_links(sphinx_app):
"""Test that links are embedded properly in doc."""
out_dir = sphinx_app.outdir
examples_dir = op.join(out_dir, 'auto_examples')
assert op.isdir(examples_dir)
example_files = os.listdir(examples_dir)
assert 'plot_numpy_scipy.html' in example_files
example_file = op.join(examples_dir, 'plot_numpy_scipy.html')
with codecs.open(example_file, 'r', 'utf-8') as fid:
lines = fid.read()
# ensure we've linked properly
assert '#module-scipy.signal' in lines
assert 'scipy.signal.firwin.html' in lines
assert '#module-numpy' in lines
assert 'numpy.arange.html' in lines
# assert '#module-matplotlib.pyplot' in lines
# assert 'pyplot.html' in lines

Although for some reason the matplotlib check was not passing on Travis which is why it is commented out. I could not reproduce, see #308 (comment) for more details.

@larsoner
Copy link
Contributor

Although for some reason the matplotlib check was not passing on Travis

I suspect this was due to old sphinx being used

@Titan-C Titan-C changed the title [WIP] Encoding problems [MRG] Encoding problems Nov 19, 2017
@Titan-C
Copy link
Member Author

Titan-C commented Nov 19, 2017

This is finally working. I even found some extra problems that could be originated by different encodings, characters in use and the source file config. I have added comments explaining such events. I also have generated a new example using unicode emoji just to stress test this setup.
https://184-26191147-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/alabaster_html/auto_examples/plot_unicode_everywhere.html

@Titan-C
Copy link
Member Author

Titan-C commented Dec 5, 2017

Are there any comments on this? Can I merge?
@pllim @NelleV does this solve your issues #253 & #269 ?

@pllim
Copy link
Contributor

pllim commented Dec 5, 2017

@Titan-C , I don't have time to test this right now but I can give it a try in the future when a new release with this fix is in. Thanks!

@lesteve
Copy link
Member

lesteve commented Dec 5, 2017

@Titan-C I am looking at this, please don't pull the trigger yet ;-) !

Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

except SyntaxError:
return SYNTAX_ERROR_DOCSTRING, content.decode('utf-8'), 1
return SyntaxError, content.decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit kludgy to return an exception class rather than a node. Why don't you SyntaxError through and catch it in get_docstring_and_rest and have an except clause at the parse_file_source call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree is not the nicest way. I ended up using this, because I need this function in few more places and I rely those places always having access to the decoded content and to know if the parsing was successful or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that but I would rather let the SyntaxError go through and catch it where you call get_docstring_and_rest. Arguably you could have a SyntaxError somewhere else in your code but I don't think this is that bad. If we care about that we could catch the SyntaxError in parse_source_file and raise a custom exception e.g. ParsingError (or a better name) then catch ParsingError where parse_source_file` is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the boundaries of my knowledge. But if I raise the exception how do I get access to the returned value of content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case I am missing and the exception approach does not work, can you not return None instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. I can return None

'e.HelloWorld': {'name': 'HelloWorld', 'module': 'd', 'module_short': 'd'}}

import locale
print(locale.getlocale())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like debugging print statements that you forgot to remove.

import locale
print(locale.getlocale())

with tempfile.NamedTemporaryFile('wb', delete=False) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pytest tmpdir fixture is a lot more neat for this kind of things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did this of using the fixture, but all tests jobs in travis (except 1) failed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any info on this. It's recent commit f2f2d04

@@ -19,30 +19,57 @@
"""


def get_docstring_and_rest(filename):
"""Separate `filename` content between docstring and the rest
def parse_file_source(filename):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe parse_python_file or parse_source_file?



def get_docstring_and_rest(filename):
"""Separate `filename` content between docstring and the rest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use double backticks if you want this to be fixed-width in the HTML reference documentation. Same thing for a few single backticks that you use in a few different places.

def test_get_docstring_and_rest():

docstring, rest, lineno = sg.get_docstring_and_rest(
'sphinx_gallery/tests/unicode.sample')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write the content into a temporary file instead using tmpdir? This way the test is a bit easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. If you check on the previous file there is test_identify_names2 there I indeed write to file. And for everything to work correctly in py2 & py3 I write it as bytes. But in this case and the function test_identify_names I really want to read a file from disk, the file is identical. But is the 2 places I need to be sure the file read from disk is correctly parsed.

Copy link
Member

@lesteve lesteve Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this does work:

content = b'\n'.join(
    [b'',
     b'# -*- coding: utf-8 -*-',
     b"'''",
     b'\xc3\x9anicode in header',
     b'=================',
     b'',
     b'U\xc3\xb1icode in description',
     b"'''",
     b'',
     b'# Code source: \xc3\x93scar N\xc3\xa1jera',
     b'# License: BSD 3 clause',
     b'',
     b'import os',
     b"path = os.path.join('a','b')",
     b'',
     b"a = 'hei\xc3\x9f'  # Unicode string",
     b'',
     b'import sphinx_gallery.back_references as br',
     b'br.identify_names',
     b'',
     b'from sphinx_gallery.back_references import identify_names',
     b'identify_names ',
     b''])

# use tmpdir to get a file here instead
open('/tmp/unicode.sample', 'wb').write(content)

Note you can simplify the file because I don't think you are using identify_names in this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is simpler than having a unicode heavy python file on disk? Also, I use this file twice, one to test that despite the unicode I get_docstring_and_rest and one to test that despite the unicode I identify_names of the used functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is simpler than having a unicode heavy python file on disk?

Yes, this is a lot more clear because you see exactly what you are writing to the file. Otherwise how do we know how you generated your file. Also test files like this don't get included in the installed package which mean that you can not run the tests on an installed package to make sure that they run. For example this is a problem we already have for sphinx_gallery/tests/reference_parse.txt.

If you use the file content in two tests you can put that in a helper function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then. I implemented a pytest fixture to generate this sample file

@Titan-C
Copy link
Member Author

Titan-C commented Dec 14, 2017

This is again ready for review.
Travis on ubuntu fails because the system version of six is too old and pytest coverage refuses to load. But this is a particular situation because
python setup.py test passes as it installs for the test a resent version of six
pytest sphinx_gallery fails because it uses the old system version.
Is everyone fine if I pip install a more resent version of six in this test environment? Other ideas?

@lesteve
Copy link
Member

lesteve commented Dec 15, 2017

I saw that a few days ago (basically we have a failure on master at the moment because of this). I just opened #325 to fix this.

I guess (I did not check in details) that this is one of the reason that @choldgraf had troubles in #244. Installing sphinx 1.5 probably pulls in a more recent version of six ...

@lesteve
Copy link
Member

lesteve commented Dec 15, 2017

I added an entry in CHANGES.rst, @Titan-C you probably understand the problem more than me, so feel free to improve/fix it!

Other than this, I think this is ready to be merged, great work on a nasty long-standing issue!

@lesteve
Copy link
Member

lesteve commented Dec 18, 2017

Merging this one, thanks a lot @Titan-C!

@ImportanceOfBeingErnest

When will a version containing this fix be released?
(Building the matplotlib docs on windows currently blocks on this.)

@choldgraf
Copy link
Contributor

Relatively soon, I believe. We are spot-checking the remaining issues before a release. @lesteve just opened a PR for one and I'll plan to hit the others later this week or next. As always, PRs are welcome ;-)

@jni
Copy link

jni commented Apr 28, 2018

I just want to note that this PR also fixed an unclosed file warning when building docs:

/home/travis/venv/lib/python3.4/site-packages/sphinx_gallery/backreferences.py:144:
ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/travis/build/scikit-image/scikit-image/doc/source/auto_examples/color_exposure/plot_ihc_color_separation.py' mode='r' encoding='UTF-8'>

See e.g. https://travis-ci.org/scikit-image/scikit-image/jobs/372237529#L6134

I searched issues and PRs for ResourceWarning and couldn't find anything, so I presumed it was unfixed and almost raised a new issue. =)

Thanks for the great work!

@jni
Copy link

jni commented Apr 28, 2018

This was the offending line, now properly context-managed elsewhere:

example_code_obj = identify_names(open(example_file).read())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants