Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use latexmk if Sphinx > 1.6 #5656

Merged
merged 12 commits into from May 16, 2019
Merged

Use latexmk if Sphinx > 1.6 #5656

merged 12 commits into from May 16, 2019

Conversation

@humitos
Copy link
Member

@humitos humitos commented May 2, 2019

  • Checks for the Sphinx version installed in the user virtualenv to decide if latexmk or pdflatex should be called to build the PDF.
  • Removes the USE_PDF_LATEXMK feature flag since it's not needed anymore.

Closes #5655

@humitos
Copy link
Member Author

@humitos humitos commented May 2, 2019

Running the command manually from inside the Docker image I get the output that I want:

docs@f7bc3ebeec08:~$ /bin/sh -c 'cd /home/docs/readthedocs.org/user_builds/test-builds/checkouts/datetime && /home/docs/readthedocs.org/user_builds/test-builds/envs/datetime/bin/python -c "import sphinx;print(sphinx.__version__)"'
1.8.5

I don't understand why it does not work when calling it from Python.

@humitos
Copy link
Member Author

@humitos humitos commented May 2, 2019

I found the issue: our get_wrapped_command() makes the command invalid.

(Pdb++) self.get_wrapped_command()
'/bin/sh -c \'cd /home/humitos/rtfd/code/readthedocs.org/user_builds/test-builds/checkouts/datetime && /home/humitos/rtfd/code/readthedocs.org/user_builds/test-builds/envs/datetime/bin/python -c\\"import\\ sphinx\\;print\\(sphinx.__version__\\)\\"\''

(Pdb++) command = '/bin/sh -c \'cd /home/humitos/rtfd/code/readthedocs.org/user_builds/test-builds/checkouts/datetime && /home/humitos/rtfd/code/readthedocs.org/user_builds/test-builds/envs/datetime/bin/python -c"import sphinx;print(sphinx.__version__)"\''

(Pdb++) exec = client.exec_create(container=self.build_env.container_id,cmd=self.get_wrapped_command(),stderr=True)
(Pdb++) output = client.exec_start(exec_id=exec['Id'], stream=False)
(Pdb++) output
b''

(Pdb++) exec = client.exec_create(container=self.build_env.container_id,cmd=command,stderr=True)
(Pdb++) output = client.exec_start(exec_id=exec['Id'], stream=False)
(Pdb++) output
b'1.8.5\n'
(Pdb++) 

Note that I'm removing all the unneeded \ bars.

@stsewd
Copy link
Member

@stsewd stsewd commented May 2, 2019

We do that to prevent command injection. But if the code is entirely called from our code, we could add an option allow_insecure or something.

@humitos
Copy link
Member Author

@humitos humitos commented May 2, 2019

Yes! I pushed some changes adding a escape_command attribute to the DockerBuildCommand and I was able to make it work.

@humitos humitos force-pushed the humitos/use-latexmk-sphinx-16 branch from e58cb78 to fbdfa96 May 2, 2019
@humitos
Copy link
Member Author

@humitos humitos commented May 2, 2019

Works!

With 1.8.5 calls latexmk.

Screenshot_2019-05-02_16-32-15

With 1.5.2 calls pdflatex.

Screenshot_2019-05-02_16-32-21

@humitos humitos requested a review from May 2, 2019
@humitos humitos force-pushed the humitos/use-latexmk-sphinx-16 branch from bc6ecce to 3de6972 May 6, 2019
Copy link
Member

@stsewd stsewd left a comment

This is not going to work when using outside docker. Not sure how important it is to make it work outside docker anyway... Also, maybe it's easy to make it work in both 🤷‍♂️

"""
self.escape_command = escape_command
Copy link
Member

@stsewd stsewd May 7, 2019

Feels like something that can be moved to the parent class. I think (not sure) that the same behavior for this in local command is shell=True

https://docs.python.org/3.6/library/subprocess.html

On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself...


command = [
self.python_env.venv_bin(filename='python'),
'-c'
Copy link
Member

@stsewd stsewd May 7, 2019

You are missing a comma here

command = [
self.python_env.venv_bin(filename='python'),
'-c'
'"import sphinx; print(sphinx.__version__)"',
Copy link
Member

@stsewd stsewd May 7, 2019

when DOCKER_ENABLED is false, this will break. Because of the double quotes

>>> proc = subprocess.Popen(['python3', '-c', 'import sphinx; print(sphinx.__version__)'], stdou
t=subprocess.PIPE)
>>> proc.communicate()
(b'1.8.5\n', None)
>>> proc = subprocess.Popen(['python3', '-c', '"import sphinx; print(sphinx.__version__)"'], std
out=subprocess.PIPE)
>>> proc.communicate()
(b'', None)

Copy link
Member Author

@humitos humitos May 8, 2019

Thanks for raising this.

I made it work by using shell=True and flattening the command when shell=True using the helper that we had there.

I like how the ended now. On BuildCommand it uses the shell=True attribute and flatten the command to be compatible. On DockerBuildCommand it uses escape_command=False to not escape the command.

It works in both builder backends.

@agjohnson
Copy link
Contributor

@agjohnson agjohnson commented May 7, 2019

I'd vote that we shouldn't be relying on command output in our build process. No-op commands like this are already awkward, doing a text comparison against the command output seems pretty fragile. Is there a reason we shouldn't use normal sh operators to chain? Something like:

python -c 'import sys; import sphinx; sys.exit(0 if sphinx.version_info >= (1, 7) else 1)' \
  && latexmk ... \
  || pdflatex ...

This might be slightly annoying as it probably needs to be further wrapped in a /bin/sh -c '...' just to operate within a popen subprocess.

This does make it more clear why we're executing latexmk or pdflatex to the user, rather than just having a version number in the output, and it works regardless of the builder type.

@humitos
Copy link
Member Author

@humitos humitos commented May 8, 2019

I'd vote that we shouldn't be relying on command output in our build process. No-op commands like this are already awkward, doing a text comparison against the command output seems pretty fragile.

@agjohnson I don't see this as a problem. The output of the command that we are relying here is a command that we control and users can't modify. I don't see any potential problem with it.

Also, we are not comparing text directly, but using packaging module to parse the version properly as an object and then comparing it using its helpers.

Is there a reason we shouldn't use normal sh operators to chain?

Using operators chain has some problems. We need execute ~5 different commands depending on which pdf builder we use, so we would need to add the same check on each of those commands which is repeated code in the end.

This does make it more clear why we're executing latexmk or pdflatex to the user, rather than just having a version number in the output, and it works regardless of the builder type.

Also, I don't want to log this to users in the build output. It's hard to read and confusing to users in my opinion. This should be transparent to users, similar to what happen locally --they don't care about this, Sphinx makes the decisions.

Just to be clear, the Sphinx version is not logged in the build output. That was just a test locally to see what was happening and debug it properly. The command is executed with record=False.

On the other hand, from a developer perspective, I find the code easy to read by having the different chunk of commands wrapped up into different methods.

@humitos humitos requested a review from stsewd May 8, 2019
@agjohnson
Copy link
Contributor

@agjohnson agjohnson commented May 14, 2019

@humitos If it's too much to wrap the commands with a shell and use shell pipes to execute the command we want, I still think string parsing is the wrong approach, as it is more fragile than it needs to be.

A more robust version of your implementation would use the exit code instead of coercing the Sphinx version multiple times. The command would be similar to the one i posted, but we don't use pipes and would instead use the exit code as a boolean for our decision on whether to run latexmk or not. If there are problems running the current PR outside docker, this avoids those issues as well, as we can expect an exit code from either execution strategy.

I agree that in this case, the command is a noop command that is not helpful to users and should be hidden.

@humitos
Copy link
Member Author

@humitos humitos commented May 15, 2019

If there are problems running the current PR outside docker, this avoids those issues as well, as we can expect an exit code from either execution strategy.

The problem is not running the command inside or outside docker. The issue is because the command string is sanitized in a way that breaks depending how that string is used internally to run the command. Although, this problem is fixed using shell=True in one case and escape_command=False in the other.

@humitos
Copy link
Member Author

@humitos humitos commented May 15, 2019

A more robust version of your implementation would use the exit code instead of coercing the Sphinx version multiple times. The command would be similar to the one i posted, but we don't use pipes and would instead use the exit code as a boolean for our decision on whether to run latexmk or not

I updated the PR to use the exit code instead.

@humitos humitos requested a review from agjohnson May 15, 2019
stsewd
stsewd approved these changes May 15, 2019
readthedocs/doc_builder/backends/sphinx.py Outdated Show resolved Hide resolved
# When using ``shell=True`` the command should be flatten
command = self.command
if self.shell:
command = self.get_command()
Copy link
Member

@stsewd stsewd May 15, 2019

Just a note that this would make some problems to people using paths with spaces and reserved bash symbols (not our case). Flatten commands are not escaped by default.

Copy link
Contributor

@agjohnson agjohnson left a comment

This approach seems good. We do need to be very careful about using shell=True, as not escaping correctly and allowing user input in our shell=True command could lead to a remote execution attack. It's probably fine for these kind of applications though.

import zipfile
from glob import glob
from pathlib import Path

from django.conf import settings
from django.template import loader as template_loader
from django.template.loader import render_to_string
from packaging.version import Version
Copy link
Contributor

@agjohnson agjohnson May 15, 2019

Looks unused now.

humitos and others added 2 commits May 16, 2019
Co-Authored-By: Santos Gallegos <santos_g@outlook.com>
@humitos
Copy link
Member Author

@humitos humitos commented May 16, 2019

@agjohnson @stsewd both commented about the same, and both are right. shell=True should not be used by default or exposed to the user at all. It's intended only for internal usage and purposes like this one where we write the command by ourselves and now how it will behave.

I didn't added the shell= attribute in the class, I just made it work when used (flattening the command). Although, I think this is the only place where it's used. I didn't find another one.

@humitos
Copy link
Member Author

@humitos humitos commented May 16, 2019

I'll merge the PR once the tests pass.

@humitos humitos merged commit e69167c into master May 16, 2019
2 checks passed
@delete-merged-branch delete-merged-branch bot deleted the humitos/use-latexmk-sphinx-16 branch May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants