Build: Allow multiple PDFs #10438

benjaoming · 2023-06-14T14:50:09Z

Reviewing notes

This is intended to be a very limited change to our builders, whereby we can enable the Feature flag for a couple of projects and have PDFs build with the new method.

The filename for single-file projects is left intact, which should mean that we can switch on the feature for a couple of live-projects as well, and the generated PDFs should be served through the current URLs and APIs.

Would be great with some after-thoughts on using the ImportedFile model.

TODO

Fixes #10424
Fixes #2045

APIs will be added in a separate PR.

Other aspects covered

File extensions are restricted for pdf, htmlzip and epub.

…ature/multiple-pdfs

… created

benjaoming · 2023-06-26T15:43:57Z

Will be following up with test case updates and some additional manual testing (both with and without the feature flag).

…ature/multiple-pdfs

…w pattern from feature flag

humitos

This looks good to me. I need to do a deeper review and local QA still, tho.

In the meanwhile, I have some questions to understand what's the status of the PR and what is the final goal:

what are the names of the resulting PDF files when the project has multiple PDFs using our Sphinx builder? (I guess they will be <project-slug>_<counter>.pdf. However, I'd like the user to be able to define this by using the latex_documents Sphinx config)
does this PR add support for multiple HTMLZip and ePUB as well? (I guess it doesn't)
what are the resulting URLs when there are multiple PDFs generated?
what's the API response for a particular version? (see https://docs.readthedocs.io/en/stable/api/v3.html#get--api-v3-projects-(string-project_slug)-versions-(string-version_slug)-)

I think these questions are important since they will tell us how we are going to move forward with this PR. These decisions will make the future integration of the backend easier with the front end.

Also, if projects generating multiple PDF files are not told those cannot be easily accessed by the users, it looks broken to me and it's something I'd prefer to not expose to users yet.

humitos · 2023-07-18T14:49:25Z

readthedocs/doc_builder/backends/sphinx.py

+        # Isolate temporary files in the _readthedocs/ folder
+        # Used for new feature ENABLE_MULTIPLE_PDFS
+        self.absolute_host_tmp_root = os.path.join(
+            self.project.checkout_path(self.version.slug),
+            "_readthedocs/tmp",
+        )


If we need a temporary directory we should mktemp --directory as we are doing in other parts of this code. I prefer to keep _readthedocs/ clean since it's a directory where we output files we are going to serve.

humitos · 2023-07-18T14:53:26Z

readthedocs/builds/storage.py

@@ -132,13 +132,13 @@ def _check_suspicious_path(self, path):
    def _rclone(self):
        raise NotImplementedError

-    def rclone_sync_directory(self, source, destination):
+    def rclone_sync_directory(self, source, destination, **sync_kwargs):


Why don't we pass filter_extensions here directly instead of a dictionary without knowing exactly what it has inside?

humitos · 2023-07-18T14:56:36Z

readthedocs/doc_builder/backends/sphinx.py

+
+        This method mostly exists so we have a pattern that is test-friend (can be mocked).
+        """
+        tex_files = glob(os.path.join(self.absolute_host_output_dir, f"*.{extension}"))


This line could be combined with _get_epub_files_generated() and make it generic.

humitos · 2023-07-18T14:57:11Z

readthedocs/doc_builder/backends/sphinx.py

+        """
+        tex_files = glob(os.path.join(self.absolute_host_output_dir, f"*.{extension}"))
+        if not tex_files:
+            raise BuildUserError("No *.{extension} files were found.")


The build error string should be an attribute like BuildUserError.NO_ARTIFACT_FILES_FOUND or similar.

humitos · 2023-07-18T14:58:05Z

readthedocs/doc_builder/backends/sphinx.py


        # Run LaTeX -> PDF conversions
        success = self._build_latexmk(self.project_path)

-        self._post_build()
+        if self.project.has_feature(Feature.ENABLE_MULTIPLE_PDFS):
+            self._post_build_multiple()


Suggested change

self._post_build_multiple()

self._post_build_multiple_pdf()

humitos · 2023-07-18T15:04:26Z

readthedocs/doc_builder/backends/sphinx.py

+            # There is only 1 PDF file. We will call it project_slug.pdf
+            # This is the old behavior.
+            if len(pdf_files) == 1:
+                os.rename(
+                    os.path.join(self.absolute_host_output_dir, pdf_files[0]),
+                    os.path.join(
+                        self.absolute_host_output_dir, f"{self.project.slug}.pdf"
+                    ),
+                )


I'm not sure to understand why we would like to keep "the old behavior here". I think we will always want "the new behavior" for projects using this feature. However, I'm not sure to understand what would be the filename of the resulting PDF in the new behavior --but it should be the name the user has defined in the docs/conf.py (see my other comment about -jobname).

humitos · 2023-07-18T15:09:11Z

readthedocs/doc_builder/backends/sphinx.py

+        # We cannot use '*' in commands sent to the host machine, the asterisk gets escaped.
+        # So we opt for iterating from outside the container
+        pdf_file_names = []
+        for fname in pdf_files:
+            shutil.move(
+                fname,
+                os.path.join(self.absolute_host_tmp_root, os.path.basename(fname)),
+            )
+            pdf_file_names.append(os.path.basename(fname))


We could probably use self.run(..., escape=False) or similar if we want to avoid this.

humitos · 2023-07-18T15:13:58Z

readthedocs/projects/constants.py

+# Map media types to their know extensions.
+# This is used for validating and uploading artifacts.
+MEDIA_TYPES_EXTENSIONS = {
+    MEDIA_TYPE_PDF: ("pdf",),
+    MEDIA_TYPE_EPUB: ("epub",),
+    MEDIA_TYPE_HTMLZIP: ("zip",),
+    MEDIA_TYPE_JSON: ("json",),
+}


Are we using a tuple in the value here because we want to have multiple extensions per each media type?

humitos · 2023-07-18T15:20:28Z

readthedocs/projects/tasks/search.py

+@app.task(queue="web")
+def sync_downloadable_artifacts(
+    version_pk, commit, build, artifacts_found_for_download
+):
+    """
+    Create ImportedFile objects for downloadable files.


I'm not sure why this task is part of the search tasks. I understand it's not related with search itself, but more with the build process, in my opinion. So, I would probably move it to projects.tasks.utils or similar.

humitos · 2023-07-18T15:22:48Z

readthedocs/projects/tasks/search.py

+            ImportedFile.objects.create(
+                name=name,
+                project=version.project,
+                version=version,
+                path=fpath,
+                commit=commit,
+                build=build,
+                ignore=True,
+            )


We are only "creating new objects here", but don't we need to delete the old ImportedFile for this particular version. Otherwise, wouldn't be exposing "old filenames" to the user?

benjaoming added 19 commits June 14, 2023 16:46

Adds a feature flag and some initial ability to handle multiple PDFs

3f2bd1f

WIP: Only copy top-level files

f7c3b58

Fix the test

8efd480

WIP

abea0cb

Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…

a39256d

…ature/multiple-pdfs

WIP: Task and ImportedFile creation

032c7fd

WIP: Need to hide new behavior behind Feature flag

b89dd23

Hide new behaviors behind feature flag

653a0ce

Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…

64ada46

…ature/multiple-pdfs

Melt together two test scenarios

216592e

Adds tests to model/querysets

600b948

Move constant to where other MEDIA_TYPES are handled

f452630

Reuse DOWNLOADABLE_MEDIA_TYPES, add comment about removing pdf_file_name

9bfe6f3

lint

65e714b

WIP

99ffb3b

Finalize test case for testing that multiple ImportedFile objects are…

7b11369

… created

Update failure test case

8fadd42

Add some more test comments

d91a455

comment update

c610326

benjaoming marked this pull request as ready for review June 26, 2023 15:43

benjaoming requested a review from a team as a code owner June 26, 2023 15:43

benjaoming requested a review from stsewd June 26, 2023 15:43

auto-assign bot assigned benjaoming Jun 26, 2023

benjaoming changed the title ~~Build: Allow multiple PDFs (WIP)~~ Build: Allow multiple PDFs Jun 26, 2023

benjaoming requested a review from humitos June 26, 2023 15:43

benjaoming mentioned this pull request Jun 27, 2023

Build and API: Backend changes to handle multiple PDFs #10424

Open

5 tasks

benjaoming added 3 commits June 28, 2023 15:31

Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…

d13eaa2

…ature/multiple-pdfs

Mock revoking the API key?

01e3c2a

Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…

c5622b2

…ature/multiple-pdfs

benjaoming added 4 commits June 28, 2023 15:57

Merge branch 'mock-api-revoke' into feature/multiple-pdfs

cc2231d

Update tests after we stopped mocking 'glob'

c1cc8eb

lint

c49bbc1

Add test case test_build_commands_executed_multiple_artifacts with ne…

0075280

…w pattern from feature flag

agjohnson linked an issue Jul 5, 2023 that may be closed by this pull request

Multiple pdf files for single project #2045

Open

humitos unassigned benjaoming Jul 11, 2023

humitos reviewed Jul 18, 2023

View reviewed changes

This was referenced Aug 28, 2023

use alpaka.png instead alpaka.pdf as logo in the sphix doc latex build alpaka-group/alpaka#2098

Merged

don't use pdf images in the sphinx doc pdf documentation alpaka-group/alpaka#2100

Open

New readthedocs project solves pdf build Problem #10661

Closed

humitos mentioned this pull request Oct 26, 2023

Build: don't care about the filename for the offline formats #10873

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: Allow multiple PDFs #10438

Build: Allow multiple PDFs #10438

benjaoming commented Jun 14, 2023 •

edited by humitos

Loading

benjaoming commented Jun 26, 2023

humitos left a comment •

edited

Loading

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

humitos Jul 18, 2023

Build: Allow multiple PDFs #10438

Are you sure you want to change the base?

Build: Allow multiple PDFs #10438

Conversation

benjaoming commented Jun 14, 2023 • edited by humitos Loading

Reviewing notes

TODO

Other aspects covered

benjaoming commented Jun 26, 2023

humitos left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjaoming commented Jun 14, 2023 •

edited by humitos

Loading

humitos left a comment •

edited

Loading