Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers: have progress streamed instead of via mounted volumes (and deprecate doc_to_pixels_qubes_wrapper.py) #443

Closed
4 tasks
Tracked by #412
deeplow opened this issue Jun 13, 2023 · 1 comment · Fixed by #627
Closed
4 tasks
Tracked by #412
Labels
container development Development-focused changes stretch goal
Milestone

Comments

@deeplow
Copy link
Contributor

deeplow commented Jun 13, 2023

The current approach to sending data back from the "doc-to-pixels" container is via mounted volumes. This was practical because we had essentially two streams to send. However, the Qubes implementation is simpler and has less attacker-controlled data passing, since error messages are not sent. Only the really needed data.

Personally, I want to move all the Qubes wrapper code within doc_to_pixels.py, since we plan to use the stdio methods in containers as well. Not something that important for now, but something to keep in mind. (source)

Tasks

  • change container logic to stream pages as they are converted
  • implement client-side timeouts (see also Qubes: Client-side timeouts #446)
  • deprecate json progress information add way to receive debug information when conversion fails (probably after the conversion fails)
  • deprecate the doc_to_pixels_qubes_wrapper.py since now both containers and Qubes will be streaming pages.
@deeplow deeplow added container development Development-focused changes labels Jun 13, 2023
@deeplow deeplow changed the title Containers: have progress streamed instead of via mounted volumes Containers: have progress streamed instead of via mounted volumes (and deprecate doc_to_pixels_qubes_wrapper.py) Jun 13, 2023
@apyrgio apyrgio added this to the 0.5.1 milestone Jun 14, 2023
@apyrgio apyrgio modified the milestones: 0.5.1, 0.5.0 Aug 23, 2023
@apyrgio apyrgio modified the milestones: 0.5.0, 0.6.0 Aug 28, 2023
apyrgio added a commit that referenced this issue Sep 13, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
apyrgio added a commit that referenced this issue Sep 20, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
apyrgio added a commit that referenced this issue Sep 20, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
deeplow added a commit that referenced this issue Sep 20, 2023
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
deeplow added a commit that referenced this issue Sep 21, 2023
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
deeplow added a commit that referenced this issue Sep 22, 2023
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
deeplow added a commit that referenced this issue Sep 26, 2023
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
deeplow added a commit that referenced this issue Sep 28, 2023
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
@deeplow
Copy link
Contributor Author

deeplow commented Nov 13, 2023

Looks like our files-based approach lead to one more issue: #620

deeplow added a commit that referenced this issue Nov 22, 2023
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
@apyrgio apyrgio modified the milestones: 0.6.0, Bookmarks Nov 22, 2023
deeplow added a commit that referenced this issue Nov 23, 2023
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
deeplow added a commit that referenced this issue Dec 13, 2023
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
deeplow added a commit that referenced this issue Jan 9, 2024
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
deeplow added a commit that referenced this issue Jan 10, 2024
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
apyrgio pushed a commit that referenced this issue Jan 31, 2024
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
deeplow added a commit that referenced this issue Feb 5, 2024
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.

This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
apyrgio pushed a commit that referenced this issue Feb 6, 2024
Merge Qubes and Containers isolation providers into a superclass called
"ProcessBasedIsolationProviders" by streaming pages in containers for
exclusively in first conversion process. The commit is rather large due
to the multiple interdependencies of the code, making it difficult to
split into various commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "get_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
deeplow added a commit that referenced this issue Feb 6, 2024
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.

This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
@deeplow deeplow closed this as completed in 0a09954 Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container development Development-focused changes stretch goal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants