Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVF: more filesystem sandboxing #1373

Merged
merged 22 commits into from
Sep 28, 2023

Conversation

mrcnski
Copy link
Contributor

@mrcnski mrcnski commented Sep 3, 2023

Done in this PR

  • Create a worker directory for each worker.
    • The directory contains files with known names (e.g. "socket", "artifact") so we don't have to pass these paths to the workers.
    • We only pass the worker dir path, once at the start of the worker lifetime.
    • The worker dir is cleared after each job so that malicious jobs can't access unintended data.
    • See "Worker dir" section.
  • The worker dir is sandboxed with pivot_root (a better chroot), if possible.
    • This means the workers shouldn't have access to anything outside their worker dir.
  • Instead of running Landlock once per thread, which is insecure, we run it on the whole process when a worker starts.
    • There is only one exception for the worker dir (mentioned above). This is / if pivot_root is enabled.
    • Jobs can't access any files outside the worker dir and shouldn't have any rights they don't strictly need.
  • We get the statuses of security features once per host startup.
    • This is so we can emit a warning as soon as possible and so workers don't duplicate the warning.
    • In the future we plan for this to be an early error for validators, with an optional flag to bypass the checks.
  • Refactor/simplify some landlock code.
    • We moved landlock enablement from worker-specific code to the common worker setup function.
    • Some compatibility code for Mac turned out to be unneeded and was removed.

For even more background, see the issues and PRs in the "Related" section.

Worker dir

Here is how the filesystem structure has changed.

Before:

+ /<cache_path>/
  - artifact-1
  - artifact-2
  - [...]
+ /tmp/
  - socket-1       (random name) (created by host)
  - socket-2       (random name) (created by host)
  - tmp-artifact-1 (random name) (created by worker) (prepare-only)
  - tmp-artifact-2 (random name) (created by worker) (prepare-only)
  - [...]

Now:

+ /<cache_path>/
  - artifact-1
  - artifact-2
  - [...]
  - worker-dir-1/  (new `/` for worker-1)
    + socket                            (created by host)
    + tmp-artifact                      (created by host) (prepare-only)
    + artifact     (link -> artifact-1) (created by host) (execute-only)
  - worker-dir-2/  (new `/` for worker-2)
    + [...]

Some more notes on the worker dir:

  • It is created when spawning a worker and attached to the IdleWorker struct.
    • As opposed to the WorkerHandle struct, IdleWorker contains data that should be used when starting a job. I know, confusing. :/
  • The host populates the worker dir before a job and clears it right after.
  • Jobs have no access to other artifacts or files from other jobs.
  • The worker dir has some attached RAII to delete it when it's dropped.
  • For execute jobs, we link the artifact into the worker dir.
    • Copying it would be too expensive.
    • Moving it wouldn't work because other workers may need it concomittantly.
    • We use a hard-link instead of a symlink, to avoid possible situations where the workers can follow a symlink to get out of the pivot_root jail.

Considered but not done

Shared memory

Since we are already doing a big refactor, we discussed at length switching the IPC mechanism to shared memory. At the least, switching out the filesystem IPC (all the worker dir stuff) for transferring artifacts between the host and workers.

We decided that there is not a big enough benefit for what would be an even more drastic change. Without filesystem IPC we could totally remove FS access for workers, but with multiple layers of sandboxing already (including Landlock and now pivot_root), the extra security seemed like overkill. Shared memory also comes at a performance cost: we have to write to it and read from it. I'd estimate that to cost 1-2ms, which with execution taking up to 10ms total, would be somewhat of a regression relatively speaking.

Shared memory was also considered in the past here with some more downsides listed. We decided to continue with the file-based architecture, albeit significantly hardened.

Follow-ups

Fork instead of spawning thread for better isolation (see #574)

There's still a potential attack where a lingering malicious job can access the worker dir state for subsequent jobs. With the isolation described in #574 this can be fixed if we apply Landlock with no exceptions in the forked process.

Refactor argument passing (raised #1576)

Right now if you want to add some data that is passed from the host startup to a worker, there is a long chain of calls it has to go through. It should be refactored somehow. For example, maybe we could pass around some super-objects to make adding new fields a one-line change. It would make such changes much easier in the future.

On the worker-side, we often pass around the worker kind, pid, and dir together. Perhaps there could be one WorkerInfo struct.

In general I felt like the PVF code is in dire need of cleaning up. Any other ideas would be highly appreciated.

Remove puppet worker (done in #1449)

Related to the above, it would be awesome to get this issue done:

#583

It would remove some of the code duplication you may notice in this PR.

TODO

  • Versi burn-in

Related

Supersedes paritytech/polkadot#7580
Closes #584
Closes #600

@mrcnski mrcnski added T0-node This PR/Issue is related to the topic “node”. T8-parachains_engineering T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 3, 2023
@mrcnski mrcnski self-assigned this Sep 3, 2023
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no Landlock expert, and I've just left some nits; none of them is a blocker. I still believe it should be burned in on Versi as the change is significant. Overall, it's a great improvement to the PVF security, and I do appreciate that work, that's really cool!

polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/prepare-worker/src/lib.rs Outdated Show resolved Hide resolved
@mrcnski

This comment was marked as outdated.

Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/execute.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/src/host.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PartiallyEnforced => LandlockStatus::PartiallyEnforced,
NotEnforced => LandlockStatus::NotEnforced,
/// Delete all env vars to prevent malicious code from accessing them.
pub fn remove_env_vars(worker_kind: WorkerKind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to do this when spawning the process (e.g. execvpe allows you to pass a custom envp), otherwise this info will most likely still be in memory anyway.

Copy link
Contributor Author

@mrcnski mrcnski Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously tried that here, but when testing on my Mac it inserted some env var. Since we don't care about Macs so I went back to that approach and added an exception for that one env var.

let _ = tokio::fs::remove_file(socket_path).await;
let socket_path = worker_dir::socket(&worker_dir_path);
let stream = UnixStream::connect(&socket_path).await?;
let _ = tokio::fs::remove_file(&socket_path).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above connect fails, we will never remove the socket? Isn't there some RAII primitive available?

Copy link
Contributor Author

@mrcnski mrcnski Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or defer like in Go. :P (One of the few nice things about that language.) There are some crates for defer in Rust, but I don't want to introduce a dep or extra code just for this one case.

Anyway, good point, but I think here it's fine because 1. the socket is always created in the worker dir, which we remove on worker death as well as host startup, and has a random name so it can't be reused, and 2. if we fail to connect, the worker dies and we try to re-create it from scratch, with a new worker dir and socket and everything.

#[cfg(target_os = "linux")]
if security_status.can_enable_landlock {
let landlock_status =
security::landlock::enable_for_worker(worker_kind, &worker_dir_path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we do this above already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, near the beginning of the function? I want to enable landlock after the socket's been removed, so we don't need an exception for it. Any extra exception would increase surface area for attacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this is a good point; why not do it before spawning tokio? (:

You can still remove the socket; use normal synchronous std APIs to create and remove it, then enable landlock, and then you can use tokio's from_std to convert it to a tokio socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. :)

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
bkchr pushed a commit that referenced this pull request Apr 10, 2024
bkchr pushed a commit that referenced this pull request Apr 10, 2024
TomaszWaszczyk pushed a commit to TomaszWaszczyk/polkadot-sdk that referenced this pull request May 27, 2024
TomaszWaszczyk pushed a commit to TomaszWaszczyk/polkadot-sdk that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

PVF worker: apply sandboxing per-process PVF: consider spawning a new process per job
6 participants