Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument using $RUSTC_WRAPPER and cargo as a subcommand #554

Merged
merged 55 commits into from
Aug 4, 2022

Conversation

kkysen
Copy link
Contributor

@kkysen kkysen commented Jul 22, 2022

Fixes #448 and should fix #488.

Overview

This reimplements c2rust-instrument by invoking cargo as a subcommand, overriding $RUSTC_WRAPPER.

Implementation

More specifically, instead of using cargo as a dependency, which is enormous (very long compile times) and causes some compilation issues (#488), we invoke cargo as a subcommand. This also has the advantage of using the correct rustup cargo wrapper so that it resolves correctly. See #448 for a more detailed explanation of why this is a better approach and why other tools like clippy and miri do it this way.

Then we set $RUSTC_WRAPPER to our own binary. We use $RUSTC_WRAPPER to detect if we're being invoked as a cargo or rustc wrapper.

If we're the cargo wrapper, we:

  • parse args using clap derive (Update to clap3 #395)
  • run cargo as a subcommand, passing:
    • the cargo args as is (so all normal cargo invocations all work)
    • our own executable as $RUSTC_WRAPPER
    • the metadata path in $C2RUST_INSTRUMENT_METADATA

If we're the rustc wrapper, we:

  • determine if we should instrument the current rustc command if $CARGO_PRIMARY_PACKAGE is set
  • run RunCompiler::new(...).run() with our MirTransformCallbacks, or TimePassesCallbacks::default() (rustc's default) if not instrumenting
  • then save the metadata to the metadata file

Note that we have to pass the --sysroot to RunCompiler as normally rustc looks this up relative to its executable's location, which works when the rustup rustc wrapper resolves to the toolchain-specific rustc, but since we're using RunCompiler directly and being rustc, we need to explicitly set the sysroot.

Usage Changes

Instead of passing c2rust_analysis_rt as an --extern and running cargo separately to compile it, we just add it as a normal, optional dependency to the crate we're instrumenting. In this way, it's similar to libc, which can be externed as it's in the sysroot,
but the preferred way is to specify it as a normal dependency. Thus, we can also drop the runtime argument that seemed odd to have to specify to c2rust instrument.

The existing c2rust-instrument binary in the c2rust package is now gone (along with its --features dynamic-instrumentation), replaced by the new c2rust-instrument binary in dynamic-instrumentation/src/bin.rs (the crate has also been renamed to c2rust-instrument). Now, c2rust_instrument no longer exposes a library crate, and is purely a binary crate. This is because there's no way to use c2rust-instrument not as a binary, as it invokes itself and thus needs control over main to function correctly.

This does cause a slight change in that c2rust no longer sees the instrument.yaml; only c2rust-instrument does. This could potentially hurt shell autocompletion in the future (not currently implemented), but the drawbacks seems slim. Instead, we don't need to remember --features dynamic-instrumentation as a simple cargo build builds everything. And we've upgraded to a superior clap 3 derive parser, not the deprecated YAML parser.

Fast!

Furthermore, now that we use cargo normally and include c2rust_analysis_rt as a normal dependency, we can delete only the main binary (with cargo clean --package), forcing it to be rebuilt and re-instrumented. Most of the time building used to be in building c2rust_analysis_rt, and on the fast donna, was ~15 seconds for c2rust instrument. Now it takes only ~0.5 seconds, 30x faster.

Also enabling this is using a separate $CARGO_TARGET_DIR: instrument.target instead of the default target. This allows builds not to conflict with normal, non-instrumented builds using cargo build, so we don't need to cargo clean everything every time.

Thus, we now also unconditionally instrument (in pdg.sh). Before, we only did when the c2rust-instrument binary changed, but this missed changes to the instrumented crate. Now that it's fast enough, we just always instrument.

Previously, I was working on getting incremental instrumentation to work, but that proved harder than I thought for the corner cases (which did pop up). With this change, we remove the largest block to fast iteration times, so I think we don't need incremental instrumentation at all until we get to iterating on small edits in large instrumented crates.

Fixes and Improvements

  • c2rust instrument should apply extra cargo args #448: the primary fix. Now we can specify any cargo args, such as build, run, test, --release, etc.
  • libiconv necessary on mac, doesn't link #488: should be fixed as the offending cargo dependency is removed, but needs to be tested by @boyland-pf.
  • We can remove pretty-instrument-err.mjs and get-binary-names-from-cargo-metadata.mjs, the temporary node scripts.
    Thus, we can remove all node dependencies.
  • With enough polishing, we may be able to entirely get rid of pdg.sh.
  • The cargo dependency is dropped, saving a lot on compile time, as it is a huge dependency.
  • The --features dynamic-instrumentation is removed, so a simple cargo build works for everything now.

…s nothing an complicates logic going forward.
…` as a subcommand, overriding `RUSTC_WRAPPER`.

More specifically, instead of using `cargo` as a dependency,
which is enormous and causes some compilation issues (#488),
we invoke `cargo` as a subcommand.
This also has the advantage of using the correct `rustup` `cargo` wrapper so that it resolves correctly.

Then we set `RUSTC_WRAPPER` to our own binary.
We use `RUSTC_WRAPPER` to detect if we're being invoked as a `cargo` or `rustc` wrapper.

If we're the `cargo` wrapper, we parse args using `clap` derive,
determine the crate target info (that we want to instrument)
using `cargo metadata` instead of the massive `cargo` dependency,
and then run `cargo` as a subcommand,
passing the `cargo` args as is (so all normal `cargo` invocations all work),
our own executable as `$RUSTC_WRAPPER`,
and the crate target info and instrumentation args serialized through `$C2RUST_INSTRUMENT_INFO`.

If we're the `rustc` wrapper, we parse the instrument info,
determine if we should instrument the current `rustc` command
if the crate being compiled matches the crate target info passed,
and based on that, either invoke `rustc`
or run `RunCompiler::new(...).run()` with our `MirTransformCallbacks`,
and then save the metadata to the metadata file.

Note that we have to pass the `--sysroot` to `RunCompiler`
as normally `rustc` looks this up relative to its executable's location,
which works when the `rustup` `rustc` wrapper resolves to the toolchain-specific `rustc`,
but since we're using `RunCompiler` directly and being `rustc`,
we need to explicitly set the sysroot.

Instead of passing `c2rust_analysis_rt` as an `--extern` and running `cargo` separately to compile it,
we just add it as a normal dependency to the crate we're instrumenting.
In this way it's similar to `libc`, which can be `extern`ed as it's in the sysroot,
but the preferred way is to specify it as a normal dependency.
…y `cargo` arguments, including `--release` and `--profile`.
…a normal dependency,

we can delete only the main binary, forcing it to be rebuilt and re-instrumented.

Most of the time building used to be in building `c2rust_analysis_rt`,
and on `donna`, was ~15 seconds for `c2rust instrument`.
Now it takes only ~0.5 seconds, 30x faster.

Thus, we now also unconditionally instrument.
Before we only did when the `c2rust-instrument` binary changed,
but this missed changes to the instrumented crate.
Now that it's fast enough, we just always instrument.
@kkysen kkysen requested review from rinon and aneksteind July 22, 2022 09:17
@kkysen kkysen mentioned this pull request Jul 22, 2022
Copy link
Contributor

@rinon rinon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is the right direction. Hopefully we can fix this up to not require serializing and passing the cargo target metadata to the subprocess, that's pretty cumbersome if it can be avoided (see comments).

I do have one major concern with this approach... Previously we would rebuild everything using whatever version of rustc was embedded in the dynamic instrumenter. Now, we rely on the rustc that this wrapper calls to be identical to (or at least closely compatible with) the version we linked against. We should enforce this and warn if there is a mismatch.

analysis/test/Cargo.toml Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/main.rs Outdated Show resolved Hide resolved
@kkysen
Copy link
Contributor Author

kkysen commented Jul 27, 2022

Overall, I think this is the right direction. Hopefully we can fix this up to not require serializing and passing the cargo target metadata to the subprocess, that's pretty cumbersome if it can be avoided (see comments).

If the CARGO_PRIMARY_PACKAGE way works, that'd be great, and we can get rid of all that code (just need to pass the metadata path then).

I do have one major concern with this approach... Previously we would rebuild everything using whatever version of rustc was embedded in the dynamic instrumenter. Now, we rely on the rustc that this wrapper calls to be identical to (or at least closely compatible with) the version we linked against. We should enforce this and warn if there is a mismatch.

That's a good point. One thing to note, though, is that we must depend on the system's rustc sysroot, as we don't ship one. So I think either way, we need rustc versions to match. I could change the code to use RunCompiler with NullCallbacks, instead, but we'll have the rustc version issue whichever way we do things. I think we should also set $RUSTUP_TOOLCHAIN, but still also check versions. Thinking about it more, I think if we use RunCompiler with NullCallbacks, the hashes in dynamic libraries should enforce the version check. The reason I used (the real) rustc as a subcommand instead was so that we get exactly correct handling, as there's a bit more code in the rustc binary than just RunCompiler::new(...).run().

This is also why I think that, in hindsight, the rpath embedding is suboptimal, as it embeds an absolute path in a binary that should be portable. It's not the worst, as we just get the old behavior if that absolute rpath doesn't exist, and if it does exist, it's highly likely it's correct, because it's a toolchain/version-specific path, and the dynamic libraries include their hash. Instead, I think it's better that all c2rust binaries that operate on a Rust crate and rustc private crates (i.e. c2rust-instrument, c2rust-pdg, c2rust-analyze, but not c2rust-transpile), should be run through cargo as a cargo-* plugin.

@kkysen
Copy link
Contributor Author

kkysen commented Jul 30, 2022

I'm now (2660946) detecting build scripts by checking if it is --crate-type bin and does not set $CARGO_BIN_NAME, because $CARGO_BIN_NAME is set for normal binary crates but not for build scripts. I'm not sure if this is the best way to detect build scripts, but I couldn't find a better way.

@kkysen
Copy link
Contributor Author

kkysen commented Jul 30, 2022

However, now I'm getting an error in the actual instrumentation code, which wasn't happening previously, but this is probably from other changes and refactors to the instrumentation code in other PRs, as it's not in the code changes from this PR.

./scripts/pdg.sh ../lighttpd/rust/:

thread 'rustc' panicked at 'operand is not of integer-castable type: op = _4, ty = src::configfile::config_t, ty.kind(): Adt(src::configfile::config_t, [])', dynamic_instrumentation/src/arg.rs:30:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/panicking.rs:143:14
   2: c2rust_instrument::arg::ArgKind::from_type
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/arg.rs:30:13
   3: c2rust_instrument::point::build::InstrumentationBuilder::arg_var
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/point/build.rs:83:42
   4: c2rust_instrument::instrument::<impl rustc_middle::mir::visit::Visitor for c2rust_instrument::point::InstrumentationAdder>::visit_assign::{{closure}}
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/instrument.rs:181:13
   5: c2rust_instrument::instrument::<impl rustc_middle::mir::visit::Visitor for c2rust_instrument::point::InstrumentationAdder>::visit_assign
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/instrument.rs:190:17
   6: rustc_middle::mir::visit::Visitor::super_statement
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:381:25
   7: rustc_middle::mir::visit::Visitor::visit_statement
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:95:17
   8: rustc_middle::mir::visit::Visitor::super_basic_block_data
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:312:21
   9: rustc_middle::mir::visit::Visitor::visit_basic_block_data
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:84:17
  10: rustc_middle::mir::visit::Visitor::super_body
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:261:21
  11: rustc_middle::mir::visit::Visitor::visit_body
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/compiler/rustc_middle/src/mir/visit.rs:78:17
  12: c2rust_instrument::instrument::instrument_body
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/instrument.rs:406:5
  13: c2rust_instrument::instrument::Instrumenter::instrument_fn
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/instrument.rs:61:9
  14: c2rust_instrument::override_queries::{{closure}}
             at /home/kkysen/work/rust/c2rust/dynamic_instrumentation/src/main.rs:334:13
  15: core::ops::function::FnOnce::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/ops/function.rs:227:5
  16: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>
  17: rustc_data_structures::stack::ensure_sufficient_stack::<(&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>::{closure#3}>
  18: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>>
  19: <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::mir_built
  20: rustc_mir_transform::check_unsafety::unsafety_check_result
  21: <rustc_mir_transform::check_unsafety::provide::{closure#0} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_span::def_id::LocalDefId)>>::call_once
  22: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::UnsafetyCheckResult>
  23: rustc_data_structures::stack::ensure_sufficient_stack::<(&rustc_middle::mir::query::UnsafetyCheckResult, rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::UnsafetyCheckResult>::{closure#3}>
  24: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::UnsafetyCheckResult>>
  25: <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::unsafety_check_result
  26: rustc_mir_transform::mir_const
  27: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>
  28: rustc_data_structures::stack::ensure_sufficient_stack::<(&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>::{closure#3}>
  29: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, &rustc_data_structures::steal::Steal<rustc_middle::mir::Body>>>
  30: <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::mir_const
  31: rustc_mir_transform::mir_promoted
  32: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, (&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, &rustc_data_structures::steal::Steal<rustc_index::vec::IndexVec<rustc_middle::mir::Promoted, rustc_middle::mir::Body>>)>
  33: rustc_data_structures::stack::ensure_sufficient_stack::<((&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, &rustc_data_structures::steal::Steal<rustc_index::vec::IndexVec<rustc_middle::mir::Promoted, rustc_middle::mir::Body>>), rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, (&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, &rustc_data_structures::steal::Steal<rustc_index::vec::IndexVec<rustc_middle::mir::Promoted, rustc_middle::mir::Body>>)>::{closure#3}>
  34: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::WithOptConstParam<rustc_span::def_id::LocalDefId>, (&rustc_data_structures::steal::Steal<rustc_middle::mir::Body>, &rustc_data_structures::steal::Steal<rustc_index::vec::IndexVec<rustc_middle::mir::Promoted, rustc_middle::mir::Body>>)>>
  35: <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::mir_promoted
  36: <rustc_borrowck::provide::{closure#0} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_span::def_id::LocalDefId)>>::call_once
  37: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::BorrowCheckResult>
  38: rustc_data_structures::stack::ensure_sufficient_stack::<(&rustc_middle::mir::query::BorrowCheckResult, rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::BorrowCheckResult>::{closure#3}>
  39: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<rustc_span::def_id::LocalDefId, &rustc_middle::mir::query::BorrowCheckResult>>
  40: <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::mir_borrowck
  41: <rustc_session::session::Session>::time::<(), rustc_interface::passes::analysis::{closure#2}>
  42: rustc_interface::passes::analysis
  43: <rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::with_task::<rustc_middle::ty::context::TyCtxt, (), core::result::Result<(), rustc_errors::ErrorReported>>
  44: rustc_data_structures::stack::ensure_sufficient_stack::<(core::result::Result<(), rustc_errors::ErrorReported>, rustc_query_system::dep_graph::graph::DepNodeIndex), rustc_query_system::query::plumbing::execute_job<rustc_query_impl::plumbing::QueryCtxt, (), core::result::Result<(), rustc_errors::ErrorReported>>::{closure#3}>
  45: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::plumbing::QueryCtxt, rustc_query_system::query::caches::DefaultCache<(), core::result::Result<(), rustc_errors::ErrorReported>>>
  46: rustc_query_system::query::plumbing::get_query::<rustc_query_impl::queries::analysis, rustc_query_impl::plumbing::QueryCtxt>
  47: <rustc_interface::passes::QueryContext>::enter::<rustc_driver::run_compiler::{closure#1}::{closure#2}::{closure#3}, core::result::Result<(), rustc_errors::ErrorReported>>
  48: rustc_interface::interface::create_compiler_and_run::<core::result::Result<(), rustc_errors::ErrorReported>, rustc_driver::run_compiler::{closure#1}>

This is the code that panics:

pub fn from_type(op: Operand<'tcx>, ty: &ty::Ty<'tcx>) -> Self {
use ArgKind::*;
(if ty.is_unsafe_ptr() {
RawPtr
} else if ty.is_region_ptr() {
Reference
} else if ty.is_integral() {
AddressUsize
} else {
panic!("operand is not of integer-castable type: {:?}", op)
})(op)
}

@aneksteind, @fw-immunant, any thoughts? I'm not too familiar with this part of the code other than refactors.

It seems to be from the TyKind::Adt (an algebraic data-type, a struct in this case) not being handled. We don't handle most TyKinds, though. Why is that? Or should such TyKinds not be passed to ArgKind::from_type in the first place?

hint hint eyre would've been very useful for this kind of bug.

@kkysen kkysen requested a review from fw-immunant August 1, 2022 21:58
@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

The above lighttpd Adt error happens on master as well, so it's definitely not caused by changes from this PR. I think there was a change in a previous PR that landed that @fw-immunant had mentioned might not handle all cases, but I'm not sure where that is.

@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

Filed an issue for it: #570, so it shouldn't block this PR.

Copy link
Contributor

@rinon rinon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has grown somewhat beyond it's original goal, which was just to build via the cargo cli tool instead of the library. I'd like to see this get split into a few different changes. At the very least, move the script changes into a separate PR?

I put a few more comments, but this is really too large to properly review as it is now.

dynamic_instrumentation/src/arg.rs Outdated Show resolved Hide resolved
dynamic_instrumentation/src/lib.rs Show resolved Hide resolved
c2rust/src/main.rs Show resolved Hide resolved
@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

I can move the script deletions to a separate PR, but the pdg.sh changes are in sync with the changes to the other code such that cargo test should always work.

Comment on lines -126 to -223
pub fn instrument(
metadata_file_path: &Path,
rt_path: &Path,
_args: &[String],
) -> anyhow::Result<()> {
let config = Config::default().unwrap();
config.shell().set_verbosity(Verbosity::Quiet);
let mode = CompileMode::Build;
let compile_opts = CompileOptions::new(&config, mode).unwrap();

let manifest_path = find_root_manifest_for_wd(config.cwd()).unwrap();
let ws = Workspace::new(&manifest_path, &config).unwrap();

let rt_manifest_path = find_root_manifest_for_wd(rt_path).unwrap();
let mut rt_ws = Workspace::new(&rt_manifest_path, &config).unwrap();
rt_ws.set_target_dir(ws.target_dir());

let exec = Arc::new(InstrumentationExecutor {
default: DefaultExecutor,
target_pkg: ws.current().unwrap().package_id(),
rt_crate_path: Mutex::new(String::new()),
building_rt: AtomicBool::new(true),
});
let exec_dyn: Arc<dyn Executor> = exec.clone();

let cwd = env::current_dir().unwrap();

env::set_current_dir(rt_ws.root()).unwrap();
ops::compile_with_exec(&rt_ws, &compile_opts, &exec_dyn)?;

exec.building_rt.store(false, Ordering::Relaxed);
env::set_current_dir(cwd).unwrap();
ops::compile_with_exec(&ws, &compile_opts, &exec_dyn)?;

INSTRUMENTER.finalize(metadata_file_path)
}
struct InstrumentationExecutor {
default: DefaultExecutor,
target_pkg: PackageId,
rt_crate_path: Mutex<String>,
building_rt: AtomicBool,
}

impl Executor for InstrumentationExecutor {
fn init(&self, cx: &Context<'_, '_>, unit: &Unit) {
if self.building_rt.load(Ordering::Relaxed) && cx.is_primary_package(unit) {
*self.rt_crate_path.lock().unwrap() = cx.outputs(unit).unwrap()[0]
.path
.to_str()
.unwrap()
.to_owned();
}
self.default.init(cx, unit);
}

fn exec(
&self,
cmd: &ProcessBuilder,
id: PackageId,
target: &Target,
_mode: CompileMode,
_on_stdout_line: &mut dyn FnMut(&str) -> CargoResult<()>,
_on_stderr_line: &mut dyn FnMut(&str) -> CargoResult<()>,
) -> CargoResult<()> {
let mut args: Vec<String> = cmd
.get_args()
.iter()
.map(|a| a.to_str().unwrap().to_string())
.collect();
args.insert(0, cmd.get_program().to_str().unwrap().to_string());
// We need to point the rust compiler libraries to the corresponding sysroot
args.push(format!("--sysroot={}", env!("RUST_SYSROOT")));
for (var, val) in cmd.get_envs() {
env::set_var(var, val.as_ref().unwrap_or(&OsString::new()));
}
if id == self.target_pkg && !target.for_host() {
args.push("--extern".to_string());
args.push(format!(
"c2rust_analysis_rt={}",
self.rt_crate_path.lock().unwrap()
));
let mut callbacks = MirTransformCallbacks;
// TODO: Capture stdout and pass it back to cargo
rustc_driver::RunCompiler::new(&args, &mut callbacks)
.run()
.map_err(|_| anyhow!("Compilation failed"))
} else {
let mut callbacks = NullCallbacks;
rustc_driver::RunCompiler::new(&args, &mut callbacks)
.run()
.map_err(|_| anyhow!("Compilation failed"))
}
}

fn force_rebuild(&self, unit: &Unit) -> bool {
self.building_rt.load(Ordering::Relaxed) || self.default.force_rebuild(unit)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was deleted as it is the cargo-as-a-library code that was deleted. The rest of lib.rs was moved to the end of main.rs as-is, with only imports merged.

@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

This PR has grown somewhat beyond it's original goal, which was just to build via the cargo cli tool instead of the library. I'd like to see this get split into a few different changes. At the very least, move the script changes into a separate PR?

I put a few more comments, but this is really too large to properly review as it is now.

Is it really too large of a change to review? There are ~160 new lines of code in main.rs, ~100 lines of documentation in main.rs, a few tiny additions in other files, and then a lot of code that was able to be deleted now. I don't see a good way of breaking it up, and breaking it up by putting deleted files in another PR doesn't seem very helpful, as those are just dead code and (I don't think) require significant cognitive complexity to review.

It's unfortunate that the lib.rs to main.rs move with some changes is hard to see in the diff, but I just went through and tried to highlight the differences with comments on the code. NullCallbacks and the cargo-as-a-library code was all deleted, and then the rest was moved as-is to main.rs.

Significant development-time savings are waiting on this PR with its 30-fold incremental instrumentation performance improvements, as well as the changes I'm trying to make off of this, which are hard to do while this one still changes. Splitting this PR up into more is both difficult as I'm not sure how to do so effectively, might introduce bugs that I've already fixed, and make everything waiting for this PR take much longer.

There's been a lot of development on this PR, so there are a lot of commits, but a lot of those went towards simplifying things and fixing bugs. For example, the $CARGO_PRIMARY_PACKAGE change allowed a lot of code to be removed. Again, it's very unfortunate that the lib.rs to main.rs move couldn't be properly modeled by git, but there aren't actually so many additions in this PR, just a lot of dead code removal.

@fw-immunant
Copy link
Contributor

W/r/t the lib.rs -> main.rs move, is there a way to put that code in a separate (non-lib.rs) file with pub(crate) visibility to simplify the diff?

@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

Oh, that's a good idea. Let me see if that works.

@kkysen kkysen force-pushed the kkysen/instrument-rustc-wrapper branch from fe8408f to 5abd6d6 Compare August 2, 2022 23:12
@kkysen
Copy link
Contributor Author

kkysen commented Aug 2, 2022

@fw-immunant, I tried moving that previously lib.rs code to callbacks.rs, but git still doesn't recognize it as a moved file. Do you know how to get git to recognize that after the fact? (I could create callbacks.rs on master with a similar refactor, but I'm not sure if I should do that.)

@fw-immunant
Copy link
Contributor

@fw-immunant, I tried moving that previously lib.rs code to callbacks.rs, but git still doesn't recognize it as a moved file. Do you know how to get git to recognize that after the fact? (I could create callbacks.rs on master with a similar refactor, but I'm not sure if I should do that.)

git only detects moves/renames within a single commit, so the commits moving code from lib.rs into main.rs and moving back out from main.rs into callbacks.rs need to be squashed together. I tried doing this with

git show 'HEAD^{/Removed `InstrumentationExecutor::default: DefaultExecutor`}'`

but this change to the rebase plan

pick 676dee2d Deleted `c2rust-dynamic-instrumentation`'s `lib.rs` and moved it all to `bin.rs`, b/c it should only have a CLI interface.
squash d99b9a8d Move the previous `lib.rs` code in `main.rs` to `callbacks.rs` for a (hopefully) cleaner diff.

ran into a bunch of conflicts, presumably because of the merge commits between the two.

I don't really know how to work with interactive rebase in the presence of various merge commits.

@kkysen
Copy link
Contributor Author

kkysen commented Aug 3, 2022

@fw-immunant, I think it's too hard to rebase that far back now, as doing so would require a lot of manual merges. @rinon approved now, so I think I'll just merge this as is, and then fix up the c2rust subcommand logic so c2rust instrument works the same again (as well as the other subcommands).

@kkysen kkysen merged commit d03d1c7 into master Aug 4, 2022
@kkysen kkysen deleted the kkysen/instrument-rustc-wrapper branch August 12, 2022 17:29
kkysen added a commit that referenced this pull request Aug 25, 2022
…rust` just forward args to discovered subcommands

This changes `c2rust` to be a simpler wrapper around the other `c2rust-*` subcommand.

Instead of `c2rust` having to know about all the subcommands and their arguments upfront, `c2rust $subcommand $args` just runs `c2rust-$subcommand $args`.  This allows `c2rust instrument` to work as before (before #554), while also enabling `c2rust analyze` and `c2rust pdg` in the same way.  The `clap` help messages are still preserved for the most part, except for the short help messages for the subcommands.  Otherwise, `c2rust --help` works as before (while also suggesting the new subcommands), and `c2rust $subcommand --help` works by running `c2rust-$subcommand --help` (instead of `clap` intercepting the `--help`).

The way this is implemented is, first the `c2rust` binary's directory is searched for executables named `c2rust-*` to discover subcommands.  This is combined with the simple list of known subcommands (`["transpile", "instrument", "pdg", "analyze"]`) in case they're not discovered properly and we still want to suggest them.  Then we check if the first argument is one of these subcommands.  If it exists, we invoke it.  If it doesn't exist, but is known, we suggest building it, and it doesn't exist and isn't known (or there was no subcommand given), then we run the `clap` parser and let it handle arg parsing and nice error/help messages.  The reason we don't have everything go through `clap` is that I couldn't figure out a way to have `clap` just forward all arguments, even ones like `--metadata` with hyphens (`Arg::allow_hyphen_values` didn't work), without requiring a leading `--` argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libiconv necessary on mac, doesn't link c2rust instrument should apply extra cargo args
4 participants