Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmm: imporve observability to vmm-task & vmm-sandboxer #146

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

Ziy1-Tan
Copy link
Contributor

@Ziy1-Tan Ziy1-Tan commented Aug 4, 2024

closes #5

async fn append_container(&mut self, id: &str, options: ContainerOption) -> Result<()> {

  • Visualize the results:

image

Main Changes

  • : Replace env_logger with crate tracing;
  • : Add #[instrument] for Sandbox and Container API;
  • : Consume tracing data with the opentelemetry backend (e.g. Jaeger).

Steps

  1. Enable tracing feature:
[sandbox]
enable_tracing = true
[hypervisor]
[hypervisor.task]
enable_tracing = true
[hypervisor.virtiofsd]
  1. Run the jaeger server:
docker run --rm --name jaeger \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one
  1. Run the Kuasar: https://github.com/kuasar-io/kuasar#quick-start

  2. Check the result: visit the web page http://localhost:16686/

Instrumentation Scope

The following API will be instrumented:

  1. Sandbox API
pub trait Sandboxer {
    type Sandbox: Sandbox + Send + Sync;
    async fn create(&self, id: &str, s: SandboxOption) -> Result<()>;
    async fn start(&self, id: &str) -> Result<()>;
    async fn update(&self, id: &str, s: SandboxData) -> Result<()>;
    async fn sandbox(&self, id: &str) -> Result<Arc<Mutex<Self::Sandbox>>>;
    async fn stop(&self, id: &str, force: bool) -> Result<()>;
    async fn delete(&self, id: &str) -> Result<()>;
}

#[async_trait]
pub trait Sandbox: Sync + Send {
    type Container: Container + Send + Sync;
    fn status(&self) -> Result<SandboxStatus>;
    async fn ping(&self) -> Result<()>;
    async fn container(&self, id: &str) -> Result<&Self::Container>;
    async fn append_container(&mut self, id: &str, option: ContainerOption) -> Result<()>;
    async fn update_container(&mut self, id: &str, option: ContainerOption) -> Result<()>;
    async fn remove_container(&mut self, id: &str) -> Result<()>;
    async fn exit_signal(&self) -> Result<Arc<ExitSignal>>;
    fn get_data(&self) -> Result<SandboxData>;
}
  1. Task API
#[async_trait]
pub trait ContainerFactory<C> {
    async fn create(&self, ns: &str, req: &CreateTaskRequest) -> Result<C>;
    async fn cleanup(&self, ns: &str, c: &C) -> Result<()>;
}

#[async_trait]
pub trait ProcessFactory<E> {
    async fn create(&self, req: &ExecProcessRequest) -> Result<E>;
}

#[async_trait]
pub trait ProcessLifecycle<P: Process> {
    async fn start(&self, p: &mut P) -> crate::Result<()>;
    async fn kill(&self, p: &mut P, signal: u32, all: bool) -> crate::Result<()>;
    async fn delete(&self, p: &mut P) -> crate::Result<()>;
    async fn update(&self, p: &mut P, resources: &LinuxResources) -> crate::Result<()>;
    async fn stats(&self, p: &P) -> crate::Result<Metrics>;
    async fn ps(&self, p: &P) -> crate::Result<Vec<ProcessInfo>>;
}

@Ziy1-Tan Ziy1-Tan requested a review from a team as a code owner August 4, 2024 09:28
@Ziy1-Tan Ziy1-Tan marked this pull request as draft August 4, 2024 09:29
@Ziy1-Tan Ziy1-Tan changed the title vmm: Add observability to vmm-task & vmm-sandboxer vmm: imporve observability to vmm-task & vmm-sandboxer Aug 4, 2024
@Ziy1-Tan Ziy1-Tan marked this pull request as ready for review September 8, 2024 13:35
@Ziy1-Tan Ziy1-Tan force-pushed the feat-trace branch 2 times, most recently from c7bb961 to 7171a71 Compare September 16, 2024 07:02
@Ziy1-Tan Ziy1-Tan requested a review from a team as a code owner September 16, 2024 07:02
@Ziy1-Tan
Copy link
Contributor Author

cc @Burning1020

@Ziy1-Tan Ziy1-Tan force-pushed the feat-trace branch 5 times, most recently from 5a8ef7b to ac3d462 Compare September 19, 2024 15:52
Copy link
Member

@Burning1020 Burning1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! /cc @abel-von @flyflypeng
Could you add a markdown to give more detail usages and results?
Another question, after switching from log to tracing, what changes occurred in the format and amount of log output? Was there a comparison?

Comment on lines 41 to 54
let env_filter = init_logger_filter(&args.log_level.unwrap_or(config.sandbox.log_level()))
.expect("failed to init logger filter");

let mut layers = vec![tracing_subscriber::fmt::layer().boxed()];
if config.sandbox.enable_tracing {
let tracer = init_otlp_tracer("kuasar-vmm-sandboxer-clh-tracing-service")
.expect("failed to init otlp tracer");
layers.push(tracing_opentelemetry::layer().with_tracer(tracer).boxed());
}

let subscriber = Registry::default().with(env_filter).with(layers);
tracing::subscriber::set_global_default(subscriber).expect("unable to set global subscriber");

let root_span = info_span!("kuasar-vmm-sandboxer-clh-root").entered();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extra L44-54 in a common function and give sandboxers different service name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Maybe service name could be also configured by toml else use it default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s better if the name isn’t configurable.

vmm/sandbox/src/bin/cloud_hypervisor/main.rs Outdated Show resolved Hide resolved
vmm/task/Cargo.toml Show resolved Hide resolved
vmm/task/src/main.rs Outdated Show resolved Hide resolved
@Ziy1-Tan
Copy link
Contributor Author

Ziy1-Tan commented Sep 21, 2024

Good job! /cc @abel-von @flyflypeng Could you add a markdown to give more detail usages and results? Another question, after switching from log to tracing, what changes occurred in the format and amount of log output? Was there a comparison?

There are slight differences in the log format (brackets & colons):

env_logger::Builder::from_default_env()
	.format_timestamp_micros()
	.init();
log::error!("hello world from env_logger");  // [2024-09-21T04:02:26.007041Z ERROR log_test] Hello from env_logger
tracing::error!("hello world from tracing"); // 2024-09-21T04:02:05.022320Z ERROR log_test: Hello from tracing

tracer::setup_tracing(
&args.log_level.unwrap_or(config.sandbox.log_level()),
enable_tracing,
"kuasar-vmm-sandboxer-clh-otlp-service",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use kuasar-vmm-sandboxer-clh, as "oltp" is not the key info

@@ -47,4 +52,11 @@ impl Hooks<StratoVirtVM> for StratoVirtHooks {
sandbox.sync_clock().await;
Ok(())
}

async fn post_stop(&self, _sandbox: &mut KuasarSandbox<StratoVirtVM>) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post_stop was called every time before the sandbox stopped, so it's not appropriate call it here.

To make it easy, how about not positive stopping tracing, stopping it with the whole process exits?

Comment on lines 235 to 237
if config.enable_tracing {
tracer::shutdown_tracing();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task process also stopped tracing when it exited.

@Ziy1-Tan Ziy1-Tan force-pushed the feat-trace branch 2 times, most recently from 1014996 to 78276ac Compare October 19, 2024 16:41
ENABLED.load(Ordering::Relaxed)
}

// 设置 ENABLED 的值
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please translate it to English

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 59 to 69
if enable_tracing {
let tracer = init_otlp_tracer(otlp_service_name)
.map_err(|e| anyhow!("failed to init otlp tracer: {}", e))?;
layers.push(tracing_opentelemetry::layer().with_tracer(tracer).boxed());
} else {
layers.push(
tracing_opentelemetry::layer()
.with_tracer(NoopTracer::new())
.boxed(),
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let tracer = if enable_tracing {
        init_otlp_tracer(otlp_service_name)
            .map_err(|e| anyhow!("failed to init otlp tracer: {}", e))?
    } else {
        NoopTracer::new()
    };
    layers.push(tracing_opentelemetry::layer().with_tracer(tracer).boxed());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing_opentelemetry::layer() will be added only when enable_tracing = true now.

Comment on lines 106 to 117
SIGINT | SIGTERM => {
info!("Received signal {}, stopping tracing and exiting...", sig);
shutdown_tracing();
std::process::exit(0);
}
_ => {
if let Ok(sig) = nix::sys::signal::Signal::try_from(sig) {
debug!("received {}", sig);
} else {
warn!("received invalid signal {}", sig);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only signal SIGUSR1 should be handled, and the others should be kept same as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

let log_level = args.log_level.unwrap_or(config.sandbox.log_level());
let service_name = "kuasar-vmm-sandboxer-clh-service";
tracer::setup_tracing(&log_level, enable_tracing, service_name).unwrap();
tracer::set_enabled(enable_tracing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ENABLED: AtomicBool static variable is the only one to detiminate, so it's better to:

...
tracer::set_enabled(config.sandbox.enable_tracing);
tracer::setup_tracing(&log_level, service_name).unwrap();

and let setup_tracing() read ENABLED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

global::shutdown_tracer_provider();
}

pub async fn handle_signals(log_level: &str, otlp_service_name: &str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_signal() is used to handle all signals of the current process, so this function should not in the tracer.rs file(e.g. signal.rs), it is a more common funtion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

vmm/task/src/main.rs Show resolved Hide resolved
Copy link
Member

@Burning1020 Burning1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

Copy link
Member

@Burning1020 Burning1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Burning1020
Copy link
Member

Please squash your commits. @Ziy1-Tan

Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>
@Burning1020 Burning1020 merged commit 6504897 into kuasar-io:main Oct 30, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add observability to the project
2 participants