Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime/pprof: extend profile configuration to support perf events #53286

Open
erifan opened this issue Jun 8, 2022 · 62 comments
Open

Comments

@erifan
Copy link
Contributor

erifan commented Jun 8, 2022

#42502 Proposes configurable CPU profiling via NewCPUProfile + cpuProfile.Start and the proposal has been accepted. But the proposal does not appear to approve any exported fields, nor does it support any perf event extensions. So I propose to extend this proposal to support configuring perf events.

I propose to add the following exported struct and method to the pprof package.

type PerfProfile {  // We already have struct Profile, name it as PerfProfile
    Event string  // perf event names, such as cycles, instructions
    Rate int  // perf event period or os-timer frequency
}

func (p *PerfProfile) Start(w io.Writer) error {}

It would be better if a corresponding Stop method was exported, but it is not necessary, we can correctly stop the profiling by the cached configuration.

If we want to use PMU profiling in http/pprof and testing packages, then we need to make a few more changes.

For http/pprof, we are not necessary to add a new interface, we can add some parameters to the profiling URL, and then do the corresponding profiling according to the parameters. for example:
go tool pprof http://localhost:6060/debug/pprof/profile?event=cycles&&rate=10000

For the testing package, we need to add two options to the go test command, one is -perfprofile perf.out to specify the output file, and the other is -perfprofileconfig "conf" to specify the configuration. These two options need to exist at the same time to work. The value of -perfprofileconfig must meet the specified format: "{event=EEE rate=RRR}" Example:
go test -perfprofile perf.out -perfprofileconfig="{event=EEE rate=RRR}" -run .

The introduction of perf event helps to improve the precision of profiling, which is also beneficial for PGO in the future. I have made a preliminary implementation, see the series of patches https://go-review.googlesource.com/c/go/+/410796/5, the implementation does not make any changes to the http/pprof and testing packages.

This proposal is related to #42502 and #36821.

@erifan erifan added the Proposal label Jun 8, 2022
@gopherbot gopherbot added this to the Proposal milestone Jun 8, 2022
@prattmic
Copy link
Member

prattmic commented Jun 8, 2022

I have a few minor questions:

  1. What is the output format of these profiles? Still pprof, I assume (rather than perf.data)? In that case, are there special considerations we need to take for particular perf events that may not map nicely to the existing pprof format, or should everything fit nicely?
  2. What is the plan for non-Linux OSes? proposal: runtime/pprof: add PMU-based profiles #36821 (comment) mentions that Darwin and Windows don't have clear PMU interfaces. Is PerfProfile only exported on Linux? Or Start returns an error on non-Linux?

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 8, 2022
@erifan
Copy link
Contributor Author

erifan commented Jun 9, 2022

  1. What is the output format of these profiles? Still pprof, I assume (rather than perf.data)? In that case, are there special considerations we need to take for particular perf events that may not map nicely to the existing pprof format, or should everything fit nicely?

Yes, still pprof. I didn't change the output format, just replace the original signal source (os-timer) with the PMU counter. The signal handler is also basically unchanged. So pprof is fitting well. The only change is to change the original sample type from nanoseconds to count. I'd like to convert count to nanoseconds because sometimes count doesn't seem so intuitive, but there doesn't seem to be a proper way to do this conversion.

  1. What is the plan for non-Linux OSes? proposal: runtime/pprof: add PMU-based profiles #36821 (comment) mentions that Darwin and Windows don't have clear PMU interfaces. Is PerfProfile only exported on Linux? Or Start returns an error on non-Linux?

Yes, my plan is to only support linux, if we find a appropriate method to support PMU on Windows and Darwin in the future, we will add support for them then. It may be appropriate to panic in Start before it is implemented.

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Jun 9, 2022

Would it be possible to measure multiple events simultaneously by creating a PerfProfile for every event of interest? Alternatively, by giving PerfProfile a method like func (*PerfProfile) MeasureEvent(name string, period int) that can be called multiple times, and having all events go in the same profile?

@erifan
Copy link
Contributor Author

erifan commented Jun 10, 2022

Would it be possible to measure multiple events simultaneously by creating a PerfProfile for every event of interest? Alternatively, by giving PerfProfile a method like func (*PerfProfile) MeasureEvent(name string, period int) that can be called multiple times, and having all events go in the same profile?

My prototype implementation doesn't account for multiple events, but it's not hard to support multiple events. I can do it however I'm not a proposal approver. Thanks.

@prattmic
Copy link
Member

prattmic commented Jul 25, 2022

What is the plan for non-Linux OSes?

I did some research today on other OSes today to get a sense of the capabilities.

Windows

The Windows Performance Recorder CLI supports collecting some PMU events. There is an API to control profile collection, though I believe this communicates with a centralized service that actually does profile collection. As far as I can tell, there is no direct kernel interface. Additionally, it is not possible to limit collection to a specific process as far as I can tell.

Unanswered questions:

  1. Is it possible to collect stack traces / auxiliary information with events, or only counts?
  2. What permissions are required to use this interface? (Given the system-wide collection, I imagine elevated privileges are required.)

Open source libraries like PCM use a custom Windows kernel driver to access PMUs.

Visual Studio has a command capable of collecting PMU events for a specific process. It is unclear to me how this works. Does it use the WPR API?

My takeaway is that there isn't really anything feasible for us to programmatically integrate with, unless I've missed something.

Darwin

Xcode Instruments supports collecting counts from PMU events. How it does so is unclear.

PCM requires a custom kernel driver, like on Windows.

The Darwin kernel has a PMC subsystem called kpc. It looks like this is exposed to userspace, but is only usable by one "blessed" process?

I suspect that there is nothing feasible here either, but I am less confident than with Windows.

FreeBSD

FreeBSD has a complete libpmc library for accessing hardware performance counters in counting or sampling mode (includes support for sending SIGPROF). Internally this uses system calls to a kernel module. I haven't looked into this in much detail, but clearly there is something feasible here.

@erifan
Copy link
Contributor Author

erifan commented Jul 26, 2022

Thanks @prattmic for the research, I don't know much about other OSes, I'll look into how to access the PMU programmatically on other OS's. I wonder if you guys @golang/windows can provide some hints for Windows?

Also I wonder if it is really difficult to access PMU on a OS, can we not support it on this OS? Just like per thread OS timer, we also only support it on Linux.

@prattmic
Copy link
Member

prattmic commented Jul 26, 2022

I should have added more context to my previous comment. The purpose of my research was to get a sense of how hard we should try to design a platform-independent API (even if only Linux is supported initially). My initial take-away is that it is probably less important if Windows and Darwin will be impossible to support.

@prattmic
Copy link
Member

prattmic commented Jul 26, 2022

In this proposal, you are proposing a new type PerfProfile. However, your original CL introduced the previously approved (#42502) CPUProfile type and extended it with perf event support.

I actually like that approach better than adding a new top-level perf-specific profile type for a few reasons:

  • Most of the events that you’d use are CPU profiles. The existing StartCPUProfile is a CPU time profile. The standard perf cycles event would be a CPU cycles profile. Similar with instructions, branch mispredictions. Even events like L1D cache misses are attributed to specific instructions, and thus I think could be reasonably considered CPU profiles.
    • The main exceptions are much more obscure events that measure things like DRAM read/write bandwidth. I am not sure these would make much sense to use in a sampling mode anyways.
  • I think one of the most exciting parts of this proposal is the potential to automatically provide more accurate CPU profiles to everyone using StartCPUProfile/CPUProfile/-cpuprofile if the PMU is available. I think this conceptually make the most sense if the PMU options are directly in CPUProfile.
    • That said, perhaps CPUProfile could just provide options for timer (ITIMER_PROF), cycles (PMU), cycles-precise (PMU+PEBS equivalent) and any more advanced usage is in another profile type.

I also think that using Set methods rather than exported fields provides more flexibility on the struct internals plus the ability to return errors, though I don’t feel too strongly about this as it does make the API below more verbose. One such possible API:

type CPUProfile struct { ... }

func (*CPUProfile) SetRate(int) error
func (*CPUProfile) SetSource(CPUProfileSource) error

type CPUProfileSource int

const (
	CPUTimer CPUProfileSource = iota
	CPUCycles
	CPUCyclesPrecise
	Custom // Configure via Sys().
)

// Returns *LinuxCPUProfile on Linux.
func (*CPUProfile) Sys() any

// LinuxCPUProfile contains Linux perf specific advanced configuration options.
type LinuxCPUProfile struct{ ... }

// Use one of the generic PERF_TYPE_HARDWARE events.
func (*LinuxCPUProfile) SetHardwareSource(LinuxCPUProfileHardwareSource) error

// Use a raw hardware event (PERF_TYPE_RAW). Format of config, config1, and config2 are microarchitecture dependent.
func (*LinuxCPUProfile) SetRawSource(config, config1, config2 uint64) error

// Additional options. e.g., setting the “precise” flag.

The idea of hiding OS-dependent details behind a Sys() method comes from os.ProcessState.Sys() and os/exec.Cmd.SysProcAttr. I’m not the biggest fan of this pattern, but it is nice to be able to provide advanced options without needing to an OS-independent API for every feature, especially given #53286 (comment) indicating that we are unlikely to support Windows or Darwin anyways.


Another big unanswered question is how events should be described. The proposed PerfProfile API says Event string // perf event names, such as cycles, instructions. Are these names as described in the documentation for perf record -e and shown in perf list?

That would be great, but presents a big problem: these names aren’t understood directly by the kernel interface.

The simple generic events like cycles and instructions are simply stringified versions of PERF_TYPE_HARDWARE, so those are simple to support. However, another event I’ve used recently for example is cycle_activity.stalls_mem_any. This is defined here (for Skylake, other files for other microarches). This JSON file is used to generate code for the userspace perf tool that describes the raw event values for this named event. libpfm4 has a similar giant table that describes this event.

There is such a huge number of possible events here that I don’t think it is feasible for the Go standard library to be in the business of understanding the names of every single event. In the proposed API above, I’ve addressed this by directly supporting the generic PERF_TYPE_HARDWARE events and for anything beyond that just providing SetRawSource that accepts PERF_TYPE_RAW config values (which would be derived from the EventCode, UMask, etc seen in the JSON file). SetRawSource is not user-friendly, but it provides just enough API for a third-party libpfm4-style package to understand the full set of possible events to help users configure it.

@prattmic
Copy link
Member

prattmic commented Jul 26, 2022

I will also note that as a first-step proposal, we could just take the simple part of the hardware CPU profile API above. i.e.,

type CPUProfile struct { ... }

func (*CPUProfile) SetRate(int) error
func (*CPUProfile) SetSource(CPUProfileSource) error

type CPUProfileSource int

const (
	CPUTimer CPUProfileSource = iota
	CPUCycles
	CPUCyclesPrecise
)

That is, we add support for using a hardware cycle counter for CPU profiles (perhaps exposing a ‘precise’ option, perhaps not). Nothing more. Under the hood this uses the perf interface, but we don’t expose an API for using any other event types.

@erifan
Copy link
Contributor Author

erifan commented Jul 27, 2022

I will also note that as a first-step proposal, we could just take the simple part of the hardware CPU profile API above.

I don't have a particularly strong opinion on how to define this API, so I think your proposal is also good, not exposing the internal fields of CPUProfile gives more flexibility to our implementation. And I don't think it's necessary to support too much events, including raw hardware event?

Most of the events that you’d use are CPU profiles

Yes, I think we will only support a few common PERF_TYPE_HARDWARE events. Do we have to support too many perf events? Because in my opinion our main purpose is to improve the CPU profiling precise of the pprof, not to implement a complete perf .

Are these names as described in the documentation for perf record -e and shown in perf list?

Yes, they are. In contrast to exported constants, we don't need to export these strings, we won't support a lot of perf events.

const (
CPUTimer CPUProfileSource = iota
CPUCycles
CPUCyclesPrecise
)
Maybe we can support a little more events. CL 410799 supported 6 events:

cycles,
instructions,
cache-references,
cache-misses,
branch-misses,
bus-cycles.

I think at least cycles, instructions, cache-misses and branch-misses are still quite important.

My initial take-away is that it is probably less important if Windows and Darwin will be impossible to support.

So this is the basis of the above discussion. Does it make sense if we can't find a proper way to access the PMU in Darwin and Windows? As far as I know, there is currently no way to access the Arm PMU on Windows, Arm has some work in progress to provide low level profiling infrastructure and tool for Windows, but it also seems to be struggling to meet our needs. So can we support the above API only on Linux? If so then what's the behavior of SetSource on unsupported OSes? Panic or use os timer by default?

@prattmic
Copy link
Member

prattmic commented Jul 27, 2022

Yes, I think we will only support a few common PERF_TYPE_HARDWARE events. Do we have to support too many perf events? Because in my opinion our main purpose is to improve the CPU profiling precise of the pprof, not to implement a complete perf .

I completely agree. I was trying to be flexible since I've seen desire for arbitrary events (such as in #36821), but I think focusing just on cycles first makes the most sense.

Maybe we can support a little more events. I think at least cycles, instructions, cache-misses and branch-misses are still quite important.

I think it is hard to decide what is important. There was a lot of discussion on #36821 about paring down the initial proposal to focus narrowly on improving CPU profiles without adding tons of extra options, and I think that is pertinent here. If there is general agreement that just cycles is valuable, then I think we start there.

So can we support the above API only on Linux? If so then what's the behavior of SetSource on unsupported OSes? Panic or use os timer by default?

Yes, it would only be supported on Linux.

The way I envisioned this API is that NewCPUProfile() automatically selects the best available profile source. Users that know they want a specific profile source can call SetSource, which returns an error if that source is not available (either due to unsupported OS or lack of HW support). The primary use-cases I imagine for this API are:

  1. Folks that always want the timer-based profile either for more consistency across all profiles in their fleet or they have tooling that expects a cpu-s profile.
  2. Folks that definitely want a precise cycles profile and would rather not profile at all if that isn't available.

P.S. it's possible that (1) will prevent us from making CPUProfile/StartCPUProfile automatically use PMU profiles because it is too much of a behavior change. I think this needs more thought, but I hope it isn't a blocker because I'd love to magically provide better profiles for a new release rather than hiding them behind a method most folks won't use.

P.P.S. since we are introducing the first implementation of NewCPUProfile with this proposal, a halfway point would be for NewCPUProfile to automatically select the best available source, but StartCPUProfile to always use the OS timer. Existing code wouldn't automatically get better profiles, but StartCPUProfile would be effectively deprecated, so new code would at least get better profiles by default.

@prattmic
Copy link
Member

prattmic commented Jul 27, 2022

P.S. it's possible that (1) will prevent us from making CPUProfile/StartCPUProfile automatically use PMU profiles because it is too much of a behavior change. I think this needs more thought, but I hope it isn't a blocker because I'd love to magically provide better profiles for a new release rather than hiding them behind a method most folks won't use.

I will point out that https://pkg.go.dev/runtime/pprof#StartCPUProfile does not promise that profiles are measured in cpu-seconds, only that this provides a "CPU profile", without ever defining what that means. But of course Hyrum's Law doesn't care what the API says. :)

@prattmic
Copy link
Member

prattmic commented Jul 27, 2022

One concern that I believe @rhysh raised on #36821 (though I can't find the comment now) and that I share is whether or not the PMU will continue counting while executing in the kernel, even if we are not allowed access to kernel stacks. It appears that the answer is no.

I tested with an application that does some spinning in userspace as well as performs an expensive system call.

Here are the results:

pprof (CPU timers):
image

sudo perf -e cycles (kernel+user):
image

perf -e cycles:u (user only):
image

My perf_event_paranoid == 2, which prevents access to kernel sampling. Simply using perf record -e cycles has the same behavior of auto-selecting user mode only.

Perhaps there is a way to keep counting in kernel mode, but only receive user stacks, which would match the pprof behavior, but honestly I doubt it since I can imagine that even getting counts with small periods could provide interesting side-channel security insights into what the kernel is doing that they'd want to stop.

(Edit: I dug through the kernel code a bit and it doesn't look like this is possible. In fact, it turns out that at least for Intel chips the PMU hardware itself has config bits for ring 0 and ring 3 counting, so the kernel doesn't even have to turn counting off/on on entry/exit.)

I think not counting is kernel mode is perhaps OK for an opt-in API, but it may be too big of a change from pprof to automatically select a PMU profile over a CPU timer.

@rhysh
Copy link
Contributor

rhysh commented Jul 28, 2022

One concern that I believe @rhysh raised on #36821 (though I can't find the comment now) and that I share is whether or not the PMU will continue counting while executing in the kernel, even if we are not allowed access to kernel stacks. It appears that the answer is no.

I think that comment is #36821 (comment) . The design of the perf_event support at the time used a file descriptor per thread, and to prevent the profiler from using many more FDs than GOMAXPROCS it took explicit action to detach the perf_event FD from the thread when making a syscall. So my comment wasn't about a limitation of the perf_event interface (though from your update today, @prattmic , that limitation seems exist!), it was about how the proposed Go runtime change chose to spend file descriptors.

Yes, a default profiler that reports syscalls as taking no time at all seems likely to confuse.


I will point out that https://pkg.go.dev/runtime/pprof#StartCPUProfile does not promise that profiles are measured in cpu-seconds, only that this provides a "CPU profile", without ever defining what that means. But of course Hyrum's Law doesn't care what the API says. :)

For what it's worth, the feedback on one of my early attempts to change the default profiler (https://go.dev/cl/204279) included:

Right now, in practice, everybody sets the profiling rate to 100, runtime/pprof.StartCPUProfile. [...] I'm concerned that this will confuse people used to the standard rate.

and

I expect that many people look at the sample count and mentally divide by 100 to get the time spent.

But, there's a much better reason this time.

@erifan
Copy link
Contributor Author

erifan commented Jul 28, 2022

If there is general agreement that just cycles is valuable, then I think we start there.

Ok, this can be a good start. If people feel that other events are needed in the future, it will be much simpler.

I will point out that https://pkg.go.dev/runtime/pprof#StartCPUProfile does not promise that profiles are measured in cpu-seconds, only that this provides a "CPU profile", without ever defining what that means. But of course Hyrum's Law doesn't care what the API says. :)

Yes, as you mentioned before, we can keep StartCPUProfile still using the OS timer so that its behavior doesn't change. For users who don't care about the unit of the sample and only care about the profiling precision, we choose the best available source for them in NewCPUProfile. For users who wish to use a definite source can use SetSource to set the source.

One concern that I believe @rhysh raised on #36821 (though I can't find the comment now) and that I share is whether or not the PMU will continue counting while executing in the kernel, even if we are not allowed access to kernel stacks. It appears that the answer is no.

perf_event_open has an option exclude_kernel, enabling it can exclude kernel events, so my understanding is that if it is not set, the kernel events will be counted, that is, the PMU will count in the kernel execution state. I'm not sure if you are referring to this.

Perhaps there is a way to keep counting in kernel mode, but only receive user stacks, which would match the pprof behavior

If exclude_kernel is not set and the PMU keeps counting in the kernel mode, then this is feasible. As for the callchain, we collect it by stack tracing in go, so I don't think there will be any difference. Unless we want to read the FD directly to get the callchain, but there also seems to be options (exclude_callchain_kernel and exclude_callchain_user ) to control whether to include the user or kernel callchain.

In my latest code, if it's pure go code, I set this option because it helps the precision a little bit. To be honest, there is no right or wrong to count the execution of the kernel or not. For example, if a kernel operation affects multiple threads (such as allocating a new heap arena), which thread is appropriate to assign this time to? But considering that to be consistent with the behavior of the OS timer, it may be reasonable to let the PMU counting the event of the kernel.

So my comment wasn't about a limitation of the perf_event interface (though from your update today, @prattmic , that limitation seems exist!), it was about how the proposed Go runtime change chose to spend file descriptors.

Thank @rhysh for raising this problem, I think we have some discussions in CL 410798 and 36821, and there seems to be no good solution for this problem. I didn't deal with this in CL 410798 because I think such cases are really rare, especially for go, where OS thread is shared by multiple goroutines. I'd like to hear more people's thoughts on whether this is an issue that needs to be dealt with.

@erifan
Copy link
Contributor Author

erifan commented Jul 28, 2022

How does the perf_event_open profiler deal with threads that aren't created by the Go runtime?

@rhysh you asked this question in #36821 (comment), and this question really stumped me. With the current framework, non-go threads can only receive profiling signals from the process signal source created by setitimer, so I installed a PMU counter for the process. However the non-go thread rarely receive the profiling signals, I thought the process signal would randomly choose a thread to receive, but it doesn't seem to be so random. I looked at the code ( complete_signal) of the kernel, it seems that the main thread always gets the signal first. Since our main thread does not block the SIGPROF signal, most of the signals are received by the main thread, and the main thread has a per-thread PMU counter, so it ignores the signal. I don't know how to fix this problem.

@erifan
Copy link
Contributor Author

erifan commented Jul 28, 2022

@rsc Would you mind adding this proposal to active queue ? Thanks.

@prattmic
Copy link
Member

prattmic commented Jul 28, 2022

perf_event_open has an option exclude_kernel, enabling it can exclude kernel events, so my understanding is that if it is not set, the kernel events will be counted, that is, the PMU will count in the kernel execution state. I'm not sure if you are referring to this.

Yes, that is correct. However, if /proc/sys/kernel/perf_event_paranoid >= 2, then setting exclude_kernel=false requires CAP_PERFMON or CAP_SYS_ADMIN (effectively, you must be root): https://elixir.bootlin.com/linux/v5.18.14/source/include/linux/perf_event.h#L1351

(N.B., the perf tool automatically detects this case and sets exclude_kernel=true if you are unpriviledged)

As far as I know, must Linux distributions set /proc/sys/kernel/perf_event_paranoid to 2 by default. In practice this means that very few Go programs will be able to enable kernel counting.

I was brainstorming with @cherrymui earlier and one idea that came up is that we could collect both the OS timer and PMU profiles by default. The pprof format supports multiple different sample types in one file, so putting both in one file would not be a problem. I think the main downside is that we'd approximately double the overhead because we'd have SIGPROFs for both profiles.

@erifan
Copy link
Contributor Author

erifan commented Jul 29, 2022

However, if /proc/sys/kernel/perf_event_paranoid >= 2, then setting exclude_kernel=false requires CAP_PERFMON or CAP_SYS_ADMIN (effectively, you must be root)

Well, that's a bit of a hassle. I wonder if we can do a check on this before starting the PMU, if the environment has the appropriate permissions we will allow the PMU profiling, otherwise not. I think usually the development environment and debugging environment meet these conditions.

Or we don't count the execution of the kernel, which behaves a little differently from the OS timer source, but is actually more accurate for user programs. This is also the default mode of perf.

one idea that came up is that we could collect both the OS timer and PMU profiles by default.

This is a bit tricky because different types of samples have different units, so we need to differentiate between them.

@prattmic
Copy link
Member

prattmic commented Jul 29, 2022

I wonder if we can do a check on this before starting the PMU, if the environment has the appropriate permissions we will allow the PMU profiling, otherwise not. I think usually the development environment and debugging environment meet these conditions.

Certainly we can check permissions ahead of time, but I am skeptical that most development environments will have perf_event_paranoid < 2. Perhaps we should do a survey of the defaults across some common Linux distributions.

This is a bit tricky because different types of samples have different units, so we need to differentiate between them.

Yes, the pprof tool treats them as completely different, and only displays one at a time. You can see an example of this behavior by collecting a perf.data with multiple counters:

$ perf record -e cycles,instructions ./set
$ pprof perf.data                                                                                                                           
File: set
perf-version:5.17.11
perf-command:/usr/bin/perf record -e cycles,instructions ./set
Type: instructions:u_event
Duration: 5.43s, Total samples = 18125609784
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

Note that this is displaying Type: instructions:u_event.

(pprof) options
...
  sample_index              = 0                    //: [cycles:u_sample | cycles:u_event | instructions:u_sample | instructions:u_event]
...

Here are all the included types. Select cycles:u_event with pprof -sample_index=1.

@erifan
Copy link
Contributor Author

erifan commented Aug 1, 2022

but I am skeptical that most development environments will have perf_event_paranoid < 2. Perhaps we should do a survey of the defaults across some common Linux distributions.

I think you are right that the default perf_event_paranoid value is 2 and 3 or 4 on some distributions, I wonder if we can make PMU profiling a feature that requires users to give more permissions or privileged users, similar with perf tool. Or don't count kernel events.

Yes, the pprof tool treats them as completely different, and only displays one at a time. You can see an example of this behavior by collecting a perf.data with multiple counters:

But I think what we want to have is one accurate event rather than two less accurate events.

@erifan
Copy link
Contributor Author

erifan commented Aug 1, 2022

Based on the discussion above, I make a summary.
The proposal (by @prattmic) is this, I removed CPUCyclesPrecise, target for Linux only.

// 1, Default event type is os-timer.
// 2, Default period is provided.
type CPUProfile struct { ... }

// 1, The unit of the period is related to the event type. For time-based events such as os-timer, it's nanosecond,
//  for other events, it is the count of events that passed between samples.
// 2, If the parameter <= 0, reset it to 0, stop profiling.
// 3, The profiling period has a minimum threshold to prevent the profiling rate from being too high.
// When the parameter value exceeds the threshold, it will be adjusted to the minimum threshold.
// 4, If c is nil, report an error and return.
// 5, If the profiling has been started, calling this function will report an error, but will not affect the running profiling.
func (c *CPUProfile) SetPeriod(int64) error

// 1, If c is nil, report an error and return.
// 2, If the profiling has been started, we can't resest the event by call SetEvent.
// 3, If the source is not available, report and error and return.
// 4, The second parameter indicates whether to count kernel events. For the os-timer event, this parameter is always true.
// 5, For PMU to count the kernel events, the CAP_PERFORM permission or the admin permission is required, if the condition is not met and the second parameter is also specified as true, an error will be reported.
// 6, If cgo is used, only the CPUTimer event can be used.
// 7, The precise_ip of the PMU event is set to the maximum value available.
func (c *CPUProfile) SetEvent(CPUProfileEvent, bool) error

// SetSampleKernel sets whether to count the kernel events, pass in “true” to count the kernel events, otherwise not count.
// This function has no effect on os-timer based events, because os-timer events always count kernel events.
// To count the kernel events, the CAP_PERFORM permission or the admin permission is required.
func (c *CPUProfile) SetSampleKernel(bool) error

// 1, If the method receiver c is nil, report an error.
// 2, If the profiling has been started, report an error.
func (c *CPUProfile) Start(w io.Writer) error

// Stop stops a running CPUProfile and flushes any pending writes.
// It returns only after all writes for the profile have completed.
// If c is not running, Stop does nothing and returns nil.
//
// It is the caller's responsibility to close the profile's Writer if appropriate.
//
// A CPUProfile can be started again after calling Stop.
func (c *CPUProfile) Stop() error

type CPUProfileEvent int

const (
	CPUTimer CPUProfileEvent = iota
	CPUCycles
        CPUBranchMisses
        CPUCacheMisses
        CPUInstructions
)

There are three problems:
First, FD may be depleted. But that's probably not a problem as this is very rare.
Disposal method: Do not handle.

Second, for PMU to count the kernel events, perf_event_open requires the CAP_PERFORM permission or the admin permission, otherwise the behavior of the PMU will be inconsistent with the behavior of the OS timer.

Disposal method:

  1. Ask the user to give the appropriate permissions.
  2. Kernel events are not counted if the permissions are not satisfied.

Third, it is difficult for non-go threads to receive signals from the PMU.
Disposal method:
If CGO profiling is on, only os-timer events can be used.

Everyone is welcome to brainstorm to move this proposal forward, thanks.

Edit1: Add Start method.
Edit2: Added some problem and behavior descriptions that I'm not sure about.
Edit3: Change func (c *CPUProfile) SetRate(int) error to func (c *CPUProfile) SetPeriod(int64) error.
Rename CPUProfileSource to CPUProfileEvent.
Rename func (c *CPUProfile) SetSource(CPUProfileSource) error to func (c *CPUProfile) SetEvent(CPUProfileEvent) error.
Add func SetPMUEventPrecise(PMUEventPrecise ) error and type PMUEventPrecise int.
Add CPUBranchMisses, CPUCacheMisses and CPUInstructions.
Add Disposal methods for the above three problems.
Edit4: Change func SetPMUEventPrecise(PMUEventPrecise) error to func (c *CPUProfile) SetPMUEventSkid(PMUEventSkid) error
Add func (c *CPUProfile) Stop(w io.Writer) error method
Modified some behavior descriptions.
Edit5: Modified some behavior descriptions.
Changes func (c *CPUProfile) SetEvent(CPUProfileEvent) error to func (c *CPUProfile) SetEvent( CPUProfileEvent, bool) error
Edit6: Add method func (c *CPUProfile) SetSampleKernel(bool) error
Edit7: updated the behavior description of the Start method

@rsc
Copy link
Contributor

rsc commented Sep 7, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@prattmic
Copy link
Member

prattmic commented Sep 7, 2022

To add to the second issue in #53286 (comment) (no kernel events), I think that this difference is acceptable for an explicit API (i.e., users must explicitly call SetSource(CPUCycles)), but that it is too big of a difference to automatically change StartCPUProfile to using the PMU when available.

@erifan
Copy link
Contributor Author

erifan commented Sep 8, 2022

it is too big of a difference to automatically change StartCPUProfile to using the PMU when available.

If the second issue is not a problem, I personally think that this is not a big problem, it is just a behavioral rule. My opinion is the same as yours, StartCPUProfile uses os-timer by default, so we keep the original behavior. And leave the new API to users who need it.

@erifan
Copy link
Contributor Author

erifan commented Sep 8, 2022

Third, it is difficult for non-go threads to receive signals from the PMU. Currently I haven't found any possible solution.

For this problem I think we can do like this:

  1. If cgo is not used or cgo profiling is not turned on, then this is not a problem.
  2. If cgo profiling is enabled, only os-timer events can be used, treat PMU as unavailable.

@rsc
Copy link
Contributor

rsc commented Sep 21, 2022

It sounds like maybe people are happy with #53286 (comment)?

Are there any other likely CPUProfileSource values that we expect to support? Right now it's looking like a bool, and if we had a few more it might help us understand the shape of the API better.

@aclements
Copy link
Member

aclements commented Oct 5, 2022

A few nit-picks on the API:

  • "The meaning of Rate is related to the event type. When the event type is os-timer, it means frequency, and when the event type is cycles, it means period." I'm not sure about overloading one method to mean frequency in some cases and period in the other. I think I would rather this always take a period, possibly as a float64. For any events tied to real-time, you need to specify the base unit of the period. I assume "frequency" is intended to be in Hz, so the period for os-timer could be in "seconds" or maybe in "nanoseconds". This would also allow a frequency less than 1 Hz for very low-rate background profiling.
  • SetSource: The word "Source" can mean a lot of things. I think a more common term for this generally is "event", and that would match the Linux perf API.

@rsc
Copy link
Contributor

rsc commented Oct 5, 2022

runtime.SetCPUProfileRate is Hz. If we use the word Rate, it should be Hz. But we could use Period, which in time is always nanoseconds and for other events is counting how many of those events go by between samples (including one of the samples).

@aclements
Copy link
Member

aclements commented Oct 5, 2022

I proposed always using a period rather than a frequency because I think it would apply to all events, while frequency (in Hz) is only natural for time-based events. For example, while runtime.SetCPUProfileRate is in Hz, the other runtime.Set*ProfileRate functions take periods, for exactly this reason.

If the period for time-based events is in nanoseconds, then SetPeriod can take an int (perhaps int64). In that case, we should probably document how to use SetPeriod with a time.Duration.

@erifan
Copy link
Contributor Author

erifan commented Oct 8, 2022

OK, I think Russ's idea is great.

If the period for time-based events is in nanoseconds, then SetPeriod can take an int (perhaps int64). In that case, we should probably document how to use SetPeriod with a time.Duration.

Yes, I think it is better to make the unit of time-based events default to nanoseconds. And we need to document this point.

@erifan
Copy link
Contributor Author

erifan commented Oct 8, 2022

Based on the discussion above, I updated issuecomment-1200714095, please check again, thanks.

Another question is should we set default values? What's the default event, period and precise for an uninitialized CPUProfile type variable?

@felixge
Copy link
Contributor

felixge commented Oct 9, 2022

Based on the discussion above, I updated #53286 (comment), please check again, thanks.

I have a quick question about the proposed API: Why is there no Stop() method? I see that the issue description is talking about it:

It would be better if a corresponding Stop method was exported, but it is not necessary, we can correctly stop the profiling by the cached configuration.

But I don't understand what "stop the profiling by the cached configuration" means. Sorry if I'm missing something.

@erifan
Copy link
Contributor Author

erifan commented Oct 10, 2022

But I don't understand what "stop the profiling by the cached configuration" means

I meant we can stop PMU profiling correctly with the existing pprof.StopCPUProfile() function with some internal modifications. But we can also stop PMU profiling via a new exported Stop method, and I think both are fine.

@felixge
Copy link
Contributor

felixge commented Oct 10, 2022

I think it would be a little weird to call the Start() method on a *CPUProfile instance and then use a package function to stop it. So I'd be in favor of adding a Stop() method to *CPUProfile.

Another question: #42502 (comment) proposes a NewCPUProfile constructor for *CPUProfile, but I don't see it in this proposal. Was it omitted for brevity, or are you suggesting to not have a constructor? I feel less strongly about this one.

Perhaps it would help to add a full example to the proposal that demonstrates how to use the new API to produce a profile?

@erifan
Copy link
Contributor Author

erifan commented Oct 10, 2022

Was it omitted for brevity,

Yes, I think this is the case.

Perhaps it would help to add a full example to the proposal that demonstrates how to use the new API to produce a profile?

The following are two examples, one for CPUTimer event and one for CPUCycles event.

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")

func main() {
	flag.Parse()
	if *cpuprofile != "" {
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal("could not create CPU profile: ", err)
		}
		defer f.Close()
		var cpuprof pprof.CPUProfile
		if err := cpuprof.SetEvent(pprof.CPUTimer); err != nil {
			log.Fatal("could not set CPUTimer event: ", err)
		}
		if err := cpuprof.SetPeriod(int64(1e9 /100)); err != nil { // assume profiling at 100HZ frequency
			log.Fatal("Failed to set the profiling period: ", err)
		}
		if err := cpuprof.Start(); err != nil {
			log.Fatal("could not start CPU profile: ", err)
		}
		defer pprof.StopCPUProfile()  // or defer cpuprof.Stop()
	}
	// ... rest of the program
}
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")

func main() {
	flag.Parse()
	if *cpuprofile != "" {
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal("could not create CPU profile: ", err)
		}
		defer f.Close()
		var cpuprof pprof.CPUProfile
		if err := cpuprof.SetEvent(pprof.CPUCycles); err != nil {
			log.Fatal("could not set CPUTimer event: ", err)
		}
		if err := cpuprof.SetPeriod(10000); err != nil { // assume sampling every 10000 cycles
			log.Fatal("Failed to set the profiling period: ", err)
		}
		if err := cpuprof.SetPMUEventPrecise(pprof.REQUEST0); err != nil {
			log.Fatal("Failed to set the requested precision: ", err)
		}
		if err := cpuprof.Start(); err != nil {
			log.Fatal("could not start CPU profile: ", err)
		}
		defer pprof.StopCPUProfile() // or defer cpuprof.Stop()
	}
	// ... rest of the program
}

@aclements
Copy link
Member

aclements commented Oct 10, 2022

Given that a program could reasonably have multiple CPU profiles going at once, either for different events or even just for different rates on the same event (e.g., for background profiling), I think the API should definitely have a Stop method.

@rsc
Copy link
Contributor

rsc commented Oct 12, 2022

It sounds like everyone is in favor of:

// 1, No default event type or default to os-timer or cycles or the best available one ?
// 2, No default period or based on the default event ?
type CPUProfile struct { ... }

// 1, The unit of the period is related to the event type. For time-based events such as os-timer, it's nanosecond,
//  for other events, it is the count of events that passed between samples.
// 2, If the parameter <= 0, reset it to 0, stop profiling.
// 3, The profiling period has a maximum threshold to prevent the profiling rate from being too high.
// When the parameter value exceeds the threshold, it will be adjusted to the maximum threshold.
// 4, The event type needs to be set first, otherwise an error is reported and returned.
// 5, If c is nil, report an error and return.
// 6, If the profiling has been started, can we reset the profiling rate by call SetRate ?
func (c *CPUProfile) SetPeriod(int64) error

// 1, If c is nil, report an error and return.
// 2, If the profiling has been started, we can't resest the source by call SetSource.
// 3, If the source is not available, report and error and return.
// 4, For PMU to count the kernel events, the CAP_PERFORM permission or the admin permission is required,
// 5, If cgo is used, only the CPUTimer event can be used.
func (c *CPUProfile) SetEvent(CPUProfileEvent) error

// 1, If the method receiver c is nil, report an error, stop.
// 2, If the profiling has been started, return nil directly.
func (c *CPUProfile) Start(w io.Writer) error

type CPUProfileEvent int

const (
	CPUTimer CPUProfileEvent = iota
	CPUCycles
        CPUBranchMisses
        CPUCacheMisses
        CPUInstructions
)

What about the PMU API?

type PMUEventPrecise int

const (
	ARBITRARY PMUEventPrecise = iota // 0  SAMPLE_IP can have arbitrary skid.
	CONSTANT // 1  SAMPLE_IP must have constant skid.
        REQUEST0 // 2  SAMPLE_IP requested to have 0 skid.
        MUST0 // 3  SAMPLE_IP must have 0 skid.
)

// SetPMUEventPrecise sets the profiling precise of PMU events, which controls the amount of skid.
// 1, For other events, the function is unavailable and returns nil.
// 2, Returns an error if the requested precision is unreachable.
// 3, If this function is not explicitly called, the default precision is ARBITRARY.
func SetPMUEventPrecise(PMUEventPrecise) error

These names are not very Go-like, and I'm not sure I understand why this is a global function instead of a method on CPUProfile. Is this API still part of the proposal?

@aclements
Copy link
Member

aclements commented Oct 12, 2022

Overall that looks pretty good to me. A few comments:

// 3, The profiling period has a maximum threshold to prevent the profiling rate from being too high.
// When the parameter value exceeds the threshold, it will be adjusted to the maximum threshold.

Since this is a period, this should be a minimum threshold.

What about the PMU API?

I'm not convinced precision needs to be part of an initial API, though maybe it's worth agreeing on an API now and separately deciding "do we want precision at all?"

If we have API for precision, it should definitely be a method of CPUProfile because it controls per-profile configuration, and obviously it should use more Go-like names. I'd also be inclined to phrase it directly in terms of "skid" because that's what you're controlling.

Finally, I think the API should also include a Stop method:

// Stop stops a running CPUProfile and flushes any pending writes.
// It returns only after all writes for the profile have completed.
// If c is not running, Stop does nothing and returns nil.
//
// It is the caller's responsibility to close the profile's Writer if appropriate.
//
// A CPUProfile can be started again after calling Stop.
func (c *CPUProfile) Stop() error

Even if we don't initially support multiple simultaneous CPU profiles, this makes it possible to support them in the future. Furthermore, it makes the new API entirely self-contained, rather than depending on a package-level function for ultimately stopping the profile. To that end, I think StopCPUProfile should not stop any profiles started by the CPUProfile API. Probably, StartCPUProfile and StopCPUProfile would become trivial wrappers around a global (unexported) CPUProfile.

@prattmic
Copy link
Member

prattmic commented Oct 12, 2022

// 1, No default event type or default to os-timer or cycles or the best available one ?

I think we should default to os-timer since os-timer and cycle will have different behavior for time spent in the kernel. But given this is a new API, I could go either way as long as StartCPUProfile always uses os-timer.

// 2, No default period or based on the default event ?

IMO, providing a default period for each event would be nice. (SetEvent thus overrides previous calls to SetPeriod).

// 6, If the profiling has been started, can we reset the profiling rate by call SetRate ?

IMO, changing Rate or Event during profiling should be an error. The pprof proto can only list 1 period per sample type, plus I'm not sure why you'd want to change it?

I also agree with @aclements that there should be a Stop method and precision should be a method.

@aclements
Copy link
Member

aclements commented Oct 13, 2022

The pprof proto can only list 1 period per sample type, plus I'm not sure why you'd want to change it?

I agree that it should be an error to change Rate or Event on a running profile, but wanted to point out that you can, in effect, change the rate of a running profile by choosing a common denominator period for the profile. For example, set Profile.sample_type to a small unit like nanoseocnds and set Profile.period to a small value like 1. The account for the actual period by varying the Sample.value of each sample.

If we support starting multiple profiles for the same event but with different periods (e.g., to support background profiling), then we may need to do a trick like this internally.

@erifan
Copy link
Contributor Author

erifan commented Oct 13, 2022

I'm not sure I understand why this is a global function instead of a method on CPUProfile. Is this API still part of the proposal?

Sorry, this is a mistake, it definitely should be a method of CPUProfile.

Since this is a period, this should be a minimum threshold.

Yes, updated.

Finally, I think the API should also include a Stop method:

Ok, I have added it. And yes StartCPUProfile and StopCPUProfile would become trivial wrappers around a global (unexported) CPUProfile.

In addition, I also updated some behavior descriptions according to @prattmic 's comments.

Finally let me discuss whether to add the SetPMUEventSkid methods, and how to name it and the related constants. Personally I tend not to add this method and choose the smallest skid available in the implementation. Of course we will document this in the API. issuecomment-1200714095

@aclements
Copy link
Member

aclements commented Oct 20, 2022

Finally let me discuss whether to add the SetPMUEventSkid methods, and how to name it and the related constants. Personally I tend not to add this method and choose the smallest skid available in the implementation.

I agree with that. We can add it in the future if we want.

For choosing the smallest skid available, is that equivalent to SKIDRequest0? Or does that fail if it can't get at least SKIDConstant? (This is a property of perf, and we can create our own behavior either way, just curious.)

@aclements
Copy link
Member

aclements commented Oct 20, 2022

Or does that fail if it can't get at least SKIDConstant?

The kernel code for this is, unsurprisingly, confusing, but there are a lot of conditions that just look for precise_ip != 0, and I didn't find any obvious "downgrading" code, so I think SKIDRequest0 will fail rather than downgrade to SKIDArbitrary.

@aclements
Copy link
Member

aclements commented Oct 20, 2022

A few nits:

// 4, The event type needs to be set first, otherwise an error is reported and returned.
func (c *CPUProfile) SetPeriod(int64) error

If this is the case, it seems SetEvent should explicitly reset the period, in case you call SetEvent, then SetPeriod, then SetEvent.

// 5, If c is nil, report an error and return.
func (c *CPUProfile) SetPeriod(int64) error

We don't typically specify that a nil receiver results in an error.

// 4, For PMU to count the kernel events, the CAP_PERFORM permission or the admin permission is required,
func (c *CPUProfile) SetEvent(CPUProfileEvent) error

This should probably be specified as part of the events.

@erifan
Copy link
Contributor Author

erifan commented Oct 21, 2022

For choosing the smallest skid available, is that equivalent to SKIDRequest0? Or does that fail if it can't get at least SKIDConstant?

I meant start with the largest precise_ip value 3 (means the smallest skid) and see if we can successfully execute perf_event_open, and use that precise_ip value if it successes. If that fails, try a smaller precise_ip value (means a bigger skid), 2, 1 and 0. If both fail, then throw. Of course, the minimum precise_ip value is not necessarily to be 0. We can set a bigger precise_ip value as the threshold, such as 1, means that Go requires SAMPLE_IP must have constant skid.

If this is the case, it seems SetEvent should explicitly reset the period, in case you call SetEvent, then SetPeriod, then SetEvent.

This is because the parameters of the method have different meanings for different events. Of course we could remove this restriction and make the user responsible for setting the correct value. It seems more reasonable to remove this restriction, I will update the description.

We don't typically specify that a nil receiver results in an error.

As the receiver is a pointer, we should specify a behavior in case the method is invoked with a nil pointer. It doesn't seem very reasonable to return nil like a normal call. So we either return an error or just throw.

This should probably be specified as part of the events.

Makes sense, so that it doesn't lead to confusing behavior. Maybe we can add a parameter to SetEvent? Like,
func (c *CPUProfile) SetEvent(e CPUProfileEvent, countKernel bool) error

Updated result #53286 (comment)

@erifan
Copy link
Contributor Author

erifan commented Oct 24, 2022

Another question is, do we need to discuss here how to extend PMU profiling to testing and net/http packages? Or that this should be discussed in another proposal.

@aclements
Copy link
Member

aclements commented Nov 2, 2022

This should probably be specified as part of the events.

Makes sense, so that it doesn't lead to confusing behavior. Maybe we can add a parameter to SetEvent? Like,
func (c *CPUProfile) SetEvent(e CPUProfileEvent, countKernel bool) error

Sorry, I didn't mean that I thought this should be an API change, just a documentation change. That is, in the CPUProfileEvent const block, each event should have documentation describing what it means and also whether it requires permissions.

Another question is, do we need to discuss here how to extend PMU profiling to testing and net/http packages? Or that this should be discussed in another proposal.

I think that can be a separate proposal, just to keep this one scoped.

@erifan
Copy link
Contributor Author

erifan commented Nov 3, 2022

Sorry, I didn't mean that I thought this should be an API change, just a documentation change. That is, in the CPUProfileEvent const block, each event should have documentation describing what it means and also whether it requires permissions.

I think this works too, but needs to be clear about the behavior under different conditions. There are two options:

  1. For PMU events, permission is required, otherwise it cannot be enabled.
  2. For PMU events, if the current environment has the permission, the kernel events are counted, if not, the kernel events are not counted.
    The first option ensures that the behaviors of PMU and os-timer events are consistent. The second option may cause some confusion, but not restricting permissions makes PMU events more widely available. Which do you think is better?

@rsc
Copy link
Contributor

rsc commented Nov 9, 2022

It sounds like maybe we should start with kernel events excluded.
If we need to add them, we could add more to the CPUProfileEvent enumeration
or maybe just have a CPUKernelEvents bit that gets OR'ed in to the enumeration values.
Does it make sense to start there?

@erifan
Copy link
Contributor Author

erifan commented Nov 11, 2022

Compared to adding more CPUProfileEvent enumeration, I prefer having a CPUKernelEvents bit, but a bool type CPUKernelEvent may be simple. But instead of using a bool value as a switch, why not use a method API? Such as:
func (c *CPUProfile) CountKernelEvent() error

@aclements
Copy link
Member

aclements commented Nov 16, 2022

Given that we have other "Set" methods for setting the properties of the CPUProfile, it seems like this should be another Set method: func (c *CPUProfile) SetSampleKernel(bool) error (maybe a better name). What I'm not sure about is that different events treat kernel events differently. I think there are three possibilities:

  1. For signal-based events, if the sample lands in the kernel, we attribute it to where we exited user space. I'm not sure we can do anything differently depending on whether SampleKernel is set.
  2. For PMU-based events, if SampleKernel is set and a sample lands in the kernel, we attribute the sample to a kernel PC.
  3. For PMU-based events, if SampleKernel is not set and a sample lands in the kernel, nothing gets added to the profile.

Maybe we just document for each event whether SampleKernel has any effect and ignore it for signal-based events?

@erifan
Copy link
Contributor Author

erifan commented Nov 17, 2022

Maybe we just document for each event whether SampleKernel has any effect and ignore it for signal-based events?

Yes, I totally agree. Updated #53286 (comment)

Seems like we've agreed on all concerns? Any other comment ?

@aclements
Copy link
Member

aclements commented Nov 23, 2022

I think we're quite close!

// 2, If the parameter <= 0, reset it to 0, stop profiling.
// 3, The profiling period has a minimum threshold to prevent the profiling rate from being too high.
// When the parameter value exceeds the threshold, it will be adjusted to the minimum threshold.
...
// 5, If the profiling has been started, calling this function will report an error, but will not affect the running profiling.
func (c *CPUProfile) SetPeriod(int64) error

These three things seem inconsistent. I would expect SetPeriod to always return an error without affecting the profile if the profile is already running. I could go either way on whether a negative value gets rounded up to the minimum threshold for the event, but would lean toward a negative value being an error because it seems most likely that reflects a programmer error.

@aclements
Copy link
Member

aclements commented Nov 23, 2022

Sorry, one more:

// 1, If the method receiver c is nil, report an error, stop.
// 2, If the profiling has been started, return nil directly.
func (c *CPUProfile) Start(w io.Writer) error

pprof.StartCPUProfile says that it "returns an error if profiling is already enabled." I think that's both reasonable, and we want to be consistent. In this case, I think Start should definitely return an error if c is already running, and should be allowed to return an error if any CPU profile is running.

@aclements
Copy link
Member

aclements commented Nov 23, 2022

I posted a more fleshed-out and documented version of the base (non-PMU) API in the CPUProfile issue.

@erifan
Copy link
Contributor Author

erifan commented Nov 24, 2022

I would expect SetPeriod to always return an error without affecting the profile if the profile is already running.

Yes, it is.

2, If the parameter <= 0, reset it to 0, stop profiling.

This is for consistency with SetCPUProfileRate , If we don't care about this, then I think it's fine to report an error for a negative period.

pprof.StartCPUProfile says that it "returns an error if profiling is already enabled." I think that's both reasonable, and we want to be consistent. In this case, I think Start should definitely return an error if c is already running, and should be allowed to return an error if any CPU profile is running.

Ok, make sense. Updated #53286 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Status: Active
Development

No branches or pull requests

8 participants