Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposed the AsyncProfiler stop method #121

Closed
wants to merge 10 commits into from
Closed

Conversation

Krithika3
Copy link

As per #119 I had requested for the ability to do on-demand profiling. I have added a PR here to expose the stop method so we could stop profiling anytime in production after collecting samples for an amount of time. Please review and provide feedback. Apologies if standards are not followed

@Krithika3
Copy link
Author

Noticed that there are not many tests, so any recommendations on adding tests. Thanks

@Krithika3
Copy link
Author

@korniltsev please review in reference to #119

@Krithika3 Krithika3 marked this pull request as ready for review October 11, 2023 14:21
Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you contribution.

I think you need a different approach to achieve, stopping of the Profiler you started before. Let me know if my instructions make sense to you.

Comment on lines 36 to 47
// This is a naive implementation for the stop method
PyroscopeAgent.stop(
new PyroscopeAgent.Options.Builder(
new Config.Builder()
.setApplicationName("demo.app{qweqwe=asdasd}")
.setServerAddress("http://localhost:4040")
.setFormat(Format.JFR)
.setLogLevel(Logger.Level.DEBUG)
.setLabels(mapOf("user", "tolyan"))
.build())
.build()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will start a second profiler and stop it immediately

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review @simonswine , it makes sense to add AtomicReference and stop everything that was set up above. I will let you know if I run into anything

Copy link
Author

@Krithika3 Krithika3 Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed again when I ran this test, it starts the profiler and then waits for sometime with Thread.sleep and stops it but the start thread is still running so we get an error there saying Error Dumping profiler

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonswine pls re-review. Thanks

* stop is used to stop profiling
* @param options
*/
public static void stop(Options options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than passing Options here you, you should go a slightly different path.

I am proposing you replace the AtomicBool with an AtomicReference<Options>, which gets set by start().

Then the stop method can stop those <Options> of the previous start call and set it back to null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, let me give that a shot and get back to you with questions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonswine pls re-review and let me know comments

if (!options.config.agentEnabled) {
logger.log(Logger.Level.INFO, "Pyroscope agent start disabled by configuration");
return;
}
logger.log(Logger.Level.DEBUG, "Config: %s", options.config);
try {
startOptions.compareAndSet(null, options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail, as it will always happen in line 40. You can safely remove it:

Suggested change
startOptions.compareAndSet(null, options);
startOptions.compareAndSet(null, options);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it

logger.log(Logger.Level.DEBUG, "Config: %s", options.config);
try {
startOptions.compareAndSet(null, options);
startOptions.get().scheduler.start(startOptions.get().profiler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler like that

Suggested change
startOptions.get().scheduler.start(startOptions.get().profiler);
options.scheduler.start(options.profiler);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 35 to 39

Thread.sleep(2 * 60 * 1000);

// This is a naive implementation for the stop method
PyroscopeAgent.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test this properly I suggest, you Start it, then in a separate thread: stop it after a 1 second and then start it again a second later. This will make sure it works as expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the test. Please re-review. Let me know if I am making any blunders as well

* stop is used to stop profiling
*/
public static void stop() {
Logger logger = startOptions.get().logger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash when startOptions is null

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a null check

Comment on lines 68 to 74
try {
options.scheduler.start(options.profiler);
logger.log(Logger.Level.INFO, "Profiling started");
startOptions.get().scheduler.stop(startOptions.get().profiler);
startOptions.set(null);
logger.log(Logger.Level.INFO, "Profiling stopped");
} catch (final Throwable e) {
logger.log(Logger.Level.ERROR, "Error starting profiler %s", e);
logger.log(Logger.Level.ERROR, "Error stopping profiler %s", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if we need the try catch block here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I removed the try/catch block. Might not even be needed in start

@korniltsev
Copy link
Collaborator

@Krithika3 Do you think you can implement this logic as a custom ProfilingScheduler at your application level (not in the sdk)?

Here's a quick dirty example
https://gist.github.com/korniltsev/68afb0dd09c596b7fd0306fa8b14ba6b

@Krithika3
Copy link
Author

@korniltsev thanks for the example. The main reason we wanted to do it here at the sdk level was because we may need to do this in multiple apps (and establish a pattern for profiling across the company ) and we did not want to add a CustomScheduler in everyone of them. So if we can have something added at the java agent level it could be great. Thanks and appreciate it

@Krithika3
Copy link
Author

Do we see any concerns doing the above changes? Thank you

@korniltsev
Copy link
Collaborator

start method was safe to call from multiple threads

new stop method is not safe to call from multiple threads

I suggest you replace AtomicReference with plain reference and guard it with synchronized block.

@Krithika3 Did you run your changes? Do they work?

@Krithika3
Copy link
Author

Krithika3 commented Oct 26, 2023

@korniltsev I ran the changes locally, with pyroscope running locally and the sample java app as an example. I did see behave as expected, let me attach a screenshot. One issue I observed was the ```
Screenshot 2023-10-26 at 10 39 33 AM

@Krithika3
Copy link
Author

@korniltsev I left the AtomicReferenceOptions> as is but guarded the stop with a synchronized block. Let me know if that makes sense

@korniltsev
Copy link
Collaborator

Start method should be synchronized as well.

I dont see a reason for AtomicReference, do you?

*/
@Override
public void stop(Profiler profiler) {
profiler.stop();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this stop the this.job task somehow ?
What happens if this stop method is executed concurrently with dumpProfile runnable? Is it possible? Are they synchronized in any way? should they?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be stopping the this.job. Let me take a look at that. Also let me see if it can be executed concurrently with dumpProfile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the logic to stop this.job task as well. Let me look into the second part of running stop concurrently with dumpProfile runnable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the dumProfile and stop should be synchronized. @korniltsev any insights?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should.

T1(dump) calls profiler.stop
T2(stopper) calls scheduler.stop
T1(dump) calls profiler.start

And now it profiles forever

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me make that change,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change @korniltsev

@Krithika3
Copy link
Author

Krithika3 commented Nov 1, 2023

Start method should be synchronized as well.
I have synchronized the start method as well
I dont see a reason for AtomicReference, do you?

I was not able to get a proper class to initialize Reference with

@Krithika3
Copy link
Author

The current run screenshot
Screenshot 2023-10-31 at 10 41 14 PM

This does not have the error with dumping profile data

@Krithika3
Copy link
Author

Krithika3 commented Nov 1, 2023

@korniltsev One worry I had was since we now have the need for synchronized block and adding locks are we introducing any performance issues with the Start and Stop methods?

@Krithika3
Copy link
Author

@korniltsev One worry I had was since we now have the need for synchronized block and adding locks are we introducing any performance issues with the Start and Stop methods?

Any thoughts here?

@korniltsev
Copy link
Collaborator

I see no changes for problem discussed here #121 (comment)

@korniltsev
Copy link
Collaborator

Closing this PR because of inactivity.

Will continue here #149

@korniltsev korniltsev closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants