Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posix profiling #5187

Merged
merged 37 commits into from Sep 14, 2018
Merged

posix profiling #5187

merged 37 commits into from Sep 14, 2018

Conversation

guliashvili
Copy link
Contributor

@guliashvili guliashvili commented Sep 6, 2018

Using the getrusage syscall doing additional profiling for the queries. Linux provides even more granular profiling using the RUSAGE_THREAD, which counts things only for the calling thread.

In the future query profiling code will grow more and will be tied to the specific OS. So, we should benefit by having query_profiler for each different systems/standards.

@guliashvili guliashvili self-assigned this Sep 6, 2018
@facebook-github-bot facebook-github-bot added the cla signed Automated label: Pull Request author has signed the osquery CLA label Sep 6, 2018
@guliashvili
Copy link
Contributor Author

jenkins test windows

@guliashvili guliashvili added the ready for review Pull requests that are ready to be reviewed by a maintainer label Sep 6, 2018
@akindyakov
Copy link
Contributor

There is a debug build for linux and it is failing of this PR, can I ask you to have a look? https://jenkins.osquery.io/job/osqueryMasterLinuxDebug/207/TargetSystem=aws-ubuntu-14.04/console

@guliashvili
Copy link
Contributor Author

Cool, how can people find this build logs?

This failure looks unrelated to the PR. Exact same failure has happened to me while 'fixing' integration tests.

@guliashvili
Copy link
Contributor Author

#5192 @akindyakov It does not fail anymore. 👍

Copy link
Contributor

@akindyakov akindyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on it!

I'd like to suggest a bit different interface for your new fancy functionality.

You could implement it as a class with two methods start and stop. And your measurement in the code would be look like:

auto profiler = ResourceProfiler(monitoring_path_prefix).start();
< piece of code to watch >
profiler.stop();

It allow us to use it not only for launching queries. Also the interface going to be more clear and simple.
What do you think?


recordRusageStatDifference(start_stats.ru_stime,
end_stats.ru_stime,
monitoring_path_prefix + ".time.system");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about system time here? IMHO, the most important time parameter of running query is a user time and full time. If this two is fine we likely don't care about system time. Otherwise we can repeat the intended query in controllable environment and figure out more parameters. Does it makes sense for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no need of trade offs here. As an example, system time gives information about how heavy syscalls we are using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm just talking about more keys to send. For some monitoring systems it could be a problem :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What system are we talking about specifically? I don't think we should lose information, without very specific/practical limitation.


const auto query_duration =
std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::steady_clock::now() - start_time_point);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is good idea to include the time of monitoring key creating and time of all routine in recordRusageStatDifference in query_duration? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query monitoring is not rusage specific, that's why I don't think it should be somewhere completely separated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Moving steady_clock after rusage

namespace osquery {
void launchQueryWithProfiling(const std::string& name,
std::function<Status()> launchQuery) {
const auto start_time_point = std::chrono::steady_clock::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you calculate current time time twice if Killswitch::get().isWindowsProfilingEnabled() is false? Would you like to do something like posix implementation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows query_profiler is almost copy paste of the old code. Code flow when killswitch is turned on is not really worth of optimizations. This should be removed soon after testing.

#include <osquery/status.h>

namespace osquery {
void launchQueryWithProfiling(const std::string& name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to pass the return status of launchQuery as a return status here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@guliashvili
Copy link
Contributor Author

guliashvili commented Sep 10, 2018

@akindyakov Your comment is definitely enlightening. For some reason I've not thought that we would need it for anything else.
I'm up for renaming launchQueryWithProfiling to something like launchWithProfiling so that, it would look okay to use profiling for any kind of code.

I kind of like that, with the current interface we can encourage developer to 1) Encapsulate the code being profiled in the single function. 2) Return Status, so that we can know how frequent failure is and also how much time failure/success take.
What do you think?

@akindyakov
Copy link
Contributor

@guliashvili , I see and kind of agree with you. Although, your interface force user to create a function with certain interface. For instance right now we are moving from Status to Expected. And after a while we need either create another copy of you function or significantly change it. My suggestion is to create it from very beginning flexible.
And there are two ways use templates of use class :)

@guliashvili
Copy link
Contributor Author

guliashvili commented Sep 14, 2018

Thats fair 👍
I kept both of the pros. Now the code will look like that.

        {
          CodeProfiler codeProfiler("monitorsomething"); //codeProfiler constructor - start monitoring
          ...
          codeProfiler.appendName("something additional depending on the code flow");
          ...
          //codeProfiler destructor - end monitoring
      }

Copy link
Contributor

@akindyakov akindyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@guliashvili guliashvili merged commit 53a48d8 into osquery:master Sep 14, 2018
@guliashvili guliashvili deleted the query-benchmark/linux branch September 14, 2018 15:05
jjerger pushed a commit to jjerger/osquery that referenced this pull request Sep 14, 2018
trizt pushed a commit to trizt/osquery that referenced this pull request May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Automated label: Pull Request author has signed the osquery CLA ready for review Pull requests that are ready to be reviewed by a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants