New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
posix profiling #5187
posix profiling #5187
Conversation
jenkins test windows |
…y into query-benchmark/linux
There is a debug build for linux and it is failing of this PR, can I ask you to have a look? https://jenkins.osquery.io/job/osqueryMasterLinuxDebug/207/TargetSystem=aws-ubuntu-14.04/console |
Cool, how can people find this build logs? This failure looks unrelated to the PR. Exact same failure has happened to me while 'fixing' integration tests. |
…y into query-benchmark/linux
#5192 @akindyakov It does not fail anymore. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on it!
I'd like to suggest a bit different interface for your new fancy functionality.
You could implement it as a class with two methods start
and stop
. And your measurement in the code would be look like:
auto profiler = ResourceProfiler(monitoring_path_prefix).start();
< piece of code to watch >
profiler.stop();
It allow us to use it not only for launching queries. Also the interface going to be more clear and simple.
What do you think?
|
||
recordRusageStatDifference(start_stats.ru_stime, | ||
end_stats.ru_stime, | ||
monitoring_path_prefix + ".time.system"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about system time here? IMHO, the most important time parameter of running query is a user time and full time. If this two is fine we likely don't care about system time. Otherwise we can repeat the intended query in controllable environment and figure out more parameters. Does it makes sense for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no need of trade offs here. As an example, system time gives information about how heavy syscalls we are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm just talking about more keys to send. For some monitoring systems it could be a problem :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What system are we talking about specifically? I don't think we should lose information, without very specific/practical limitation.
|
||
const auto query_duration = | ||
std::chrono::duration_cast<std::chrono::microseconds>( | ||
std::chrono::steady_clock::now() - start_time_point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is good idea to include the time of monitoring key creating and time of all routine in recordRusageStatDifference
in query_duration
? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query monitoring is not rusage specific, that's why I don't think it should be somewhere completely separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Moving steady_clock after rusage
namespace osquery { | ||
void launchQueryWithProfiling(const std::string& name, | ||
std::function<Status()> launchQuery) { | ||
const auto start_time_point = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you calculate current time time twice if Killswitch::get().isWindowsProfilingEnabled()
is false? Would you like to do something like posix implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows query_profiler is almost copy paste of the old code. Code flow when killswitch is turned on is not really worth of optimizations. This should be removed soon after testing.
osquery/dispatcher/query_profiler.h
Outdated
#include <osquery/status.h> | ||
|
||
namespace osquery { | ||
void launchQueryWithProfiling(const std::string& name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to pass the return status of launchQuery
as a return status here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
@akindyakov Your comment is definitely enlightening. For some reason I've not thought that we would need it for anything else. I kind of like that, with the current interface we can encourage developer to 1) Encapsulate the code being profiled in the single function. 2) Return Status, so that we can know how frequent failure is and also how much time failure/success take. |
@guliashvili , I see and kind of agree with you. Although, your interface force user to create a function with certain interface. For instance right now we are moving from Status to Expected. And after a while we need either create another copy of you function or significantly change it. My suggestion is to create it from very beginning flexible. |
1b0e963
to
4cf9765
Compare
2358efe
to
dcbd744
Compare
Thats fair 👍
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Using the
getrusage
syscall doing additional profiling for the queries. Linux provides even more granular profiling using the RUSAGE_THREAD, which counts things only for the calling thread.In the future query profiling code will grow more and will be tied to the specific OS. So, we should benefit by having query_profiler for each different systems/standards.