Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: d0baa99bcb
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 525 lines (523 sloc) 19.056 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524
.TH mprof-report 1 ""
.SH The Mono log profiler
.PP
The Mono \f[I]log\f[] profiler can be used to collect a lot of
information about a program running in the Mono runtime.
This data can be used (both while the process is running and later)
to do analyses of the program behaviour, determine resource usage,
performance issues or even look for particular execution patterns.
.PP
This is accomplished by logging the events provided by the Mono
runtime through the profiling interface and periodically writing
them to a file which can be later inspected with the command line
\f[I]mprof-report\f[] program or with a GUI (not developed yet).
.PP
The events collected include (among others):
.IP \[bu] 2
method enter and leave
.IP \[bu] 2
object allocation
.IP \[bu] 2
garbage collection
.IP \[bu] 2
JIT compilation
.IP \[bu] 2
metadata loading
.IP \[bu] 2
lock contention
.IP \[bu] 2
exceptions
.PP
In addition, the profiler can periodically collect info about all
the objects present in the heap at the end of a garbage collection
(this is called heap shot and currently implemented only for the
sgen garbage collector).
Another available profiler mode is the \f[I]sampling\f[] or
\f[I]statistical\f[] mode: periodically the program is sampled and
the information about what the program was busy with is saved.
This allows to get information about the program behaviour without
degrading its performance too much (usually less than 10%).
.SS Basic profiler usage
.PP
The simpler way to use the profiler is the following:
.PP
\f[B]mono\ --profile=log\ program.exe\f[]
.PP
At the end of the execution the file \f[I]output.mlpd\f[] will be
found in the current directory.
A summary report of the data can be printed by running:
.PP
\f[B]mprof-report\ output.mlpd\f[]
.PP
With this invocation a huge amount of data is collected about the
program execution and collecting and saving this data can
significantly slow down program execution.
If saving the profiling data is not needed, a report can be
generated directly with:
.PP
\f[B]mono\ --profile=log:report\ program.exe\f[]
.PP
If the information about allocations is not of interest, it can be
excluded:
.PP
\f[B]mono\ --profile=log:noalloc\ program.exe\f[]
.PP
On the other hand, if method call timing is not important, while
allocations are, the needed info can be gathered with:
.PP
\f[B]mono\ --profile=log:nocalls\ program.exe\f[]
.PP
You will still be able to inspect information about the sequence of
calls that lead to each allocation because at each object
allocation a stack trace is collected if full enter/leave
information is not available.
.PP
To periodically collect heap shots (and exclude method and
allocation events) use the following options (making sure you run
with the sgen garbage collector):
.PP
\f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
.PP
To perform a sampling profiler run, use the \f[I]sample\f[] option:
.PP
\f[B]mono\ --profile=log:sample\ program.exe\f[]
.SS Profiler option documentation
.PP
By default the \f[I]log\f[] profiler will gather all the events
provided by the Mono runtime and write them to a file named
\f[I]output.mlpd\f[].
When no option is specified, it is equivalent to using:
.PP
\f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100\f[]
.PP
The following options can be used to modify this default behaviour.
Each option is separated from the next by a \f[B],\f[] character,
with no spaces and all the options are included after the
\f[I]log:\f[] profile module specifier.
.IP \[bu] 2
\f[I]help\f[]: display concise help info about each available
option
.IP \[bu] 2
\f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object
allocation info, \f[I]alloc\f[] enables it if it was disabled by
another option like \f[I]heapshot\f[].
.IP \[bu] 2
\f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method
enter and leave events.
When this option is used at each object allocation and at some
other events (like lock contentions and exception throws) a stack
trace is collected by default.
See the \f[I]maxframes\f[] option to control this behaviour.
\f[I]calls\f[] enables method enter/leave events if they were
disabled by another option like \f[I]heapshot\f[].
.IP \[bu] 2
\f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
collection.
The frequency of the heap shots can be changed with the
\f[I]MODE\f[] parameter.
When this option is used allocation events and method enter/leave
events are not recorded by default: if they are needed, they need
to be enabled explicitly.
The optional parameter \f[I]MODE\f[] can modify the default heap
shot frequency.
heapshot can be used multiple times with different modes: in that
case a heap shot is taken if either of the conditions are met.
MODE can be one of:
.RS 2
.IP \[bu] 2
\f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
milliseconds passed since the last one.
.IP \[bu] 2
\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
garbage collections
.IP \[bu] 2
\f[I]ondemand\f[]: perform a heap shot when such a command is sent
to the control port
.RE
.IP \[bu] 2
\f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
program behaviour.
The default is to collect a 1000 times per second the instruction
pointer.
This is equivalent to the value \[lq]cycles/1000\[rq] for
\f[I]TYPE\f[].
On some systems, like with recent Linux kernels, it is possible to
cause the sampling to happen for other events provided by the
performance counters of the cpu.
In this case, \f[I]TYPE\f[] can be one of:
.RS 2
.IP \[bu] 2
\f[I]cycles\f[]: processor cycles
.IP \[bu] 2
\f[I]instr\f[]: executed instructions
.IP \[bu] 2
\f[I]cacherefs\f[]: cache references
.IP \[bu] 2
\f[I]cachemiss\f[]: cache misses
.IP \[bu] 2
\f[I]branches\f[]: executed branches
.IP \[bu] 2
\f[I]branchmiss\f[]: mispredicted branches
.RE
.IP \[bu] 2
\f[I]time=TIMER\f[]: use the TIMER timestamp mode.
TIMER can have the following values:
.RS 2
.IP \[bu] 2
\f[I]fast\f[]: a usually faster but possibly more inaccurate timer
.RE
.IP \[bu] 2
\f[I]maxframes=NUM\f[]: when a stack trace needs to be performed,
collect \f[I]NUM\f[] frames at the most.
The default is 8.
.IP \[bu] 2
\f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
call chain depth is bigger than NUM.
.IP \[bu] 2
\f[I]zip\f[]: automatically compress the output data in gzip
format.
.IP \[bu] 2
\f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
with the current process id and \f[I]%t\f[] with the current date
and time, then do according to \f[I]OUTSPEC\f[]:
.RS 2
.IP \[bu] 2
if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
rest as a program and feed the data to its standard input
.IP \[bu] 2
if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
rest of OUTSPEC as the filename, but force overwrite any existing
file by that name
.IP \[bu] 2
otherwise write the data the the named file: note that is a file by
that name already exists, a warning is issued and profiling is
disabled.
.RE
.IP \[bu] 2
\f[I]report\f[]: the profiling data is sent to mprof-report, which
will print a summary report.
This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
If the \f[I]output\f[] option is specified as well, the report will
be written to the output file instead of the console.
.IP \[bu] 2
\f[I]port=PORT\f[]: specify the tcp/ip port to use for the
listening command server.
Currently not available for windows.
This server is started for example when heapshot=ondemand is used:
it will read commands line by line.
The following commands are available:
.RS 2
.IP \[bu] 2
\f[I]heapshot\f[]: perform a heapshot as soon as possible
.RE
.SS Analyzing the profile data
.PP
Currently there is a command line program (\f[I]mprof-report\f[])
to analyze the data produced by the profiler.
This is ran automatically when the \f[I]report\f[] profiler option
is used.
Simply run:
.PP
\f[B]mprof-report\ output.mlpd\f[]
.PP
to see a summary report of the data included in the file.
.SS Trace information for events
.PP
Often it is important for some events, like allocations, lock
contention and exception throws to know where they happened.
Or we may want to see what sequence of calls leads to a particular
method invocation.
To see this info invoke mprof-report as follows:
.PP
\f[B]mprof-report\ --traces\ output.mlpd\f[]
.PP
The maximum number of methods in each stack trace can be specified
with the \f[I]\[em]maxframes=NUM\f[] option:
.PP
\f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[]
.PP
The stack trace info will be available if method enter/leave events
have been recorded or if stack trace collection wasn't explicitly
disabled with the \f[I]maxframes=0\f[] profiler option.
Note that the profiler will collect up to 8 frames by default at
specific events when the \f[I]nocalls\f[] option is used, so in
that case, if more stack frames are required in mprof-report, a
bigger value for maxframes when profiling must be used, too.
.PP
The \f[I]\[em]traces\f[] option also controls the reverse reference
feature in the heapshot report: for each class it reports how many
references to objects of that class come from other classes.
.SS Sort order for methods and allocations
.PP
When a list of methods is printed the default sort order is based
on the total time spent in the method.
This time is wall clock time (that is, it includes the time spent,
for example, in a sleep call, even if actual cpu time would be
basically 0).
Also, if the method has been ran on different threads, the time
will be a sum of the time used in each thread.
.PP
To change the sort order, use the option:
.PP
\f[B]--method-sort=MODE\f[]
.PP
where \f[I]MODE\f[] can be:
.IP \[bu] 2
\f[I]self\f[]: amount of time spent in the method itself and not in
its callees
.IP \[bu] 2
\f[I]calls\f[]: the number of method invocations
.IP \[bu] 2
\f[I]total\f[]: the total time spent in the method.
.PP
Object allocation lists are sorted by default depending on the
total amount of bytes used by each type.
.PP
To change the sort order of object allocations, use the option:
.PP
\f[B]--alloc-sort=MODE\f[]
.PP
where \f[I]MODE\f[] can be:
.IP \[bu] 2
\f[I]count\f[]: the number of allocated objects of the given type
.IP \[bu] 2
\f[I]bytes\f[]: the total number of bytes used by objects of the
given type
.SS Selecting what data to report
.PP
The profiler by default collects data about many runtime subsystems
and mprof-report prints a summary of all the subsystems that are
found in the data file.
It is possible to tell mprof-report to only show information about
some of them with the following option:
.PP
\f[B]--reports=R1[,R2...]\f[]
.PP
where the report names R1, R2 etc.
can be:
.IP \[bu] 2
\f[I]header\f[]: information about program startup and profiler
version
.IP \[bu] 2
\f[I]jit\f[]: JIT compiler information
.IP \[bu] 2
\f[I]sample\f[]: statistical sampling information
.IP \[bu] 2
\f[I]gc\f[]: garbage collection information
.IP \[bu] 2
\f[I]alloc\f[]: object allocation information
.IP \[bu] 2
\f[I]call\f[]: method profiling information
.IP \[bu] 2
\f[I]metadata\f[]: metadata events like image loads
.IP \[bu] 2
\f[I]exception\f[]: exception throw and handling information
.IP \[bu] 2
\f[I]monitor\f[]: lock contention information
.IP \[bu] 2
\f[I]thread\f[]: thread information
.IP \[bu] 2
\f[I]heapshot\f[]: live heap usage at heap shots
.PP
It is possible to limit some of the data displayed to a timeframe
of the program execution with the option:
.PP
\f[B]--time=FROM-TO\f[]
.PP
where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application
startup (they can be floating point numbers).
.PP
Another interesting option is to consider only events happening on
a particular thread with the following option:
.PP
\f[B]--thread=THREADID\f[]
.PP
where \f[I]THREADID\f[] is one of the numbers listed in the thread
summary report (or a thread name when present).
.PP
By default long lists of methods or other information like object
allocations are limited to the most important data.
To increase the amount of information printed you can use the
option:
.PP
\f[B]--verbose\f[]
.SS Track individual objects
.PP
Instead of printing the usual reports from the profiler data, it is
possible to track some interesting information about some specific
object addresses.
The objects are selected based on their address with the
\f[I]\[em]track\f[] option as follows:
.PP
\f[B]--track=0xaddr1[,0xaddr2,...]\f[]
.PP
The reported info (if available in the data file), will be class
name, size, creation time, stack trace of creation (with the
\f[I]\[em]traces\f[] option), etc.
If heapshot data is available it will be possible to also track
what other objects reference one of the listed addresses.
.PP
The object addresses can be gathered either from the profiler
report in some cases (like in the monitor lock report), from the
live application or they can be selected with the
\f[I]\[em]find=FINDSPEC\f[] option.
FINDSPEC can be one of the following:
.IP \[bu] 2
\f[I]S:SIZE\f[]: where the object is selected if it's size is at
least \f[I]SIZE\f[]
.IP \[bu] 2
\f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[]
partially matches its class name
.PP
This option can be specified multiple times with one of the
different kinds of FINDSPEC.
For example, the following:
.PP
\f[B]--find=S:10000\ --find=T:Byte[]\f[]
.PP
will find all the byte arrays that are at least 10000 bytes in
size.
.PP
Note that with a moving garbage collector the object address can
change, so you may need to track the changed address manually.
It can also happen that multiple objects are allocated at the same
address, so the output from this option can become large.
.SS Saving a profiler report
.PP
By default mprof-report will print the summary data to the console.
To print it to a file, instead, use the option:
.PP
\f[B]--out=FILENAME\f[]
.SS Dealing with profiler slowness
.PP
If the profiler needs to collect lots of data, the execution of the
program will slow down significantly, usually 10 to 20 times
slower.
There are several ways to reduce the impact of the profiler on the
program execution.
.SS Use the statistical sampling mode
.PP
Statistical sampling allows executing a program under the profiler
with minimal performance overhead (usually less than 10%).
This mode allows checking where the program is spending most of
it's execution time without significantly perturbing its behaviour.
.SS Collect less data
.PP
Collecting method enter/leave events can be very expensive,
especially in programs that perform many millions of tiny calls.
The profiler option \f[I]nocalls\f[] can be used to avoid
collecting this data or it can be limited to only a few call levels
with the \f[I]calldepth\f[] option.
.PP
Object allocation information is expensive as well, though much
less than method enter/leave events.
If it's not needed, it can be skipped with the \f[I]noalloc\f[]
profiler option.
Note that when method enter/leave events are discarded, by default
stack traces are collected at each allocation and this can be
expensive as well.
The impact of stack trace information can be reduced by setting a
low value with the \f[I]maxframes\f[] option or by eliminating them
completely, by setting it to 0.
.PP
The other major source of data is the heapshot profiler option:
especially if the managed heap is big, since every object needs to
be inspected.
The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
used to reduce the frequency of the heap shots.
.SS Reduce the timestamp overhead
.PP
On many operating systems or architectures what actually slows down
profiling is the function provided by the system to get timestamp
information.
The \f[I]time=fast\f[] profiler option can be usually used to speed
up this operation, but, depending on the system, time accounting
may have some level of approximation (though statistically the data
should be still fairly valuable).
.SS Dealing with the size of the data files
.PP
When collecting a lot of information about a profiled program, huge
data files can be generated.
There are a few ways to minimize the amount of data, for example by
not collecting some of the more space-consuming information or by
compressing the information on the fly or by just generating a
summary report.
.SS Reducing the amount of data
.PP
Method enter/leave events can be excluded completely with the
\f[I]nocalls\f[] option or they can be limited to just a few levels
of calls with the \f[I]calldepth\f[] option.
For example, the option:
.PP
\f[B]calldepth=10\f[]
.PP
will ignore the method events when there are more than 10 managed
stack frames.
This is very useful for programs that have deep recursion or for
programs that perform many millions of tiny calls deep enough in
the call stack.
The optimal number for the calldepth option depends on the program
and it needs to be balanced between providing enough profiling
information and allowing fast execution speed.
.PP
Note that by default, if method events are not recorded at all, the
profiler will collect stack trace information at events like
allocations.
To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler
option.
.PP
Allocation events can be eliminated with the \f[I]noalloc\f[]
option.
.PP
Heap shot data can also be huge: by default it is collected at each
major collection.
To reduce the frequency, you can specify a heapshot mode: for
example to collect every 5 collections (including major and minor):
.PP
\f[B]heapshot=5gc\f[]
.PP
or when at least 5 seconds passed since the last heap shot:
.PP
\f[B]heapshot=5000ms\f[]
.SS Compressing the data
.PP
To reduce the amout of disk space used by the data, the data can be
compressed either after it has been generated with the gzip
command:
.PP
\f[B]gzip\ -9\ output.mlpd\f[]
.PP
or it can be compressed automatically by using the \f[I]zip\f[]
profiler option.
Note that in this case there could be a significant slowdown of the
profiled program.
.PP
The mprof-report program will tranparently deal with either
compressed or uncompressed data files.
.SS Generating only a summary report
.PP
Often it's enough to look at the profiler summary report to
diagnose an issue and in this case it's possible to avoid saving
the profiler data file to disk.
This can be accomplished with the \f[I]report\f[] profiler option,
which will basically send the data to the mprof-report program for
display.
.PP
To have more control of what summary information is reported (or to
use a completely different program to decode the profiler data),
the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as
the first character: the rest of the output name will be executed
as a program with the data fed in on the standard input.
.PP
For example, to print only the Monitor summary with stack trace
information, you could use it like this:
.PP
\f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
.SH WEB SITE
http://www.mono-project.com/Profiler
.SH SEE ALSO
.PP
mono(1)
.SH AUTHORS
Paolo Molaro.

Something went wrong with that request. Please try again.