Browse files


  • Loading branch information...
1 parent cfcfc9e commit 75b63651e4cdab0c861abeb4bafd5268da457f86 @fchecconi committed Nov 22, 2011
Showing 323 changed files with 8,922 additions and 1,906 deletions.
@@ -0,0 +1,64 @@
+The module hwlat_detector is a special purpose kernel module that is used to
+detect large system latencies induced by the behavior of certain underlying
+hardware or firmware, independent of Linux itself. The code was developed
+originally to detect SMIs (System Management Interrupts) on x86 systems,
+however there is nothing x86 specific about this patchset. It was
+originally written for use by the "RT" patch since the Real Time
+kernel is highly latency sensitive.
+SMIs are usually not serviced by the Linux kernel, which typically does not
+even know that they are occuring. SMIs are instead are set up by BIOS code
+and are serviced by BIOS code, usually for "critical" events such as
+management of thermal sensors and fans. Sometimes though, SMIs are used for
+other tasks and those tasks can spend an inordinate amount of time in the
+handler (sometimes measured in milliseconds). Obviously this is a problem if
+you are trying to keep event service latencies down in the microsecond range.
+The hardware latency detector works by hogging all of the cpus for configurable
+amounts of time (by calling stop_machine()), polling the CPU Time Stamp Counter
+for some period, then looking for gaps in the TSC data. Any gap indicates a
+time when the polling was interrupted and since the machine is stopped and
+interrupts turned off the only thing that could do that would be an SMI.
+Note that the SMI detector should *NEVER* be used in a production environment.
+It is intended to be run manually to determine if the hardware platform has a
+problem with long system firmware service routines.
+Loading the module hwlat_detector passing the parameter "enabled=1" (or by
+setting the "enable" entry in "hwlat_detector" debugfs toggled on) is the only
+step required to start the hwlat_detector. It is possible to redefine the
+threshold in microseconds (us) above which latency spikes will be taken
+into account (parameter "threshold=").
+ # modprobe hwlat_detector enabled=1 threshold=100
+After the module is loaded, it creates a directory named "hwlat_detector" under
+the debugfs mountpoint, "/debug/hwlat_detector" for this text. It is necessary
+to have debugfs mounted, which might be on /sys/debug on your system.
+The /debug/hwlat_detector interface contains the following files:
+count - number of latency spikes observed since last reset
+enable - a global enable/disable toggle (0/1), resets count
+max - maximum hardware latency actually observed (usecs)
+sample - a pipe from which to read current raw sample data
+ in the format <timestamp> <latency observed usecs>
+ (can be opened O_NONBLOCK for a single sample)
+threshold - minimum latency value to be considered (usecs)
+width - time period to sample with CPUs held (usecs)
+ must be less than the total window size (enforced)
+window - total period of sampling, width being inside (usecs)
+By default we will set width to 500,000 and window to 1,000,000, meaning that
+we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we
+observe any latencies that exceed the threshold (initially 100 usecs),
+then we write to a global sample ring buffer of 8K samples, which is
+consumed by reading from the "sample" (pipe) debugfs file interface.
@@ -57,10 +57,17 @@ On PowerPC - Press 'ALT - Print Screen (or F13) - <command key>,
On other - If you know of the key combos for other architectures, please
let me know so I can add them to this section.
-On all - write a character to /proc/sysrq-trigger. e.g.:
+On all - write a character to /proc/sysrq-trigger, e.g.:
echo t > /proc/sysrq-trigger
+On all - Enable network SysRq by writing a cookie to icmp_echo_sysrq, e.g.
+ echo 0x01020304 >/proc/sys/net/ipv4/icmp_echo_sysrq
+ Send an ICMP echo request with this pattern plus the particular
+ SysRq command key. Example:
+ # ping -c1 -s57 -p0102030468
+ will trigger the SysRq-H (help) command.
* What are the 'command' keys?
'b' - Will immediately reboot the system without syncing or unmounting
@@ -0,0 +1,186 @@
+ Using the Linux Kernel Latency Histograms
+This document gives a short explanation how to enable, configure and use
+latency histograms. Latency histograms are primarily relevant in the
+context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)
+and are used in the quality management of the Linux real-time
+* Purpose of latency histograms
+A latency histogram continuously accumulates the frequencies of latency
+data. There are two types of histograms
+- potential sources of latencies
+- effective latencies
+* Potential sources of latencies
+Potential sources of latencies are code segments where interrupts,
+preemption or both are disabled (aka critical sections). To create
+histograms of potential sources of latency, the kernel stores the time
+stamp at the start of a critical section, determines the time elapsed
+when the end of the section is reached, and increments the frequency
+counter of that latency value - irrespective of whether any concurrently
+running process is affected by latency or not.
+- Configuration items (in the Kernel hacking/Tracers submenu)
+* Effective latencies
+Effective latencies are actually occuring during wakeup of a process. To
+determine effective latencies, the kernel stores the time stamp when a
+process is scheduled to be woken up, and determines the duration of the
+wakeup time shortly before control is passed over to this process. Note
+that the apparent latency in user space may be somewhat longer, since the
+process may be interrupted after control is passed over to it but before
+the execution in user space takes place. Simply measuring the interval
+between enqueuing and wakeup may also not appropriate in cases when a
+process is scheduled as a result of a timer expiration. The timer may have
+missed its deadline, e.g. due to disabled interrupts, but this latency
+would not be registered. Therefore, the offsets of missed timers are
+recorded in a separate histogram. If both wakeup latency and missed timer
+offsets are configured and enabled, a third histogram may be enabled that
+records the overall latency as a sum of the timer latency, if any, and the
+wakeup latency. This histogram is called "timerandwakeup".
+- Configuration items (in the Kernel hacking/Tracers submenu)
+* Usage
+The interface to the administration of the latency histograms is located
+in the debugfs file system. To mount it, either enter
+mount -t sysfs nodev /sys
+mount -t debugfs nodev /sys/kernel/debug
+from shell command line level, or add
+nodev /sys sysfs defaults 0 0
+nodev /sys/kernel/debug debugfs defaults 0 0
+to the file /etc/fstab. All latency histogram related files are then
+available in the directory /sys/kernel/debug/tracing/latency_hist. A
+particular histogram type is enabled by writing non-zero to the related
+variable in the /sys/kernel/debug/tracing/latency_hist/enable directory.
+Select "preemptirqsoff" for the histograms of potential sources of
+latencies and "wakeup" for histograms of effective latencies etc. The
+histogram data - one per CPU - are available in the files
+The histograms are reset by writing non-zero to the file "reset" in a
+particular latency directory. To reset all latency data, use
+if test -d $HISTDIR
+ for i in `find . | grep /reset$`
+ do
+ echo 1 >$i
+ done
+* Data format
+Latency data are stored with a resolution of one microsecond. The
+maximum latency is 10,240 microseconds. The data are only valid, if the
+overflow register is empty. Every output line contains the latency in
+microseconds in the first row and the number of samples in the second
+row. To display only lines with a positive latency count, use, for
+grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
+#Minimum latency: 0 microseconds.
+#Average latency: 0 microseconds.
+#Maximum latency: 25 microseconds.
+#Total samples: 3104770694
+#There are 0 samples greater or equal than 10240 microseconds
+#usecs samples
+ 0 2984486876
+ 1 49843506
+ 2 58219047
+ 3 5348126
+ 4 2187960
+ 5 3388262
+ 6 959289
+ 7 208294
+ 8 40420
+ 9 4485
+ 10 14918
+ 11 18340
+ 12 25052
+ 13 19455
+ 14 5602
+ 15 969
+ 16 47
+ 17 18
+ 18 14
+ 19 1
+ 20 3
+ 21 2
+ 22 5
+ 23 2
+ 25 1
+* Wakeup latency of a selected process
+To only collect wakeup latency data of a particular process, write the
+PID of the requested process to
+PIDs are not considered, if this variable is set to 0.
+* Details of the process with the highest wakeup latency so far
+Selected data of the process that suffered from the highest wakeup
+latency that occurred in a particular CPU are available in the file
+In addition, other relevant system data at the time when the
+latency occurred are given.
+The format of the data is (all in one line):
+<PID> <Priority> <Latency> (<Timeroffset>) <Command> \
+<- <PID> <Priority> <Command> <Timestamp>
+The value of <Timeroffset> is only relevant in the combined timer
+and wakeup latency recording. In the wakeup recording, it is
+always 0, in the missed_timer_offsets recording, it is the same
+as <Latency>.
+When retrospectively searching for the origin of a latency and
+tracing was not enabled, it may be helpful to know the name and
+some basic data of the task that (finally) was switching to the
+late real-tlme task. In addition to the victim's data, also the
+data of the possible culprit are therefore displayed after the
+"<-" symbol.
+Finally, the timestamp of the time when the latency occurred
+in <seconds>.<microseconds> after the most recent system boot
+is provided.
+These data are also reset when the wakeup histogram is reset.
@@ -3008,6 +3008,15 @@ L:
S: Odd Fixes
F: drivers/tty/hvc/
+P: Jon Masters
+S: Supported
+F: Documentation/hwlat_detector.txt
+F: drivers/misc/hwlat_detector.c
M: Jean Delvare <>
M: Guenter Roeck <>
@@ -6,6 +6,7 @@ config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
depends on HAVE_OPROFILE
+ depends on !PREEMPT_RT_FULL
@@ -107,7 +107,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If we're in an interrupt context, or have no user context,
we must not take the fault. */
- if (!mm || in_atomic())
+ if (!mm || pagefault_disabled())
goto no_context;
@@ -29,6 +29,7 @@ config ARM
select CPU_PM if (SUSPEND || CPU_IDLE)
The ARM series is a line of low-power-consumption RISC chip designs
@@ -1654,7 +1655,7 @@ config HAVE_ARCH_PFN_VALID
config HIGHMEM
bool "High Memory Support"
- depends on MMU
+ depends on MMU && !PREEMPT_RT_FULL
The address space of ARM processors is only 4 Gigabytes large
and it has to accommodate user address space, kernel address
@@ -29,28 +29,17 @@ static void early_console_write(struct console *con, const char *s, unsigned n)
early_write(s, n);
-static struct console early_console = {
+static struct console early_console_dev = {
.name = "earlycon",
.write = early_console_write,
.index = -1,
-asmlinkage void early_printk(const char *fmt, ...)
- char buf[512];
- int n;
- va_list ap;
- va_start(ap, fmt);
- n = vscnprintf(buf, sizeof(buf), fmt, ap);
- early_write(buf, n);
- va_end(ap);
static int __init setup_early_printk(char *buf)
- register_console(&early_console);
+ early_console = &early_console_dev;
+ register_console(&early_console_dev);
return 0;
@@ -432,7 +432,7 @@ armpmu_reserve_hardware(struct arm_pmu *armpmu)
err = request_irq(irq, handle_irq,
"arm-pmu", armpmu);
if (err) {
pr_err("unable to request IRQ%d for ARM PMU counters\n",
Oops, something went wrong.

0 comments on commit 75b6365

Please sign in to comment.