Skip to content

Commit

Permalink
Merge branch 'devel'
Browse files Browse the repository at this point in the history
* devel: (33 commits)
  edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
  edac: allow specifying the error count with fake_inject
  edac: add support for Calxeda highbank L2 cache ecc
  edac: add support for Calxeda highbank memory controller
  edac: create top-level debugfs directory
  sb_edac: properly handle error count
  i7core_edac: properly handle error count
  edac: edac_mc_handle_error(): add an error_count parameter
  edac: remove arch-specific parameter for the error handler
  amd64_edac: Don't pass driver name as an error parameter
  edac_mc: check for allocation failure in edac_mc_alloc()
  edac: Increase version to 3.0.0
  edac_mc: Cleanup per-dimm_info debug messages
  edac: Convert debugfX to edac_dbg(X,
  edac: Use more normal debugging macro style
  edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
  Edac: Add ABI Documentation for the new device nodes
  edac: move documentation ABI to ABI/testing/sysfs-devices-edac
  i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  ...
  • Loading branch information
Mauro Carvalho Chehab committed Jul 30, 2012
2 parents 73bcc49 + f58d0de commit c2078e4
Show file tree
Hide file tree
Showing 48 changed files with 3,471 additions and 2,555 deletions.
140 changes: 140 additions & 0 deletions Documentation/ABI/testing/sysfs-devices-edac
@@ -0,0 +1,140 @@
What: /sys/devices/system/edac/mc/mc*/reset_counters
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This write-only control file will zero all the statistical
counters for UE and CE errors on the given memory controller.
Zeroing the counters will also reset the timer indicating how
long since the last counter were reset. This is useful for
computing errors/time. Since the counters are always reset
at driver initialization time, no module/kernel parameter
is available.

What: /sys/devices/system/edac/mc/mc*/seconds_since_reset
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays how many seconds have elapsed
since the last counter reset. This can be used with the error
counters to measure error rates.

What: /sys/devices/system/edac/mc/mc*/mc_name
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays the type of memory controller
that is being utilized.

What: /sys/devices/system/edac/mc/mc*/size_mb
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays, in count of megabytes, of memory
that this memory controller manages.

What: /sys/devices/system/edac/mc/mc*/ue_count
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays the total count of uncorrectable
errors that have occurred on this memory controller. If
panic_on_ue is set, this counter will not have a chance to
increment, since EDAC will panic the system

What: /sys/devices/system/edac/mc/mc*/ue_noinfo_count
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays the number of UEs that have
occurred on this memory controller with no information as to
which DIMM slot is having errors.

What: /sys/devices/system/edac/mc/mc*/ce_count
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays the total count of correctable
errors that have occurred on this memory controller. This
count is very important to examine. CEs provide early
indications that a DIMM is beginning to fail. This count
field should be monitored for non-zero values and report
such information to the system administrator.

What: /sys/devices/system/edac/mc/mc*/ce_noinfo_count
Date: January 2006
Contact: linux-edac@vger.kernel.org
Description: This attribute file displays the number of CEs that
have occurred on this memory controller wherewith no
information as to which DIMM slot is having errors. Memory is
handicapped, but operational, yet no information is available
to indicate which slot the failing memory is in. This count
field should be also be monitored for non-zero values.

What: /sys/devices/system/edac/mc/mc*/sdram_scrub_rate
Date: February 2007
Contact: linux-edac@vger.kernel.org
Description: Read/Write attribute file that controls memory scrubbing.
The scrubbing rate used by the memory controller is set by
writing a minimum bandwidth in bytes/sec to the attribute file.
The rate will be translated to an internal value that gives at
least the specified rate.
Reading the file will return the actual scrubbing rate employed.
If configuration fails or memory scrubbing is not implemented,
the value of the attribute file will be -1.

What: /sys/devices/system/edac/mc/mc*/max_location
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file displays the information about the last
available memory slot in this memory controller. It is used by
userspace tools in order to display the memory filling layout.

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/size
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file will display the size of dimm or rank.
For dimm*/size, this is the size, in MB of the DIMM memory
stick. For rank*/size, this is the size, in MB for one rank
of the DIMM memory stick. On single rank memories (1R), this
is also the total size of the dimm. On dual rank (2R) memories,
this is half the size of the total DIMM memories.

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file will display what type of DRAM device is
being utilized on this DIMM (x1, x2, x4, x8, ...).

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file will display what type of Error detection
and correction is being utilized. For example: S4ECD4ED would
mean a Chipkill with x4 DRAM.

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This control file allows this DIMM to have a label assigned
to it. With this label in the module, when errors occur
the output can provide the DIMM label in the system log.
This becomes vital for panic events to isolate the
cause of the UE event.
DIMM Labels must be assigned after booting, with information
that correctly identifies the physical slot with its
silk screen label. This information is currently very
motherboard specific and determination of this information
must occur in userland at this time.

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file will display the location (csrow/channel,
branch/channel/slot or channel/slot) of the dimm or rank.

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
linux-edac@vger.kernel.org
Description: This attribute file will display what type of memory is
currently on this csrow. Normally, either buffered or
unbuffered memory (for example, Unbuffered-DDR3).
15 changes: 15 additions & 0 deletions Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt
@@ -0,0 +1,15 @@
Calxeda Highbank L2 cache ECC

Properties:
- compatible : Should be "calxeda,hb-sregs-l2-ecc"
- reg : Address and size for ECC error interrupt clear registers.
- interrupts : Should be single bit error interrupt, then double bit error
interrupt.

Example:

sregs@fff3c200 {
compatible = "calxeda,hb-sregs-l2-ecc";
reg = <0xfff3c200 0x100>;
interrupts = <0 71 4 0 72 4>;
};
14 changes: 14 additions & 0 deletions Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt
@@ -0,0 +1,14 @@
Calxeda DDR memory controller

Properties:
- compatible : Should be "calxeda,hb-ddr-ctrl"
- reg : Address and size for DDR controller registers.
- interrupts : Interrupt for DDR controller.

Example:

memory-controller@fff00000 {
compatible = "calxeda,hb-ddr-ctrl";
reg = <0xfff00000 0x1000>;
interrupts = <0 91 4>;
};
112 changes: 8 additions & 104 deletions Documentation/edac.txt
Expand Up @@ -232,116 +232,20 @@ EDAC control and attribute files.


In 'mcX' directories are EDAC control and attribute files for
this 'X' instance of the memory controllers:


Counter reset control file:

'reset_counters'

This write-only control file will zero all the statistical counters
for UE and CE errors. Zeroing the counters will also reset the timer
indicating how long since the last counter zero. This is useful
for computing errors/time. Since the counters are always reset at
driver initialization time, no module/kernel parameter is available.

RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset

This resets the counters on memory controller 0


Seconds since last counter reset control file:

'seconds_since_reset'

This attribute file displays how many seconds have elapsed since the
last counter reset. This can be used with the error counters to
measure error rates.



Memory Controller name attribute file:

'mc_name'

This attribute file displays the type of memory controller
that is being utilized.


Total memory managed by this memory controller attribute file:

'size_mb'

This attribute file displays, in count of megabytes, of memory
that this instance of memory controller manages.


Total Uncorrectable Errors count attribute file:

'ue_count'

This attribute file displays the total count of uncorrectable
errors that have occurred on this memory controller. If panic_on_ue
is set this counter will not have a chance to increment,
since EDAC will panic the system.


Total UE count that had no information attribute fileY:

'ue_noinfo_count'

This attribute file displays the number of UEs that have occurred
with no information as to which DIMM slot is having errors.


Total Correctable Errors count attribute file:

'ce_count'

This attribute file displays the total count of correctable
errors that have occurred on this memory controller. This
count is very important to examine. CEs provide early
indications that a DIMM is beginning to fail. This count
field should be monitored for non-zero values and report
such information to the system administrator.


Total Correctable Errors count attribute file:

'ce_noinfo_count'

This attribute file displays the number of CEs that
have occurred wherewith no information as to which DIMM slot
is having errors. Memory is handicapped, but operational,
yet no information is available to indicate which slot
the failing memory is in. This count field should be also
be monitored for non-zero values.

Device Symlink:

'device'

Symlink to the memory controller device.

Sdram memory scrubbing rate:

'sdram_scrub_rate'

Read/Write attribute file that controls memory scrubbing. The scrubbing
rate is set by writing a minimum bandwidth in bytes/sec to the attribute
file. The rate will be translated to an internal value that gives at
least the specified rate.

Reading the file will return the actual scrubbing rate employed.

If configuration fails or memory scrubbing is not implemented, accessing
that attribute will fail.
this 'X' instance of the memory controllers.

For a description of the sysfs API, please see:
Documentation/ABI/testing/sysfs/devices-edac


============================================================================
'csrowX' DIRECTORIES

When CONFIG_EDAC_LEGACY_SYSFS is enabled, the sysfs will contain the
csrowX directories. As this API doesn't work properly for Rambus, FB-DIMMs
and modern Intel Memory Controllers, this is being deprecated in favor
of dimmX directories.

In the 'csrowX' directories are EDAC control and attribute files for
this 'X' instance of csrow:

Expand Down
12 changes: 12 additions & 0 deletions arch/arm/boot/dts/highbank.dts
Expand Up @@ -118,6 +118,12 @@
interrupts = <0 90 4>;
};

memory-controller@fff00000 {
compatible = "calxeda,hb-ddr-ctrl";
reg = <0xfff00000 0x1000>;
interrupts = <0 91 4>;
};

ipc@fff20000 {
compatible = "arm,pl320", "arm,primecell";
reg = <0xfff20000 0x1000>;
Expand Down Expand Up @@ -188,6 +194,12 @@
reg = <0xfff3c000 0x1000>;
};

sregs@fff3c200 {
compatible = "calxeda,hb-sregs-l2-ecc";
reg = <0xfff3c200 0x100>;
interrupts = <0 71 4 0 72 4>;
};

dma@fff3d000 {
compatible = "arm,pl330", "arm,primecell";
reg = <0xfff3d000 0x1000>;
Expand Down
24 changes: 23 additions & 1 deletion drivers/edac/Kconfig
Expand Up @@ -7,7 +7,7 @@
menuconfig EDAC
bool "EDAC (Error Detection And Correction) reporting"
depends on HAS_IOMEM
depends on X86 || PPC || TILE
depends on X86 || PPC || TILE || ARM
help
EDAC is designed to report errors in the core system.
These are low-level errors that are reported in the CPU or
Expand All @@ -31,6 +31,14 @@ if EDAC

comment "Reporting subsystems"

config EDAC_LEGACY_SYSFS
bool "EDAC legacy sysfs"
default y
help
Enable the compatibility sysfs nodes.
Use 'Y' if your edac utilities aren't ported to work with the newer
structures.

config EDAC_DEBUG
bool "Debugging"
help
Expand Down Expand Up @@ -294,4 +302,18 @@ config EDAC_TILE
Support for error detection and correction on the
Tilera memory controller.

config EDAC_HIGHBANK_MC
tristate "Highbank Memory Controller"
depends on EDAC_MM_EDAC && ARCH_HIGHBANK
help
Support for error detection and correction on the
Calxeda Highbank memory controller.

config EDAC_HIGHBANK_L2
tristate "Highbank L2 Cache"
depends on EDAC_MM_EDAC && ARCH_HIGHBANK
help
Support for error detection and correction on the
Calxeda Highbank memory controller.

endif # EDAC
3 changes: 3 additions & 0 deletions drivers/edac/Makefile
Expand Up @@ -55,3 +55,6 @@ obj-$(CONFIG_EDAC_AMD8111) += amd8111_edac.o
obj-$(CONFIG_EDAC_AMD8131) += amd8131_edac.o

obj-$(CONFIG_EDAC_TILE) += tile_edac.o

obj-$(CONFIG_EDAC_HIGHBANK_MC) += highbank_mc_edac.o
obj-$(CONFIG_EDAC_HIGHBANK_L2) += highbank_l2_edac.o

0 comments on commit c2078e4

Please sign in to comment.