Skip to content

o2locktop - the Python scripts are used to display DLM lock usages in a OCFS2 cluster.

License

Notifications You must be signed in to change notification settings

ganghe/o2locktop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

o2locktop - a top-like OCFS2 DLM lock monitor

Introduction

o2locktop is a top-like tool to monitor OCFS2 DLM lock usage in the cluster, and can be used to detect hot files/directories, which intensively acquire DLM locks.

The average/maximal wait time for DLM lock acquisitions likely gives hints to the administrator when concern about OCFS2 performance, for example,

  • if the workload is unbalanced among nodes.
  • if a file is too hot, then maybe need check the related applications above.
  • if a directory is too hot, then maybe split it to smaller with less number of files underneath.

For slightly more implementation details,

  • As a shared disk cluster file system, OCFS2 files and directies can be accessed from the different nodes simultaneously. To protect the data consistency, the file access is coordinated through Distributed Lock Manager(DLM). For example, "Meta DLM lock" is used to protect file(per inode) meta data change. "Write DLM lock" is used to protect file data write. "Open DLM lock" could be used for one node keeps accessing a opened file while other processes (or even from other nodes) might delete it, and eventually get deleted once all associated file descriptors are close. For more information about how OCFS2 works with DLM, please check OCFS2 Project web page.

  • o2locktop reads OCFS2 kernel debugfs statistics under /sys/kernel/debug/. That says, for all cluster nodes, OCFS2_FS_STATS kernel config option must be set(enabled). To check it out:

grep OCFS2_FS_STATS < /boot/config-`uname -r`

Installation

Note: o2locktop is Python 2 and Python 3 compatible.

  sudo zypper install <http_rpm_uri>
  or
  sudo rpm -ivh <o2locktop-1.0.0...noarch.rpm>
  • Python pip:
  sudo pip install o2locktop
  • Or, directly use o2locktop from the source code tree:
  git clone https://github.com/ganghe/o2locktop.git
  cd o2locktop 
  ~/o2locktop> python o2locktop -h

Usage

  • Check o2locktop --help in details, also availble in the below REFERENCE

  • Or, check the asciidemo here

  • Known limitations

    o2locktop can't display the file name of the inode number. The additional step is needed to translate the inode number to the file name.

  find <YOUR_OCFS2_MOUNT_POINT> -inum <INODE_NUMBER>

TODO

  • Replay o2locktop log file.
  • In SUSE-based ocfs2 cluster, o2locktop can run without any arguments.
  • unittest

Community

REFERENCE

usage: o2locktop [-h] [-n NODE_IP] [-o LOG_FILE] [-l DISPLAY_LENGTH] [-V] [-d]
                 [MOUNT_POINT]

It is a top-like tool to monitor OCFS2 DLM lock usage in the cluster, and can
be used to detect hot files/directories, which intensively acquire DLM locks.

positional arguments:
  MOUNT_POINT        OCFS2 mount point, e.g. /mnt/shared

optional arguments:
  -h, --help         show this help message and exit
  -n NODE_IP         OCFS2 node IP address for ssh
  -o LOG_FILE        log path
  -l DISPLAY_LENGTH  number of lock records to display
  -V, --version      print the current version of o2locktop and exit
  -d, --debug        show all the inode including the system inode number

The average/maximal wait time for DLM lock acquisitions likely gives hints to
the administrator when concern about OCFS2 performance, for example,
- if the workload is unbalanced among nodes.
- if a file is too hot, then maybe need check the related applications above.
- if a directory is too hot, then maybe split it to smaller with less number
  of files underneath.

OUTPUT ANNOTATION:
  - The output is refreshed every 5 seconds, and sorted by the sum of 
    DLM EX(exclusive) and PR(protected read) lock average wait time
  - One row, one inode (including the system meta files if with '-d' argument)
  - Columns:
    "TYPE" is DLM lock types,
      'M' -> Meta data lock for the inode
      'W' -> Write lock for the inode
      'O' -> Open lock for the inode

    "INO" is the inode number of the file

    "EX NUM" is the number of EX lock acquisitions
    "EX TIME" is the maximal wait time to get EX lock
    "EX AVG" is the average wait time to get EX lock

    "PR NUM" is the number of PR(read) lock acquisitions
    "PR TIME" is the maximal wait time to get PR lock
    "PR AVG" is the average wait time to get PR lock

SHORTCUTS:
  - Type "d" to display DLM lock statistics for each node
  - Type "Ctrl+C" or "q" to exit o2locktop process

PREREQUISITES:
  o2locktop reads OCFS2_FS_STATS statistics from /sys/kernel/debug/. That says,
  for all cluster nodes, the kernel option must be set(enabled). Check it out:
      grep OCFS2_FS_STATS < /boot/config-\`uname -r\`

  o2locktop uses the passwordless SSH to OCFS2 nodes as root. Set it up if not:
      ssh-keygen; ssh-copy-id root@node1

EXAMPLES:
  - At any machine within or outside of the cluster:

    o2locktop -n node1 -n node2 -n node3 /mnt/shared

    To find the absolute path according to the inode number:
    find <MOUNT_POINT> -inum <INO>

About

o2locktop - the Python scripts are used to display DLM lock usages in a OCFS2 cluster.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%