## Lustre Metadata Weights

This notebook calculates some possible weighting factors for different Lustre metadata operations with the goal of establishing a simple metric that captures how much metadata demand different users and projects place on the storage system.  The goal is to use this metric to inform the best way to balance users across MDTs in a DNE-capable Lustre file system.

The general idea is that

$ M = N_{opens} * C_{opens} + N_{stats} * C_{stats} + ... $

where

* $M$ is a scalar value that ranks how much metadata load a user consumes
* $N_{opens}$ is the number of file opens that users does (over a given time, for a specific job, etc)
* $C_{opens}$ is a weighting factor that describes the relative "cost" of performing a file open in Lustre

In [None]:
%matplotlib inline

In [None]:
import os
import datetime

import numpy
import pandas # relies on pytables being installed too
import matplotlib

import tokio

In [None]:
DATE_START = datetime.datetime(2019, 1, 1, 0, 0, 0)
DATE_END = datetime.datetime(2019, 7, 1, 0, 0, 0)
CACHE_FILE = 'mdrates_%s-%s.hdf5' % (DATE_START.strftime("%Y-%m-%d"), DATE_END.strftime("%Y-%m-%d"))

In [None]:
METADATA_OPS = [
    'open',
    'close',
    'getattr',
    'getxattr',
    'link',
    'mkdir',
    'mknod',
    'rename',
    'rmdir',
    'setattr',
    'statfs',
    'unlink',
]
NUM_OPS = len(METADATA_OPS)

Load the MDS rates for each metadata operation over a period of time

In [None]:
if os.path.isfile(CACHE_FILE):
    mdrates_df = pandas.read_hdf(CACHE_FILE, 'mdrates')
    print("Loaded mdrates (%d bytes) from %s" % (os.path.getsize(CACHE_FILE), CACHE_FILE))
else:
    mdrates = {}
    for opname in METADATA_OPS:
        dataset_name = "/mdtargets/%srates" % opname
        dataframe = tokio.tools.hdf5.get_dataframe_from_time_range(
            fsname='cscratch',
            datetime_start=DATE_START,
            datetime_end=DATE_END,
            dataset_name=dataset_name
        )
        # dataframe contains one column per MDS, but only the first column
        # is populated on cscratch due to Cray LMT being DNE-incapable
        mdrates[opname] = dataframe.iloc[:, 0]
        print("Loaded %s for %s - %s" % (dataset_name, DATE_START, DATE_END))
        
    mdrates_df = pandas.DataFrame.from_dict(mdrates)
    mdrates_df.to_hdf(CACHE_FILE, 'mdrates', complevel=9)
    print("Wrote mdrates (%d bytes) to %s" % (os.path.getsize(CACHE_FILE), CACHE_FILE))

In [None]:
print("Observed metadata rates (ops/sec) for each op tracked:\n")
mdrates_df.head()

In [None]:
cpuload_file = CACHE_FILE.replace('mdrates', 'mdcpu')
if os.path.isfile(cpuload_file):
    cpuload_df = pandas.read_hdf(cpuload_file, 'mdcpu')
    print("Loaded mdcpu (%d bytes) from %s" % (os.path.getsize(cpuload_file), cpuload_file))
else:
    cpuload_df = tokio.tools.hdf5.get_dataframe_from_time_range(
                fsname='cscratch',
                datetime_start=DATE_START,
                datetime_end=DATE_END,
                dataset_name='mdservers/cpuload').iloc[:, 0]
    cpuload_df.to_hdf(cpuload_file, 'mdcpu', complevel=9)
    print("Wrote mdcpu (%d bytes) to %s" % (os.path.getsize(cpuload_file), cpuload_file))

Convert dictionary of dataframes into a single dataframe

In [None]:
for idx, opname in enumerate(METADATA_OPS):
    summary = mdrates_df[opname].describe()

    if idx == 0:
        print_str = "%8s " % ""
        for metric in summary.index:
            print_str += "%8s " % metric
        print(print_str)
        print("=" * len(print_str))

    print_str = "%8s " % opname
    for metric in summary.index:
        print_str += "%8d " % summary[metric]
    print(print_str)

In [None]:
bins = [0, 1]
while bins[-1] < 1000000000:
    bins.append(bins[-1] * 10)
bins.append(bins[-1] * 10)

## Visualize the distributions of metadata operations

Create a histogram showing how often different metadata operations are being completed at high and low rates.

In [None]:
fig, ax = matplotlib.pyplot.subplots(figsize=(12, 4))

xticklabels = []
for tickval in bins[1:]:
    if tickval >= 1000000000:
        xticklabels.append("%d B" % (tickval / 1000000000))
    elif tickval >= 1000000:
        xticklabels.append("%d M" % (tickval / 1000000))
    elif tickval >= 1000:
        xticklabels.append("%d K" % (tickval / 1000))
    else:
        xticklabels.append(str(tickval))

for idx, opname in enumerate(METADATA_OPS):         
    counts, _ = numpy.histogram(mdrates_df[opname].dropna(), bins=bins)
    x = range(len(counts))
    ax.bar([x_ + 0.9 / NUM_OPS * idx for x_ in x], counts, width=0.9 / NUM_OPS, label=opname)

ax.set_xticks(x)
ax.set_xticklabels(xticklabels)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel("Number of samples")
ax.set_xlabel("Op rate (Hz)")
ax.yaxis.grid()
ax.set_yscale('log')
ax.set_axisbelow(True)
ax.legend()

In [None]:
print("Cumulative distribution of total metadata operations:\n")
(mdrates_df.sum().sort_values(ascending=False) / mdrates_df.sum().sum()).cumsum()

## Calibrate weighting factors

The `quantile` parameter below is used to decide the definition of the MDS's peak capability for a given metadata operation.  Set to more nines to get closer to the absolute maximum performance observed.

The reason it is not set the max is to avoid any spuriously high measurements that may have been made during the year.

Fiddling with the `quantile` parameter and observing its effect on the correlation between the resulting load metric and the MDS CPU load is a reasonable way to arrive at a good set of weights.  The underlying premise is that the CPU load on the MDS is directly tied to metadata load, and that the load metric and the CPU load are reasonable indicators of overall MDS stress.

In [None]:
quantile = 0.9999

quantiles = {}
weights = {}
min_weight = 1.0

total_md = mdrates_df.sum().sum()

for opname in METADATA_OPS:
    quantiles[opname] = mdrates_df[opname].quantile(q=quantile)
    
    weights[opname] = 1.0 / quantiles[opname]
    
    pct_mdload = mdrates_df[opname].sum() / total_md
    if pct_mdload < 0.001:
        weights[opname] = 0.0
        print("Dropping %s due to insufficient observations" % opname)
    else:
        min_weight = min(weights[opname], min_weight)

print()
        
for opname, weight in weights.items():
    weights[opname] /= min_weight
    
for opname in METADATA_OPS:
    print("%.4f quantile: %6d %-13s (weight=%6.1f)" % (quantile, quantiles[opname], "%ss/sec" % opname, weights[opname]))
    

print()


load_scores = mdrates_df.dot([weights[x] for x in mdrates_df.columns])

# calculate correlation between cpuload and weighted sum
print("Correlation coefficient between load score and MDS cpu load: %.4f" % 
    cpuload_df.corr(load_scores))

Plot the ratio of CPU load to load score.  Ideally this would be a constant over time (a flat line).

In [None]:
(cpuload_df.resample('1D').sum() / load_scores.resample('1D').sum()).plot(figsize=(12, 3))