## Sets of features used in the papers

### Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application (University of California dataset)

 Attributes in the set of 25 are: GList1, PList, Servo1, Servo2, Servo3, Servo5, ReadError1, ReadError2, ReadError3,
FlyHeight5, FlyHeight6, FlyHeight7, FlyHeight8, FlyHeight9, FlyHeight10, FlyHeight11, FlyHeight12, ReadEr-
ror18, ReadError19, Servo7, Servo8, ReadError20, GList2, GList3, Servo10.

Single attribute tests using rank-sum were run on all 25 attributes selected in Section 3.3 with 15 samples per pattern. Of these 25, only 8 attributes (Figure 9) were able to detect failures at sufficiently low false alarm rates: ReadError1, ReadError2, ReadError3, ReadError18, ReadError19, Servo7, GList3 and Servo10. Confirming the observations of the feature selection process, ReadError18 was the best attribute, with 27.6% detection at 0.06% false alarms.

Using combinations of attributes in the rank-sum test can lead to improved results over single-attribute classifiers (Figure 11). The best single attributes from Figure 9 were ReadError1, ReadError3, ReadError18 and ReadError19. Using these four attributes and 15 samples per pattern, the rank-sum test detected 28.1% of the failures, with no measured false alarms. Higher detection rates (52.8%) can be had if more false alarms are allowed (0.7%). 


### Bayesian Approaches to Failure prediction for Disk Drives (Quantum dataset)

|Abbreviation | Description|
|---|---|
| RET | read error rate |
| SUT | spinup time |
| CSS | start-stop count |
| GDC | grown defects count |
| SKE | seek errors count |
| POH | power-on hours |
| RRT | calibration retries |
| PCC | power cycles count |
| RSE | read soft errors count |
| DMC | CRC errors count |
| OSS | offline surface scan |


### Health Monitoring of Hard Disk Drive Based on Mahalanobis Distance


**Most of the works used all of the data without selection or select the data with maximum distance between healthy drive and failed drives**. It is actually a supervised methodology based on the prior knowledge from which the drive's health status can be known.

#### FMMEA method applying


1. Potential failure mechanisms 
are determined by available mechanisms corresponding to the physical, electrical, chemical and mechanical stresses which can induce the failure. It is found that 60% of drives failures are mechanical, often resulting from the gradual degradation of the drive's performance. 
In HDD they are:
- Head disk interface (HDI, including head and disk, also known as air bearing): Crack on head, broken head, head contamination, bad connection to electronics module; disk
scratches, defect, bad servo pattern, flying height variation and modulation.
- Head stack assembly: off-track, deformation.
- Motors/bearings: motor failure, worn bearing, excessive run out, no spin.
- Electronic module: circuit/chip failure, bad connection to drive or bus.

2. Prioritization of Potential Failure Mechanisms

- head disk interface as the dominant contributor to HDD reliability 
- the wear out, overstress of magnetic head and disk, and resonancehead assembly are categorized as potential failure mechanisms with high risk
- spindle motor and control board have a failure mode in low priorities. 


**Corresponsdence between attributes and failure mechanisms**:
- head flying height, data throughput performance, read/write errors, re-allocated sector count and drive calibration retries count can be recognized as HDI failure indicators. 
-  Seek error rate and seek time performance can be mainly attributed to head assembly issue.
- The servo error count is a special case which can be induced by any component failures in the whole servo loop. Changes in spin up time and increases in drive temperature can reflect problems with spindle motor.
- It is notable that a study published by Google suggested **very little correlation between failure rates and high temperature** based on 100,000 drives data [40]


**In this study, the HDI and head assembly performance attributes are selected as candidates to assess the health ofHDD.**


Typical characteristics of SMART are:
- Head flying height -- is the distance between the disk read/write head on a hard disk drive and the platter. Fly height variation can cause the media being insufficiently magnetized and the data are not readable. The physically bumping or banging during the HDD reading or writing process leading the head with strong vibration, which can induce the read/write failure. 
- Data throughput performance -- General throughput performance of the hard disk. Indicate problem with motor, servo or bearings.
- Spin up time -- S.M.A.R.T. parameter indicates an average time (in milliseconds or seconds) of spindle spinup (from zero RPM (Revolutions Per Minute) to fully operational). The low value means it takes too long for the hard disk to a fully operational state.
- Re-allocated sector count -- is the number of sectors that are marked as reallocated by the hard drive upon an error. A growing count is generally considered a bad sign and can result in hard drive failure.
- Seek error rate -- 	Rate of positioning errors of the read/write heads. Indicate problem with servo, head. High temperature can also cause this problem.
- Seek time performance --  the average performance of seek operations of the hard disk’s magnetic heads.
- Spin try recount -- Retry count of spin start attempts. Indicate problem with motor, bearings or power supply.
- Drive calibration retries count -- Number of attempts to calibrate a drive. Indicate problem with motor, bearings or power supply.



Here is the full [list](hdsentinel.com/smart/smartattr.php) of SMART parameters descriptions. 

### HMM, HSMM

We then run our HMM and HSMM predictors and found four attributes provided good failure detection, namely, ReadError18, Servo2, Servo10, and FlyHeight7.

### Autoencoders (Backblaze dataset)

SMART attributes used in the experiments:

| SMART ID | Attribute Name |
| --- | --- |
| 1 | Read Error Rate |
| 3 | Spin up time |
| 4 | start stop count |
| 5 | Reallocated sectors count| 
| 7 | Seek error rate |
| 9 | Power on hours |
| 10 | Spin retry count |
| 12 | Power cycle count |
| 183 | sata downshift error count |
| 184 | End-to-End error / IOEDC | 
| 187 | Reported Uncorrectable Errors |
|188 | Command Timeout | 
| 189 | High Fly Writes |
| 190 | Temperature Difference |
| 191 | G-sense Error Rate |
| 192 | Unsafe Shutdown Count |
| 193 | Load Cycle Count | 
| 194 | Temperature |
| 197 | Current Pending Sector Count |
| 198 | Uncorrectable Sector Count |
| 199 | UltraDMA CRC Error Count |
| 240 | Head Flying Hours |
| 241  | Total LBAs Written |
| 242 | Total LBAs Read |



For the PCA method, we performed the transformation and selected the eigenvectors that resulted in features that
preserve 90% of the variance, resulting in 8 features out of the 24 described in Table I. For the Autoencoders, it was trained a neural network architecture with hidden layers of size (15-8-15) and a output layer of size 24 (the number of dimensions of the input), with the ReLU activation function and the backpropagation algorithm with L2 regularization.
