Skip to content

Reliability

manny kung edited this page Nov 17, 2021 · 7 revisions

Reliability

Revised : 15 Nov 2021

Beginning r4133, we have implemented new reliability modeling to supplement with how the maintenance and malfunctions are being handled in the simulation for each item resource (aka part).

3 Basic metrics of Reliability :

Metrics Unit
Mean Time Between Failure (MTBF) (Earth) hours
Failure Rate %
Reliability %

By definition, MTBF, an industry standard that gives the average time between failures for a component or a Part.

Incidentally, MTBF for a given part/component is also the adjusted value of the inverse of the Failure Rate.

At the beginning, each part is given a Reliability of 99.999% and a standard Failure Rate of 3%.

For simplicity sake, in each pre-defined malfunction, we also derive a composite metric called the Probability of Failure from the 3 metrics above.

Below is a snapshot of the command prompt log screen :

00-Adir-07:662.377 (Warning) Malfunction : Class A Fire - incident #3 - an Emergency repair work order was requested.
00-Adir-07:662.377 (Warning) Malfunction : Class A Fire - incident #3 - a General repair work order was set up.
00-Adir-07:662.377 (Warning) Malfunction : Class A Fire - incident #3 - the repair requires fire extinguisher (quantity: 2).
00-Adir-07:662.377 (Warning) Malfunction : Class A Fire - incident #3 - the repair requires fiberglass cloth (quantity: 2).
00-Adir-07:662.377 (Warning) MalfunctionManager : [ 1.69° N  19.74° E] Dune Runner has Class A Fire as reported by Anaisha Kappor. Cause : Human Factors.
00-Adir-07:662.377 (Warning) MalfunctionManager :   --- Part : fiberglass cloth ---
00-Adir-07:662.377 (Warning) MalfunctionManager :  (1).   Reliability :    99.577 %  -->  99.48 %
00-Adir-07:662.377 (Warning) MalfunctionManager :  (2).  Failure Rate :     0.063 %  -->  0.078 %
00-Adir-07:662.377 (Warning) MalfunctionManager :  (3).          MTBF :   1414.5 hr  -->  1150.5 hr
00-Adir-07:662.377 (Warning) MalfunctionManager :   --- Malfunction : Class A Fire ---
00-Adir-07:662.377 (Warning) MalfunctionManager :  (4).   Probability :       3.0 %  -->  1.539 %
00-Adir-07:662.377 (Warning) MalfunctionManager : [ 1.69° N  19.74° E] Anaisha Kappor had an accident.
00-Adir-07:662.377 (Warning) MalfunctionManager : [ 1.69° N  19.74° E] A Type-I accident occurred in Dune Runner in Dune Runner.
00-Adir-07:663.600 (Info) Malfunction : Air Leak - incident #2 - Emergency repair work initiated by Pari Shan.
00-Adir-07:678.364 (Info) Malfunction : Class A Fire - incident #3 - Emergency repair work initiated by Pari Shan.

https://github.com/mars-sim/mars-sim/blob/46a8e7ebd9a0a64d4c07665829ffe9ab1351960e/mars-sim-core/src/main/java/org/mars_sim/msp/core/malfunction/MalfunctionManager.java#L482-L575

When an incident/malfunction occurs over a part, these 3 metrics (as well as the probability of failure of that malfunction) will be recomputed and updated dynamically.

The MalfunctionManager class is responsible for doing the update whenever a malfunction occurs.

Obviously the reliability % gets better and the MTBF of a part would improve on each sol having no malfunction occurs.

Note: regardless, new reliability data is re-computed daily.

https://github.com/mars-sim/mars-sim/blob/46a8e7ebd9a0a64d4c07665829ffe9ab1351960e/mars-sim-core/src/main/java/org/mars_sim/msp/core/malfunction/MalfunctionFactory.java#L34-L71

The MalfunctionFactory class is responsible for storing the reliability, failure rate and MTBF of each Part.

See https://github.com/mars-sim/mars-sim/issues/507 for adding the ability to showcase the reliability data for each part.

Clone this wiki locally