## A Closer Look at the Machine Shop Model
A few key obersavtions of this workshop:
 
- All machines work independently of each other.
- Machines have two processes: *working* and *break_machine*
  * The machine will repeate *working* process loop, unless/until it is interrupted by *break_mahine*, and will return to *working* after the *break_machie* process is **processed**. So the machines will have **one and only one** of two processes active during the simulation.
  * Downtime of the machines of each *interrupt* is the wait time at repairman + time to repair.
- Repairman has two states/processes: *other_job* and *repair*
  * Similar to the machine, it is setup to have **one and only one** of two processes active during the simulation. 
  
It is very close to RBD simulation already, but we still need a few tweaks.
- Tyically with RBD we are more concerned with up/down time, so we can modify the *done_in* loop is used generate parts count, and keep log of up time, we don't need to loop-counter. We just need to log the start to interrupt.
- All machines are independent and stand-alone, there is no "system". The *repairmen* is required for any failures, and his is working on *other_jobs* unless there is a component failure. We can use the *other_jobs* to track the component states, and generate system states. Again, the *other_job* is not a typical concern in reliability/availability calculation. We just need to track all the time to repair. In a series system, that is the system downtime.
- All machies are stand-alone, and this means the system downtime does not stop the component. It is not always true. In a RBD simulation, some blocks will be brought offline by system downtime. This can be implemented by an interruption generated by *repairman*.
 
## Changes to the Machine Shop Example
- The machines class will need the following:
  * attributes: ID/SN, Name/Description
  * attribute: uptime, number of failures, and downtime
  * attribute: time to failure (distribution or fixed)
  * attribute: time to repair (distribution or fixed)
  * process: working, handle interruption (req repairman, generation time_to_failure)
  * process: break_machine: Time to failure (distribution parameter, use exponential for the fist prototype.)
  
  * Time to repair (fixed or distribution parameter. Use fixed for the first prototype.)

In [8]:
"""
Machine shop Series System
 
Scenario:
  A workshop has *n* identical machines. Each machine runs continuously and it breaks down
  periodically. The machines go back to work after repairs are carried out by one repairman. Broken machines
  enter the queue of the repairman.
  A time-to-fail will be generated and clock advanced even if the component is already down. 
"""

import random
import simpy

RANDOM_SEED = 42
MTTF = 3000.0                     # Mean time to failure in minutes
REPAIR_TIME = 300.0               # Time it takes to repair a machine in minutes
NUM_MACHINES = 1                  # Number of machines in the machine shop
WEEKS = 4                         # Simulation time in weeks
SIM_TIME = WEEKS * 7 * 24 * 60    # Simulation time in minutes


def time_to_repair():
    """return time interval until the repair is done, and machine is ready to run again. """
    return REPAIR_TIME

def time_to_failure():
    """Return time until next failure for a machine."""
    return random.expovariate(1.0/MTTF)

class SystemLog(object):
    def __init__(self, name):
        self.event_time = []
        self.event_sn = []
        self.event_type = []
        self.down_count = 0

class Machine(object):
    """A machine produces parts and my get broken every now and then.
 
    If it breaks, it requests a *repairman* and continues the production
    after the it is repaired.
 
    A machine has a *name* and a numberof *parts_made* thus far.
 
    """
    def __init__(self, env, sn, name, repairman, system_log):
        self.env = env
        self.sn = sn
        self.name = name
        self.parts_made = 0
        self.broken = False
        self.downtime = 0
        self.uptime = 0
        self.fail = 0
 
        # Start "working" and "break_machine" processes for this machine.
        self.process = env.process(self.working(repairman, system_log))
        env.process(self.break_machine())
 
    def working(self, repairman, system_log):
        """Working as long as the simulation runs.
 
        While working, the machine may break multiple times.
        Request a repairman when this happens.
        """
        while True:
            try:
                # Working
                start = self.env.now
                system_log.event_time.append(start)
                system_log.event_sn.append(self.sn)
                system_log.event_type.append("start")
                print("%s starts at %d" % (self.sn, start))
                yield self.env.timeout(SIM_TIME)
            except simpy.Interrupt:
                self.broken = True
                self.uptime += self.env.now - start  # How much uptime since last start?
                down_start = self.env.now
                system_log.event_time.append(down_start)
                system_log.event_sn.append(self.sn)
                system_log.event_type.append("fail")
                print("%s fails at %d" % (self.sn, down_start))
                # Request a repairman. This will preempt its "other_job".
                with repairman.request(priority=1) as req:
                    repair_time = time_to_repair()
                    yield req
                    yield self.env.timeout(repair_time)
 
                #Machine back to work
                self.broken = False
                self.downtime += self.env.now - down_start
                self.fail += 1
                print("%s starts at %d" % (self.sn, self.env.now))
 
 
    def break_machine(self):
        """Break the machine every now and then. Machine clock continues running even component is down ..."""
#        while not self.broken:
        while True:
            time_to_nextfail = time_to_failure()
            print("%s try break at %d + %d" % (self.sn, self.env.now, time_to_nextfail))   
            yield self.env.timeout(time_to_nextfail)
            if not self.broken:
                # Only break the machine if it is currently working.
                self.process.interrupt()
            else: print("%s skip fail at %d" % (self.sn, self.env.now))
 
 
# Setup and start the simulation
print('Machine shop Series')
random.seed(RANDOM_SEED)  # This helps reproducing the results
 
# Create an environment and start the setup process
env = simpy.Environment()
repairman = simpy.PreemptiveResource(env, capacity=1)
system_log = SystemLog("series_system")
machines = [Machine(env, 'Machine %d' % i, "dummy", repairman, system_log)
            for i in range(NUM_MACHINES)]
# Execute!
env.run(until=SIM_TIME)
 
# Analyis/results
print('Machine shop results after %s weeks' % WEEKS)
for machine in machines:
    print('%s failed %d times.' % (machine.sn, machine.fail))

Machine shop Series
Machine 0 starts at 0
Machine 0 try break at 0 + 3060
Machine 0 try break at 3060 + 75
Machine 0 fails at 3060
Machine 0 skip fail at 3136
Machine 0 try break at 3136 + 964
Machine 0 starts at 3360
Machine 0 starts at 3360
Machine 0 try break at 4101 + 757
Machine 0 fails at 4101
Machine 0 starts at 4401
Machine 0 starts at 4401
Machine 0 try break at 4858 + 4000
Machine 0 fails at 4858
Machine 0 starts at 5158
Machine 0 starts at 5158
Machine 0 try break at 8859 + 3387
Machine 0 fails at 8859
Machine 0 starts at 9159
Machine 0 starts at 9159
Machine 0 try break at 12247 + 6681
Machine 0 fails at 12247
Machine 0 starts at 12547
Machine 0 starts at 12547
Machine 0 try break at 18928 + 272
Machine 0 fails at 18928
Machine 0 skip fail at 19201
Machine 0 try break at 19201 + 1644
Machine 0 starts at 19228
Machine 0 starts at 19228
Machine 0 try break at 20845 + 90
Machine 0 fails at 20845
Machine 0 skip fail at 20936
Machine 0 try break at 20936 + 740
Machine 0 starts a

Note above, a time-to-failure is generated and clock advanced with no regard to whether the component is up or down.

Try break_machine only when system is up. The following is based on the simple car example. 

In [19]:
import random
import simpy

RANDOM_SEED = 42
MTTF = 100.0                     # Mean time to failure in minutes
REPAIR_TIME = 2.0               # Time it takes to repair a machine in minutes
NUM_MACHINES = 1                  # Number of machines in the machine shop
WEEKS = 4                         # Simulation time in weeks
SIM_TIME = WEEKS * 7 * 24 * 60    # Simulation time in minutes


def time_to_repair():
    """return time interval until the repair is done, and machine is ready to run again. """
    return REPAIR_TIME

def time_to_failure():
    """Return time until next failure for a machine."""
    return random.expovariate(1.0/MTTF)

class Machine:
    def __init__(self, env):
        self.env = env
        self.drive_proc = env.process(self.drive(env))

    def working(self, env):
        while True:
            # Up until failure
            time_to_fail = time_to_failure()
            print('time to next failure is %.2f' % time_to_fail)
            yield env.timeout(time_to_fail)

            # Repair for time_to_repair
            print("Failure Starts at %.2f" % env.now)
            repair_time = time_to_repair()
            yield env.timeout(repair_time)
            print("Machine is repaired at %.2f" % env.now)
            
env = simpy.Environment()
machine = Machine(env)
env.run(until=1000)

time to next failure is 7.17
Failure Starts at 7.17
Machine is repaired at 9.17
time to next failure is 142.96
Failure Starts at 152.14
Machine is repaired at 154.14
time to next failure is 145.17
Failure Starts at 299.31
Machine is repaired at 301.31
time to next failure is 13.74
Failure Starts at 315.05
Machine is repaired at 317.05
time to next failure is 64.49
Failure Starts at 381.54
Machine is repaired at 383.54
time to next failure is 79.81
Failure Starts at 463.35
Machine is repaired at 465.35
time to next failure is 30.80
Failure Starts at 496.14
Machine is repaired at 498.14
time to next failure is 205.91
Failure Starts at 704.05
Machine is repaired at 706.05
time to next failure is 55.02
Failure Starts at 761.07
Machine is repaired at 763.07
time to next failure is 23.80
Failure Starts at 786.87
Machine is repaired at 788.87
time to next failure is 77.50
Failure Starts at 866.37
Machine is repaired at 868.37
time to next failure is 130.91
Failure Starts at 999.28


The structure used in the Car example, *Sleep until woken up* seems to be more suitable for RBD modeling. The time_failure and time_to_repair function can be easy modified to take into account, distribution, machie age, repair efficiency etc. 

A "system_log" object is needed to keep track individual failures, and calculate system state. 