On January 28 1986 Space Shuttle Challenger exploded on takeoff. Seven astronauts died. An investigation ensued into the reliability of the shuttle's propulsion system. The explosion was eventually traced to the failure of one of the field joints on solid booster rockets. Each of these field joints includes two O-rings, which fail when phenomena called erosion and blowby both occur. The night before the launch a decision had to be made regarding launch safety. The discussion among engineers and managers leading to this decision included concern that the probability of failure of the O-rings depended on the air temperature at launch. For 23 Challenger launches before the explosion we know air temperature values and whether any of the O-rings had failed.

In [1]:
import numpy as np
import pandas as pd
from statsmodels.stats.weightstats import _tconfint_generic


challenger = pd.read_csv('challenger.txt', sep='\t')
challenger

Unnamed: 0.1,Unnamed: 0,Temperature,Incident
0,Apr12.81,18.9,0
1,Nov12.81,21.1,1
2,Mar22.82,20.6,0
3,Nov11.82,20.0,0
4,Apr04.83,19.4,0
5,Jun18.83,22.2,0
6,Aug30.83,22.8,0
7,Nov28.83,21.1,0
8,Feb03.84,13.9,1
9,Apr06.84,17.2,1


What is the difference between average air temperatures for launches with and without O-ring failures? Round the answer to 3 decimal points.

In [2]:
boom = challenger.query('Incident == 1')
not_boom = challenger.query('Incident == 0')

In [3]:
np.mean(boom.Temperature.values) - np.mean(not_boom.Temperature.values)

-4.666964285714283

Using percentile bootstrap, build 95% confidence interval for the difference between mean air temperatures for launches with and without O-ring failures. What is its' bound closes to 0? Round the answer to 4 decimal points. If you want to get exactly the same results as we did:

- use get_bootstrap_samples and percentile_interval functions

- set random seed = 0 before calling get_bootstrap_samples, once

- use 5000 bootstrap resamples from each sample

In [4]:
def get_bootstrap_samples(x, n_resamples):
    indices = np.random.randint(0, len(x), (n_resamples, len(x)))
    resamples = x[indices]
    return resamples


def percentile_interval(stat, alpha):
    boundaries = np.percentile(stat, [100 * alpha / 2., 100 * (1 - alpha / 2.)])
    return boundaries


np.random.seed(0)
boom_mean_scores = list(map(np.mean, get_bootstrap_samples(boom.Temperature.values, 5000)))
not_boom_mean_scores = list(map(np.mean, get_bootstrap_samples(not_boom.Temperature.values, 5000)))


delta_median_scores = list(map(lambda x: x[1] - x[0], zip(boom_mean_scores, not_boom_mean_scores)))
print("95% confidence interval for the difference between median repair times:",  
      percentile_interval(delta_median_scores, 0.05))

95% confidence interval for the difference between median repair times: [1.27671875 7.976875  ]
