##3030 Assignment 5 O-Rings Model Solution

In [1]:
# Render plots inline
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# import formula api as alias smf
import statsmodels.formula.api as smf

# Variables:
# ORings: Number of O-rings at risk on a given flight
# DistressedOrings: Number experiencing thermal distress
# Temp: Launch temperature (degrees F)
# Pressure: Leak-check pressure (psi)
# TempOrderOfFlight: Temporal order of flight
cols = ['ORings', 'DistressedOrings', 'Temp', 'Pressure', 'TempOrderOfFlight']

df = pd.read_csv('o-ring-erosion-or-blowby.csv', names=cols)
df

Unnamed: 0,ORings,DistressedOrings,Temp,Pressure,TempOrderOfFlight
0,6,0,66,50,1
1,6,1,70,50,2
2,6,0,69,50,3
3,6,0,68,50,4
4,6,0,67,50,5
5,6,0,72,50,6
6,6,0,73,100,7
7,6,0,70,100,8
8,6,1,57,200,9
9,6,1,63,200,10


In [2]:
# Ordinary Least Squares Regression

# First, select relevant variables
# O-Rings:           This is a constant (its always 6) so there is no point using it as a predictor.
#                    It doesn't vary so it can't contribute to different cases having different
#                    outcomes.
# DistressedOrings:  This is what we're trying to predict so this is our target variable
# Temp:              Our most important predictor
# Pressure:          Might or might not be predictive. Include it and see what happens
# TempOrderOfFlight: This is just the order of the flights (Flight #1, #2, etc.). If we were interested
#                    in whether the situation is getting better or worse over time we want to include
#                    this as a predictor but since we are only interested in the effects of temperature
#                    (and possibly test pressure) including this might result in the model attributing
#                    the change in # of rings to just the passage of time and mask the relationship
#                    we're really interested in.

X = df[['Temp', 'Pressure']]
y = df['DistressedOrings']

# Add a constant so the model will choose an intercept. (Otherwise the model will fit a line
# through the origin).
X = sm.add_constant(X)

# Fit the OLS model
est = sm.OLS(y, X).fit()

# Check the results
est.summary()

0,1,2,3
Dep. Variable:,DistressedOrings,R-squared:,0.354
Model:,OLS,Adj. R-squared:,0.29
Method:,Least Squares,F-statistic:,5.49
Date:,"Sun, 05 Jul 2015",Prob (F-statistic):,0.0126
Time:,09:29:40,Log-Likelihood:,-17.408
No. Observations:,23,AIC:,40.82
Df Residuals:,20,BIC:,44.22
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,3.3298,1.188,2.803,0.011,0.851 5.808
Temp,-0.0487,0.017,-2.910,0.009,-0.084 -0.014
Pressure,0.0029,0.002,1.699,0.105,-0.001 0.007

0,1,2,3
Omnibus:,19.324,Durbin-Watson:,2.39
Prob(Omnibus):,0.0,Jarque-Bera (JB):,23.471
Skew:,1.782,Prob(JB):,8e-06
Kurtosis:,6.433,Cond. No.,1840.0


In [3]:
est.params

const       3.329831
Temp       -0.048671
Pressure    0.002939
dtype: float64

In [4]:
# Intercept
constant = est.params[0] 
# Coeff for Temp
coef1 = est.params[1]
# Coeff for Pressure
coef2 = est.params[2]

# No. of O rings in distress when temperature = 31 and pressure is 0, 50, 100, and 200
for pressure in [0, 50, 100, 200]:
    print "Temp=31 Pressure=", pressure, " Predicted # of O-Rings in distress:", constant + coef1 * 31 + coef2 * pressure

Temp=31 Pressure= 0  Predicted # of O-Rings in distress: 1.82102695086
Temp=31 Pressure= 50  Predicted # of O-Rings in distress: 1.96799318368
Temp=31 Pressure= 100  Predicted # of O-Rings in distress: 2.1149594165
Temp=31 Pressure= 200  Predicted # of O-Rings in distress: 2.40889188214


In [5]:
# Conclusion
#
# If we assume the overall relationship is linear the analysis suggests that approximately 2 o-rings
# will experience distress if the temperature the day of the launch is 31F.  See notes below for
# important cautions.
#
# Notes
#
# Extrapolating results outside the range of actual observations like we are doing here is always
# a very dicey proposition. It assumes that the overall relationship is truly linear but any smooth 
# curve looked at over a small enough range will look straight--this is why calculus works.
# In the absence of any better way to do this (there were only so many actual flights) it can provide
# some insight but must be thought of as indicative only, not an actual prediction with any precision.