Skip to content

Statistically analyzing a new vehicle's manufacturing process, then designing an independent study to analyze cost variance.

Notifications You must be signed in to change notification settings

jmoletteire/MechaCar-Statistical-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

MechaCar Statistical Analysis

Linear Regression to Predict MPG

This multiple linear regression examines the influence five variables have on MPG (miles per gallon) for MechaCar prototypes.

The Pr(>|t|) column lists the probability that each coefficient contributes a random amount of variance to the MPG values in the dataset. Given the significantly low probabilities of random contributions from Vehicle Length and Ground Clearance, there is statistical evidence that both variables have a regular impact on miles per gallon.

However, with a significant intercept, the model leaves something to be desired, as it indicates further variability not explained by these parameters. So while two significant relationships were discovered and the null hypothesis can be rejected — now assuming a non-zero slope — three variables in the model had no impact, suggesting overfitting, and ~30% of the variance is still unexplained (based on R-squared), suggesting a good, yet incomplete model.

Summary Statistics on Suspension Coils

Total PSI Summary Statistics


PSI by Manufacturing Lot

With design specifications dictating suspension coil variance be no greater than 100 pounds per square inch, the manufacturing data indicates that two of three lots are meeting this criteria. Lots One and Two are well below the maximum, as is the total variance for all three lots, but it could be much lower if Lot Three were not 70 pounds per square inch over the limit.

Room for improvement in one lot, but so far so good everywhere else.

T-Tests on Suspension Coils

Hypotheses

For all four tests.

α = 0.05

H₀ - No statistically significant difference between the sample mean and the population mean.
Ha - There is a statistically significant difference between the sample mean and the population mean.

Lot 1 vs. Pop. Mean PSI (1500 lbs/sq. inch)

p-value - 1
Significance Level - Not significant

Lot 1 is perfect. Examining the entire sample, the average coil fails to deviate from our 1500 lbs/sq. inch metric. It matches the population mean exactly, resulting in a p-value of 1, and making Lot 1 perfectly average. Nothing noteworthy here.

Conclusion - Fail to reject the null hypothesis.

Lot 2 vs. Pop. Mean (1500 lbs/sq. inch)

p-value - 0.61
Significance Level - Not significant

Not as perfect as Lot 1, but still enough to support the null hypothesis. With a p-value of 0.61, Lot 2 slightly deviates from our hypothesized mean, but hardly enough to provide significant evidence for the alternative.

Conclusion - Fail to reject the null hypothesis.

Lot 3 vs. Pop. Mean (1500 lbs/sq. inch)

p-value - 0.04
Significance Level - Moderate

At a p-value of 0.04, Lot 3 provides the only evidence for the alternative hypothesis. The moderately low probability of its results suggests its mean differs relatively significantly from the presumed population mean. We can see that this is true at the bottom of the image (mean of x: 1496.14). At a more precise significance level, this could be a major issue, but with α = 0.05 we'll just make note of it for now.

Conclusion - Reject the null hypothesis (at α = 0.05).

All Lots vs Pop. Mean (1500 lbs/sq. inch)

p-value - 0.06
Significance Level - Low

Examining a combination of all three lots requires context, which is why we examined them individually first. With a p-value of 0.06 and a mean of 1498.78, the result for all three lots barely clears our threshold of 0.05, and deviates from our hypothesis by just a hair. It is somewhat significant, but let's consider everything we have seen so far. Lots 1 and 2 are nearly perfect, with each sample mean approximately equal to the population mean. Lot 3 is a bit of an outlier, with its results relatively unlikely under the null hypothesis, but not strong enough evidence to reject it.

The third lot clearly drags down the mean in this test, and it is reflected in the results, and a low significance level sounds about right for something caused by such an isolated issue. With hardly enough evidence to refute the null hypothesis, we fail to reject it.

Conclusion - Fail to reject the null hypothesis.

Study Design: MechaCar vs. Competition

The Question

Customers would likely be interested in a study that could show them where cost comes from for each vehicle. In other words, how do factors like fuel efficiency, safety rating, horse power, and year affect the cost of each vehicle?

Hypotheses

Significance Level
α = 0.05

H₀ - Slope of linear model is zero, m = 0.
Ha - Slope of linear model is not zero m ≠ 0.

Design and Data

I would run several multiple linear regressions, one for the MechaCar, and others for similar vehicles (probably three or four). Each regression would require data about each feature for each vehicle:

Dependent Variable:

  • Cost

Independent Variables:

  • City/Highway Fuel Efficiency
  • Safety Rating
  • Year

After acquiring this data for each vehicle, I would run separate regressions for each car, determining the impact the parameters have on total cost.

This would obviously require tinkering, as it is naïve to believe these three variables are responsible for the majority of the variance in total cost among each type of vehicle tested. I would probably have to widen the list of independent variables and narrow down to the most impactful for each car. It would be time consuming to create, but very informative for consumers.

About

Statistically analyzing a new vehicle's manufacturing process, then designing an independent study to analyze cost variance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages