Skip to content

Performed a statistical analysis with t-tests and linear regression in R and R Studio on data taken from car design specs to determine if there were any differences between the cars in performance that were statistically significant.

Notifications You must be signed in to change notification settings

mdbinger/MechaCar_Statistical_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MechaCar_Statistical_Analysis

Module 15 of Data Analytics Bootcamp

Linear Regression to Predict MPG

Which variables/coefficients provided a non-random amount of variance to the mpg values in the dataset?

  • The Y-Intercept, vehicle length, and ground clearance each provided a non-random amount of variance to the mpg values in the dataset, as indicated in the image below by the triple asterisks (***).

linear_regression_results_summary

Is the slope of the linear model considered to be zero? Why or why not?

  • The slope of the linear model is not considered to be zero. This is indicated by our p-value for the model being equal to 5.35e-11. This is far lower than our 0.05 threshold for significance, which means we can safely reject our null hypothesis and that the slope of our model will not be zero.

Does this linear model predict mpg of MechaCar prototypes effectively? Why or why not?

  • Our R-Squared value in this model is 0.7149. According to our stats cheat sheet below, any r value above .7 is considered to represent a strong correlation. As a result, we can assume that this linear model does predict mpg of MechaCar prototypes effectively.

Stats_Cheat_Sheet.pdf

Summary Statistics on Suspension Coils

The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. Does the current manufacturing data meet this design specification for all manufacturing lots in total and each lot individually? Why or why not?

The variance of the suspension coils for all lots and for Lot 1 and Lot 2 individually all meet the requirement of not exceeding 100 PSI. As seen below, the variance for all lots is 62.29, and the variance for Lot 1 and Lot 2 are 0.98 and 7.47, respectively. However, Lot 3's variance was well above the 100 PSI threshold, coming in at 170.29. Lot 1 and Lot 2's data was very consistent, with most data points falling right around the mean, while Lot 3 had a large spread of data. This led to Lot 3's variance and standard deviation being far higher than Lot 1 and Lot 2 and failing to meet the design specifications.

total_summary

lot_summary

T-Tests on Suspension Coils

briefly summarize your interpretation and findings for the t-test results. Include screenshots of the t-test to support your summary.

  • Essentially, only one of the t-tests performed gave us results that were statistically significant (spoiler alert: it was Lot 3). The t-test run on all lots returned a p-value of 0.06, which is very close to being statistically significant, but doesn't quite make the cut, so we cannot safely reject the null hypothesis for all lots and must accept that any variance could simply be random chance.

t_test_all_lots

  • The t-test for Lot 1 was interesting, but not in any way close to rejecting our null hypothesis. In fact, it was essentially the opposite, as the p-value returned from the t-test on Lot 1 was equal to 1. This suggests that any variance in this data is almost certainly due to random chance.

t_test_lot1

  • The t-test for Lot 2 was not so dramatic as Lot 1, but the results were essentially the same. With a p-value returned of 0.61, we can very safely attribute any variance in this data to random chance as well.

t_test_lot2

  • The t-test for Lot 3 (as I spoiled above) did come back with a p-value below 0.05. The p-value returned from the t-test on Lot 3 was equal to about 0.042, which allows us to reject our null hypothesis and indicates that the variance in this dataset is not random.

t_test_lot3

Study Design: MechaCar vs Competition

Write a short description of a statistical study that can quantify how the MechaCar performs against the competition. In your study design, think critically about what metrics would be of interest to a consumer: for a few examples, cost, city or highway fuel efficiency, horse power, maintenance cost, or safety rating. In your description, address the following questions:

  1. What metric or metrics are you going to test?
  • I would use the following metrics in this study, as I believe they are some of the most pure indicators in performance of a car for the average driver: City MPG, Highway MPG, maintenance cost, and safety rating.
  1. What is the null hypothesis or alternative hypothesis?
  • The null hypothesis would be that there is no difference between competitors in any metric listed above. More specifically, MechaCar is no different from any of its competitors in vehicle MPG (city and highway), maintenance cost, or safety rating.
  1. What statistical test would you use to test the hypothesis? And why?
  • I would first just view some summary statistics to get a general idea for how the competitors compare to the MechaCar. Some factors, like MPG, may be obviously comparable just by looking at means alone. To get a more in-depth look, I would consider running a multiple linear regression for each metric with MechaCar's data being compared to all the competitors at once. To be specific, this would require a multiple linear regression comparing MechaCar to each competitor in both MPGs, maintenance cost, and safety rating.
  1. What data is needed to run the statistical test?
  • All MPG data would be needed for both city driving and highway driving. Additionally, a considerable amount of records involving upkeep, repair, and replacement for each vehicle would be beneficial for the maintenance cost piece. Lastly, crash test data and official safety scores from verified organizations would be a great set of information to compare safety ratings across all vehicles.

About

Performed a statistical analysis with t-tests and linear regression in R and R Studio on data taken from car design specs to determine if there were any differences between the cars in performance that were statistically significant.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages