Exploratory Data Analysis Project
This project conducts exploratory data analysis (EDA) on a dataset of career statistics for major league baseball players. The goal is to understand the relationships between different performance metrics and player salary. The Rmarkdown html code file can be found here
The analysis focuses on key variables like AtBat, Hits, and Salary. After cleaning the data and dealing with missing values, univariate analysis is performed to understand the distribution of individual variables. Bivariate analysis explores the relationships between variables through visualizations like scatterplots, boxplots, and spread-level plots.
-- Exploring distributions of key variables
-- Using transformations to make distributions more symmetric
-- Fitting resistant models to understand relationships
-- Binning data and constructing rootograms
-- Salary has a positive correlation with AtBat
-- Hits are right skewed while AtBat is left skewed
-- Power transformations improve symmetry
-- Roots of Hits/AtBat deviate from normality