Skip to content

gexijin/inspect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inspect: An R Package for automated EDA (exploratory analysis)

Writtend mostly by GPT-4, this R package renders an EDA report based on this R Markdown file. It can be used to generate an EDA report like this, from any data set. You can also generate this report using the Shiny app RTutor. Contact or feedback Steven Ge

Install & use

library("remotes")
install_github("gexijin/inspect")
library(inspect)

eda(mtcars)   # Generate EDA report for a data frame, i.e. mtcars
eda(iris, "Species")  # Specifying a dependent/target variable

Main goal

Exploratory data analysis (EDA) is an essential first step in any data science project. Consider it the equivalent of an annual doctor’s check-up but for data science projects. I have long believed that EDA can be automated as the tasks are very general. While there are existing R packages for EDA such as DataExplorer, summarytools, tableone, and GGally, I have not found what I was looking for. Leveraging GPT-4, I was able to create an EDA script in just a few hours.

Given a data set, the main idea is to streamline these steps:

  1. Starts with a data summary.
  2. Any missing values and outliers?
  3. Plots distribution of numerical variables using histograms and QQ plots. When excessive skewness is present, a log transformation is recommended.
  4. Distribution of categorical variables.
  5. It provides a general data overview with a heatmap and a correlation plot.
  6. Correlation matrix (corrplot)
  7. Scatter plots to examine correlations between numerical variables.
  8. It uses violin plots and performs ANOVA to study the differences between groups delineated by categorical variables.
  9. Are categorical variables independent of each other? Uses Chi-squared test and bar plots.

To use this RMarkdown file, you just need to obtain a copy from this GitHub repository. Replace the demo data file with your own, specify a target variable, and you’re ready to render the report.

If that sounds like too much work, simply upload your data file to RTutor.ai, and click on the EDA tab. A comprehensive report will be generated in 2 minutes. The template was originally written for RTutor.

Example plots

Missing

Correlation

Heatmap

Histogram

Barplot

Scatter plot

Boxplot

Combination

About

A general approach to EDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published