This repository has been archived by the owner. It is now read-only.
Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

P4: Exploratory Data Analysis

This project is about investigating a dataset on chemical properties and quality ratings of wine samples by using exploratory data analysis techniques. The primary research target of the investigation was to find the chemical properties that affect the quality of red wines.


Exploratory Data Analysis (EDA) is the numerical and graphical examination of data characteristics and relationships before formal, rigorous statistical analyses are applied. In this project, exploratory data analysis is conducted to explore the variables, structure, patterns, oddities, and underlying relationships of factors that affect wine quality.

The activities implemented in this project are:

  1. Choose a dataset from the provided list.

  2. Explore the dataset and plan the analysis.

  3. Univariate, bivariate and multivariate analysis.

  4. Documenting the analysis.

Learning Outcome

This project helped me learn to use plots to understand the distribution of a variable to check for patterns and their relationships with other variables. Moreover, I learned to create a logical flow when building up from single-variable analysis to multivariate analysis.


  • wineQualityReds.csv – This dataset is publicly available for research in the UCI Machine Learning Repository.

  • Red_Wine_Quality.rmd – Main RMD project file containing the analysis.

  • Red_Wine_Quality.html – HTML file knitted from the project file.

  • Red_Wine_Quality.R - R code extract (with documentation).

  • References.txt – List of references.


This project was developed using RStudio Version 1.0.153 – © 2009-2017 RStudio, Inc (R Version 3.4.2).

The required packages are ggplot2, gridExtra, GGally, ggthemes, dplyr, knitr and memisc.


Modified MIT License © Pranav Suri