Skip to content

hushunbo/boxplot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

boxplot

Mary McDonagh

Fundamentals for Data Analysis

GMIT 2018

Table of Contents

1.0 Overview

2.0 Prerequisites

3.0 Github

4.0 Summary

5.0 References

1.0 Overview

This repository is intended to review the following in detail:

  • • Summarise the history of the box plot and situations in which it used.
  • • Demonstrate the use of the box plot using data of your choosing.
  • • Explain any relevant terminology such as the terms quartile and percentile.
  • • Compare the box plot to alternatives.

2.0 Prerequisites

Install Python 3.0 Install all required python dependencies

3.0 Github

Create repository and readme in github account.

Download Data

In command line, cd (change directory) to this repo root directory Execute setup script: sh ./setup/setup.sh. Clone github through command line.

Jupter

Open the Jupyter notebook through Anaconda. Create python 3 ipynb file and name it boxplot.

To begin my project I carried out an investigation of box plots. I have outlined my initial steps carried out below:

  • Research the history of the box plot.
  • Demonstrate the use of it with data.
  • Import required libraries.
  • Research and explanation of terminology.
  • Comparison of the box plot to alternatives.
  • Analyse all of the above information to summarise the dataset in detail.

After researching the history of boxplots I had a better understanding of their use. I defined the components of boxplots and then I began displaying the various types of boxplots by simulating data to display them. I used np.random.rand to do this. I displayed 2 data simulations and I proceeded to display different types of boxplots such as:

  • a basic boxplot
  • a horizontal boxplot
  • a notched boxplot
  • changed outlier points
  • not displaying outlier points
  • changing the whisker length

To analyse this further in section 2.2 I used an some random data using the numpy package and also actual data using the Iris dataset to demonstrate the use of boxplots. I defined the variables in the Iris dataset and then used the descibe function to see a statistical analysis of it. I then began created boxplots using this data. I displayed the sepal and petal length and width.

In Section 2.3 I continued my analysis by explaining the different terminology associated with boxplots.

In Section 2.4 I carried out a comparison of boxplots to other alternatives.

Finally in Section 3 I summaried my work for this project defining boxplots and the tasks I covered throughout the project.

4.0 Summary

Boxplots capture a summary of data with a simple box and whiskers allowing us to compare easily across groups. I have learned throughout my research that a boxplot is simply a type of graph that displays a summary of a large amount of data in five numbers. These numbers include the median, upper quartile, lower quartile, minimum and maximum data values. Boxplots summarise a sample data using 25th, 50th and 75th percentiles. These percentiles are known as the lower quartile, median and upper quartile. The advantage of comparing quartiles is that they are not influenced by outliers. The median is the midpoint of the range of data; the upper and lower quartiles represent the numbers above and below the highest and lower quarters of the data and the minimum and maximum data values. Organizing data in a box plot by using five key concepts is an efficient way of dealing with large data too unmanageable for other graphs, such as line plots or stem and leaf plots.

####Advantages of boxplots

  • they handle large amounts of data easily
  • allow you to efficiently display large amounts of data using the 5 concepts mentioned above

Disadvantages of boxplots

  • boxplots do not keep the exact values and details of the distribution results
  • displays a simple summary of the data
  • generally boxplots would need to be used in conjunction with other graphs Reference 15.

5.0 References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%