# Visualizing Relationships and Distributions in Seaborn

### Getting into the seaborn domain

In [2]:
!pip3 install seaborn

Collecting seaborn
  Downloading https://files.pythonhosted.org/packages/10/01/dd1c7838cde3b69b247aaeb61016e238cafd8188a276e366d36aa6bcdab4/seaborn-0.8.1.tar.gz (178kB)
[K    100% |████████████████████████████████| 184kB 1.3MB/s ta 0:00:01
[?25hCollecting scipy (from seaborn)
  Downloading https://files.pythonhosted.org/packages/87/52/1ca31e34c44864072a1c14e78e345ac2cae00400a363e794a41529ad7e35/scipy-1.0.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (16.3MB)
[K    100% |████████████████████████████████| 16.3MB 79kB/s eta 0:00:011
[?25hCollecting matplotlib (from seaborn)
  Downloading https://files.pythonhosted.org/packages/41/ac/5db3f25a7fbfe1b9ea6af82375e15b64d8aa9535b0570a13ba89debc77f6/matplotlib-2.2.2-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (13.6MB)
[K    100% |████████████████████████████████| 13.6MB 102kB/s ta 0:00:011  11% |███▊             

In [4]:
import seaborn as sns

In [13]:
print(sns.__version__)

0.8.1


### What data do we work on
Ideally, data in a tabulated format - dataframe

#### We use the Pandas library for generating dataframes

In [5]:
import pandas as pd

In [14]:
print(pd.__version__)

0.22.0


* Load dataset into pandas from a .csv file; store it as a dataframe
* File downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
* Even though the file comes with headers, we still specify them so that they are in our desired format

In [7]:
wine_data = pd.read_csv('datasets/winequality-white.csv', 
                        names=["Fixed Acidity", "Volatile Acidity", "Citric Acid", "Residual Sugar",
                                "Chlorides", "Free Sulfur Dioxide", "Total Sulfur Dioxide", "Density", 
                                "pH", "Sulphates", "Alcohol", "Quality"],
                        skiprows=1,
                        sep=r'\s*;\s*', engine='python')


In [8]:
wine_data.head()

Unnamed: 0,Fixed Acidity,Volatile Acidity,Citric Acid,Residual Sugar,Chlorides,Free Sulfur Dioxide,Total Sulfur Dioxide,Density,pH,Sulphates,Alcohol,Quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [9]:
#Displaying the bottom 5 entries
wine_data.tail()

Unnamed: 0,Fixed Acidity,Volatile Acidity,Citric Acid,Residual Sugar,Chlorides,Free Sulfur Dioxide,Total Sulfur Dioxide,Density,pH,Sulphates,Alcohol,Quality
4893,6.2,0.21,0.29,1.6,0.039,24.0,92.0,0.99114,3.27,0.5,11.2,6
4894,6.6,0.32,0.36,8.0,0.047,57.0,168.0,0.9949,3.15,0.46,9.6,5
4895,6.5,0.24,0.19,1.2,0.041,30.0,111.0,0.99254,2.99,0.46,9.4,6
4896,5.5,0.29,0.3,1.1,0.022,20.0,110.0,0.98869,3.34,0.38,12.8,7
4897,6.0,0.21,0.38,0.8,0.02,22.0,98.0,0.98941,3.26,0.32,11.8,6


*The dataset discusses different wine samples and their qualities*
The dataset has details about the content of alcohol, sugar, different acids in each wine sample and even has a wine quality represented by numbers.

In [12]:
print(len(wine_data.index))

4898


In [11]:
wine_data.describe()

Unnamed: 0,Fixed Acidity,Volatile Acidity,Citric Acid,Residual Sugar,Chlorides,Free Sulfur Dioxide,Total Sulfur Dioxide,Density,pH,Sulphates,Alcohol,Quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0
