# Examine Feature Statistics of a Single Profile on Wine Quality

## Installing Dependencies

In [1]:
# Install these if you don't have them already
# %%sh
# pip install --upgrade pip -q
# pip install whylogs -U -q
# pip install pybars3 -U -q

## Loading the data

In [2]:
import pandas as pd

url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine = pd.read_csv(url,sep=";")
wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [3]:
wine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


Also add some missing values to `citric acid`, to see how this is reflected in profile visualizer later on.

In [4]:
ixs = wine.iloc[100:110].index
wine.loc[ixs,'citric acid'] = None

The `quality` feature is a numerical one, representing the wine's quality. For now you will leave it like that, but later on you will revisit this feature for easier comparison with a second profile.

Now, you can profile your dataframe with `whylogs`. You will keep this profile in-memory to use as input for the `NotebookProfileViewer`

In [5]:
import datetime
from whylogs import get_or_create_session


session = get_or_create_session()
now = datetime.datetime.now()

with session.logger("test", dataset_timestamp=now) as logger:
    logger.log_dataframe(wine)
    target_profile = logger.profile

WARN: Missing config


Instantiate `NotebookProfileViewer` and set the target profile:

In [6]:
from whylogs.viz import NotebookProfileViewer

visualization = NotebookProfileViewer()
visualization.set_profiles(target_profile=target_profile)

# Feature Statistics

With `feature_statistics`, we have access to very useful statistics for features of a single profile one at a time by passing the feature and profile name:

In [7]:
visualization.feature_statistics(feature_name="citric acid", profile="target")

In [11]:
import os
os.getcwd()
visualization.download(html=visualization.feature_statistics(feature_name="citric acid"), html_file_name=os.getcwd()+"/feature_stats_citric_acid")