# Romboost: Wine Findings

## Basic Information

### Wine Attributes

In [1]:
import pandas as pd

In [2]:
wine_raw = pd.read_csv('../data/wine_raw.csv')
wine_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Alcohol               178 non-null    float64
 1   Malic_Acid            178 non-null    float64
 2   Ash                   178 non-null    float64
 3   Ash_Alcanity          178 non-null    float64
 4   Magnesium             178 non-null    int64  
 5   Total_Phenols         178 non-null    float64
 6   Flavanoids            178 non-null    float64
 7   Nonflavanoid_Phenols  178 non-null    float64
 8   Proanthocyanins       178 non-null    float64
 9   Color_Intensity       178 non-null    float64
 10  Hue                   178 non-null    float64
 11  OD280                 178 non-null    float64
 12  Proline               178 non-null    int64  
dtypes: float64(11), int64(2)
memory usage: 18.2 KB


These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:
- Alcohol: The percentage of alcohol content in the wine, which contributes to its body and mouthfeel.

-  Malic Acid: A type of acid found in grapes, affecting the tartness and acidity of the wine.

- Ash: The inorganic residue left after the complete burning of organic matter, indicating the mineral content in the wine.

- Ash Alcanity: The measure of the total alkalinity of the ash in the wine.

- Magnesium: A mineral present in wine that can influence its taste and mouthfeel.

- Total Phenols: The total concentration of phenolic compounds in wine, which contribute to its color, flavor, and antioxidant properties.

- Flavanoids: A subgroup of phenolic compounds that contribute to the color and flavor of the wine.

- Nonflavanoid Phenols: Phenolic compounds other than flavanoids, which also contribute to the overall phenolic content.

- Proanthocyanins: A type of flavonoid that contributes to the wine's color and mouthfeel.

- Color Intensity: The depth of color in the wine, influenced by the presence of pigments and other compounds.

- Hue: The shade of color in the wine, often described in terms of red, purple, or brown.

- OD280 (Optical Density at 280 nm): A measure of the absorbance of light by the wine at a specific wavelength, often used to assess protein content.

- Proline: An amino acid found in grapes, and its concentration in wine can be an indicator of grape maturity.


![distplot](../app/images/wine_distplots.png)

**Single Bell Curves:**
From the previous plots we can check that there is a single bell curve in variables Ash, Alcalinity of Ash, Magnesium, Nonflavanoid Phenols, Proanthocyanins and Color intensity.

**Two bell Curves:**
There is a two bells curve in variables Malic Acid, Alcohol, Total Phenols, Flavanoids, Hue, ODB280 and Proline.

![correlations](../app/images/wine_correlations.png)

We can see some wine features are correlated.

**Medium Correlations**

![medium corr](../app/images/wine_correlated_scatters(0.4,0.6).png)

**Strong Correlations**

![strong corr](../app/images/wine_correlated_scatters(0.6,0.8).png)

**Very Strong Correlations**

![very strong corr](../app/images/wine_correlated_scatters(0.8,1).png)

#### **Correlations**
**Positive Correlations:**
There are positive correlations between alcohol content and magnesium, color intensity, and proline. Wines with higher alcohol content tend to exhibit higher levels of magnesium, color intensity, and proline. This may suggest that these attributes are positively associated with alcohol content.

**Negative Correlations:**
There are negative correlations between malic acid and both flavanoids and hue. Wines with higher malic acid content tend to have lower levels of flavanoids and lower hue values. This suggests an inverse relationship between malic acid and these attributes.

**Strong Relationships:**
Strong positive correlations exist between total phenols and both flavanoids and OD280. Additionally, there is a strong correlation between flavanoids and OD280. These strong relationships indicate that these pairs of attributes tend to vary together, and they may play significant roles in defining the overall chemical composition and quality of the wine.

**Trade-Offs:**
There is a moderate negative correlation between total phenols and nonflavanoid phenols, suggesting a potential trade-off between these two types of phenolic compounds. Wines with higher total phenols may have lower levels of nonflavanoid phenols.

**Quality Indicators:**
The strong correlations involving total phenols, flavanoids, and OD280 suggest that these attributes may be important indicators of wine quality. Wines with higher total phenols and flavanoids, as well as higher OD280 values, might be associated with higher perceived quality.

**Color Relationships:**
Alcohol content is positively correlated with color intensity, suggesting that wines with higher alcohol content may exhibit more intense color. Additionally, there is a negative correlation between malic acid and hue, indicating that wines with higher malic acid may have a lower color hue.

### Dimensionality Reduction (PCA) and Clustering

**Number of components decision:**

![exp variance](../app/images/explained_variance_PCA.png)

**Around 70% of all the variance is explained with only 3 components, we choose this number of components for the dimensionality reduction.**

**Number of clusters:**

We are going to use both the elbow method and Silhouette Score to choose the number of clusters.

![cluster decision](../app/images/wine_cluster_tests-k.png)

- **Elbow Method:** Through the elbow method we can check that the number of clusters from where the WCSS start to decrese at a lower rate (elbow) is 3. From that point onwards the compactness of clusters doesn't improve significantly.

- **Silhouette Score:** We can also check that the number of clusters that maximizes the Silhouette score (measures how well separated clusters are) is also 3.

**We are going to work with 3 clusters for the analysis.**

**Feature influence on clusters:**

![pca biplot](../app/images/PCA_biplot.png)

- **Cluster 0:** Hue, Diluted wines, Flavanoids and Proanthocyanins tends to form yet another cluster.
- **Cluster 1:** Alkalinity of Ash, Nonflavanoid phenols and Malic acid tends to form one of the clusters.
- **Cluster 2:** Color Intensity, Ash, Alcohol, Magnesium and Proline tends to form another cluster.

**Cluster Characteristics:**

![mean deviations](../app/images/wine_cluster_mean_deviations.png)

We can check what we mentioned before for each of the variables. The previous chart shows the mean deviation of each cluster from the overall mean for each feature.

![cluster characteristics](../app/images/wine_cluster_characteristics.png)

![radar](../app/images/wine_cluster_radar.png)

## Cluster presentations

### **Cluster 0: Light and Refreshing**
#### Characteristics:
- **Lower Alcohol:** Wines in this cluster have lower alcohol content, contributing to a lighter and potentially more refreshing drinking experience.
- **Lower Malic Acid:** Reduced malic acid levels may result in wines with less tartness and acidity.
- **Lower Ash:** Wines in this cluster have lower ash content, potentially influencing the mineral profile.
- **Medium Ash Alcalinity:** Moderate ash alcalinity indicates a balanced acidity level.
- **Lower Magnesium:** Reduced magnesium levels may influence the mouthfeel and texture of the wine.
- **Medium Total Phenols:** A moderate level of total phenols suggests a balanced phenolic composition.
- **Medium Nonflavanoid Phenols:** These wines exhibit a moderate level of nonflavanoid phenols, contributing to the overall phenolic profile.
- **Medium Proanthocyanins:** A balanced level of proanthocyanins contributes to color intensity and potential health benefits.
- **Lower Color Intensity:** The wines in this cluster exhibit lower color intensity.
- **Medium OD280:** A medium OD280 value indicates a balanced protein content.
- **Lower Proline:** Reduced proline levels may contribute to a different mouthfeel.
- **Hue same as Cluster 2:** Similar hue values to Cluster 2 may suggest a shared color tone.

#### Interpretation:
- Cluster 0 wines are characterized by their lighter profile, offering a refreshing and easy-drinking experience.
- Medium levels of Flavanoids, Proanthocyanins, and OD280 hint a medium level of wine quality.
- **Potential Appeal:** Individuals who enjoy lighter, well-balanced wines with moderate acidity and a nuanced flavor profile may find Cluster 0 wines appealing.
- **Food Pairing:** These wines may pair well with lighter dishes, salads, and seafood.


### **Cluster 1: Balanced and Versatile**
#### Characteristics:
- **Medium Alcohol:** Wines in this cluster have a moderate alcohol content, providing a balanced and versatile drinking experience.
- **Higher Malic Acid:** Elevated malic acid levels may contribute to increased tartness and acidity.
- **Medium Ash:** These wines exhibit a moderate ash content, influencing the mineral profile.
- **Higher Ash Alcalinity:** Elevated ash alcalinity contributes to a higher acidity level.
- **Medium Magnesium:** A moderate level of magnesium influences the mouthfeel and texture of the wine.
- **Lower Total Phenols:** Wines in this cluster have lower total phenolic content, potentially resulting in a lighter flavor profile.
- **Higher Nonflavanoid Phenols:** Elevated nonflavanoid phenols contribute to a unique phenolic composition.
- **Medium Proanthocyanins:** A balanced level of proanthocyanins adds to the overall structure of the wine.
- **Higher Color Intensity:** Wines in this cluster exhibit increased color intensity.
- **Lower Hue:** Reduced hue values indicate a potentially different color tone.
- **Lower OD280:** A lower OD280 value suggests a different protein content.
- **Medium Proline:** Moderate proline levels contribute to the overall mouthfeel.

#### Interpretation:
- Cluster 1 wines offer a balanced profile with a moderate alcohol level, elevated acidity, and unique phenolic composition.
- Lower levels of Flavanoids, Proanthocyanins, and OD280 hint a lower level of wine quality.
- **Potential Appeal:** Those seeking wines with versatility, moderate complexity, and a balanced acidity may find Cluster 1 wines appealing.
- **Food Pairing:** These wines may pair well with a variety of dishes, including poultry, pasta, and grilled vegetables.


### **Cluster 2: Bold and Complex**
#### Characteristics:
- **Higher Alcohol:** Wines in this cluster have higher alcohol content, contributing to a fuller body and potentially warmer sensation.
- **Medium Malic Acid:** A moderate level of malic acid contributes to a balanced acidity.
- **High Ash:** These wines exhibit higher ash content, potentially influencing the mineral profile.
- **Lower Ash Alcalinity:** Reduced ash alcalinity contributes to a lower acidity level.
- **Higher Magnesium:** Elevated magnesium levels influence the mouthfeel and texture of the wine.
- **Higher Total Phenols:** Wines in this cluster have higher total phenolic content, suggesting a richer flavor profile.
- **Lower Nonflavanoid Phenols:** Reduced nonflavanoid phenols contribute to a distinct phenolic composition.
- **Higher Proanthocyanins:** Elevated proanthocyanin levels add to the overall structure and potential health benefits.
- **Medium Color Intensity:** The wines in this cluster exhibit a moderate level of color intensity.
- **Hue same as Cluster 0:** Similar hue values to Cluster 0 may suggest a shared color tone.
- **Higher OD280:** An elevated OD280 value indicates a potentially higher protein content.
- **Higher Proline:** Elevated proline levels contribute to the overall mouthfeel.

#### Interpretation:
- Cluster 2 wines are characterized by their boldness, richer flavor profile, and fuller body.
- Strong levels of Flavanoids, Proanthocyanins, and OD280 hint a higher level of wine quality.
- **Potential Appeal:** Enthusiasts seeking more intense and complex wines with higher alcohol content and robust flavors may find Cluster 2 wines appealing.
- **Food Pairing:** These wines may pair well with hearty dishes, red meats, and aged cheeses.

