# Wine Quality Analysis 

# On the data set

For more information, read [Cortez et al., 2009].

## Vinho Verde
Vinho Verde is not a grape variety, it is a DOC for the production of wine. The name means 'green wine', but translates as "young wine", with wine being released three to six months after the grapes are harvested.

## Input variables (based on physicochemical tests):
1. Fixed acidity: acids are major wine properties and contribute greatly to a wine’s taste. Usually, the total acidity is divided into two groups: the volatile acids and the non-volatile or fixed acids. Fixed acids found in wine include: tartaric, malic, citric, and succinic. This variable is expressed in g(tartaricacid)/dm3 in the datasets.
2. Volatile acidity: essentially the process of wine turning into vinegar. In the U.S, the legal limits of Volatile Acidity are 1.2 g/L for red table wine and 1.1 g/L for white table wine. In these datasets, the volatile acidity is expressed in g(aceticacid)/dm3.
3. Citric acid is one of the fixed acids that you’ll find in wines. It’s expressed in g/dm3 in the two data sets.
4. Residual sugar typically refers to the sugar remaining after fermentation stops. It’s expressed in g/dm3 in the datasets.
5. Chlorides can be a significant contributor to saltiness in wine. Here, it’s expressed in g(sodiumchloride)/dm3.
6. Free sulfur dioxide: free sulfur dioxide becomes "bound" when it protects against oxidation and spoilage in wine, but remains permanently present afterward. Winemakers try to have the highest proportion of free sulfur to bind. This variable is expressed in mg/dm3 in the data.
7. Total sulfur dioxide is the sum of the bound and the free sulfur dioxide (SO2). Here, it’s expressed in mg/dm3. There are legal limits for sulfur levels in wines: in the EU, red wines can only have 160mg/L, while white and rose wines can have about 210mg/L. Sweet wines are allowed to have 400mg/L. Legal limits are set at 350mg/L for the USA, and 250mg/L for Australia.
8. Density is generally used as a measure of the conversion of sugar to alcohol (sugar concentration). Here, it’s expressed in g/cm3.
9. pH or the potential of hydrogen is a numeric scale to specify the acidity or basicity the wine. As you might know, solutions with a pH less than 7 are acidic, while solutions with a pH greater than 7 are basic. With a pH of 7, pure water is neutral. Most wines have a pH between 2.9 and 3.9 and are therefore acidic.
10. Sulphates are an additive that contribute to the sulfur dioxide levels. In this case, they are expressed in g(potassiumsulphate)/dm3.
11. Alcohol: wine is an alcoholic beverage and, as you know, the percentage of alcohol can vary from wine to wine. It’s expressed in % vol.

| Variable Name        | Role    | Type        | Description                                                                                               | Units   | Missing Values |
| -------------------- | ------- | ----------- | --------------------------------------------------------------------------------------------------------- | ------- | -------------- |
| fixed_acidity        | Feature | Continuous  | tartaric, malic, citric, and succinic; this variable is expressed in g(tartaricacid)/dm3 | g/dm^3  | no             |
| volatile_acidity     | Feature | Continuous  |                                                                                                           | g/dm^3  | no             |
| citric_acid          | Feature | Continuous  |                                                                                                           | g/dm^3  | no             |
| residual_sugar       | Feature | Continuous  |                                                                                                           | g/dm^3  | no             |
| chlorides            | Feature | Continuous  |                                                                                                           | g/dm^3  | no             |
| free_sulfur_dioxide  | Feature | Continuous  |                                                                                                           | mg/dm^3 | no             |
| total_sulfur_dioxide | Feature | Continuous  |                                                                                                           | mg/dm^3 | no             |
| density              | Feature | Continuous  |                                                                                                           | g/cm^3  | no             |
| pH                   | Feature | Continuous  |                                                                                                           | -       | no             |
| sulphates            | Feature | Continuous  |                                                                                                           | g/dm^3  | no             |
| alcohol              | Feature | Continuous  |                                                                                                           | vol.%   | no             |
| quality              | Target  | Integer     | score between 0 and 10                                                                                    | -       | no             |
| color                | Other   | Categorical | red or white                                                                                              | -       | no             |

## Output variable (based on sensory data): 

Quality: wine experts graded the wine quality between 0 (very bad) and 10 (excellent). The final score is the mean of at least three evaluations. Some analysts might combine these levels to Low, Medium & High-Quality wines.

Comments on output
* As a quality measure this might vary a lot!
* Say: "Wine experts prefer ..."

## Taste and physiochemical components

What are the parameters in wine tasting?

- **sourness**
  - acids cause sourness
  - ranges
    - low pH (2,8 - 3,2): wine will taste sour
    - medium pH (3,2 - 3,5) säurefrisch, anregend
    - high (3,2 - 4): boring, fad, flach und müde. 
  - The acidity in wine is an important component in the quality and taste of the wine. It adds a sharpness to the flavors and is detected most readily by a prickling sensation on the sides of the tongue and a mouth-watering aftertaste. Of particular importance is the balance of acidity versus the sweetness of the wine (the leftover residual sugar) and the more bitter components of the wine (most notably tannins but also includes other phenolics). A wine with too much acidity will taste excessively sour and sharp. A wine with too little acidity will taste flabby and flat, with less defined flavors.
  - **Volatile acidity**
    - As a by-product of fermentation, acetic acid (volatile acid) can impair wine quality above a certain level.
    - Acetic acid – Considered a main component of volatile acidity that can make a wine taste unbalanced and overly acidic. (However, small amounts of acetic acid are actually beneficial for the yeast as they use them to synthesis lipids in the cell membrane.)
- **sweetness**
  - Sugars and alcohol enhance a wine's sweetness and is balanced by acidity
  - In wine tasting, humans are least sensitive to the taste of sweetness (in contrast to sensitivity to bitterness or sourness) with the majority of the population being able to detect sugar or "sweetness" in wines between 1% and 2.5% residual sugar.
- **bitterness**
  - mainly determined by the tannins
  - tannins are not directly measured, but are contained in density 
- The **density** of a wine refers to its consistency and viscosity. A dense wine has a higher concentration of solids, such as tannins and sugars, and feels heavier and thicker in the mouth. Density plays an important role in the structure, mouthfeel and aromas of a wine. Density significantly influences the structure of a wine. A dense wine usually has a stronger and more powerful structure. The tannins and other components are more abundant and ensure a longer ageing potential. The structure of a wine is decisive for its complexity and ageing potential.
- **Alcohol**
- **Body**
  - alcohol
  - density
- **sulfites, sulphates**
  - preservative, additive
  - Lower pH shifts the equilibrium towards molecular (gaseous) SO2, which is the active form, while at higher pH more SO2 is found in the inactive sulfite and bisulfite forms. The molecular SO2 is active as an antimicrobial and antioxidant, and this is also the form which may be perceived as a pungent odor at high levels.
- **saltiness**
  -  The level of Chloride and sodium ions in wines essentially depends on the geographic, geologic and climatic conditions of vine culture. As a general rule, the levels of these ions are low. the content of these elements is increased in wines coming from vineyards which are near the sea coast, which have brackish sub—soil or which have arid ground irrigated with salt water and the molar ratio cf Cl/Na+ therefore varies significantly and can even have a value close to one (1) which could imply the addition of salt (NaCl) to the wine. 
- Most of the numerous trace elements determined by chemical analysis are of little significance for the taste of the wine.

## Missing data
- input
  - origin
  - grape family
- Must
  - sugar before fermentation (Natural and added)
- Physico chemical
  - residual sugar composition (glucose vs fructose)
  - acids: tartaric vs malic, to estimate taste
  - tannin
  - higher alcohols, excpecially glycerin
  - lactic acid: what about that? 
- Smell

## Modeling 

Components and Processes
- Tasters 
  - tasting / expectation machine
    - have a taste profile
    - rate good if taste profile is met
  - categorization / normalization machines: wine taste (physio chemical composition) -> quality
  - group with a wide variety somewhere between tradition and modernity
    - a lot of expectation what a good wine is, because the taste is very complex
    - work like well oiled machines: the person has tasted many wines and has found his taste profile
    - might be open to new tastes and surprises
  - What is quality? What is considered a good vinho verde? Are there certain quality characteristics to it? How strongly does the wine vary?
- Wine
  - taste profile that arises from physio-chemical composition
  - physio-chemical composition is very complex and we know only a few values
  - wine is created by fermentation of grapes
    - yeast turns sugar into alcohol and ???
    - a lot of other processes take place
    - additives: sugar, sulphates


Thoughts on Do's and Don't's if you want to find out what makes a good wine:
- find the values that primarily determine wine quality 
- filter out noise to get to conclusions about wine quality
- Abstraction

On Sulphates
* from the research paper: they might have a larger influence on taste because they might be related to the fermenting nutrition, which is very important to improve the wine aroma.

# The Task

BlueBerry Winery's primary goal is to sell their product at a proper price based on the quality. They requested CODE Analytics to create a Wine Quality Analytics System to help them determine the quality of the wines produced based on their composition.

Your Machine Learning model is ready, but you now need to present it to BlueBerry Winery's marketing team. Prepare some slides to showcase your model and your findings, including:

- Most meaningful insights from the exploratory analysis (EDA) that had an impact on this part of the project
- (Very) brief explanation of the steps you followed in your process. Select the most important.
- What features did you choose for your model and why?
- Is there any additional data needed in order to deepen or corroborate your findings?
- How accurate was your model?
- Name 2 or 3 business recommendations for BlueBerry Winery

Remember that you're presenting to a non-technical audience so avoid using jargon, difficult terms, or getting too much into the details of the process. What they need is an overview of how you reached your conclusions.

You already know the basic guidelines for a presentation. Don't forget to come up with a storyline to convey your message!

You presentation should be 12-15 minutes long and consist of 6-12 slides.