# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [60]:
import pandas as pd 
import numpy as np

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

### Step 3. Assign it to a variable called wine

In [61]:
wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data' , sep = ',')
wine.head()

Unnamed: 0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
0,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
1,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
2,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
3,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
4,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450


### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [62]:
wine = wine.drop(wine.columns[[0 , 3 , 6 ,8 , 11 , 12 , 13]] , axis=1)
wine

Unnamed: 0,14.23,1.71,15.6,127,3.06,2.29,5.64
0,13.20,1.78,11.2,100,2.76,1.28,4.38
1,13.16,2.36,18.6,101,3.24,2.81,5.68
2,14.37,1.95,16.8,113,3.49,2.18,7.80
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.20,1.76,15.2,112,3.39,1.97,6.75
...,...,...,...,...,...,...,...
172,13.71,5.65,20.5,95,0.61,1.06,7.70
173,13.40,3.91,23.0,102,0.75,1.41,7.30
174,13.27,4.28,20.0,120,0.69,1.35,10.20
175,13.17,2.59,20.0,120,0.68,1.46,9.30


### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

In [63]:
wine.columns = ['alcohol', 'malic_acid', 'alcalinity_of_ash', 'magnesium', 'flavanoids', 'proanthocyanins', 'hue']
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,13.2,1.78,11.2,100,2.76,1.28,4.38
1,13.16,2.36,18.6,101,3.24,2.81,5.68
2,14.37,1.95,16.8,113,3.49,2.18,7.8
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.2,1.76,15.2,112,3.39,1.97,6.75


### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [64]:
wine.alcohol.iloc[:3] = np.nan
wine.head()

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  wine.alcohol.iloc[:3] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wine.alcohol.iloc[:3] = np.nan


Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100,2.76,1.28,4.38
1,,2.36,18.6,101,3.24,2.81,5.68
2,,1.95,16.8,113,3.49,2.18,7.8
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.2,1.76,15.2,112,3.39,1.97,6.75


### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [65]:
wine.magnesium.iloc[2 : 4] = np.nan
wine.head()

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  wine.magnesium.iloc[2 : 4] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wine.magnesium.iloc[2 : 4] 

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100.0,2.76,1.28,4.38
1,,2.36,18.6,101.0,3.24,2.81,5.68
2,,1.95,16.8,,3.49,2.18,7.8
3,13.24,2.59,21.0,,2.69,1.82,4.32
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75


### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [66]:
wine.alcohol.fillna(10 , inplace=True)
wine.magnesium.fillna(100 , inplace=True)
wine.head(7)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  wine.alcohol.fillna(10 , inplace=True)


Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,10.0,1.78,11.2,100.0,2.76,1.28,4.38
1,10.0,2.36,18.6,101.0,3.24,2.81,5.68
2,10.0,1.95,16.8,100.0,3.49,2.18,7.8
3,13.24,2.59,21.0,100.0,2.69,1.82,4.32
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75
5,14.39,1.87,14.6,96.0,2.52,1.98,5.25
6,14.06,2.15,17.6,121.0,2.51,1.25,5.05


### Step 9. Count the number of missing values

In [67]:
pd.isna(wine).sum()

alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64

### Step 10.  Create an array of 10 random numbers up until 10

In [68]:
arr = np.random.randint(0 ,11 , 10)
arr

array([ 7, 10,  0,  6,  8,  6,  2,  9,  3,  3])

### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

In [69]:
for i in arr : 
    wine.iloc[i] = np.nan

wine.head(10)    

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,,,,,,
1,10.0,2.36,18.6,101.0,3.24,2.81,5.68
2,,,,,,,
3,,,,,,,
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75
5,14.39,1.87,14.6,96.0,2.52,1.98,5.25
6,,,,,,,
7,,,,,,,
8,,,,,,,
9,,,,,,,


### Step 12.  How many missing values do we have?

In [70]:
wine.isnull().sum()

alcohol              8
malic_acid           8
alcalinity_of_ash    8
magnesium            8
flavanoids           8
proanthocyanins      8
hue                  8
dtype: int64

### Step 13. Delete the rows that contain missing values

In [71]:
wine.dropna(how="any" , axis=0 , inplace=True)
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
1,10.0,2.36,18.6,101.0,3.24,2.81,5.68
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75
5,14.39,1.87,14.6,96.0,2.52,1.98,5.25
11,13.75,1.73,16.0,89.0,2.76,1.81,5.6
12,14.75,1.73,11.4,91.0,3.69,2.81,5.4


### Step 14. Print only the non-null values in alcohol

In [72]:
for value in wine.alcohol : 
    if value != np.nan :
        print(value)

10.0
14.2
14.39
13.75
14.75
14.38
13.63
14.3
13.83
14.19
13.64
14.06
12.93
13.71
12.85
13.5
13.05
13.39
13.3
13.87
14.02
13.73
13.58
13.68
13.76
13.51
13.48
13.28
13.05
13.07
14.22
13.56
13.41
13.88
13.24
13.05
14.21
14.38
13.9
14.1
13.94
13.05
13.83
13.82
13.77
13.74
13.56
14.22
13.29
13.72
12.37
12.33
12.64
13.67
12.37
12.17
12.37
13.11
12.37
13.34
12.21
12.29
13.86
13.49
12.99
11.96
11.66
13.03
11.84
12.33
12.7
12.0
12.72
12.08
13.05
11.84
12.67
12.16
11.65
11.64
12.08
12.08
12.0
12.69
12.29
11.62
12.47
11.81
12.29
12.37
12.29
12.08
12.6
12.34
11.82
12.51
12.42
12.25
12.72
12.22
11.61
11.46
12.52
11.76
11.41
12.08
11.03
11.82
12.42
12.77
12.0
11.45
11.56
12.42
13.05
11.87
12.07
12.43
11.79
12.37
12.04
12.86
12.88
12.81
12.7
12.51
12.6
12.25
12.53
13.49
12.84
12.93
13.36
13.52
13.62
12.25
13.16
13.88
12.87
13.32
13.08
13.5
12.79
13.11
13.23
12.58
13.17
13.84
12.45
14.34
13.48
12.36
13.69
12.85
12.96
13.78
13.73
13.45
12.82
13.58
13.4
12.2
12.77
14.16
13.71
13.4
13.27
13.17
14.13


### Step 15.  Reset the index, so it starts with 0 again

In [77]:
wine = wine.reset_index( drop=True)

In [78]:
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,10.0,2.36,18.6,101.0,3.24,2.81,5.68
1,14.2,1.76,15.2,112.0,3.39,1.97,6.75
2,14.39,1.87,14.6,96.0,2.52,1.98,5.25
3,13.75,1.73,16.0,89.0,2.76,1.81,5.6
4,14.75,1.73,11.4,91.0,3.69,2.81,5.4
