# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import random

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

In [2]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"

### Step 3. Assign it to a variable called wine

In [3]:
wine = pd.read_csv(url, header=None)

### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [4]:

columns_to_drop = [0, 3, 6, 8, 10, 12, 13]
wine = wine.drop(columns=columns_to_drop, axis=1)

print(wine.head())

      1     2     4    5     7     9     11
0  14.23  1.71  15.6  127  3.06  2.29  1.04
1  13.20  1.78  11.2  100  2.76  1.28  1.05
2  13.16  2.36  18.6  101  3.24  2.81  1.03
3  14.37  1.95  16.8  113  3.49  2.18  0.86
4  13.24  2.59  21.0  118  2.69  1.82  1.04


### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

In [5]:
wine.columns = ['alcohol', 'malic_acid', 'alcalinity_of_ash', 'magnesium', 'flavanoids', 'proanthocyanins', 'hue']

print(wine.head())

   alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0    14.23        1.71               15.6        127        3.06   
1    13.20        1.78               11.2        100        2.76   
2    13.16        2.36               18.6        101        3.24   
3    14.37        1.95               16.8        113        3.49   
4    13.24        2.59               21.0        118        2.69   

   proanthocyanins   hue  
0             2.29  1.04  
1             1.28  1.05  
2             2.81  1.03  
3             2.18  0.86  
4             1.82  1.04  


### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [6]:
wine.loc[0:2, 'alcohol'] = np.nan

print(wine.head())

   alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0      NaN        1.71               15.6        127        3.06   
1      NaN        1.78               11.2        100        2.76   
2      NaN        2.36               18.6        101        3.24   
3    14.37        1.95               16.8        113        3.49   
4    13.24        2.59               21.0        118        2.69   

   proanthocyanins   hue  
0             2.29  1.04  
1             1.28  1.05  
2             2.81  1.03  
3             2.18  0.86  
4             1.82  1.04  


### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [7]:
wine.loc[3:4, 'magnesium'] = np.nan

print(wine.head())

   alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0      NaN        1.71               15.6      127.0        3.06   
1      NaN        1.78               11.2      100.0        2.76   
2      NaN        2.36               18.6      101.0        3.24   
3    14.37        1.95               16.8        NaN        3.49   
4    13.24        2.59               21.0        NaN        2.69   

   proanthocyanins   hue  
0             2.29  1.04  
1             1.28  1.05  
2             2.81  1.03  
3             2.18  0.86  
4             1.82  1.04  


### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [8]:
wine['alcohol'] = wine['alcohol'].fillna(10)
wine['magnesium'] = wine['magnesium'].fillna(100)

print(wine.head())

   alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0    10.00        1.71               15.6      127.0        3.06   
1    10.00        1.78               11.2      100.0        2.76   
2    10.00        2.36               18.6      101.0        3.24   
3    14.37        1.95               16.8      100.0        3.49   
4    13.24        2.59               21.0      100.0        2.69   

   proanthocyanins   hue  
0             2.29  1.04  
1             1.28  1.05  
2             2.81  1.03  
3             2.18  0.86  
4             1.82  1.04  


### Step 9. Count the number of missing values

In [9]:
missing_values_count = wine.isnull().sum()
print("Số lượng giá trị thiếu trong mỗi cột:\n", missing_values_count)

Số lượng giá trị thiếu trong mỗi cột:
 alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64


### Step 10.  Create an array of 10 random numbers up until 10

In [10]:

random_indices = np.random.randint(0, len(wine), size=10)

print("10 chỉ mục ngẫu nhiên:", random_indices)

10 chỉ mục ngẫu nhiên: [125 100  69 111  28 142 175  98   0 167]


### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

In [11]:
wine.loc[random_indices, 'alcohol'] = np.nan

print(wine.head(15))

    alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0       NaN        1.71               15.6      127.0        3.06   
1     10.00        1.78               11.2      100.0        2.76   
2     10.00        2.36               18.6      101.0        3.24   
3     14.37        1.95               16.8      100.0        3.49   
4     13.24        2.59               21.0      100.0        2.69   
5     14.20        1.76               15.2      112.0        3.39   
6     14.39        1.87               14.6       96.0        2.52   
7     14.06        2.15               17.6      121.0        2.51   
8     14.83        1.64               14.0       97.0        2.98   
9     13.86        1.35               16.0       98.0        3.15   
10    14.10        2.16               18.0      105.0        3.32   
11    14.12        1.48               16.8       95.0        2.43   
12    13.75        1.73               16.0       89.0        2.76   
13    14.75        1.73           

### Step 12.  How many missing values do we have?

In [12]:
total_missing_values = wine.isnull().sum().sum()
print("Tổng số giá trị thiếu hiện có:", total_missing_values)

Tổng số giá trị thiếu hiện có: 10


### Step 13. Delete the rows that contain missing values

In [15]:
wine_cleaned = wine.dropna()

print("Kích thước DataFrame sau khi xóa hàng có NaN:", wine_cleaned.shape)

Kích thước DataFrame sau khi xóa hàng có NaN: (168, 7)


### Step 14. Print only the non-null values in alcohol

In [14]:
non_null_alcohol = wine[wine['alcohol'].notna()]['alcohol']
print("Các giá trị không rỗng trong cột 'alcohol':\n", non_null_alcohol)

Các giá trị không rỗng trong cột 'alcohol':
 1      10.00
2      10.00
3      14.37
4      13.24
5      14.20
       ...  
172    14.16
173    13.71
174    13.40
176    13.17
177    14.13
Name: alcohol, Length: 168, dtype: float64


### Step 15.  Reset the index, so it starts with 0 again

In [16]:
wine_cleaned = wine_cleaned.reset_index(drop=True)

print("DataFrame sau khi reset index:\n", wine_cleaned.head())

DataFrame sau khi reset index:
    alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0    10.00        1.78               11.2      100.0        2.76   
1    10.00        2.36               18.6      101.0        3.24   
2    14.37        1.95               16.8      100.0        3.49   
3    13.24        2.59               21.0      100.0        2.69   
4    14.20        1.76               15.2      112.0        3.39   

   proanthocyanins   hue  
0             1.28  1.05  
1             2.81  1.03  
2             2.18  0.86  
3             1.82  1.04  
4             1.97  1.05  


### BONUS: Create your own question and answer it.