# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [10]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

### Step 3. Assign it to a variable called wine

In [11]:
wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', sep=',')

### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [12]:
wine.head()
wine.drop(wine.columns[[0, 3, 6, 8, 10, 12, 13]], axis=1, inplace=True)

### Step 5. Assign the columns as below:

The attributes are (dontated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

In [14]:
wine.columns = ['alcohol', 'malic_acid', 'alcalinity_of_ash', 'magnesium' ,'flavanoids', 'proanthocyanins', 'hue']

### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [15]:
wine.loc[0:3, 'alcohol'] = np.nan

### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [16]:
wine.loc[[2, 3], 'magnesium'] = np.nan

### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [20]:
wine['alcohol'].fillna(10, inplace=True)
wine['magnesium'].fillna(100, inplace=True)

### Step 9. Count the number of missing values

In [26]:
wine.isnull().sum().sum()

0

### Step 10.  Create an array of 10 random numbers up until 10

In [36]:
random_indices = np.random.randint(10, size=10)
random_indices

array([8, 6, 9, 1, 4, 0, 9, 5, 7, 3])

### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

In [39]:
wine.iloc[random_indices, :] = np.nan

### Step 12.  How many missing values do we have?

In [41]:
wine.isnull().sum().sum()

63

### Step 13. Delete the rows that contain missing values

In [42]:
wine.dropna(inplace=True)

### Step 14. Print only the non-null values in alcohol

In [46]:
wine['alcohol'][~wine['alcohol'].isnull()]

# alt
wine.alcohol[wine.alcohol.notnull()]

2      10.00
10     14.12
11     13.75
12     14.75
13     14.38
14     13.63
15     14.30
16     13.83
17     14.19
18     13.64
19     14.06
20     12.93
21     13.71
22     12.85
23     13.50
24     13.05
25     13.39
26     13.30
27     13.87
28     14.02
29     13.73
30     13.58
31     13.68
32     13.76
33     13.51
34     13.48
35     13.28
36     13.05
37     13.07
38     14.22
       ...  
147    13.32
148    13.08
149    13.50
150    12.79
151    13.11
152    13.23
153    12.58
154    13.17
155    13.84
156    12.45
157    14.34
158    13.48
159    12.36
160    13.69
161    12.85
162    12.96
163    13.78
164    13.73
165    13.45
166    12.82
167    13.58
168    13.40
169    12.20
170    12.77
171    14.16
172    13.71
173    13.40
174    13.27
175    13.17
176    14.13
Name: alcohol, Length: 168, dtype: float64

### Step 15.  Reset the index, so it starts with 0 again

In [47]:
wine.reset_index()

Unnamed: 0,index,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,2,10.00,1.95,16.8,100.0,3.49,2.18,0.86
1,10,14.12,1.48,16.8,95.0,2.43,1.57,1.17
2,11,13.75,1.73,16.0,89.0,2.76,1.81,1.15
3,12,14.75,1.73,11.4,91.0,3.69,2.81,1.25
4,13,14.38,1.87,12.0,102.0,3.64,2.96,1.20
5,14,13.63,1.81,17.2,112.0,2.91,1.46,1.28
6,15,14.30,1.92,20.0,120.0,3.14,1.97,1.07
7,16,13.83,1.57,20.0,115.0,3.40,1.72,1.13
8,17,14.19,1.59,16.5,108.0,3.93,1.86,1.23
9,18,13.64,3.10,15.2,116.0,3.03,1.66,0.96


### BONUS: Create your own question and answer it.