# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [3]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](http://mlr.cs.umass.edu/ml/machine-learning-databases/wine/wine.data). 

4. Relevant Information:

   -- These data are the results of a chemical analysis of
      wines grown in the same region in Italy but derived from three
      different cultivars.
      The analysis determined the quantities of 13 constituents
      found in each of the three types of wines. 

   -- I think that the initial data set had around 30 variables, but 
      for some reason I only have the 13 dimensional version. 
      I had a list of what the 30 or so variables were, but a.) 
      I lost it, and b.), I would not know which 13 variables
      are included in the set.

   -- The attributes are (dontated by Riccardo Leardi, 
	riclea@anchem.unige.it )
 	
    1) Alcohol
 	
    2) Malic acid
 	
    3) Ash
	
    4) Alcalinity of ash  
 	
    5) Magnesium
	
    6) Total phenols
 	
    7) Flavanoids
 	
    8) Nonflavanoid phenols
 	
    9) Proanthocyanins
	
    10)Color intensity
 	
    11)Hue
 	
    12)OD280/OD315 of diluted wines
 	
    13)Proline            

5. Number of Instances

    class 1 59
    
	class 2 71
    
	class 3 48

6. Number of Attributes 
	
	13

7. For Each Attribute:

	All attributes are continuous
	
	No statistics available, but suggest to standardise
	variables for certain uses (e.g. for us with classifiers
	which are NOT scale invariant)

	NOTE: 1st attribute is class identifier (1-3)

8. Missing Attribute Values:

	None

9. Class Distribution: number of instances per class

    class 1 59
    
	class 2 71

### Step 3. Assign it to a variable called wine

In [5]:
# url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'
url = 'http://mlr.cs.umass.edu/ml/machine-learning-databases/wine/wine.data'
wine = pd.read_csv(url)

wine.head()

Unnamed: 0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
0,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
1,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
2,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
3,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
4,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450


### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [6]:
wine = wine.drop(wine.columns[[0,3,6,8,11,12,13]], axis = 1)

# Signature: wine.drop(labels, axis=0, level=None, inplace=False, errors='raise')
# Docstring:
# Return new object with labels in requested axis removed.

# Parameters
# ----------
# labels : single label or list-like
# axis : int or axis name
# level : int or level name, default None
#     For MultiIndex
# inplace : bool, default False
#     If True, do operation inplace and return None.
# errors : {'ignore', 'raise'}, default 'raise'
#     If 'ignore', suppress error and existing labels are dropped.

    
# Returns
# -------
# dropped : type of caller
    


wine.head()

Unnamed: 0,14.23,1.71,15.6,127,3.06,2.29,5.64
0,13.2,1.78,11.2,100,2.76,1.28,4.38
1,13.16,2.36,18.6,101,3.24,2.81,5.68
2,14.37,1.95,16.8,113,3.49,2.18,7.8
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.2,1.76,15.2,112,3.39,1.97,6.75


### Step 5. Assign the columns as below:

The attributes are (dontated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

In [7]:
wine.columns = ['alcohol', 'malic_acid', 'alcalinity_of_ash', 'magnesium', 'flavanoids', 'proanthocyanins', 'hue']
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,13.2,1.78,11.2,100,2.76,1.28,4.38
1,13.16,2.36,18.6,101,3.24,2.81,5.68
2,14.37,1.95,16.8,113,3.49,2.18,7.8
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.2,1.76,15.2,112,3.39,1.97,6.75


### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [8]:
wine.iloc[0:3, 0] = np.nan
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100,2.76,1.28,4.38
1,,2.36,18.6,101,3.24,2.81,5.68
2,,1.95,16.8,113,3.49,2.18,7.8
3,13.24,2.59,21.0,118,2.69,1.82,4.32
4,14.2,1.76,15.2,112,3.39,1.97,6.75


### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [9]:
wine.iloc[2:4, 3] = np.nan
wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100.0,2.76,1.28,4.38
1,,2.36,18.6,101.0,3.24,2.81,5.68
2,,1.95,16.8,,3.49,2.18,7.8
3,13.24,2.59,21.0,,2.69,1.82,4.32
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75


### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [10]:
wine.alcohol.fillna(10, inplace = True)

wine.magnesium.fillna(100, inplace = True)

wine.head()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,10.0,1.78,11.2,100.0,2.76,1.28,4.38
1,10.0,2.36,18.6,101.0,3.24,2.81,5.68
2,10.0,1.95,16.8,100.0,3.49,2.18,7.8
3,13.24,2.59,21.0,100.0,2.69,1.82,4.32
4,14.2,1.76,15.2,112.0,3.39,1.97,6.75


### Step 9. Count the number of missing values

In [11]:
wine.isnull().sum()

alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64

### Step 10.  Create an array of 10 random numbers up until 10

In [12]:
random = np.random.randint(10, size = 10)

# Docstring:
# randint(low, high=None, size=None, dtype='l')

# Return random integers from `low` (inclusive) to `high` (exclusive).

# Return random integers from the "discrete uniform" distribution of
# the specified dtype in the "half-open" interval [`low`, `high`). If
# `high` is None (the default), then results are from [0, `low`).

# Parameters
# ----------
# low : int
#     Lowest (signed) integer to be drawn from the distribution (unless
#     ``high=None``, in which case this parameter is the *highest* such
#     integer).
# high : int, optional
#     If provided, one above the largest (signed) integer to be drawn
#     from the distribution (see above for behavior if ``high=None``).
# size : int or tuple of ints, optional
#     Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
#     ``m * n * k`` samples are drawn.  Default is None, in which case a
#     single value is returned.
# dtype : dtype, optional
#     Desired dtype of the result. All dtypes are determined by their
#     name, i.e., 'int64', 'int', etc, so byteorder is not available
#     and a specific precision may have different C types depending
#     on the platform. The default value is 'np.int'.

# Returns
# -------
# out : int or ndarray of ints
#     `size`-shaped array of random integers from the appropriate
#     distribution, or a single such random int if `size` not provided.

random

array([0, 1, 3, 8, 4, 2, 1, 5, 5, 4])

### Step 11.  Set the rows of the random numbers in the column

In [13]:
wine.alcohol[random] = np.nan
wine.head(10)

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100.0,2.76,1.28,4.38
1,,2.36,18.6,101.0,3.24,2.81,5.68
2,,1.95,16.8,100.0,3.49,2.18,7.8
3,,2.59,21.0,100.0,2.69,1.82,4.32
4,,1.76,15.2,112.0,3.39,1.97,6.75
5,,1.87,14.6,96.0,2.52,1.98,5.25
6,14.06,2.15,17.6,121.0,2.51,1.25,5.05
7,14.83,1.64,14.0,97.0,2.98,1.98,5.2
8,,1.35,16.0,98.0,3.15,1.85,7.22
9,14.1,2.16,18.0,105.0,3.32,2.38,5.75


### Step 12.  How many missing values do we have?

In [14]:
wine.isnull().sum()

alcohol              7
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64

### Step 14. Print only the non-null values in alcohol

In [16]:
mask = wine.alcohol.notnull()

# Signature: wine.alcohol.notnull()
# Docstring:
# Return a boolean same-sized object indicating if the values are
# not null.

# See Also
# --------
# isnull : boolean inverse of notnull


print(mask.head())

print(wine.alcohol[mask])

0    False
1    False
2    False
3    False
4    False
Name: alcohol, dtype: bool
6      14.06
7      14.83
9      14.10
10     14.12
11     13.75
12     14.75
13     14.38
14     13.63
15     14.30
16     13.83
17     14.19
18     13.64
19     14.06
20     12.93
21     13.71
22     12.85
23     13.50
24     13.05
25     13.39
26     13.30
27     13.87
28     14.02
29     13.73
30     13.58
31     13.68
32     13.76
33     13.51
34     13.48
35     13.28
36     13.05
       ...  
147    13.32
148    13.08
149    13.50
150    12.79
151    13.11
152    13.23
153    12.58
154    13.17
155    13.84
156    12.45
157    14.34
158    13.48
159    12.36
160    13.69
161    12.85
162    12.96
163    13.78
164    13.73
165    13.45
166    12.82
167    13.58
168    13.40
169    12.20
170    12.77
171    14.16
172    13.71
173    13.40
174    13.27
175    13.17
176    14.13
Name: alcohol, dtype: float64


### Step 13. Delete the rows that contain missing values

In [18]:
wine = wine.dropna(axis = 0, how = "any")
wine.head(10)

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
6,14.06,2.15,17.6,121.0,2.51,1.25,5.05
7,14.83,1.64,14.0,97.0,2.98,1.98,5.2
9,14.1,2.16,18.0,105.0,3.32,2.38,5.75
10,14.12,1.48,16.8,95.0,2.43,1.57,5.0
11,13.75,1.73,16.0,89.0,2.76,1.81,5.6
12,14.75,1.73,11.4,91.0,3.69,2.81,5.4
13,14.38,1.87,12.0,102.0,3.64,2.96,7.5
14,13.63,1.81,17.2,112.0,2.91,1.46,7.3
15,14.3,1.92,20.0,120.0,3.14,1.97,6.2
16,13.83,1.57,20.0,115.0,3.4,1.72,6.6


### Step 15.  Reset the index, so it starts with 0 again

In [20]:
wine = wine.reset_index(drop = True)
wine.head(10)

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,14.06,2.15,17.6,121.0,2.51,1.25,5.05
1,14.83,1.64,14.0,97.0,2.98,1.98,5.2
2,14.1,2.16,18.0,105.0,3.32,2.38,5.75
3,14.12,1.48,16.8,95.0,2.43,1.57,5.0
4,13.75,1.73,16.0,89.0,2.76,1.81,5.6
5,14.75,1.73,11.4,91.0,3.69,2.81,5.4
6,14.38,1.87,12.0,102.0,3.64,2.96,7.5
7,13.63,1.81,17.2,112.0,2.91,1.46,7.3
8,14.3,1.92,20.0,120.0,3.14,1.97,6.2
9,13.83,1.57,20.0,115.0,3.4,1.72,6.6


### BONUS: Create your own question and answer it.