#### Guidelines

Up to now you are probably familiar with downloading and opening Jupyter Notebooks. 
In this Notebook, there are a series of cells divided into three categories:

- Numpy: This is a Python package to deal with numerical data of the same type
- Pandas: A Python package to deal with data organised as tables
- Matplotlib/Seaborn: Packages to deal with plotting data

Each section below has a series of cells with very simple but common ways to manipulate data. Each cell can be run independently and may have comments. However we expect you to follow the instructor as he progresses through the Notebook. 

### Numpy

In [None]:
# A Python list with some numbers
l1 = [1, 8.9, 9.9, 4.2, 2]

# Note the list contains elements that are either integers or floating point numbers

In [None]:
# Let's sum them
l1sum = 0
for val in l1:
    l1sum += val
    
print (l1sum)

Sum of all elements of a list or array is a very common operation. It would be most convenient to simply request the sum rather than programming it all the time. 

In [None]:
import numpy as np

In [None]:
l1 = np.array(l1)

print (l1.sum())

#### Why do we bother?

In [None]:
N = 100000
xl = [np.random.rand() for i in range(N)] 

In [None]:
%%timeit 
xlsum = 0
for val in xl:
    xlsum += val
    

In [None]:
x = np.random.rand(N)

In [None]:
%timeit xsum = x.sum()

That's why. Using Numpy functions is much faster than crafting computations with our own for loops

In [None]:
x = np.array([8, 4, 16, 32, 2], dtype=np.int)  # An array of integer types
xmed = np.median(x).astype(int)
xmean = np.mean(x)
xstd = np.std(x)
print ('The median is: {:d}'.format(xmed))
print ('The mean is: {:f}'.format(xmean))
print ('The standard deviation is: {:2.2f}'.format(xstd))

In [None]:
# Sorting 
xs = np.sort(x)
xs

In [None]:
idxs = np.argsort(x)      # Finds the positions of x that sorts the values
print (x)
print (idxs)              
print (x[idxs])           # The elements of x that correspond to positions 'idxs'

In [None]:
# Indexing
print (x)      # Array x 
print (x[0])   # Value at position zero
print (x[-1])  # Value at last position

In [None]:
# Slicing
print (x)
print (x[1:4])    # Positions 1, 2, 3
print (x[-3:-1])  # Positions 2 and 3 or (5-3 and  5-2)

In [None]:
# More slicing
print (x)
print (x[1:])      # all values from position 1 
print (x[:-1])     # all positions except last one
print (x[1:-1:2])  # every other value from position 1 to previous to last

In [None]:
# some sums
print (x)
print (x[1:] + x[:-1])  # a sum 
print (np.cumsum(x))    # a cumulative sum

In [None]:
# Some computation
w0, w1, w2 = -3, 2, 1
W = np.array([w0, w1, w2])
y = np.dot(W, np.sort(x[0:3]))
print (np.sort(x[0:3]))
print (W)
print (y)


# Note the above y value is the same as:
print (np.sum(W*np.sort(x[0:3])))

#### Working with 2D arrays

In [None]:
x2 = np.empty((2,3))          # An empty array
x2

In [None]:
a = np.arange(6).reshape(2,3)
b = a**2
print (a)
print (b)
np.multiply(a, b, out = x2)
print (x2)

# Note
print (a*b)

In [None]:
print (x2)
print (x2[:,1])   # middle column

In [None]:
print (x2)
print (x2[1,:][1:])  # last two columns in second row

In [None]:
# Transpose
print (x2.T)

In [None]:
print (x2[0:2,0:2])
print (np.linalg.inv(x2[0:2,0:2]))  # inverse of that slice

In [None]:
# Extracting diagonal
x2.diagonal()

In [None]:
# Min and max of whole array
np.min(x2), np.max(x2)

In [None]:
np.random.shuffle(x2.ravel())
print (x2)
x2.min(axis=0)   # Minimum over each column

In [None]:
print (x2)
x2.max(axis=1)   # Maximum over each row

In [None]:
# Masking
mask = x2 >= 27
print (x2)
print (mask)

In [None]:
print (x2[mask])

#### Please read any documentation you find and like
https://docs.scipy.org/doc/numpy-1.15.1/user/quickstart.html

In [None]:
import pandas as pd

In [None]:
mal_class = pd.read_excel('Week3Tutorial.xlsx')
print (mal_class.shape[0])
mal_class.head()

In [None]:
mal_class.drop_duplicates(subset='Id', inplace=True)
print (mal_class.shape[0])
mal_class.head()

In [None]:
mal_class.drop(columns=['Other'], inplace=True, axis=1)
mal_class.head()

In [None]:
n = mal_class.shape[0]
height = np.random.randint(150, 190, n)
weight = np.random.rand(n)*35 + 50
print (height)

In [None]:
mal_class['Height'] = height
mal_class['Weight'] = weight

In [None]:
mal_class

#### BMI index 

The BMI (Body Mass Index has the formula):

$$ BMI = \frac{weight}{height(cm)^2} $$

- A fit person is with a BMI between 18.5 to 24.9
- Underweight if BMI is under 18.5
- Overweight if BMI is over 25


In [None]:
# Let's add a calculation of the BMI
mal_class['BMI'] = mal_class['Weight']/(0.01*mal_class['Height'])**2
mal_class.head()

In [None]:
# Number of males and females in the class
mal_class['Gender'].value_counts()

In [None]:
# Fit people
mal_class[(mal_class['BMI'] >= 18.5) & (mal_class['BMI'] <= 24.9)]['Name']


In [None]:
# Underweight
mal_class[mal_class['BMI'] < 18.5].BMI

In [None]:
# Overweight
ow = mal_class[mal_class['BMI'] > 24.9].BMI

mal_class.loc[ow.index]['Name']

### Visualisation basics

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)

%matplotlib inline

In [None]:
plt.scatter(x = mal_class['Height'], y = mal_class['BMI'])

In [None]:
plt.scatter(x = mal_class['Height'], y = mal_class['BMI'])
plt.xlabel('Height in cm')
plt.ylabel('BMI');

In [None]:
plt.scatter(x = mal_class['Height'], y = mal_class['BMI'], c='k', s = 4)
plt.xlabel('Height in cm')
plt.ylabel('BMI');
plt.xlabel('Height in cm')
plt.ylabel('BMI');

In [None]:
data1 = mal_class.groupby(['Gender'])['Weight'].mean()

data1

In [None]:
plt.bar(data1.index, data1.values, alpha=0.4)
plt.title('Weight mean in Kgs of males and females ');

In [None]:
# A pie chart of underweight, fit and overweight
# Underweight
u = mal_class[mal_class['BMI'] < 18.5].BMI.count()
f = mal_class[(mal_class['BMI'] >= 18.5) & (mal_class['BMI'] <= 24.9)].BMI.count()
o = mal_class[mal_class['BMI'] > 24.9].BMI.count()

plt.pie(x = [u, f, o], labels = ['underweight', 'fit', 'overweight'])
plt.title('Distribution of BMI across the class');

In [None]:
# A density curve of the BMI
sns.kdeplot(mal_class.BMI, shade=True, color = "r")
plt.title('Kernel Density Estimation for BMI');

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
sns.kdeplot(mal_class.BMI, shade=True, color = "r")
plt.title('Kernel Density Estimation for BMI');

In [None]:
# A different plot
sns.regplot(x="Weight", y="BMI", data=mal_class);

Have a look at this, if you have some time
https://robots.thoughtbot.com/analyzing-minards-visualization-of-napoleons-1812-march

### Try this tutorial on your own time

https://www.lynda.com/Pandas-training-tutorials/7715-0.html

Use this time to try things out!

- Now you can go back and run again all tutorials from Lecture 1
- And from Lecture 2 too!
- There are plenty of examples of *numpy*, *pandas* and of different plots 

