
# NumPy Arrays and Pandas Series tutorial

<a href = #numpy> $\S$ 1. NumPy Array</a>

- <a href = #array_index> 1.1. NumPy Indexing</a>
    - <a href = #array_index1> 1.1.1 Array Slicing </a>
    - <a href = #array_index2> 1.1.2  Integer Array Indexing</a> 
    - <a href = #array_index3> 1.1.3 Boolean array indexing</a>
- <a href = #array_math> 1.2 Array Math </a>

<a href = #pandas> $\S$ 2. Pandas Series</a>
- <a href = #pandas_alignment>Alignment of Series objects</a>



<a id = 'numpy'></a>
## NumPy Arrays

Based on the tutorial by Justin Johnson: http://cs231n.github.io/python-numpy-tutorial/#numpy-array-indexing

#### NumPy Array

A **numpy array** is a 
* _grid of values_,
* all of the _same type_, 
* is _indexed by a tuple of nonnegative integers. 

**Array _rank_** is the number of dimensins

**Array _shape_** is a tuple of integers that give the size of the array along each dimension


A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [2]:
import numpy as np

# Initialize array form a Python list

r1 = np.array([1,2,3]) # Create a rank 1 array
print (type(r1))
print ('r1 shape:',r1.shape)

print (r1[2],r1[1], r1[0]) # Access elements of an array by index using [] 
r1[0] = 5 # mutable
print (r1)

r2 = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array. Note that we have nested list!
print ('r2 shape:',r2.shape) # prints (2,3)
print (r2[0,1],r2[1,1])


<class 'numpy.ndarray'>
r1 shape: (3,)
3 2 1
[5 2 3]
r2 shape: (2, 3)
2 5


NumPy also has a bunch of functions that allow to create simple arrays

In [28]:
z = np.zeros((2,3)) # Create a 2x3 array of zeros
print (z)

[[ 0.  0.  0.]
 [ 0.  0.  0.]]


In [27]:
one = np.ones((3,3))# Create an array of ones
print(one)

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]


In [30]:
c = np.full((2,3),7)# Create an array of constants
print(c)

[[7 7 7]
 [7 7 7]]


In [32]:
identity = np.eye(5) # Creates 5x5 identity matrix
print (identity)

[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]


In [34]:
rand = np.random.random((2,2)) # Creates an array filled with random numbers
print (rand)

[[ 0.87757053  0.78561909]
 [ 0.87791236  0.45503234]]


**arange()** will create arrays with regularly incrementing values

In [40]:
arr1 = np.arange(9)
print (arr1)
print (" ")
arr2 = np.arange(3,10)
print (arr2)
print (" ")
arr3 = np.arange(5,10,0.25)
print (arr3)
print (" ")

[0 1 2 3 4 5 6 7 8]
 
[3 4 5 6 7 8 9]
 
[ 5.    5.25  5.5   5.75  6.    6.25  6.5   6.75  7.    7.25  7.5   7.75
  8.    8.25  8.5   8.75  9.    9.25  9.5   9.75]
 


**linspace()** will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values. 

**linspace()** has an advantage over the **arange()**: the former can guarantee the number of elements and the starting and end point, which arange() generally will not do for arbitrary start, stop, and step values.

In [46]:
l1 = np.linspace(1,4,6) # 1 is a starting point, 4 is an end, 6 is a step . Note that the end is included!
print(l1)

l2 = np.linspace(2,8,3)
print(l2)

[ 1.   1.6  2.2  2.8  3.4  4. ]
[ 2.  5.  8.]


In [3]:
#Exercise1: Are a and b the same?

a = np.arange(1,10,2)
b = np.linspace(1,10,2)
print(a)
print(b)

[1 3 5 7 9]
[  1.  10.]


<a id = 'array_index'></a>
### 1.1 Array indexing

There are several ways to index into arrays.


<a id = 'array_index1'></a>
#### 1.1.1 Array slicing
Similar to Python lists, to access the elements of an array, we can slice them. However, keep in mind that since arrays may be multidimensional, you must specify a slice for each dimension of the array.

In [23]:
import numpy as np
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [29]:
b = a[:2,1:3] # get first two rows and 2nd and 3rd columns
b

array([[2, 3],
       [6, 7]])

**Note!** A slice of an array is a view into the same data, so modifying it will modify the original array.

In [24]:
print(a[0,1])

2


In [30]:
b[0,0]=77

In [31]:
b

array([[77,  3],
       [ 6,  7]])

In [33]:
print(a[0,1])

77


In [36]:
# Exercise2
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
b = a[:2,1:3] # get first two rows and 2nd and 3rd columns
b[0,0]=99
# What is the output of the following line?
print(a[0,1])

99


We can also **mix integer indexing with slicing indexing**. However, doing so will result in **an array of lower rank**

In [45]:
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(a)
print('shape a:',a.shape)
row_r1 = a[1,:] # returns a rank 1 view of the second row of a
print(row_r1)
print('shape of row_r1:',row_r1.shape)
row_r2 = a[1:2,:] # returns rank 2 view of the second row of a
print(row_r2)
print('shape of row_r2:',row_r2.shape)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
shape a: (3, 4)
[5 6 7 8]
shape of r1: (4,)
[[5 6 7 8]]
shape of r2: (1, 4)


In [49]:
# Same distinction with columns
col_r1 = a[:,1] # returns a rank 1 view of the second column of a
print(col_r1)
print('shape of col_r1:',col_r1.shape)
col_r2 = a[:, 1:2] # returns rank 2 view of the second row of a
print(col_r2)
print('shape of col_r2:',col_r2.shape)

[ 2  6 10]
shape of col_r1: (3,)
[[ 2]
 [ 6]
 [10]]
shape of col_r2: (3, 1)


<a id ='array_index2'></a>
#### 1.1.2 Integer array indexing

With slicing, you create a view which is always a subarray of the original array. In contrast, with **integer indexing** you can construct arbitrary array using the data from another array.

In [6]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

b= a[[0,1,2],[0,1,2]] # First list indicates on the array's dimension; 2nd list - on the element within each of the dimensions
print(b)

#OR equivalently, you can do integer indexing in the other way:

c =np.array([a[0,0],a[1,1],a[2,2]])
print(c)


# Exercise: what is the output of the following code?
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(a[[0, 0], [1, 1]])


# Equivalent to the previous integer array indexing example
print(np.array([a[0,1],a[0,1]]))



[1 5 9]
[1 5 9]
[2 2]
[2 2]


**Useful trick**:  with integer array indexing we can select or mutate one element from each row of a matrix:

In [33]:
# Create an array from which we will select values

a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])

# Create an array of indices

idx = np.array([0,1,2,2])

#Print one element of each row of a using idx
print(a[np.arange(4),idx])

# Mutate one element from each row of a using the indices in idx

a[np.arange(4),idx]+=10

#a[np.arange(4),idx].copy+=12
print(a)
a.copy()[np.arange(4),idx]+=10
print(a)

[ 1  5  9 12]
[[11  2  3]
 [ 4 15  6]
 [ 7  8 19]
 [10 11 22]]
[[21  2  3]
 [ 4 25  6]
 [ 7  8 29]
 [10 11 32]]


In [61]:
# Exercise: what is the output of the following code?
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
idx = np.array([0,1,2,2])

a[np.arange(4),idx]+=np.array([1,1,1,1])
a.copy()[np.arange(4),idx]+=10
print(a)

[[ 2  2  3]
 [ 4  6  6]
 [ 7  8 10]
 [10 11 13]]


In [63]:
#Answer:
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
idx = np.array([0,1,2,2]) # Create an array of indices

a[np.arange(4),idx]+=np.array([99,99,99,99]) # Mutate one element from each row of a using the indices in idx. 
                        #Since slicing produces the view, we will change the original values of a
a.copy()[np.arange(4),idx]+=10 #Mutate one element from each row of a COPY of a using the indices in idx
print(a)

[[100   2   3]
 [  4 104   6]
 [  7   8 108]
 [ 10  11 111]]


<a id = 'array_index3'></a>
#### 1.1.3 Boolean array indexing: 

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [70]:
a =  np.array([[1,2],[3,4],[5,6]])
bool_idx = (a>2)
print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


In [72]:
print(a[0][1])
print(a[0,1])
print([a[0,1]])


2
2
[2]


In [74]:
y = (a>=2)

In [76]:
y.mean()

0.83333333333333337

<a id = 'array_math'></a>
### 1.2 Array math

In [88]:
x = np.array([[1,2],[3,4]],dtype = np.float64)
y = np.array([[5,6],[7,8]],dtype = np.float64)

print(x)
print(y)
#Sum
print(x+y)
print(np.add(x,y)) #same result

#Subtraction
print(y-x)
print(np.subtract(y,x))

#Multiplication
print(y*x)
print(np.multiply(x,y))

#Dividion
print(y/x)
print(np.divide(x,y))

#Elementwise square root
print(np.sqrt(x))

[[ 1.  2.]
 [ 3.  4.]]
[[ 5.  6.]
 [ 7.  8.]]
[[  6.   8.]
 [ 10.  12.]]
[[  6.   8.]
 [ 10.  12.]]
[[ 4.  4.]
 [ 4.  4.]]
[[ 4.  4.]
 [ 4.  4.]]
[[  5.  12.]
 [ 21.  32.]]
[[  5.  12.]
 [ 21.  32.]]
[[ 5.          3.        ]
 [ 2.33333333  2.        ]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]


To compute **inner product** of vectors, or multiply matrix by matrix or vector, we use ```dot``` (as a method or a function)

In [98]:
# Inner products
x = np.array([[1,2],[3,4]],dtype = np.float64)
y = np.array([[5,6],[7,8]],dtype = np.float64)

v = np.array([1,1])
w = np.array([2,2])

#Inner product of vectors
print(v.dot(w))
print(np.dot(v,w))

# Matrix - vector product
print(v.dot(x))
print(np.dot(v,x))

#Matrix- matrix multiplication
print(np.dot(x,y))

4
4
[ 4.  6.]
[ 4.  6.]
[[ 19.  22.]
 [ 43.  50.]]


<a id = 'pandas'></a>
## Pandas Series

In [100]:
import pandas as pd

Pandas.Series could be created from different data:

- scalar
    ```python
           pd.Series(99)```
- NumPy array
    ```python
            s = pd.Series(np.arange(5))```
- Python dict
    ```python
            d = pd.Series({'a' : 0., 'b' : 1., 'c' : 2.})```

In [115]:
# Create pd.Series from NumPy array
n = np.arange(5)
print(n)
s = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)

[0 1 2 3 4]
a    0
b    1
c    2
d    3
e    4
dtype: int64


In [117]:
# Create pd.Series from scalar
print(pd.Series(99))

0    99
dtype: int64


In [119]:
# Create pd.Series from a dictionary
print(pd.Series({'a':1.,'b':2.,'c':3.}))

a    1.0
b    2.0
c    3.0
dtype: float64


In [129]:
# Series values and indices:
mySeries = pd.Series({'a':1.,'b':2.,'c':3.}, dtype = 'int')

print(mySeries.values)
print(mySeries.index)
print(mySeries.values.dtype)
print(mySeries.index.dtype)

[1 2 3]
Index(['a', 'b', 'c'], dtype='object')
int64
object


In [133]:
# Another way of speeding the assigning the non-numerical indices:
import string
mySeries2 = pd.Series(np.arange(5),index = list(string.ascii_lowercase[0:len(np.arange(5))]), dtype = 'float')
mySeries2

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [174]:
#Exercise:
mySeries = pd.Series(np.arange(5),index = list(string.ascii_lowercase[0:len(np.arange(5))]), dtype = 'float')
print(mySeries[3:])
print(mySeries[:-2])
myS = mySeries[['a','c','e','f']] 
print(myS)

d    3.0
e    4.0
dtype: float64
a    0.0
b    1.0
c    2.0
dtype: float64
a    0.0
c    2.0
e    4.0
f    NaN
dtype: float64


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


In [177]:
#Answer:
mySeries = pd.Series(np.arange(5),index = list(string.ascii_lowercase[0:len(np.arange(5))]), dtype = 'float')
print(mySeries[3:])
print(mySeries[:-2])
myS = mySeries[['a','c','e','f']] 
print(myS) # doesn't raise the error but adds an index 'f' with value NaN

d    3.0
e    4.0
dtype: float64
a    0.0
b    1.0
c    2.0
dtype: float64
a    0.0
c    2.0
e    4.0
f    NaN
dtype: float64


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


When using non-zero based integer indices we might quickly get confused.

In [140]:
#Example
mySeries3 = pd.Series(np.arange(6,11), index = np.arange(11,16))
print(mySeries3)
print(mySeries3[12])

#What if I don't know the indexing and just want to acces the 1-st element?
print(mySeries3[1]) # get an ERROR

11     6
12     7
13     8
14     9
15    10
dtype: int64
7


KeyError: 1

**Hint:** use ```.iloc[]``` for a position based lookup and ```.loc[]``` for a label-based lookup

In [146]:
print(mySeries3.iloc[1])# postion-based search
print(mySeries3.loc[12]) # label-based search

7
7


<a id = 'pandas_alignment'></a>
### Alignment of Series objects

**Notes**:
 - *Addition* of Series objects associates values with **mathcing** indices
 - *Addition* of Series objects with **non-mathcing** indices produces ```NaN```s

In [148]:
s1 = pd.Series([1,2,3,4],index = ['alpha','beta','gamma','kappa'])
s2 = pd.Series([100,101,102,103],index = ['beta','gamma','lambda','phi'])
print(s1+s2)

alpha       NaN
beta      102.0
gamma     104.0
kappa       NaN
lambda      NaN
phi         NaN
dtype: float64


Series Object does not require index to be unique! This might cause a confusion when doing operations

In [152]:
s1 = pd.Series([1,2,3,4],index = ['a','b','a','d'])
s2 = pd.Series([100,101,102,103],index = ['a','a','d','l'])
print(s1+s2) # Creates a cartesian object of the set of all unique index labels in both series objects

a    101.0
a    102.0
a    103.0
a    104.0
b      NaN
d    106.0
l      NaN
dtype: float64


In [113]:
#Exercise:
#Are the values of NumPy array and Pandas Series the same (ignoring the series index): 
n = np.arange(5)
s = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(n[1:])
print(s[1:])
# How about the results of these operations?
print(n[1:]+n[:-1])
print(s[1:]+s[:-1])

[1 2 3 4]
b    1
c    2
d    3
e    4
dtype: int64
[1 3 5 7]
a    NaN
b    2.0
c    4.0
d    6.0
e    NaN
dtype: float64


In [114]:
#Answer
#Are the values of NumPy array and Pandas Series the same (ignoring the series index): 
n = np.arange(5)
s = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(n[1:])
print(s[1:]) # yes, we get the same values 1,2,3,4


# How about the results of these operations?
print(n[1:]+n[:-1])
print(s[1:]+s[:-1]) # Pandas series adds the values with the same index! If the index is missing, NaN is produces

[1 2 3 4]
b    1
c    2
d    3
e    4
dtype: int64
[1 3 5 7]
a    NaN
b    2.0
c    4.0
d    6.0
e    NaN
dtype: float64


**Miscelaneous**:
- Pandas *ignores NaNs* when running operations on Series
- NumPy *does not* ignore NaNs when running opeartions on Arrays


In [160]:
s3 = s1+s2
print(s3)
s4 = pd.Series(np.arange(7),index = s3.index, dtype = 'float')
print(s4)
print(s3+s4)

a    101.0
a    102.0
a    103.0
a    104.0
b      NaN
d    106.0
l      NaN
dtype: float64
a    0.0
a    1.0
a    2.0
a    3.0
b    4.0
d    5.0
l    6.0
dtype: float64
a    101.0
a    103.0
a    105.0
a    107.0
b      NaN
d    111.0
l      NaN
dtype: float64


In [165]:
n1 = np.array([1,2,3,np.NaN,5])
n2 = np.array([2,3,4,5,np.nan])
print(n1)
print(n2)
print(n1+n2)

[  1.   2.   3.  nan   5.]
[  2.   3.   4.   5.  nan]
[  3.   5.   7.  nan  nan]


In [170]:
s_b = s1>1
print(s_b)
print(s_b.all())
print(s_b.any())

a    False
b     True
a     True
d     True
dtype: bool
False
True
