# Introduction to Python Data Analytics
# Part 1: Numpy

Author: Kang P. Lee <br>
References:
- NumPy official website (http://www.numpy.org/) 
- Python Data Science Handbook by Jake VanderPlas (http://shop.oreilly.com/product/0636920034919.do)

## ▪ Importing the NumPy Library

In [1]:
import numpy as np

## ▪ Creating NumPy Arrays from Python Lists

In [2]:
x = np.array([1, 2, 3, 4, 5])
x

array([1, 2, 3, 4, 5])

In [3]:
x = np.array([1, 2, 3, "4", "5"])
x

array(['1', '2', '3', '4', '5'], 
      dtype='<U21')

Unlike Python lists, NumPy does not allow the elements of different types. If types do not match, NumPy will upcast if possible (here, integers are upcasted to strings)

In [4]:
x = np.array([1, 2, 3, "4", "5"], dtype=int)
x

array([1, 2, 3, 4, 5])

In [5]:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

## ▪ Creating NumPy Arrays from Scratch

In [6]:
x = np.zeros(10, dtype=int)         # Create a length-10 integer array filled with zeros
x

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
x = np.ones((5, 5), dtype=float)    # Create a 5x5 floating-point array filled with ones
x

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [8]:
x = np.full((5, 5), 3.14)           # Create a 5x5 array filled with 3.14
x

array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14]])

In [9]:
x = np.arange(0, 10, 2)             # Create an array filled with a linear sequence ranging from 0 to 10, stepping by 2
x

array([0, 2, 4, 6, 8])

In [10]:
x = np.linspace(0, 1, 5)            # Create an array of five values evenly spaced between 0 and 1
x

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [11]:
x = np.random.random((3, 3))        # Create a 3x3 array of uniformly distributed random values between 0 and 1
x

array([[ 0.85049507,  0.51780168,  0.57183451],
       [ 0.51340822,  0.01588431,  0.78927523],
       [ 0.95559845,  0.94629917,  0.23726285]])

In [12]:
x = np.random.normal(0, 1, (3, 3))  # Create a 3x3 array of normally distributed random values with mean 0 and std 1
x

array([[-0.40441992,  0.0610125 , -0.97439389],
       [-0.54840456, -0.64429951,  0.51636466],
       [-1.2149197 , -0.29696141, -0.23576779]])

In [13]:
x = np.random.randint(0, 10, (3, 3)) # Create a 3x3 array of random integers between 0 and 10
x

array([[8, 7, 7],
       [5, 5, 1],
       [3, 6, 5]])

These NumPy functions are very useful when you need to quickly generate an array of values that follow some rule. 

## ▪ NumPy Standard Data Types

Refer to https://docs.scipy.org/doc/numpy/user/basics.types.html

## ▪ NumPy Array Attributes

In [14]:
x = np.random.randint(0, 100, (3, 3))
print(x)
print(x.ndim, x.shape, x.size, x.dtype)

[[14 68 90]
 [ 0  8 25]
 [95 19 68]]
2 (3, 3) 9 int64


## ▪ Array Indexing & Slicing

In [15]:
x = np.random.randint(0, 100, 10)
x

array([70, 78, 59, 58, 20, 28, 91, 29, 60, 78])

In [16]:
x[0]

70

In [17]:
x[-1]

78

In [18]:
x[3:-3]

array([58, 20, 28, 91])

In [19]:
x[3:-3:2]

array([58, 28])

In [20]:
x[::]

array([70, 78, 59, 58, 20, 28, 91, 29, 60, 78])

In [21]:
x[::-1]

array([78, 60, 29, 91, 28, 20, 58, 59, 78, 70])

In [22]:
x = np.random.randint(0, 100, (5, 5))
x

array([[26, 69, 27, 92, 90],
       [97, 66, 21, 49, 43],
       [16, 17, 13, 30, 79],
       [80, 18, 26, 59, 47],
       [88, 87, 81, 78, 49]])

In [23]:
x[1, 2]

21

In [24]:
x[:2, :3]

array([[26, 69, 27],
       [97, 66, 21]])

## ▪ Array Concatenation and Splitting

In [25]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y])                   # Concatenate one-dimensional arrays.

array([1, 2, 3, 4, 5, 6])

In [26]:
x + y

array([5, 7, 9])

You cannot use the + operator for list concatenation unlike Python lists.

In [27]:
x = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([x, x])                   # Concatenate two-dimensional arrays along the first axis (axis 0).

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [28]:
x = np.array([[1, 2, 3], [4, 5, 6]])     
y = np.concatenate([x, x], axis=1)       # Concatenate along the second axis (axis 1).
y

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [29]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x1, x2, x3 = np.split(x, [3, 5])         # 3 and 5 are the split points.
print(x1, x2, x3)

[1 2 3] [4 5] [ 6  7  8  9 10]


Splitting is the opposite of concatenation.

## ▪ Computation on NumPy Arrays

In [30]:
x = np.array([1, 2, 3, 4, 5])
x + 5

array([ 6,  7,  8,  9, 10])

In [31]:
y = [1, 2, 3, 4, 5]
y + 5

TypeError: can only concatenate list (not "int") to list

Note that primitive Python lists do not allow computation on lists.

In [32]:
x = np.array([1, 2, 3, 4, 5])
x ** 2         # x to the power of 2

array([ 1,  4,  9, 16, 25])

In [33]:
x = np.array([1, 2, 3, 4, 5])
-x

array([-1, -2, -3, -4, -5])

In [34]:
x = np.array([-1, 2, -3, 4, -5])
np.abs(x)     # absolute value

array([1, 2, 3, 4, 5])

In [35]:
x = [1, 2, 3]
np.exp(x)      # exponential (= e^x)

array([  2.71828183,   7.3890561 ,  20.08553692])

In [36]:
x = [1, 2, 3]
np.power(3, x) # power (= 3^x)

array([ 3,  9, 27])

In [37]:
x = [1, 2, 4, 10]
np.log(x)      # ln(x)

array([ 0.        ,  0.69314718,  1.38629436,  2.30258509])

In [38]:
x = [1, 2, 4, 10]
np.log2(x)     # log2(x)

array([ 0.        ,  1.        ,  2.        ,  3.32192809])

In [39]:
x = [1, 2, 4, 10]
np.log10(x)    # log10(x)

array([ 0.        ,  0.30103   ,  0.60205999,  1.        ])

In [40]:
x = np.array([1, 2, 3])
y = np.array([1, 3, 5])
x + y

array([2, 5, 8])

In [41]:
x * y

array([ 1,  6, 15])

In [42]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
np.dot(x, y)

array([[19, 22],
       [43, 50]])

## ▪ Aggregations

In [43]:
x = np.random.rand(10)
print(x)
print(x.sum(), x.mean(), x.var(), x.std(), x.min(), x.max(), x.argmin(), x.argmax())

[ 0.82326426  0.06595784  0.44978259  0.97540784  0.14509169  0.19871735
  0.74472229  0.42618988  0.00179506  0.26142489]
4.09235368001 0.409235368001 0.102623892822 0.32034964152 0.00179505629075 0.975407844931 8 3


## ▪ Comparisons

In [44]:
x = np.array([1, 2, 3, 4, 5])
x

array([1, 2, 3, 4, 5])

In [45]:
x < 3          # Return an array of answers.

array([ True,  True, False, False, False], dtype=bool)

In [46]:
x == 3

array([False, False,  True, False, False], dtype=bool)

## ▪ Working with Boolean Arrays

In [47]:
x = np.random.randint(1, 10, [3, 3])
x

array([[3, 1, 9],
       [2, 5, 3],
       [4, 5, 4]])

In [48]:
np.count_nonzero(x < 5)     # Count the number of True values, i.e., less than 5.

6

In [49]:
np.sum(x < 5)               # Interpret True as 1 and False as 0 and sum all. 

6

In [50]:
np.sum((x > 3) & (x < 8))   # Boolean operators can be used.

4

## ▪ Boolean Arrays as Masks 

Boolean arrays can be used as masks to select particular subsets of the data themselves.

In [51]:
x = np.random.randint(1, 10, [3, 3])
x

array([[4, 5, 1],
       [3, 7, 6],
       [1, 5, 8]])

In [52]:
x < 5

array([[ True, False,  True],
       [ True, False, False],
       [ True, False, False]], dtype=bool)

In [53]:
x[x < 5]     # Select the subset of x that meets the condition.

array([4, 1, 3, 1])

## ▪ Sorting NumPy Arrays

In [54]:
x = np.random.choice(10, 5, replace=False)
x

array([8, 0, 2, 7, 6])

In [55]:
np.sort(x)

array([0, 2, 6, 7, 8])

In [56]:
x               # x hasn't changed.

array([8, 0, 2, 7, 6])

In [57]:
x.sort()
x               # x has changed.

array([0, 2, 6, 7, 8])

In [58]:
x = np.random.choice(10, 5, replace=False)
x

array([3, 7, 8, 5, 0])

In [59]:
np.sort(x)

array([0, 3, 5, 7, 8])

In [60]:
np.argsort(x)   # Return the indices of the sorted elements, instead of the elements.

array([4, 0, 3, 1, 2])