# Chapter 2: Introduction to NumPy 
Jake VanderPlas [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)

- NumPy is short for Numerical Python.
- NumPy arrays provide efficient storage and data operations.
- ndarray object

# 0. Review: Python Objects

![image.png](attachment:image.png)

### Integer:

A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value. This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically.

In [2]:
type(9324234)

int

### List:

Mutable. Can be heterogeneous.

In [3]:
L = list(range(10))
print(L,"\n"
      "objecttype:", type(L), "\n"
      "datatype:", type(L[0]))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
objecttype: <class 'list'> 
datatype: <class 'int'>


In [4]:
L2 = [str(c) for c in L]
print(L2,"\n"
      "objecttype:", type(L2), "\n"
      "datatype:", type(L2[0]))

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] 
objecttype: <class 'list'> 
datatype: <class 'str'>


In [5]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

### Array:
Mutable. Never heterogeneous. The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [6]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# 1. NumPy Data Types

bool
int
float
complex

# 2. NumPy Array

NumPy has capability to efficiently store data (e.g. ndarray of NumPy package) and use operations on the stored data.

In [7]:
import numpy as np

### NumPy Array from Python List

In [8]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

### Multidimensional Array

In [9]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### Arrays from Scratch:  

np.zeros
np.ones
np.full - constant
np.arrange - steps
np.linspace - linear sequence
np.random.random - uniform distribution; vaues 0 to 1
np.random.normal - normal distribution; mean 0, sd 1
np.random.randint -  random integers 
np.eye - identity matrix
np.empty - whatever's in memory

e.g. np.random.randint(10, size=(3, 4, 5))

### Array Attributes:

.ndim
.shape
.size
.dtype
.itemsize
.nbytes = itemseize * size
.copy()
.reshape(x,y)

### Arrays Joined

np.concatenate(array1,array2,array3)
np.concatenate([grid, grid], axis=1)
np.vstack([x, grid])
np.hstack([grid, y])

In [36]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [37]:
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

### Arrays Split

np.split
np.hspit
np.vsplit

In [34]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [35]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


### Array Indexing

index starts at 0; can be negative

### Array Slicing

x[start:stop:step]

# 3. NumPy Arrays Computation

Operator 	Equivalent ufunc 	Description
+ 	np.add 	Addition (e.g., 1 + 1 = 2)
- 	np.subtract 	Subtraction (e.g., 3 - 2 = 1)
- 	np.negative 	Unary negation (e.g., -2)
* 	np.multiply 	Multiplication (e.g., 2 * 3 = 6)
/ 	np.divide 	Division (e.g., 3 / 2 = 1.5)
// 	np.floor_divide 	Floor division (e.g., 3 // 2 = 1)
** 	np.power 	Exponentiation (e.g., 2 ** 3 = 8)
% 	np.mod 	Modulus/remainder (e.g., 9 % 4 = 1)

Function Name 	NaN-safe Version 	Description
np.sum 	np.nansum 	Compute sum of elements
np.prod 	np.nanprod 	Compute product of elements
np.mean 	np.nanmean 	Compute mean of elements
np.std 	np.nanstd 	Compute standard deviation
np.var 	np.nanvar 	Compute variance
np.min 	np.nanmin 	Find minimum value
np.max 	np.nanmax 	Find maximum value
np.argmin 	np.nanargmin 	Find index of minimum value
np.argmax 	np.nanargmax 	Find index of maximum value
np.median 	np.nanmedian 	Compute median of elements
np.percentile 	np.nanpercentile 	Compute rank-based statistics of elements
np.any 	N/A 	Evaluate whether any elements are true
np.all 	N/A 	Evaluate whether all elements are true

Operator 	Equivalent ufunc 		Operator 	Equivalent ufunc
== 	np.equal 		!= 	np.not_equal
< 	np.less 		<= 	np.less_equal
> 	np.greater 		>= 	np.greater_equal

np.count_nonzero
np.sum(x < 6, axis=1)
np.any(x > 8)
np.all(x < 10)
np.all(x == 6)
x[x < 5]

Operator 	Equivalent ufunc 		Operator 	Equivalent ufunc
& 	np.bitwise_and 		| 	np.bitwise_or
^ 	np.bitwise_xor 		~ 	np.bitwise_not

np.sort
np.argsort - returns indices
np.partition
np.argpartition

# 4. Broadcasting

![image.png](attachment:image.png)

In [None]:
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]
z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

# 5. Structured Arrays

In [38]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

In [42]:
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


In [43]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [45]:
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')