Book Reference: page 85- of **Python for Data Analysis Book by Wes McKinney**

## NumPy (Numerical Python)

- lingua franca for data exchange (for most computational packages providing scientific functionality)

a survey of things found in NumPy:
- `ndarray`, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped
files.
- Linear algebra, random number generation, and Fourier transform capabilities.
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Why is NumPy so important for numerical computations in Python?
- It is designed for efficiency on large arrays of data.
    - internally stores data in a contiguous block of memory, independent of other built-in Python objects
    - written in C 
    - uses less memory than built-in Python sequences
    - perform complex computations on entire arrays without the need for Python for loops

To give you an idea of the performance difference, consider a NumPy array of 1,000,000 integers and the equivalent Python list:

In [3]:
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))

In [4]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 34 ms


In [5]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 1.01 s


NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.

## ndarray
- **N- dimensional array**
- generic multidimensional container for homogeneous data; that is, allof the elements must be the same type
- Every array has 
    - a shape, a tuple indicating the size of each dimension &
    - a dtype,

In [9]:
import numpy as np
    
# Generate some random data
data = np.random.randn(2, 3)
data

array([[ 0.44844059,  1.46494975, -0.56300216],
       [-0.450931  , -2.21511053,  0.09484304]])

In [10]:
data * 10

array([[  4.48440593,  14.64949751,  -5.63002159],
       [ -4.50931002, -22.15110532,   0.9484304 ]])

In [11]:
data + data

array([[ 0.89688119,  2.9298995 , -1.12600432],
       [-0.901862  , -4.43022106,  0.18968608]])

In [14]:
data.shape

(2, 3)

In [15]:
data.dtype

dtype('float64')

### Creating ndarrays

In [19]:
# list to ndarray

data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [18]:
# nested seq to ndarray
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [20]:
arr2.ndim # outputs num of dimensions

2

In [21]:
arr2.shape 

(2, 4)

In [22]:
# special arrays
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [24]:
np.empty((2,3,2))

array([[[1.20833701e-311, 3.16202013e-322],
        [0.00000000e+000, 0.00000000e+000],
        [2.44029516e-312, 1.78572245e-051]],

       [[9.77595263e+165, 2.12650141e+160],
        [1.51135411e+160, 2.19284729e-076],
        [1.68715838e+160, 3.21930045e-057]]])

In [25]:
# np.empty may return uninitialized garbage values in some cases, 
# not always zero!!

In [27]:
np.arange(15)
# like the range Python built-in function

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Data types for ndarrays

In [30]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1.dtype

dtype('float64')

### Arithmetic with NumPy Arrays
- Vectorization
    - batch operations on data without writing any for loops
    - Any arithmetic operations between **equal-size** arrays applies the operation element-wise
- Broadcasting
    - Operations between differently sized arrays

In [32]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [33]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [34]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [35]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [36]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

### Basic Indexing and Slicing

In [37]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [38]:
arr[5:8]

array([5, 6, 7])

In [39]:
arr[5:8] = 12 # the value 12 is propagated (or broadcasted henceforth) to the entire selection
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [40]:
arr_slice = arr[5:8] # a slice **view** of the values with index 5 to 8
arr_slice
arr_slice[1] = 12345 
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [41]:
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

As NumPy has been designed to be able to work with very large arrays, **you could imagine performance
and memory problems if NumPy insisted on always copying data.**

to copy data:
```
arr[5:8].copy()
```

In [42]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) 
# 2 × 2 × 3 array
# count number of items outermost brackets to innermost 
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [44]:
arr3d[0] # 2 x 3 array

array([[1, 2, 3],
       [4, 5, 6]])

In [45]:
arr3d[1, 0]

array([7, 8, 9])