# 1- DataTypes in python
## Python Integer is more than just an Integer
* C int points to position in memory whose bytes encode integer value.
* Python int points to position in memory containing all the python object information:
  * PyObject_HEAD (ob_refcnt, ob_type, ob_size)
  * ob_digit: int value of the Python variable
## Python List is more than just a List
* NumPy array:
  * PyObject_HEAD + dimensions + strides
  * data: pointing directly to raw data (e.g. C int etc...)
* List in python handle heterogeneous data types are more complex:
  * PyObject_HEAD + length
  * items: pointing to memory location itself pointing to python objects (each having PyObject_HEAD etc...)
* so a list containing a single datatype is unnecessarily memory expensive

# 2- Arrays - Basics
## Python fixed-type arrays
* Introduced in py3.3, efficient storage but less powerful than np.array

In [1]:
import array
array.array("i", [1, 2, 3])

array('i', [1, 2, 3])

## NumPy arrays: datatypes
* manually set with "dtype" arg, can use enum in np (e.g. dtype=np.int16)
* similar to C: int16, int32, float16, float32, uint8 etc...)
## NumPy arrays: create from python list

In [13]:
import numpy as np
# create int array from a list and type is allocated by default
arr_int = np.array([1, 5, 6, 7])
print("array from int list:", repr(arr_int))
# numpy upcasts when possible (float+int -> everything is float)
arr_upcast = np.array([1.5, 2, 2.5])
print("array from mix of int and float list:", repr(arr_upcast))
# set dtype manually
arr_float = np.array([1.5, 2.4], dtype=np.float16)
print("array from float list:", repr(arr_float))
# numpy arrays can be multidimensional (3 nested lists = 3x3 matrix)
arr_3d = np.array([range(i, i+3) for i in [2, 4, 6]])
print("array from list containing 3 lists:", repr(arr_3d))

array from int list: array([1, 5, 6, 7])
array from mix of int and float list: array([1.5, 2. , 2.5])
array from float list: array([1.5, 2.4], dtype=float16)
array from list containing 3 lists: array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])


## NumPy arrays: create from built-in methods

In [14]:
# Arrays of same numbers
np.zeros(10, dtype=int)
np.ones((3, 5), dtype=float)  # 3x5 matrix of ones
np.full((3, 5), 3.14)  # 3x5 matrix of 3.14
# Arrays defined from ranges
np.arange(0, 20, 2)  # similar to classic range but return an array
np.linspace(0, 1, 5)  # create 5 entries evenly spaced between 0 and 1
# Random data generators (examples for 3x3 matrix)
np.random.random((3, 3))  # uniform distribution
np.random.normal(0, 1, (3, 3))  # normal distribution, mean 0, sd 1
np.random.randint(0, 10, (3, 3))  # random int between 0 and 10

array([[2, 5, 4],
       [7, 6, 5],
       [2, 6, 9]])

# 3- NumPy arrays - Introduction
## NumPy Array - Key attributes

In [1]:
import numpy as np
np.random.seed(0)  # seed to reproduce
# Create 3 example arrays: 1d, 2d, 3d - used across examples of point 3-
x1 = np.random.randint(10, size=6)  # 1d 6 entries
x2 = np.random.randint(10, size=(3, 4))  # 2d 12 entries
x3 = np.random.randint(10, size=(3, 4, 5))  # 3d 60 entries

# Structure of the array: ndim, shape, size
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size:", x3.size)
# Memory size and datatype
print("x3 itemsize:", x3.itemsize)
print("x3 nbytes:", x3.nbytes)
print("x3 size * itemsize:", x3.size * x3.itemsize)
print("x3 dtype", x3.dtype)

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
x3 itemsize: 8
x3 nbytes: 480
x3 size * itemsize: 480
x3 dtype int64


## NumPy Array - Indexing and Slicing
* Indexing:
  * 1d Array: use traditional Python indexing
  * Multi-dim arrays: use comma separated tuple
  * You can also modify array values with indexing
* Slicing:
  * Same as standard Python, for array x1, x1[[start:stop:step]] with default start=0 and step=1
  * Multi-dim arrays: use comma separated tuple with each containing desired start:stop:step
  * Sliced np arrays are not copies:
    * sliced python lists return new list
    * sliced np arrays return slice of source data which can be modified directly from sliced array
    * you can override this behaviour by using sliced_array.copy()

In [5]:
print("Indexing")
print(repr(x2))
print(x2[2, 0])  # row 2, column 0
print(x2[2, -1])  # Row 2, last column
x2[0, 0] = 10  # Modify Row 0, Column 0
print(x2[0, 0])

print("Multi-dimension array slicing")
print(repr(x2))
print(repr(x2[:2, :3]))  # 2 rows, 3 columns
print(repr(x2[:3, ::2]))  # 3 rows, every other columns
print(repr(x2[::-1, ::-1]))  # Reverse rows and columns

print("Accessing a single vector")
print(x2[:, 0])  # all values in first column
print(x2[0, :])  # all values in first row

Indexing
array([[10,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])
1
7
10
Multi-dimension array slicing
array([[10,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])
array([[10,  5,  2],
       [ 7,  6,  8]])
array([[10,  2],
       [ 7,  8],
       [ 1,  7]])
array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 10]])
Accessing a single vector
[10  7  1]
[10  5  2  4]


In [None]:
## NumPy Array - operations: reshape, concatenate and split
* Reshaping (not all operations below create copies of array):
  * Use array.reshape() method and pass new shape in tuple
  * Use slicing with newaxis keyword (only when 1 axis to be added)
* Contatenation:
  * use array.concatenate() method and pass arrays to concat in list
  * 2d arrays: by default stack on rows, but can use axis=1 to stack side-by-side
  * mixed dim arrays:
    * use array.vstack() for explicit vertical stacking (rows)

In [4]:
print(x1)
print(x1.reshape(2, 3))  # reshape x1 (6 entries) as 2x3 matrix
print(x1[:, np.newaxis])  # reshape x1 as 6x1 matrix

[5 0 3 3 7 9]
[[5 0 3]
 [3 7 9]]
[[5]
 [0]
 [3]
 [3]
 [7]
 [9]]
