# Introduction to Array Operations in Python

## Meenal Jhajharia

### Meenal Jhajharia. she/her.

- CS and Math undergrad, University of Delhi
- PyMC core contributor | GSoC student
- Contact: [meenal@mjhajharia.com](mailto:meenal@mjhajharia.com) | [mjhajharia.com](https://mjhajharia.com)

<figure style="display: table; margin: 0 auto">
  <center>
    <img src="banner.png" width="850vmin" style="padding: 4vmin 0 0 0">
  </center>
</figure>

This banner is generated from [this code](https://raw.githubusercontent.com/pymc-devs/pymc-data-umbrella/main/banner.py), the code in this link is a trivial customization of the [original code](https://github.com/pymc-devs/pymcon/blob/gh-pages/assets/make_trajectories.py) by [Colin Caroll](https://colindcarroll.com/) who designed a [similar banner for pymcon’20](https://pymcon.com/), Colin is amazing at visualization stuff and even has a couple of talks about it!!

Overview 

- Introduction
- Python Objects
- List Comprehension
- Basics of NumPy

#### Why Python?

- Useful for quick prototyping
- Dynamically Typed, Interpreted, High level data types
- Large number of scientific open source software

Best Place to learn more : [Official Python Tutorial](https://docs.python.org/3/tutorial/index.html)

#### Let's get started!

Here's what you need to begin


- Github Repo: [pymc-devs/pymc-data-umbrella](https://github.com/pymc-devs/pymc-data-umbrella)
- Working installation of Python3
- A terminal (Windows or Unix)
- Knowledge of an OOP would be nice to have

<figure style="display: table; margin: 0 auto">
  <center>
    <img src="data_types.png" width="850vmin" style="padding: 4vmin 0 0 0">
  </center>
</figure>

#### Numbers

Certain numeric modules ship with Python

In [1]:
import random
random.random()

0.962693373774243

#### Strings

Sequence Operations

In [2]:
X = 'Data'
len(X)

4

In [3]:
X[0:-2]

'Da'

#### Immutability

Immutable objects cannot be changed

In [4]:
X = 'Data'
X + 'Umbrella'

'DataUmbrella'

In [5]:
X[0] = 'P'

TypeError: 'str' object does not support item assignment

#### Polymorphism

Operators or functions mean different things for different objects

In [None]:
1+2

In [6]:
'Py'+'MC'

'PyMC'

Length or size means different things for different datatypes



In [7]:
len("Python")

6

In [8]:
len(["Python", "Java", "C"])

3

In [9]:
len({"Language": "Python", "IDE": "VSCode"})

2

Related: Class Polymorphism, Method Overriding and Inheritance

#### Lists

Positionally ordered collections of arbitrarily typed objects (mutable, no fixed size)

In [10]:
L = ['Python', 45, 1.23]
len(L)

3

In [11]:
L + [4, 5, 6]

['Python', 45, 1.23, 4, 5, 6]

In [12]:
L[-1]

1.23

List-specific operations

In [13]:
L.append('Aesara');L

['Python', 45, 1.23, 'Aesara']

In [14]:
L.pop(2); L

['Python', 45, 'Aesara']

More: sort(), reverse()

List indexing and slicing

In [15]:
L[99]

IndexError: list index out of range

In [16]:
X = [[1,2],[2,1]]
print(len(X), len(X[0]))

2 2


In [17]:
X[0][0]

1

In [18]:
L[:]

['Python', 45, 'Aesara']

In [19]:
L[-3:]

['Python', 45, 'Aesara']

In [20]:
L = [1,2,3,4,5,6,7,8,9,10]
L[1::2] #L[start:end:step_size]

[2, 4, 6, 8, 10]

In [21]:
L[::-1]

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

#### List Comprehension

In [22]:
List = []
 
for character in 'Python':
    List.append(character)

In [23]:
List = [character for character in 'Python']

In [24]:
M = [['OS','Percentage of Users'],['Linux', '40'],['Windows', '20'], ['OSX','40']]

In [25]:
[row[0] for row in M][1:]

['Linux', 'Windows', 'OSX']

In [26]:
[row[0] + '*' for row in M][1:]

['Linux*', 'Windows*', 'OSX*']

In [27]:
[row[0] for row in M if row[0][0]!='O']

['Linux', 'Windows']

Nested List Comprehension

In [28]:
n = 3; [[ 1 if i==j else 0 for i in range(n) ] for j in range(n)]

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]

In [29]:
[x for x in range(21) if x%2==0 if x%3==0] 

[0, 6, 12, 18]

Lambda Function

In [30]:
[i*10 for i in range(10)]

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

In [31]:
list(map(lambda i: i*10, [i for i in range(10)]))

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

#### NumPy

NumPy’s array class -> ndarray(array)

- ndarray.ndim
- ndarray.shape
- ndarray.size

In [32]:
import numpy as np

a = np.arange(16).reshape(4, 4)

In [33]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Simple array operation

In [34]:
2*a

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14],
       [16, 18, 20, 22],
       [24, 26, 28, 30]])

General Properties of ndarrays

In [35]:
a.shape

(4, 4)

In [36]:
a.ndim

2

In [37]:
a.size

16

Ways to create new arrays



In [38]:
a = np.array(['PyMC', 'Arviz', 'Aesara'])

In [39]:
np.zeros((4, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [40]:
np.ones((4, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Generate values in a certain range



In [41]:
np.arange(1, 100, 10)

array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

#### Random Number Generator

In [42]:
rg = np.random.default_rng(1)
x = rg.random(3);x

array([0.51182162, 0.9504637 , 0.14415961])

Cumulative sum against specified axis (in this case only one axis is present)



In [43]:
x.cumsum()

array([0.51182162, 1.46228532, 1.60644493])

Multi-dimensional arrays



In [44]:
c = np.array([[[0,  1,  2],[ 10, 12, 13]],
[[100, 101, 102],[110, 112, 113]]])

In [45]:
c.shape

(2, 2, 3)

In [46]:
for row in c:
    print(row,'-')

[[ 0  1  2]
 [10 12 13]] -
[[100 101 102]
 [110 112 113]] -


Element-wise printing



In [47]:
for row in c.flat:
    print(row)

0
1
2
10
12
13
100
101
102
110
112
113


Transpose

In [48]:
c.T

array([[[  0, 100],
        [ 10, 110]],

       [[  1, 101],
        [ 12, 112]],

       [[  2, 102],
        [ 13, 113]]])

Reshape

In [49]:
c.reshape((12,1))

array([[  0],
       [  1],
       [  2],
       [ 10],
       [ 12],
       [ 13],
       [100],
       [101],
       [102],
       [110],
       [112],
       [113]])

#### Stacking

In [50]:
a = np.ones((2,2))
b = np.zeros((2,2))

In [51]:
np.vstack((a, b))

array([[1., 1.],
       [1., 1.],
       [0., 0.],
       [0., 0.]])

In [52]:
np.hstack((a, b))

array([[1., 1., 0., 0.],
       [1., 1., 0., 0.]])

#### Broadcasting

Used to deal with inputs that do not have exactly the same shape

- If all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.

- Arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

Arrays with same dimensions



In [53]:
a = np.array([1, 2, 3])
b = np.array([3, 3, 3])
a*b

array([3, 6, 9])

1-d Array and a Scalar



In [54]:
a = np.array([1, 2, 3])
b = 3
a*b

array([3, 6, 9])

Intuitively: scalar b being "stretched" to same shape as a
    
Reality:  broadcasting moves less memory around (computationally efficient)

Arrays where dimensions aren’t exactly same, but are aligned along the leading dimension

In [55]:
a = np.ones((5,2,3))
b = np.ones((2,3))
a*b

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

Arrays where dimensions aren’t exactly same, but leading dimension is 1, so it works

In [56]:
a = np.ones((5,2,1))
b = np.ones((2,3))
a*b

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

Broadcasting fails!



In [57]:
a = np.ones((5,2,2))
b = np.ones((2,3))
a*b

ValueError: operands could not be broadcast together with shapes (5,2,2) (2,3) 

NumPy compares shapes element-wise for two given arrays

It starts with the trailing (i.e. rightmost) dimensions Two dimensions are compatible when
- they are equal, or
- one of them is 1

Arrays do not need to have the same exact number of dimensions to be compatible. Broadcasting is a convenient way of taking the outer product (or any outer operation)

Here broadcasting fails because of the mismatch of leading dimensions

In [58]:
a = np.array([1,2,3,4])
b = np.array([1,2,3])
a*b

ValueError: operands could not be broadcast together with shapes (4,) (3,) 

We transpose a to reshape it along a new axix



In [59]:
a = np.asarray([a]).T #a[:, np.newaxis]
a.shape

(4, 1)

Now it works!

In [60]:
a*b

array([[ 1,  2,  3],
       [ 2,  4,  6],
       [ 3,  6,  9],
       [ 4,  8, 12]])

#### Indexing

In [61]:
a = np.array([0, 6, 9, 8, 8, 6, 2, 7, 2, 8, 1, 0, 4, 6, 9, 0])
i = np.array([1, 1, 2, 3])
a[i]

array([6, 6, 9, 8])

In [62]:
j = np.array([[3, 0], [2, 1]])
a[j]

array([[8, 0],
       [9, 6]])

In [63]:
print(a.shape, i.shape, j.shape)
a[i,j]

(16,) (4,) (2, 2)


IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

In [64]:
a = a.reshape((4,4))
print(a.shape, i.shape, j.shape)
a[i,j]

(4, 4) (4,) (2, 2)


IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (2,2) 

In [65]:
i = i.reshape((2,2))
print(a.shape, i.shape, j.shape)
a[i,j]

(4, 4) (2, 2) (2, 2)


array([[7, 8],
       [1, 6]])

Next thing to look at -> https://numpy.org/doc/stable/user/basics.html

Note / Reference: A lot of the things here are modified/original versions of examples given in official Python or NumPy documentation, so that’s the best source to learn comprehensively, this is meant to be an accessible introduction!!