<a href="https://colab.research.google.com/github/snares27/Calculator/blob/main/%5BS25_ML%5D_Arrays_and_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CS 5361/6361 Machine Learning - Arrays and Numpy**

**Author:** Olac Fuentes<br>
Computer Science Department<br>
University of Texas at El Paso<br>
**Last modified:** 1/30/2025<br>

# **Array basics**

Arrays are not part of the Python language. We will use them through the numpy library.
Arrays are similar to lists, but all elements must be of the same type.


The numpy library provides implementations of many useful operations on arrays of any dimensionality.  

The following statement imports the library and declares np as short for numpy. We can access functions and modules in the numpy library by using the dot notation.

In [None]:
import numpy as np

Arrays can be created in several ways.

## Arrays from lists

We can convert a list to a 1-D array.

In [None]:
a = np.array([1,2,3,4])
print(a)

We can convert a list of lists to a 2-D array (or a list of lists of lists to a 3D array, and so on).

In [None]:
a = np.array([[1,2,3,4],[5,6,7,8]])
print(a)

The array type is inferred from the data provided

In [None]:
a = np.array([[1.5,2,3,4],[5,6,7,8]])
print(a)

Unlike Java, numpy does not lallow jagged arrays.

In [None]:
a = np.array([[1,2,3,4],[5,6,7]])
print(a)

## Shape and type

In Java, arrays are one dimensional.  2D arrays are implemented as arrays of arrays. We use that attribute *length* to determine the size (or number of elements) of an array.

In Numpy, arrays can have any number of dimensions. For array *A*, the attribute *A.shape* is a tuple  (an immutable list) that contains the number of elements in each of the dimensions of A.  *len(A.shape)* contains the number of dimensions of A.

For array A, the attribute A.dtype contains the type of the elements of A.

For example, in the code below, array *a* has two dimensions, the size of the first dimension is 3 (the number of rows) and the size of the second dimension is 4 (the number of columns). Thus we say that *a* is a 3-by-4 array.

In [None]:
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(a)
print(a.shape)
print(len(a.shape))
print(a.dtype)

## zeros, ones, full

The numpy function *zeros* (see https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) creates an array of a given shape and type whose elements are all equal to zero. The parameter shape may be a tuple or an integer. If shape is an integer, a 1D array is created. By default, the data type is float.


In [None]:
a = np.zeros(5) # Create 1-D array
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a = np.zeros(5,int) # Create 1-D array
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a = np.zeros(5,float) # Create 1-D array
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a = np.zeros((5,3),int) # Create 2-D array with 5 rows and 3 colummns
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a = np.zeros((5,3)) # Create 2-D array with 5 rows and 3 colummns
print(a)
print(a.shape)
print(a.dtype)

The numpy function *ones* behaves the same as zeros, but it initializes every element to 1.

In [None]:
a = np.ones((2,3)) # Create 2-D array with 2 rows and 3 colummns
print(a)
print(a.shape)
print(a.dtype)

The numpy function full (https://numpy.org/doc/stable/reference/generated/numpy.full.html) is similar to zeros, but it allows to specify a single value to assign to every element of the array.

In [None]:
a = np.full((4,6),2023)
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a = np.full((4,6), np.pi)
print(a)
print(a.shape)
print(a.dtype)

## arange and reshape

To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range, but returns an array.

In [None]:
a= np.arange(16)
print(a)
print(a.shape)
print(a.dtype)

In [None]:
a= np.arange(20,100,10)
print(a)
print(a.shape)
print(a.dtype)

The reshape operation can be used to change the dimensionality of an array (but the number of elements cannot change).

In [None]:
a= np.arange(16)
print(a)
b = a.reshape(4,4)
print(b)
c = b.reshape(16)
print(c)

In [None]:
a= np.arange(16)
b = a.reshape(4,5)
print(b)

reshape returns a copy of the array, without modifying the original array.


In [None]:
a= np.arange(16)
a.reshape(4,4)
print(a)

In [None]:
a= np.arange(16)
a =a.reshape(4,2,2)
print(a)

a.shape is a tuple (an immutable list) that contains the size of each of the dimensions of array a. len(a.shape) contains the number of dimensions of a.

In [None]:
a= np.arange(16)
print(a)
print(a.shape)
print(len(a.shape))
b = a.reshape(1,16)
print(b)

In [None]:
print(a[3])
print(b[0,3])
print(b[0][3])

In [None]:
a= np.arange(16)
print(a)
print(a.shape)
print(len(a.shape))
b = a.reshape(4,4)
print(b)
print(b.shape)
print(len(b.shape))
c = a.reshape(2,2,4,1)
print(c)
print(c.shape)
print(len(c.shape))

In [None]:
a= np.arange(16)
print(a)
a.reshape(4,4)
print(a)

Arithmetic operators on arrays apply elementwise. There's no need to write for loops to perform array operations!!!

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
b = a + 5
print(b)
c = a**2
print(c)
d = np.sin(a)
print(d)
print(a+b)
print(a*c)

Integer indexing works the same way as in Java, but indices are separated by commas.

In [None]:
a

In [None]:
print(a[2,3])
print(a[0,2])

As with lists, negative indices (counting from the end) are allowed

In [None]:
print(a[0,-1])
print(a[-1,0])

Slicing works the same ways as with lists, with one slice per dimension.

If A is a 1D array, *A[start:end:step]* is the subarray (or slice) of *A* starting at position *start*, ending at position *end*, with increments of size *step*.

If *start* is ommited, it defaults to 0.

If *end* is ommited, it defaults to the shape of A in that dimension.

If *step* is ommited, it defaults to 1.

A = np.arange(0, 20, 2)
print(A)  # Print all elments of A

In [None]:
A = np.arange(0, 20, 2)
print(A)  # Print all elements of A

In [None]:
print(A[:])  # Print all elements of A - start = 0, end = 10, step = 1

In [None]:
print(A[::])  # Print all elements of  - start = 0, end = 10, step = 1

In [None]:
print(A[1:])  # - start = 1, end = 10, step = 1

In [None]:
print(A[:4])  # - start = 0, end = 4, step = 1

In [None]:
print(A[1:6])  # - start = 1, end = 6, step = 1

In [None]:
print(A[1:6:2])  # - start = 1, end = 6, step = 2

In [None]:
A[::-1]

Some examples using 2D arrays

In [None]:
a = np.arange(16).reshape(4,4)
print(a)

In [None]:
print(a[:2,1:])  # Select rows 0 and 1 and columns 1,2, and 3

In [None]:
print(a[::2,1::2])  # Select rows 0 and 2 and columns 1 and 3

In [None]:
print(a[:,::-1])  # Reverse the order of columns

In [None]:
print(a[::-1,:])  # Reverse the order of rows

In [None]:
print(a[::-1])  # Reverse the order of rows, trailing ':' may be ommited

In [None]:
print(a)

In [None]:
print(a[2:4,1:2])

We can use lists of the same length as indices; the result is a 1D array.

In [None]:
np.arange(4)

In [None]:
a = np.random.randint(0,10,size=(4,5))
print(a)

In [None]:
def diagonal(x):
  return x[ np.arange(x.shape[0]), np.arange(x.shape[1])]

print(diagonal(a))

In [None]:
c = a[[2,3,1],[1,0,3]] # Returns 1-D array containing [a[2,1], a[3,0], a[1,3]]
print(c)

In [None]:
c = a[[2,3,1],:]
print(c)

In [None]:
a

In [None]:
a[2:3,:]

In [None]:
a[3:3,:]

We can also use lists in one dimension and slices or integers in the other.

In [None]:
print(a)

In [None]:
a[:2,[2,0,3,0,0]] # Rows 0 and 1, columns 1,2 and 3

In [None]:
a[2,[1,2,3]] # Row 2, columns 1,2 and 3 (this is a 1D array, since we are passing an index (not a slice) in the first dimension).

In [None]:
a[2:3,[1,2,3]] # Row 2, columns 1,2 and 3 (this is a 2D array, since we are passing a slice in the first dimension, even thou it contains a single element.

In [None]:
X = np.random.randint(0,10,size=(10,5))
print(X)

In [None]:
def select_features(X,features):
  return X[:,features]

print(select_features(X,[4,0,4]))

In [None]:
def select_instances(X,instances):
  return X[instances]

print(select_instances(X,[3,5]))

We can assign values to array elements and slices

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
a[2,3] = -100
print(a)

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
a[:2,1:] = -100
print(a)

You can perform elementwise operations on array slices if they are the same size

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
b = np.arange(6).reshape(2,3)
print(b)
a[:2,1:] = a[:2,1:] - b
print(a)

Warning - arrays are passed by reference; array assignments are shallow

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
b = a
b[0,0] = 2302
print(a)

This will make a deep copy of a

In [None]:
a= np.arange(16).reshape(4,4)
print(a)
b = np.copy(a)
b[0,0] = 2302
print(a)
print(b)

Number of times 100 appears in a
a) 0
b) 1
c) 4

In [None]:
a= np.arange(16)
b = a.reshape(4,4)
b[0] = 100
print(a)

In [None]:
b[:,1]=200

In [None]:
print(b)

In [None]:
print(a)

Numpy provides LOTS of functions that implement mathematical operations on arrays. The functions operate elementwise on an array, producing an array as output. See
https://numpy.org/doc/stable/reference/routines.math.html

Commonly used built-in functions:


In [None]:
a = np.array([3, 7, 1, 5, 8])
b = np.array([4, 7, 1, 3, 6])
c = np.array([[3, 4, 9, 1], [8, 1, 2, 4], [7, 7, 6, 7]])
d = np.array([[7, 8, 1, 6], [3, 6, 6, 4], [4, 9, 4, 3]])
print(f'{a = }')
print(f'{b = }')
print(f'{c = }')
print(f'{d = }')

In [None]:
print(np.sum(a))

In [None]:
print(np.sum(c))

In [None]:
print(a+b)

In [None]:
print(c+d)

In [None]:
print(np.min(a))

In [None]:
print(np.min(c))

In [None]:
print(np.max(a))

In [None]:
print(np.mean(a))

In [None]:
print(np.sort(a))

Notice that the previous operation returns a sorted version of $a$, without modifying $a$.

In [None]:
print(a)

The operations argmax and argmin return the argument (or index) of the maximum and minimum elements in the array, respectively.

In [None]:
print(np.argmin(a)) # Returns the index of the minimum element in a

In [None]:
print(np.argmax(a)) # Returns the index of the maximum element in a

In [None]:
print(c)

In [None]:
print(np.argmin(c,axis=0))

Similarly, argsort returns an array of indices, where the first element is the index of the smallest element in the array, the second element is the index of the second smallest, and so on.

In [None]:
print(np.argsort(a))

With multidimensional arrays, we can perform operations along specific axes (or dimensions) in the array.

In [None]:
print(c)

In [None]:
print(np.max(c))  # Global maximum of c

In [None]:
print(np.max(c,axis=0)) # Maxima of every column in c; this is a 1D array of shape (c.shape[1],)

In [None]:
print(c.shape)

In [None]:
print(np.mean(c,axis=0).shape)

In [None]:
print(np.mean(c,axis=1).shape)

In [None]:
print(np.max(c,axis=1)) # Maxima of every row in c; this is a 1D array of shape (c.shape[0],)

Since Python is an interpreted language, loops are slow. See the comparative running time of the same operation with and without loops.

In [None]:
import time

def sum_array_loops(a,b):
  c = np.zeros_like(a)
  for i in range(a.shape[0]):
    for j in range(a.shape[1]):
      c[i,j] =  a[i,j] + b[i,j]
  return c

def sum_array(a,b):
  return a + b

size = 2000

a = np.random.random((size,size))
b = np.random.random((size,size))

start = time.time()
c = sum_array_loops(a,b)
elapsed_time1 = time.time() - start
print('elapsed time using loops', elapsed_time1,'secs')

start = time.time()
c = sum_array(a,b)
elapsed_time2 = time.time() - start
print('elapsed time without loops', elapsed_time2,'secs')

print('ratio',elapsed_time1/elapsed_time2)

In [None]:
print(a>5)

In [None]:
b = np.array([10,20,30,50,60])

In [None]:
print(b[a>5])

## **Broadcasting**

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” (conceptually, it is replicated, but no actual copies are made)  across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

Two dimensions are compatible when they are equal, or one of them is 1.

**NumPy broadcasting  rules:**

**Dimension Alignment**

Align the shapes of the arrays by adding **leading** dimensions of size 1 to the array with fewer dimensions.
For example, if you have a 2D array and a 1D array, the 1D array is treated as a 2D array with a leading dimension of size 1.

**Dimension Compatibility**

For each dimension, the sizes must either match or one of them must be 1.
If a dimension has size 1, it is "broadcast" (by duplicating the values) to match the size of the other dimension.

**Error Condition**

If any dimension has sizes that are not compatible, raise an error.

Here are some example operations among compatible arrays:

In [None]:
np.random.seed(0)
a = np.random.randint(0,10,size=(4,5))
b = np.random.randint(0,10,size=(1,5))
c = np.random.randint(0,10,size=(4,1))
print('a =',a)
print('b =',b)
print('c =',c)

In [None]:
print(a+b)

In [None]:
print(a+c)

In [None]:
print(b+c)

## **Boolean arrays as indices**

When we apply a boolean test to an array A, it returns a boolean array with the same shape as A.

In [None]:
A = np.random.randint(0,10,size = (4,5))
print(A)
print(A>5)

A boolean array can be used as an index to an array of compatible shape returning a 1D array containing the elements of the array in the position where the value of the index array is True.

In [None]:
ind = A>5
print(A[ind])

In [None]:
A[A>5]

In [None]:
np.random.seed(2)
B = np.random.randint(0,10,size = 6)
C = np.random.randint(0,10,size = 6)
print(B)
print(C)

In [None]:
print(B>5)

In [None]:
print(B[B>5])

In [None]:
print(C[B>5])

In [None]:
print(B[C==4])

A boolean array can also be used to access a particular dimension in combination with integer indices and slices

In [None]:
print(A[[True,False,True,False]]) # Returns a 2D array containing rows 0 and 2 of A

In [None]:
print(A[[True,False,True,False],3]) # Returns a 1D array containing the elements in rows 0 and 2 and column 3 of A

In [None]:
print(A[[True,False,True,False],:2])

In [None]:
print(A[1,[True,False,True,False,True]])

In [None]:
print(A[:,[True,False,True,False,True]]) # Returns a 2D array containing columns 0, 2, and 4 of A.

In [None]:
print(A)

In [None]:
print(A)

In [None]:
rows = [True,False,True,False]
cols = [True,False,True,False,True]

In [None]:
A[rows,cols]

In [None]:
print(A[rows])

In [None]:
A[rows][:,cols]

In [None]:
print(A)

In [None]:
A[2,1]

In [None]:
A[2]

In [None]:
A[2][1]