# Lecture 4: Numpy array, September 13, 2023


## Example: compare numpy array and list

## Numpy array operation

## Some plotting

In [None]:
import numpy as np

## Example: vector sum

<span style="font-size:larger;">$\vec{A} = (a, b)$</span>

<span style="font-size:larger;">$\vec{B} = (c, d)$</span>

<span style="font-size:larger;">$\vec{C} = (a + c, b + d)$</span>


![vectorsum.png](attachment:vectorsum.png)

## How do we implement this in Python?

### the `list` solution
- each vector is a list that has two elements
- each component in the vector is an element/entry in a list
- realize the sum of vectors through a for loop, where the sum of elements at the same position in their respective lists is done. 
- the element-wise sum is then appended to a new list that represents the new vector C


### the `numpy array` solution
- each list vector is converted into a numpy array
- the sum vector is the sum of two numpy array objects
    * numpy array allows element-wise operation to be carried out simultaneously

In [None]:
vectorA = [1.0, 1.0]
vectorB = [-1.0,1.0]
vectorC = vectorA + vectorB

In [None]:
%%time
i = 0
vectorC = []
for i in range(len(vectorA)):
    vectorC.append( vectorA[i] + vectorB[i])


In [None]:
print(vectorC)

In [None]:
vectorA = [1.0, 1.0]
vectorB = [-1.0,1.0]
vectorA = np.array(vectorA)
vectorB = np.array(vectorB)

print(vectorA, vectorB)

In [None]:
%%time
vectorC = vectorA + vectorB

In [None]:
print(vectorC)

In [None]:
vectorA = []
vectorB = []
for val in range(10000):
    vectorA.append(val)
    vectorB.append(val+5)

In [None]:
%%time
i = 0
vectorC = []
for i in range(len(vectorA)):
    vectorC.append( vectorA[i] + vectorB[i])


In [None]:
vectorC_list = vectorC
#print(vectorC)

In [None]:
vectorA = np.array(vectorA)
vectorB = np.array(vectorB)

#print(vectorA, vectorB)

In [None]:
%%time

vectorC = vectorA + vectorB

In [None]:
vectorC_array = vectorC

### Memory 

In [None]:
import sys


# Check the memory consumption of the list
memory_consumption_list = sys.getsizeof(vectorC_list)
print("Memory consumption of the list (in bytes):", memory_consumption_list)


# Check the memory consumption of the array
memory_consumption_array = sys.getsizeof(vectorC_array)
print("Memory consumption of the list (in bytes):", memory_consumption_array)


In [None]:
vector_ensemble = []
for i in range(10):
    vector = []
    for j in range(10):
        vector.append(i*10+j)
    vector_ensemble.append(vector)

In [None]:
print(vector_ensemble)

# Check the memory consumption of the list
memory_consumption_list = sys.getsizeof(vector_ensemble)
print("Memory consumption of the list (in bytes):", memory_consumption_list)

In [None]:
array = np.linspace(0,99,100)
array = array.reshape(10,10)
print(array)

In [None]:
memory_consumption_list = sys.getsizeof(array)
print("Memory consumption of the list (in bytes):", memory_consumption_list)

### Difference in memory managenet between Numpy Array and List
![numpy_vs_python.png](attachment:numpy_vs_python.png)

# Numpy array

## what are the basics?

### Dimension (or Axis):

A dimension, also referred to as an axis, represents one of the indices in the shape tuple.
For example, in a 2D array, you have two dimensions: rows and columns.
In a 1D array, there is only one dimension.
In a 3D array, there are three dimensions.
You can think of dimensions as directions along which data is organized.
The number of dimensions is also referred to as the "rank" of the array.


### Shape:

The shape of a NumPy array is a tuple that specifies the number of elements along each dimension (axis) of the array.
Each element in the shape tuple represents the size of the corresponding dimension.
For example, if you have a 2D array with a shape of (3, 4), it means the array has 3 rows and 4 columns.
The shape of an array can be obtained using the shape attribute: array.shape.

### Size:

The size of a NumPy array is the total number of elements in the array.
It is the product of all the values in the shape tuple.
For example, in a 2D array with a shape of (3, 4), the size is 3 (rows) * 4 (columns) = 12.
The size of an array can be obtained using the size attribute: array.size.




![mg8O3kd.png](attachment:mg8O3kd.png)


In [None]:
import numpy as np

# Create a 2D array
arr = np.array([7,2,9,10])

# Shape of the array
shape = arr.shape  # Returns (3, 3)

# Number of dimensions (rank) of the array
dimension = arr.ndim  # Returns 2

# Total number of elements in the array
size = arr.size  # Returns 9

# Print the dimension, shape, and size using str.format()
print("dimension, shape, and size: {}, {}, {}".format(dimension, shape, size))


**since we'll do these operations over and over again, let's define a function**

In [None]:
# since we'll do these operations over and over again, let's define a function
def checknp(x):
    print("dimension, shape, and size: {}, {}, {}".format(x.ndim, x.shape, x.size))

In [None]:
# Let's create the 2D array 
twoD_array = np.array(  )
checknp(twoD_array)

In [None]:
# Let's create a 3D array
shape = (4,3,2) # what type of data structure is this?
threeD_array = np.random.randint(low=10, high = 50,size=shape)
print(threeD_array)

## Your time to play 
- for each cell, read the lines carefully before executing the cell
- try to understand what these lines do

In [None]:
a = np.array([1])
checknp(a)

In [None]:
a = np.array([1,2,3,4,5,6,7,8])
checknp(a)

In [None]:
# how do you access the elements?

print(a[0])
print(a[1])

# Slicing also works here
print(a[-1])
print(a[2:5])
print(a[4:-1])

In [None]:
a = np.array([[1]])
checknp(a)

In [None]:
a = np.array([1],[2])
checknp(a)

# what's wrong here?

In [None]:
# How is this cell different from the one above
a = np.array([[1],[2]])
checknp(a)

In [None]:
# what does the second input argument of .array() do here?
a = np.array([1],ndmin = 3)
checknp(a)
print(a)

In [None]:
# Would do you access the only element in a THREE dimensional array
print(a[0][0][0])


In [None]:
# Create an array with shape (3,3) and all entries are 0
a = np.zeros((3,3))
checknp(a)
print(a)

In [None]:
# what does ones do?
a = np.ones((3,3))
checknp(a)
print(a)

In [None]:
b = np.zeros((3,3))
print(b)

# what does ones_like do?
a = np.ones_like(b)
checknp(a)
print(a)

In [None]:
# what do these two input arguments do?
a = np.eye(3,3)
checknp(a)
print(a)

In [None]:
alist = ['a', 'b', 'c', 'd']
a = np.asarray(alist)
print(alist)
print(a)
# what's the differenc between the two printout statements ?

In [None]:
alist = ['a', 'b', 'c', 45, 634.5,700]
a = np.asarray(alist)
print(alist)
print(a)

# Check the data type of array entries
dtype_of_entries = a.dtype

print("Data type of array entries:", dtype_of_entries)



# The <U32 is a NumPy data type descriptor, and it indicates that the elements
# in the NumPy array have a Unicode (string) data type with
# a maximum string length of 32 characters.

In [None]:
# Reshaping numpy array
a = np.reshape(a,(2,3))
print(a)

## You are done! Raise your hand so that I know

## Array  slicing

use this as a reference for all array methods provided by Numpy
https://numpy.org/devdocs/reference/routines.html
- I encourage you to screen these pages, just to get an impression what's available
- while it is true when you get stuck with a problem, often you just need to google search, it is nonetheless still useful to know what methods numpy already provides. 


##### a short cheat sheet

In [None]:
data = np.array([1,2,3])
print(data[0])
print(data[1])
print(data[0:2])
print(data[1:])
print(data[-2:])


![image.png](attachment:image.png)
https://numpy.org/devdocs/user/absolute_beginners.html

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Define a color for the cell
        if 3 <= i <= 6 and 7 <= j <= 9:
            cell_color = 'lightyellow'
        else:
            cell_color = 'white'

        # Create a rectangle for the cell with the specified color
        cell = plt.Rectangle((j, i), 1, 1, color=cell_color, linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells with Highlighted Region')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()

In [None]:
data=np.linspace(1,100,100).reshape(10,10)
print(data)

**How do I get the highlighted entries?**

In [None]:
data_subset = data[]

In [None]:
import matplotlib.pyplot as plt

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Define a color for the cell based on the position
        if i % 3 == 0 and j % 3 == 0:
            cell_color = 'lightyellow'  # Highlight cells at multiples of 3
        else:
            cell_color = 'white'

        # Create a rectangle for the cell with the specified color
        cell = plt.Rectangle((j, i), 1, 1, color=cell_color, linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells with Highlighted Multiples of 3')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()

In [None]:
data_subset_2 = data[]

**How about entries that satisfy certain requirements?**

For example, 
- keep entries that are integer times of 7
- keep entries that are integer times of 7 and 3


In [None]:
data_subset_3 = data[]

In [None]:
data_subset_4 = data[]


### Merge arrays
- add the GDP per capita to our file


for this purpose, we will utilize the hstack and vstack methods of numpy array
- hstack --> stack two arrays `horizontally`, implying that 1) two arrays have the same dimensions; 2) the number of entries in axis 0 are the same, the entries of the axis-1 are `concatenated`.

![image.png](attachment:image.png)

In [None]:
# Let's code the operations illustrated in the chart



### Arrays as input to a functoin
- how a function interprets the input arguments
- vectorize functions 
- numpy array friendly function with np methods


In [None]:
def ratio (x,y):
    return x/y

In [None]:
a , b = 15. , 10

print(ratio(a,b))

In [None]:
data1 = np.random.randint(100, size=(4,3))
data2 = np.random.randint(100, size=(4,3))

print(data1)
print(data2)

In [None]:
ratio_array = ratio( data1 , data2)
print(ratio_array)

# by default, the operation is element wise

In [None]:
def S_over_sqrtB(x, y):
    return x/np.sqrt(y)

In [None]:
data_output = S_over_sqrtB(data1,data2)
print(data_output)

In [None]:
import math as m
def math_version_of_S_over_sqrtB(x, y):
    return x/m.sqrt(y)

In [None]:
print( math_version_of_S_over_sqrtB(data1,data2) )

In [None]:
def factorial(a):
    return m.factorial(a)

In [None]:
factorial(data1)

- `only size-1 arrays can be converted to Python scalars`
    - in this example, math module takes care of the sqrt operation
    - math methods only take `scalars` as input argument, i.e., the variables have to be single valued
    
- Two possibile solutions
    - vectorize functions written with math methods
        - you've seen this from one of our workshops
    - use numpy methods to implement sqrt or other operations
    

In [None]:
#vecterization
vectorized_math_version_of_S_over_sqrtB = np.vectorize( math_version_of_S_over_sqrtB)

data_output_2 = vectorized_math_version_of_S_over_sqrtB( data1, data2)
print(data_output_2)

In [None]:
# now let's do it to factorial too

### Recall numpy has its built-in math functions
https://numpy.org/doc/stable/reference/routines.math.html

In [None]:
# btw, np.sqrt or other np math operations can also take scalars as input

print(np.sqrt(9))
print(np.sin(1.5))
print(np.cosh(1.5))
print(np.tanh(1.0))
print(np.log(10.0))
print(np.exp(10.0))


checknp(np.tanh(1.0))

# what does the printout of this line mean?

### Array elements operations
- sum 
- mean
- standard deviation


**Sum over all elements**

In [None]:
print(data.sum())

#### Mean

In [None]:
print(data.mean())

### Standard Deviation

### $\sigma = \sqrt{\frac{\sum{(x_i - \mu)^2}}{N}}$

square root of the mean squared difference from the mean 

In [None]:
print(data.std())

#### median



In [None]:
print(data.median())

- when we wrote data.median(), we assumed that numpy array object has a method of median(). However, this is not true. 
- we can see all methods provided by numpy.ndarray here https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html
    - median is not one of them
- this is when we need to use methods from numpy to directly operate on the numpy array 
    - for statistics methods available in numpy, see https://numpy.org/doc/stable/reference/routines.statistics.html

In [None]:
median_value = np.median( data)

print( median_value )



# Some plotting

In [None]:
# Let's get some data
!wget https://portal.nersc.gov/project/m3438/physics77/week6/Zmumu.csv

In [None]:
csv_file = 'Zmumu.csv'
data = np.genfromtxt(csv_file, delimiter=',', skip_header=1)

In [None]:
print(data)
checknp(data)

# Histogram

In [None]:
import matplotlib.pyplot as plt

plt.hist(data[:,0],bins=(50), range=(-50,50))

# Scatter plot

In [None]:
plt.scatter(data[:,0],data[:,1])
plt.xlabel('p_x')
plt.xlabel('p_y')

# Graph/curve



In [None]:
x = np.linspace(0,np.pi,100)
y = np.sin(x)
y2 = np.cos(x)

plt.plot(x,y,color='green',label='f(x) = sin(x)')
plt.plot(x,y2,color='red',linestyle='dashed', label='f(x) = cos(x)')

plt.legend()

plt.xlabel('x')
plt.ylabel('y')

In [None]:
import matplotlib.pyplot as plt

# Define vectorA and vectorB as (a, b) and (c, d) respectively
vectorA = (2, 3)
vectorB = (4, 1)

# Calculate the sum of vectorA and vectorB
vectorC = (vectorA[0] + vectorB[0], vectorA[1] + vectorB[1])

# Create a new figure and axis
fig, ax = plt.subplots()

# Plot the vectors using quiver
ax.quiver(0, 0, vectorA[0], vectorA[1], angles='xy', scale_units='xy', scale=1, label='vectorA')
ax.quiver(0, 0, vectorB[0], vectorB[1], angles='xy', scale_units='xy', scale=1, label='vectorB')
ax.quiver(0, 0, vectorA[0] + vectorB[0], vectorA[1] + vectorB[1], angles='xy', scale_units='xy', scale=1, color='r', label='vectorC (Sum)')


# Add a dashed arrow connecting the endpoint of vectorB to the endpoint of vectorC
ax.annotate('', xy=vectorC, xytext=vectorB, arrowprops={'arrowstyle': '->', 'ls': '--', 'lw': 1.5}, color='b')

# Set the x and y limits
max_x = max(vectorA[0], vectorB[0], vectorC[0])
max_y = max(vectorA[1], vectorB[1], vectorC[1])
ax.set_xlim(0, max_x + 1)
ax.set_ylim(0, max_y + 1)

# Add labels for the vector endpoints
ax.text(vectorA[0], vectorA[1], f'({vectorA[0]}, {vectorA[1]})', ha='left', va='bottom')
ax.text(vectorB[0], vectorB[1], f'({vectorB[0]}, {vectorB[1]})', ha='left', va='bottom')
ax.text(vectorC[0], vectorC[1], f'({vectorC[0]}, {vectorC[1]})', ha='left', va='bottom')

# Set axis labels and title
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.set_title('Vector Addition')

# Add a legend
ax.legend()

# Move the legend to the upper-left corner
ax.legend(loc='upper left')

# Show the plot
plt.grid()
plt.show()



In [None]:
import matplotlib.pyplot as plt

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Create a rectangle for the cell filled with white color
        cell = plt.Rectangle((j, i), 1, 1, color='white', linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Define a color for the cell
        if 3 <= i <= 6 and 7 <= j <= 9:
            cell_color = 'lightyellow'
        else:
            cell_color = 'white'

        # Create a rectangle for the cell with the specified color
        cell = plt.Rectangle((j, i), 1, 1, color=cell_color, linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells with Highlighted Region')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()


In [None]:
print(data)
print(data[3:7,7:10])

In [None]:
import matplotlib.pyplot as plt

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Define a color for the cell
        if j == 5:  # Highlight the 6th column
            cell_color = 'lightyellow'
        else:
            cell_color = 'white'

        # Create a rectangle for the cell with the specified color
        cell = plt.Rectangle((j, i), 1, 1, color=cell_color, linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells with Highlighted 6th Column')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()


In [None]:
print(data[:,5])
checknp(data[:,5])
sixth_column = data[:,5].reshape(1,10)
checknp(sixth_column)

In [None]:
import matplotlib.pyplot as plt

# Create a 10x10 grid
grid_size = 10

# Create a figure and axis
fig, ax = plt.subplots()

# Loop through rows and columns to create cells
for i in range(grid_size):
    for j in range(grid_size):
        # Compute the center coordinates of the cell
        x = j + 0.5
        y = grid_size - 1 - i + 0.5  # Invert the y-coordinate to start from the top

        # Compute the value to display (ranging from 1 to 100)
        value = i * grid_size + j + 1

        # Define a color for the cell based on the position
        if i % 3 == 0 and j % 3 == 0:
            cell_color = 'lightyellow'  # Highlight cells at multiples of 3
        else:
            cell_color = 'white'

        # Create a rectangle for the cell with the specified color
        cell = plt.Rectangle((j, i), 1, 1, color=cell_color, linewidth=1, edgecolor='black', fill=True)
        ax.add_patch(cell)

        # Display the value at the center of the cell
        ax.text(x, y, str(value), va='center', ha='center', fontsize=10)

# Set the aspect ratio to make the cells square
ax.set_aspect('equal')

# Set axis limits
ax.set_xlim(0, grid_size)
ax.set_ylim(0, grid_size)

# Set axis labels
ax.set_xlabel('Column')
ax.set_ylabel('Row')

# Set the title
ax.set_title('10x10 Grid of Cells with Highlighted Multiples of 3')

# Remove axis ticks and labels
ax.set_xticks([])
ax.set_yticks([])

# Show the plot
plt.show()


In [None]:
data[::3,::3]

In [None]:
data[data%7==0]

In [None]:
print(data%7==0)

In [None]:
data[(data%7==0) & (data%3==0)]

In [None]:
A = data%7==0
B = data%3==0
print(A)
print(B)
C = A & B
print(C)
print(data[C])

In [None]:
data[np.where( (data%7==0) & (data%3 ==0) ) ]