#Numpy Arrays

**Python Objects :**

1. High-level number objects : integers, floating point

2. Containers : lists (costless insertion and append), dictionaries (fast lookup)

Numpy provides :

1. Extension package to Python for multi-dimensional arrays

2. Closer to hardware (efficiency)

3. Designed for scientific computation (convenience)

4. Also known as array-oriented programming.

In [None]:
import numpy as np
a = np.array([0, 1, 2, 3])
print(a)

print(np.arange(10))

[0 1 2 3]
[0 1 2 3 4 5 6 7 8 9]


**Why is it useful :** Memory-efficient container that provides fast numerical operations.

In [None]:
#python lists
L = range(1000)
%timeit [i**2 for i in L]

64.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [None]:
a = np.arange(1000)
%timeit a**2

1.46 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#1. Creating Arrays

**1.1 Manual Construction of Arrays**



In [None]:
# 1-D

a = np.array([0, 1, 2, 3])

a

array([0, 1, 2, 3])

In [None]:
#print(dimensions)

a.ndim

1

In [None]:
#shape

a.shape

(4,)

In [None]:
len(a)

4

In [None]:
# 2-D, 3-D...

b = np.array([[0, 1, 2], [4, 5, 6]])

b

array([[0, 1, 2],
       [4, 5, 6]])

In [None]:
b.ndim

2

In [None]:
b.shape

(2, 3)

In [None]:
len(b)      #returns size of the first dimension

2

In [None]:
c = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])

c

array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

In [None]:
c.ndim

3

In [None]:
c.shape

(2, 2, 2)

In [None]:
len(c)

2

**1.2 Functions for Creating arrays**

In [None]:
#using arange function

#arange is an array-valued cersion of the built-in range function

a = np.arange(10) #0, 1, ..., n
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
b = np.arange(1, 10, 2) #start, end (exclusive, as in not including the value), step

b

array([1, 3, 5, 7, 9])

In [None]:
#using linspace

a = np.linspace(0, 1, 6) #start, end, no. of points

a

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

In [None]:
#common arrays

a = np.ones((3, 3))

a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [None]:
b = np.zeros((3, 3))

b

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
np.eye(3)      #returns a 2-D Identity matrix (I_3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [None]:
np.eye(3, 2)  #Returns an Identity matrix (I_3) but only 2 columns

array([[1., 0.],
       [0., 1.],
       [0., 0.]])

In [None]:
#creating array using diag function

a = np.diag([1, 2, 3, 4])    #construct a diagonal array with the elements in the diagonal

a

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [None]:
np.diag(a)            #extract diagonal

array([1, 2, 3, 4])

In [None]:
#creating an array using random

#Create an array of the given shape and populate it with random samples from
a = np.random.rand(4)

a

array([0.95588046, 0.10072499, 0.9389603 , 0.15272015])

In [None]:
a = np.random.rand(4)    #return a sample (or samples) from the "standard norm"

a

array([0.37726964, 0.36599527, 0.54941782, 0.90680982])

**Note :**

For random samples from N(\mu, \sigma^2), use :

sigma * np.random.rand(...) + mu

#2. Basic Data Types

In some instances, array elements are displayed with a **trailing dot (e.g., 2. instead of 2)**. This is due to the difference in the **data type** used :

In [None]:
a = np.arange(10)

a.dtype

dtype('int64')

In [None]:
#Ypu can explicitly specify which data-type you want:

a = np.arange(10, dtype = 'float64')
a

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [None]:
#The default data type is float for zeros and ones function

a = np.zeros((3, 3))
print(a)

a.dtype

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


dtype('float64')

**Other datatypes**

In [None]:
a = np.array([1+2j, 2+4j]) #Complex datatype

print(a)
a.dtype

[1.+2.j 2.+4.j]


dtype('complex128')

In [None]:
a = np.array([True, False, False, True]) #Boolean datatype

print(a)
a.dtype

[ True False False  True]


dtype('bool')

In [None]:
a = np.array(["Ram", "Shyam", "Sita", "Miril"])

print(a)
a.dtype

['Ram' 'Shyam' 'Sita' 'Miril']


dtype('<U5')

**Each data type has a character code that uniquely identifies it.**

**'b'** - Boolean

**'i'** - (Signed) Integer

**'u'** - Unsigned Integer

**'f'** - floating-point

**'c'** - complex-floating point

**'m'** - timedelta

**'M'** - datetime

**'O'** - (Python) Objects

**'S', 'a'** - (byte-)string

**'U'** - Unicode

**V** - Raw data (void)

**For more details :**

[https:/docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html](https://)



#3. Indexing and Slicing

**3.1 Indexing**

The items of an array can be accessed and assigned in the same way as other Python sequences (e.g., lists).

In [None]:
a= np.arange(10)

print(a[5])    #indices begin at 0, just like other Python sequences (and C/C++)

5


In [None]:
#For multidimensional arrays, indices are tuples of integers.

a = np.diag([1, 2, 3])

print(a[2, 2])

3


In [None]:
#assigning values
a[2, 1] = 5

a

array([[1, 0, 0],
       [0, 2, 0],
       [0, 5, 3]])

**3.2 Slicing**

In [None]:
a = np.arange(10)

In [None]:
a[1:8:2] #[startindex : endindex(exclusive): step]

array([1, 3, 5, 7])

In [None]:
#Combining assignment and slicing

a = np.arange(10)
a[5:] = 10

a

array([ 0,  1,  2,  3,  4, 10, 10, 10, 10, 10])

In [None]:
b = np.arange(5)
a[5:] = b[::-1]  #assigning

a

array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

#4. Copies and Views

A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus, the original array is not copied in memory. You can use `np.may_share_memory()` to check if two arrays share the same memory block.

**When modifying the view, the original array is modified as well.**

In [None]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
b = a[::2]
b

array([0, 2, 4, 6, 8])

In [None]:
np.shares_memory(a, b)

True

In [None]:
b[0] = 10
b

array([10,  2,  4,  6,  8])

In [None]:
a       #Eventhough we only modified b, it updated 'a' because both share the same memory

array([10,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [None]:
#What is the workaround?

a = np.arange(10)

c = a[::2].copy()

print(f"Array 'a' is {a}")
print(f"Array 'c' is {c}")

Array 'a' is [0 1 2 3 4 5 6 7 8 9]
Array 'c' is [0 2 4 6 8]


In [None]:
print(f"Do arrays 'a' and 'c' share memory? {np.shares_memory(a, c)}")

Do arrays 'a' and 'c' share memory? False


In [None]:
c[0] = 10

print(f"After updating, 'c' is {c} and 'a' is {a}.")
print(f"'a' remains unchanged.")


After updating, 'c' is [10  2  4  6  8] and 'a' is [0 1 2 3 4 5 6 7 8 9].
'a' remains unchanged.


#5. Fancy Indexing

NumPy arrays can be indexed with slices, but also boolean or integer arrays **(masks)**. This method is called **Fancy indexing**. It creates copies, not views.

**Using Boolean Mask**

In [None]:
a = np.random.randint(0, 20, 15)
a

array([18,  6, 16, 13,  4,  6,  4,  2, 11,  4,  8, 12,  2, 13,  5])

In [None]:
mask = (a % 2 == 0)

In [None]:
#This creates copy, not view.
extract_from_a = a[mask]

extract_from_a

array([18,  6, 16,  4,  6,  4,  2,  4,  8, 12,  2])

**Indexing with a mask can be very useful to assign a new value to a sub-array :**

In [None]:
a[mask] = -1
a

array([-1, -1, -1, 13, -1, -1, -1, -1, 11, -1, -1, -1, -1, 13,  5])

**Indexing with an array of integers**

In [None]:
a = np.arange(0, 100, 10)

a

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
#indexing can be done with an array of integers, where the same index is repeated

a[[2, 3, 2, 4, 2]]

array([20, 30, 20, 40, 20])

In [None]:
#New values can be assigned

a[[9, 7]] = -200

a

array([   0,   10,   20,   30,   40,   50,   60, -200,   80, -200])

#Elementwise Operations

##1. Basic Operations

**with scalars**

In [None]:
a = np.array([1, 2, 3, 4])

a + 1

array([2, 3, 4, 5])

In [None]:
a ** 2

array([ 1,  4,  9, 16])

**All arithematic operates elementwise**

In [None]:
b = np.ones(4) + 1

a - b

array([-1.,  0.,  1.,  2.])

In [None]:
a * b

array([2., 4., 6., 8.])

In [None]:
#Matrix multiplication

c = np.diag([1, 2, 3, 4])

print(c * c)
print("****************")
print(c.dot(c))

[[ 1  0  0  0]
 [ 0  4  0  0]
 [ 0  0  9  0]
 [ 0  0  0 16]]
****************
[[ 1  0  0  0]
 [ 0  4  0  0]
 [ 0  0  9  0]
 [ 0  0  0 16]]


**Comparisons**

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 2, 2, 4])

a == b

array([False,  True, False,  True])

In [None]:
#array-wise comparisons
a = np.array([1, 2, 3, 4])
b = np.array([5, 2, 2, 4])
c = np.array([1, 2, 3, 4])

np.array_equal(a, b)

False

In [None]:
np.array_equal(b, c)

False

In [None]:
np.array_equal(a, c)

True

**Logical Operations**

In [None]:
a = np.array([1, 1, 0, 0], dtype = bool)
b = np.array([1, 0, 1, 0], dtype = bool)

np.logical_or(a, b)

array([ True,  True,  True, False])

In [None]:
np.logical_and(a, b)

array([ True, False, False, False])

**Transcendental Functions**

In [None]:
a = np.arange(5)

np.sin(a)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

In [None]:
np.log(a)

  np.log(a)


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436])

In [None]:
np.exp(a)

array([ 1.        ,  2.71828183,  7.3890561 , 20.08553692, 54.59815003])

**Shape Mismatch**

In [None]:
a = np.arange(4)

a + np.array([1, 2])

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

#Basic Reductions

**Computing Sums**

In [None]:
x = np.array([1, 2, 3, 4])

np.sum(x)

np.int64(10)

In [None]:
#sum by rows and columns

x = np.array([[1, 1], [2, 2]])
x

array([[1, 1],
       [2, 2]])

In [None]:
#Column-wise sums
x.sum(axis = 0)

array([3, 3])

In [None]:
#Row-wise sums
x.sum(axis = 1)

array([2, 4])

**Other reductions**

In [None]:
x = np.array([1, 3, 2])

In [None]:
x.min()

np.int64(1)

In [None]:
x.max()

np.int64(3)

In [None]:
#index of min element
x.argmin()

np.int64(0)

In [None]:
#index of max element
x.argmax()

np.int64(1)

**Loogical Operations**

In [None]:
np.all([True, True, False])

np.False_

In [None]:
np.any([True, False, False])

np.True_

In [None]:
#Note: This can be used for array comparison
a = np.zeros((50, 50))
np.any(a != 0)

np.False_

In [None]:
np.all(a == a)

np.True_

In [None]:
#can check all arrays using just one line instead of for loops
a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 4, 6])
c = np.array([4, 8, 10, 12])

((a <= b) & (b <= c)).all()

np.True_

**Statistics**

In [None]:
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])

In [None]:
np.mean(x)

np.float64(1.75)

In [None]:
np.median(x)

np.float64(1.5)

In [None]:
#-1 is last-axis (in this case, rows. Could be different for multidimensional arrays)
np.median(y, axis = -1)

array([2., 5.])

In [None]:
x.std()

np.float64(0.82915619758885)

**Example**

Data in populations.txt describes the populations of hares and lynxes (and carrots) in Northern Canada during 20 years.

In [None]:
#load dataset into numpy array object
data = np.loadtxt('populations.txt')

NameError: name 'np' is not defined

In [None]:
data

NameError: name 'data' is not defined

In [None]:
years, hares, lynxes, carrots = data.T

NameError: name 'data' is not defined

In [None]:
#Mean population overtime
populations = data[:, 1:]

NameError: name 'data' is not defined

In [None]:
#sample standard deviations
populations.std(axis = 0)

NameError: name 'populations' is not defined

In [None]:
#Which species has the highest population each year?

np.argmax(populations, axis = 1)

NameError: name 'np' is not defined

#Broadcasting

Basic operations on numpy arrays (addition, etc.) are elementwise.

This works on arrays of the same size. Nevertheless, it's also possible to do operations on arrays of different sizes so that they all have the same size : this conversion is calle broadcasting.

In [None]:
import numpy as np
a = np.tile(np.arange(0, 40, 10), (3, 1))
print(a)

print("********************")

a = a.T
print("A : ")
print(a)

[[ 0 10 20 30]
 [ 0 10 20 30]
 [ 0 10 20 30]]
********************
A : 
[[ 0  0  0]
 [10 10 10]
 [20 20 20]
 [30 30 30]]


In [None]:
b = np.array([0, 1, 2])
b

print("********************")
print("B : ")
print(b)

********************
B : 
[0 1 2]


In [None]:
#b is broadcast to be able to add with a
print("A + B : ")
a + b

A + B : 


array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

In [None]:
a = np.arange(0, 40, 10)

print(f"Shape of A is {a.shape} and A is : ")
print(a)

Shape of A is (4,) and A is : 
[ 0 10 20 30]


In [None]:
a = a[:, np.newaxis]        #adds a new axis -> 2-D array

print(f"Shape of A is {a.shape} and A is : ")
print(a)

Shape of A is (4, 1) and A is : 
[[ 0]
 [10]
 [20]
 [30]]


In [None]:
print(f"A + B : ")
print(a + b)

A + B : 
[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]


In [None]:
#However, if a was only a 1-D array, we wouldn't be able to add it.
a = np.arange(0, 40, 10)

print(f"Shape of A is {a.shape} and A is : ")
print(a)
print("********************")
print(f"Shape of B is {b.shape} and B is : ")
print(b)

a + b

Shape of A is (4,) and A is : 
[ 0 10 20 30]
********************
Shape of B is (3,) and B is : 
[0 1 2]


ValueError: operands could not be broadcast together with shapes (4,) (3,) 

#Array Shape Manipulation

**Flattening**

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a.ravel()

array([1, 2, 3, 4, 5, 6])

The inverse operation to flattening is as follows :

In [None]:
print(a.shape)
print(a)

(2, 3)
[[1 2 3]
 [4 5 6]]


In [None]:
b = a.ravel()
print(b)

[1 2 3 4 5 6]


In [None]:
b = b.reshape((2, 3))
b

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
b[0, 0] = 100
b

array([[100,   2,   3],
       [  4,   5,   6]])

Since b is a flattened form of a that has been reshaped, internally, it points to same memory loc.
So, a changes too :

In [None]:
a

array([[100,   2,   3],
       [  4,   5,   6]])

However, sometimes, **NOTE and BEWARE : reshape may also return a copy!**

In [None]:
a = np.zeros((3, 2))
a

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [None]:
b = a.T.reshape(3, 2)
b

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [None]:
b[0] = 50
b

array([[50., 50.],
       [ 0.,  0.],
       [ 0.,  0.]])

In [None]:
a

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

**Adding a dimension**

Indexing with the np.newaxis allows us to add an axis to an array.

newaxis is used to increase the dimension of th existing array by one dimension when used once. Thus,

2-D arrays become 3-D, 3-D become 4-D and so on.

In [None]:
z = np.array([1, 2, 3])
z

array([1, 2, 3])

In [None]:
z[:, np.newaxis]


array([[1],
       [2],
       [3]])

**Dimension Shuffling**

In [None]:
a = np.arange(4*3*2).reshape(4, 3, 2)
a

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]]])

**Resizing**

In [None]:
a = np.arange(4)
a.resize((8,))
a

array([0, 1, 2, 3, 0, 0, 0, 0])

However, it must not be referred to somewhere else :

In [None]:
b = a

In [None]:
a.resize((4,))

ValueError: cannot resize an array that references or is referenced
by another array in this way.
Use the np.resize function or refcheck=False

**Sorting Data**

In [None]:
#Sorting along an axis
a = np.array([[5, 4, 6], [2, 3, 2]])
b = np.sort(a, axis = 1)
b

array([[4, 5, 6],
       [2, 2, 3]])

In [None]:
#in-place sort
a.sort(axis = 1)
a

array([[4, 5, 6],
       [2, 2, 3]])

In [None]:
#fancy indexing
a = np.array([4, 3, 1, 2])
j = np.argsort(a)
j

array([2, 3, 1, 0])

In [None]:
a[j]

array([1, 2, 3, 4])