<a href="https://colab.research.google.com/github/iampramodyadav/python/blob/main/python_note_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Numpy array object

## NumPy Arrays

**python objects:** 

1. high-level number objects: integers, floating point
2. containers: lists (costless insertion and append), dictionaries (fast lookup)

**Numpy provides:**

1. extension package to Python for multi-dimensional arrays
2. closer to hardware (efficiency)
3. designed for scientific computation (convenience)
4. Also known as array oriented computing

In [2]:
import numpy as np
a=np.array([0,1,2,3])
print(a)

print(np.arange(10))

[0 1 2 3]
[0 1 2 3 4 5 6 7 8 9]


**Why it is useful:** Memory-efficient container that provides fast numerical operations.

In [3]:
#python list
L=range(1000)
%timeit([i**2 for i in L])

1000 loops, best of 5: 263 µs per loop


In [4]:
a=np.arange(1000)
%timeit a**2

The slowest run took 28.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.48 µs per loop


## Creating arrays

### Manual Construction of arrays

In [6]:
#1-D
a=np.array([0,1,2,3])
a

array([0, 1, 2, 3])

In [10]:
a.ndim

1

In [11]:
a.shape

(4,)

In [7]:
#2-D 3-D ...
b=np.array([[0,1,2],[4,5,6]])
b

array([[0, 1, 2],
       [4, 5, 6]])

In [8]:
b.ndim

2

In [9]:
b.shape

(2, 3)

In [12]:
len(b) #return the size of first dimension

2

In [14]:
c=np.array([[[0,1],[2,3]],[[4,5],[6,7]]])
c

array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

In [16]:
c.ndim


3

In [17]:
c.shape

(2, 2, 2)

In [19]:
len(c)

2

### Functions for creating arrays

In [22]:
#using arrange function
a=np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]:
b=np.arange(1,10,2)
b

array([1, 3, 5, 7, 9])

In [27]:
#using linspace
c=np.linspace(0,1,6) #start, stop, no. of points
c

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

In [29]:
#common arrays

a = np.ones((3, 3))

a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [30]:
b = np.zeros((3, 3))

b

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [31]:
c = np.eye(3)  #Return a 2-D array with ones on the diagonal and zeros elsewhere.

c

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [32]:
d = np.eye(3, 2) #3 is number of rows, 2 is number of columns, index of diagonal start with 0

d

array([[1., 0.],
       [0., 1.],
       [0., 0.]])

In [33]:
#create array using diag function

a = np.diag([1, 2, 3, 4]) #construct a diagonal array.

a

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [34]:
np.diag(a)   #Extract diagonal

array([1, 2, 3, 4])

In [35]:
#create array using random

#Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
a = np.random.rand(4) 

a

array([0.47252396, 0.01581177, 0.48339496, 0.17866282])

In [36]:
a = np.random.randn(4)#Return a sample (or samples) from the “standard normal” distribution.  ***Gausian***

a

array([ 0.35646416, -0.084599  ,  1.33488943, -0.08891673])

**Note:**
    
For random samples from N(\mu, \sigma^2), use:

sigma * np.random.randn(...) + mu



## Basic DataTypes

You may have noticed that, in some instances, array elements are displayed with a **trailing dot (e.g. 2. vs 2)**. This is due to a difference in the **data-type** used:

In [38]:
a = np.arange(10)

a.dtype

dtype('int64')

In [39]:
#You can explicitly specify which data-type you want:

a = np.arange(10, dtype='float64')
a

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [40]:
#The default data type is float for zeros and ones function

a = np.zeros((3, 3))

print(a)

a.dtype

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


dtype('float64')

### other datatypes

In [41]:
d = np.array([1+2j, 2+4j])   #Complex datatype

print(d.dtype)

complex128


In [42]:
b = np.array([True, False, True, False])  #Boolean datatype

print(b.dtype)

bool


In [44]:
s = np.array(['Ram', 'Robert', 'Rahim'])

s.dtype

dtype('<U6')

**Each built-in data type has a character code that uniquely identifies it.**

'b' − boolean

'i' − (signed) integer

'u' − unsigned integer

'f' − floating-point

'c' − complex-floating point

'm' − timedelta

'M' − datetime

'O' − (Python) objects

'S', 'a' − (byte-)string

'U' − Unicode

'V' − raw data (void)

**For more details**

**https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html**

## Indexing and Slicing

### Indexing

The items of an array can be accessed and assigned to the same way as other **Python sequences (e.g. lists)**:

In [45]:
a = np.arange(10)

print(a[5])  #indices begin at 0, like other Python sequences (and C/C++)

5


In [46]:
# For multidimensional arrays, indexes are tuples of integers:

a = np.diag([1, 2, 3])

print(a[2, 2])

3


In [47]:
a[2, 1] = 5 #assigning value

a

array([[1, 0, 0],
       [0, 2, 0],
       [0, 5, 3]])

### Slicing

In [48]:
a = np.arange(10)

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [49]:
a[1:8:2] # [startindex: endindex(exclusive) : step]

array([1, 3, 5, 7])

In [50]:
#we can also combine assignment and slicing:

a = np.arange(10)
a[5:] = 10
a

array([ 0,  1,  2,  3,  4, 10, 10, 10, 10, 10])

In [51]:
b = np.arange(5)
a[5:] = b[::-1]  #assigning

a

array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

## Copies and Views

A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. You can use **np.may_share_memory()** to check if two arrays share the same memory block. 

**When modifying the view, the original array is modified as well:**

In [52]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [53]:
b = a[::2]
b

array([0, 2, 4, 6, 8])

In [54]:
np.shares_memory(a, b)

True

In [55]:
b[0] = 10
b

array([10,  2,  4,  6,  8])

In [56]:
a  #eventhough we modified b,  it updated 'a' because both shares same memory

array([10,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [57]:


a = np.arange(10)

c = a[::2].copy()     #force a copy
c

array([0, 2, 4, 6, 8])

In [58]:
np.shares_memory(a, c)

False

In [59]:
c[0] = 10

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Fancy Indexing

NumPy arrays can be indexed with slices, but also with boolean or integer arrays **(masks)**. This method is called **fancy indexing**. It creates copies not views.

### Using Boolean Mask

In [60]:
a = np.random.randint(0, 20, 15)
a

array([ 1,  0, 10,  1, 12,  6, 15, 19,  5,  5, 17, 16,  1, 16,  3])

In [61]:
mask = (a % 2 == 0)

In [62]:
extract_from_a = a[mask]

extract_from_a

array([ 0, 10, 12,  6, 16, 16])

**Indexing with a mask can be very useful to assign a new value to a sub-array:**

In [63]:
a[mask] = -1
a

array([ 1, -1, -1,  1, -1, -1, 15, 19,  5,  5, 17, -1,  1, -1,  3])

### Indexing with an array of integers

In [64]:
a = np.arange(0, 100, 10)

a

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [65]:
#Indexing can be done with an array of integers, where the same index is repeated several time:

a[[2, 3, 2, 4, 2]]

array([20, 30, 20, 40, 20])

In [66]:
# New values can be assigned 

a[[9, 7]] = -200

a

array([   0,   10,   20,   30,   40,   50,   60, -200,   80, -200])

# Elementwise Operations

## Basic Operations

### with scalars

In [None]:
a = np.array([1, 2, 3, 4]) #create an array

a + 1

array([2, 3, 4, 5])

In [None]:
a ** 2

array([ 1,  4,  9, 16])

 **All arithmetic operates elementwise**

In [None]:
b = np.ones(4) + 1

a - b

array([-1.,  0.,  1.,  2.])

In [None]:
a * b

array([ 2.,  4.,  6.,  8.])

In [None]:
# Matrix multiplication

c = np.diag([1, 2, 3, 4])

print(c * c)
print("*****************")
print(c.dot(c))

[[ 1  0  0  0]
 [ 0  4  0  0]
 [ 0  0  9  0]
 [ 0  0  0 16]]
*****************
[[ 1  0  0  0]
 [ 0  4  0  0]
 [ 0  0  9  0]
 [ 0  0  0 16]]


### comparisions

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 2, 2, 4])
a == b

array([False,  True, False,  True], dtype=bool)

In [None]:
a > b

array([False, False,  True, False], dtype=bool)

In [None]:
#array-wise comparisions
a = np.array([1, 2, 3, 4])
b = np.array([5, 2, 2, 4])
c = np.array([1, 2, 3, 4])

np.array_equal(a, b)

False

In [None]:
np.array_equal(a, c)

True

### Logical Operations

In [None]:
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)

np.logical_or(a, b)

array([ True,  True,  True, False], dtype=bool)

In [None]:
np.logical_and(a, b)

array([ True, False, False, False], dtype=bool)

### Transcendental functions:

In [None]:
a = np.arange(5)

np.sin(a)   

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

In [None]:
np.log(a)

  """Entry point for launching an IPython kernel.


array([       -inf,  0.        ,  0.69314718,  1.09861229,  1.38629436])

In [None]:
np.exp(a)   #evaluates e^x for each element in a given input

array([  1.        ,   2.71828183,   7.3890561 ,  20.08553692,  54.59815003])

### Shape Mismatch

In [None]:
a = np.arange(4)

a + np.array([1, 2])

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

## Basic Reductions

### computing sums

In [None]:
x = np.array([1, 2, 3, 4])
np.sum(x)

10

In [None]:
#sum by rows and by columns

x = np.array([[1, 1], [2, 2]])
x

array([[1, 1],
       [2, 2]])

In [None]:
x.sum(axis=0)   #columns first dimension

array([3, 3])

In [None]:
x.sum(axis=1)  #rows (second dimension)

array([2, 4])

### Other reductions

In [None]:
x = np.array([1, 3, 2])
x.min()

1

In [None]:
x.max()

3

In [None]:
x.argmin()# index of minimum element

0

In [None]:
x.argmax()# index of maximum element

1

### Logical Operations

In [None]:
np.all([True, True, False])

False

In [None]:
np.any([True, False, False])

True

In [None]:
#Note: can be used for array comparisions
a = np.zeros((50, 50))
np.any(a != 0)

False

In [None]:
np.all(a == a)

True

In [None]:
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
((a <= b) & (b <= c)).all()

True

### Statistics

In [None]:
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()

1.75

In [None]:
np.median(x)

1.5

In [None]:
np.median(y, axis=-1) # last axis

array([ 2.,  5.])

In [None]:
x.std()          # full population standard dev.

0.82915619758884995

### Example:

Data in populations.txt describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years.


In [None]:
#load data into numpy array object
data = np.loadtxt('populations.txt')

In [None]:
data

array([[  1900.,  30000.,   4000.,  48300.],
       [  1901.,  47200.,   6100.,  48200.],
       [  1902.,  70200.,   9800.,  41500.],
       [  1903.,  77400.,  35200.,  38200.],
       [  1904.,  36300.,  59400.,  40600.],
       [  1905.,  20600.,  41700.,  39800.],
       [  1906.,  18100.,  19000.,  38600.],
       [  1907.,  21400.,  13000.,  42300.],
       [  1908.,  22000.,   8300.,  44500.],
       [  1909.,  25400.,   9100.,  42100.],
       [  1910.,  27100.,   7400.,  46000.],
       [  1911.,  40300.,   8000.,  46800.],
       [  1912.,  57000.,  12300.,  43800.],
       [  1913.,  76600.,  19500.,  40900.],
       [  1914.,  52300.,  45700.,  39400.],
       [  1915.,  19500.,  51100.,  39000.],
       [  1916.,  11200.,  29700.,  36700.],
       [  1917.,   7600.,  15800.,  41800.],
       [  1918.,  14600.,   9700.,  43300.],
       [  1919.,  16200.,  10100.,  41300.],
       [  1920.,  24700.,   8600.,  47300.]])

In [None]:
year, hares, lynxes, carrots = data.T #columns to variables
print(year)

[ 1900.  1901.  1902.  1903.  1904.  1905.  1906.  1907.  1908.  1909.
  1910.  1911.  1912.  1913.  1914.  1915.  1916.  1917.  1918.  1919.
  1920.]


In [None]:
#The mean population over time
populations = data[:, 1:]
populations

array([[ 30000.,   4000.,  48300.],
       [ 47200.,   6100.,  48200.],
       [ 70200.,   9800.,  41500.],
       [ 77400.,  35200.,  38200.],
       [ 36300.,  59400.,  40600.],
       [ 20600.,  41700.,  39800.],
       [ 18100.,  19000.,  38600.],
       [ 21400.,  13000.,  42300.],
       [ 22000.,   8300.,  44500.],
       [ 25400.,   9100.,  42100.],
       [ 27100.,   7400.,  46000.],
       [ 40300.,   8000.,  46800.],
       [ 57000.,  12300.,  43800.],
       [ 76600.,  19500.,  40900.],
       [ 52300.,  45700.,  39400.],
       [ 19500.,  51100.,  39000.],
       [ 11200.,  29700.,  36700.],
       [  7600.,  15800.,  41800.],
       [ 14600.,   9700.,  43300.],
       [ 16200.,  10100.,  41300.],
       [ 24700.,   8600.,  47300.]])

In [None]:
#sample standard deviations
populations.std(axis=0)

array([ 20897.90645809,  16254.59153691,   3322.50622558])

In [None]:
#which species has the highest population each year?

np.argmax(populations, axis=1)

array([2, 2, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 0, 0, 0, 1, 2, 2, 2, 2, 2])

## Broadcasting

Basic operations on numpy arrays (addition, etc.) are elementwise

This works on arrays of the same size.
    Nevertheless, It’s also possible to do operations on arrays of different sizes if NumPy can transform these arrays     so that they all have the same size: this conversion is called broadcasting.

The image below gives an example of broadcasting:

![title](broadcasting.png)

In [None]:
a = np.tile(np.arange(0, 40, 10), (3,1))
print(a)

print("*************")
a=a.T
print(a)

[[ 0 10 20 30]
 [ 0 10 20 30]
 [ 0 10 20 30]]
*************
[[ 0  0  0]
 [10 10 10]
 [20 20 20]
 [30 30 30]]


In [None]:

b = np.array([0, 1, 2])
b

array([0, 1, 2])

In [None]:

a + b

array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

In [None]:
a = np.arange(0, 40, 10)
a.shape


(4,)

In [None]:
a = a[:, np.newaxis]  # adds a new axis -> 2D array
a.shape

(4, 1)

In [None]:
a

array([[ 0],
       [10],
       [20],
       [30]])

In [None]:
a + b

array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

## Array Shape Manipulation

### Flattening

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a.ravel() #Return a contiguous flattened array. A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.

array([1, 2, 3, 4, 5, 6])

In [None]:
a.T #Transpose

array([[1, 4],
       [2, 5],
       [3, 6]])

In [None]:
a.T.ravel()

array([1, 4, 2, 5, 3, 6])

### Reshaping

The inverse operation to flattening:

In [None]:
print(a.shape)
print(a)

(2, 3)
[[1 2 3]
 [4 5 6]]


In [None]:
b = a.ravel()
print(b)

[1 2 3 4 5 6]


In [None]:
b = b.reshape((2, 3))
b

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
b[0, 0] = 100
a

array([[100,   2,   3],
       [  4,   5,   6]])

**Note and       Beware: reshape may also return a copy!:**

In [None]:
a = np.zeros((3, 2))
b = a.T.reshape(3*2)
b[0] = 50
a

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

### Adding a Dimension

Indexing with the np.newaxis object allows us to add an axis to an array

newaxis is used to increase the dimension of the existing array by one more dimension, when used once. Thus,

1D array will become 2D array

2D array will become 3D array

3D array will become 4D array and so on

In [None]:
z = np.array([1, 2, 3])
z

array([1, 2, 3])

In [None]:
z[:, np.newaxis]

array([[1],
       [2],
       [3]])

### Dimension Shuffling

In [None]:
a = np.arange(4*3*2).reshape(4, 3, 2)
a.shape

(4, 3, 2)

In [None]:
a

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]]])

In [None]:
a[0, 2, 1]

5

### Resizing

In [None]:
a = np.arange(4)
a.resize((8,))
a

array([0, 1, 2, 3, 0, 0, 0, 0])

However, it must not be referred to somewhere else:

In [None]:
b = a
a.resize((4,)) 

ValueError: cannot resize an array that references or is referenced
by another array in this way.  Use the resize function

### Sorting Data

In [None]:
#Sorting along an axis:
a = np.array([[5, 4, 6], [2, 3, 2]])
b = np.sort(a, axis=1)
b

array([[4, 5, 6],
       [2, 2, 3]])

In [None]:
#in-place sort
a.sort(axis=1)
a

array([[4, 5, 6],
       [2, 2, 3]])

In [None]:
#sorting with fancy indexing
a = np.array([4, 3, 1, 2])
j = np.argsort(a)
j

array([2, 3, 1, 0])

In [None]:
a[j]

array([1, 2, 3, 4])