# Numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Originally Python was not designed for numeric computation. As people started using python for various tasks, the need for fast numeric computation arose. And the Numpy was created by a group of people in 2005 to address this challenge. Today in the era of Artificial Intelligence, it would not have been possible to train Machine Learning algorithms without a fast numeric library such as Numpy.

The following are the main reasons behind the fast speed of Numpy.
* Numpy array is a collection of similar data-types that are densely packed in memory. A Python list can have different data-types, which puts lots of extra constraints while doing computation on it.
* Numpy is able to divide a task into multiple subtasks and process them parallelly.
* Numpy functions are implemented in C. Which again makes it faster compared to Python Lists.

#  import numpy (never forget this)

In [2]:
import numpy as np

# to check which version we are using
np.version.version

In [4]:
 np.version.version

'1.20.1'

# to create a array in numpy

In [7]:
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

In [5]:
# type np. and then press tab to see all components
np

<module 'numpy' from 'C:\\Users\\jxie\\Anaconda3\\lib\\site-packages\\numpy\\__init__.py'>

In [6]:
# to view numpy documentation
np?

A Python Integer Is More Than Just an Integer
The standard Python implementation is written in C. This means that every Python
object is simply a cleverly disguised C structure, which contains not only its value, but
other information as well. For example, when we define an integer in Python, such as
x = 10000, x is not just a “raw” integer. It’s actually a pointer to a compound C structure,
which contains several values. Looking through the Python 3.4 source code, we
find that the integer (long) type definition effectively looks like this (once the C macros
are expanded):
```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```
A single integer in Python 3.4 actually contains four pieces:
* ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
* ob_type, which encodes the type of the variable
* ob_size, which specifies the size of the following data members
* ob_digit, which contains the actual integer value that we expect the Python variable to represent


# difference between a python list and numpy array

In [7]:
# a list in python
l=list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [8]:
type(l)

list

In [9]:
type(l[0])

int

In [12]:
# a list of strings
l1=[str(c) for c in l]
l1

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [13]:
type(l1)

list

In [14]:
type(l1[0])

str

In [15]:
l2=[2,1,'1',bool,True]
p2=[type(c) for c in l2]

In [16]:
p2

[int, int, str, type, bool]

But this flexibility comes at a cost: to allow these flexible types, each item in the list
must contain its own type info, reference count, and other information—that is, each
item is a complete Python object. In the special case that all variables are of the same
type, much of this information is redundant: it can be much more efficient to store
data in a fixed-type array. The difference between a dynamic-type list and a fixed-type
(NumPy-style) array is illustrated in Figure 2-2.
At the implementation level, the array essentially contains a single pointer to one contiguous
block of data. The Python list, on the other hand, contains a pointer to a
block of pointers, each of which in turn points to a full Python object like the Python
integer we saw earlier. Again, the advantage of the list is flexibility: because each list
element is a full structure containing both data and type information, the list can be
filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility,
but are much more efficient for storing and manipulating data.

![](images/22.PNG)

# python built-in array module for fixed type ,efficient data

In [18]:
import array

In [19]:
l=list(range(10))

In [23]:
Array=array.array('i',l)
print(Array)
# here 'i' represents it is a array of type integer

Array[3]

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


3

# numpy arrays

In [15]:
import numpy as np

In [16]:
# using a list to create an array
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

In [17]:
# all the types in numpy array should be same 
#if not values are upcasted if possible

np.array([1.0,2,3,4,5])
# converts everyone integer to floating point

array([1., 2., 3., 4., 5.])

In [18]:
np.array([1,2,3,4,'5'])

array(['1', '2', '3', '4', '5'], dtype='<U11')

In [19]:
# if we want to set data type of a array explicitly
np.array([1,2,3,4,5],dtype='float32')

array([1., 2., 3., 4., 5.], dtype=float32)

In [20]:
# creating a multidimensional array in numpy
x=np.array([range(3)])
print(x)
# keep in mind that range() is half-open
np.array([range(i,i+3) for i in [2,4,6,8]])

[[0 1 2]]


array([[ 2,  3,  4],
       [ 4,  5,  6],
       [ 6,  7,  8],
       [ 8,  9, 10]])

#  create a numpy array from scratch

In [21]:
# create a size specified array of only 0, np.zeros()
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [22]:
# mention data type of array explicitly
np.zeros(10,dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [23]:
# create a 3x5 array filled with one,np.ones()
np.ones((3,5),dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [24]:
# a 3x5 matrix filled with zeros
np.zeros((3,5),dtype=int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [25]:
# create a np array filled with 3.14,np.full()
np.full(10,3.14)

array([3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14])

In [26]:
np.full((3,5),3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [27]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:
# create a equi spaced array from 0 to 1 with 5 elements
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [29]:
# create a 3x3 array filled with random value between 0 and 1
np.random.random((3,5))

array([[0.30331556, 0.20156116, 0.63957366, 0.38560001, 0.25581328],
       [0.56392788, 0.60119951, 0.05535524, 0.86522106, 0.47119562],
       [0.94584845, 0.6627685 , 0.88538239, 0.04151066, 0.48889614]])

In [30]:
# create a 3x3e array with mean 0 and standard deviation 1
np.random.normal(0,1,(3,3))

array([[ 0.5585787 ,  0.07284954, -0.93512739],
       [ 0.25144169, -0.88039133, -1.81741983],
       [-0.25638069, -0.94972938,  0.14843666]])

In [31]:
# mean 2 and standard deviation 2
np.random.normal(2,2,(3,3))

array([[ 5.58414565,  3.47299152,  0.17848189],
       [ 0.31655025,  0.57868655, -2.95519041],
       [ 4.38075817,  3.87099664,  2.37502837]])

In [32]:
# create a 3x3 array fileed with random values from 0 to 10
np.random.randint(0,10,(3,3))

array([[5, 7, 4],
       [6, 2, 7],
       [7, 0, 4]])

In [33]:
# create a 3x3 identity matrix
np.eye(3,dtype=int)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

In [34]:
# create a empty array of given shape ,gets filled with whatever values are already exist
np.empty((3,3))

array([[5.58414565, 3.47299152, 0.17848189],
       [0.31655025, 0.57868655, 2.95519041],
       [4.38075817, 3.87099664, 2.37502837]])

In [37]:
np.empty(5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

#  numpy standard data types

In [54]:
np.zeros(10,dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [55]:
np.zeros(10,dtype=float)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [56]:
np.zeros(10,dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [57]:
np.zeros(10,dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

#  basics of numpy array

This section
will present several examples using NumPy array manipulation to access data
and subarrays, and to split, reshape, and join the arrays.

## array attributes

In [44]:
np.random.seed(0)
x1=np.random.randint(10,size=6)
x1

array([5, 0, 3, 3, 7, 9])

In [40]:
# every np array has attributes ndim,shape,size for its dimension,shape, and total size
x1.ndim

1

In [41]:
x1.shape

(6,)

In [42]:
x1.size

6

In [45]:
x2=np.random.randint(10,size=(2,3))
x2

array([[3, 5, 2],
       [4, 7, 6]])

In [46]:
x2.ndim

2

In [47]:
x2.shape

(2, 3)

In [48]:
x2.size

6

In [49]:
# we also have dtype to see data type of array
x2.dtype

dtype('int32')

In [50]:
# accessing array elememts
x1

array([5, 0, 3, 3, 7, 9])

In [51]:
x1[0]

5

In [52]:
# to index from the end use -1 and so on
x1[-1]

9

In [53]:
x1[-2]

7

In [54]:
x2

array([[3, 5, 2],
       [4, 7, 6]])

In [55]:
# to access elements in a multidim array
print(x2[0,0])
print(x2[(0,0)])
print(x2[0][0])

3
3
3


In [56]:
print(x2[0,1])

5


In [57]:
## modify values using abov notation
x2[0,1]=10

In [58]:
x2

array([[ 3, 10,  2],
       [ 4,  7,  6]])

In [111]:
# if we assign a float value it will be truncated because its a int array
x2[0,0]=99.15
x2

array([[99, 10,  1],
       [ 3,  3,  3]])

## array slicing,sub arrays

Just as we can use square brackets to access individual array elements, we can also use
them to access subarrays with the slice notation, marked by the colon (:) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of
an array x, use this:
x[start:stop:step]
If any of these are unspecified, they default to the values start=0, stop=size of
dimension, step=1.

In [60]:
x=np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [61]:
# get first 5 element subarray (half-open range)
x[:5]

array([0, 1, 2, 3, 4])

In [62]:
x[2:5]

array([2, 3, 4])

In [63]:
x[5:]

array([5, 6, 7, 8, 9])

In [64]:
# steps of 2
x[::2]

array([0, 2, 4, 6, 8])

In [119]:
# starting from x[1], steps of 2
x[1::2]

array([1, 3, 5, 7, 9])

In [65]:
# if step value is negative , start and stop are swapped, easy way to reverse an array
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [121]:
x[5::-2]

array([5, 3, 1])

### multidimensional array slicing

In [66]:
x2=np.random.randint(20,size=(3,4))

In [67]:
x2

array([[12,  1,  6,  7],
       [14, 17,  5, 13],
       [ 8,  9, 19, 16]])

In [68]:
x2[:,:]

array([[12,  1,  6,  7],
       [14, 17,  5, 13],
       [ 8,  9, 19, 16]])

In [69]:
# upto second row and second column
x2[:2,:2]

array([[12,  1],
       [14, 17]])

In [70]:
# all rows alternate columns
x2[:,::2]

array([[12,  6],
       [14,  5],
       [ 8, 19]])

In [71]:
# reverse both rows and columns
x2[::-1,::-1]

array([[16, 19,  9,  8],
       [13,  5, 17, 14],
       [ 7,  6,  1, 12]])

In [72]:
# accessing 1st column of an array
print(x2)
print("===============")
print(x2[:,0])

[[12  1  6  7]
 [14 17  5 13]
 [ 8  9 19 16]]
[12 14  8]


In [73]:
# print first row of x2
print(x2[0,:])

[12  1  6  7]


In [74]:
# also can be used
print(x2[0])

[12  1  6  7]


**One important—and extremely useful—thing to know about array slices is that they
return references rather than copies of the array data. This is one area in which NumPy
array slicing differs from Python list slicing: in lists, slices will be copies.**

In [75]:
print(x2)

[[12  1  6  7]
 [14 17  5 13]
 [ 8  9 19 16]]


In [76]:
x2_sub=x2[:2,:2]

In [77]:
x2_sub

array([[12,  1],
       [14, 17]])

In [78]:
x2_sub[0,0]=45

In [79]:
# we see that x2 is changed as well despite making changes only in x2_sub, 
# this is because np array slicing return references instead of copy of array data
print(x2)

[[45  1  6  7]
 [14 17  5 13]
 [ 8  9 19 16]]


### create copies

In [80]:
# to create copy we use copy()
x2_sub_copy=x2[:2,:2].copy()

In [81]:
x2_sub_copy

array([[45,  1],
       [14, 17]])

In [82]:
x2_sub_copy[0,0]=100

In [83]:
# x2 has no change in this case
x2

array([[45,  1,  6,  7],
       [14, 17,  5, 13],
       [ 8,  9, 19, 16]])

## reshaping arrays

In [84]:
np.arange(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [85]:
# to reshape easiest way is to use reshape()
np.arange(9).reshape((3,3))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [86]:
# note that for this to work size of original array should match the size of the new array
np.arange(9).reshape((3,4))

ValueError: cannot reshape array of size 9 into shape (3,4)

In [93]:
#Another common reshaping pattern is the conversion of a one-dimensional array
#into a two-dimensional row or column matrix. You can do this with the reshape
#method, or more easily by making use of the newaxis keyword within a slice operation:
x=np.array([1,2,3])
print(x)
print(x.shape)
# add one dimention using reshape()
y=x.reshape((1,3))
y.shape
print(y)

[1 2 3]
(3,)
[[1 2 3]]


In [94]:
#can also use np.newaxis to add one dimention
x[np.newaxis,:]

array([[1, 2, 3]])

In [95]:
x.reshape((3,1))

array([[1],
       [2],
       [3]])

In [97]:
print(x)
y=x[:,np.newaxis]
print(y)
y.shape

[1 2 3]
[[1]
 [2]
 [3]]


(3, 1)

## array concatination and splitting

All of the preceding routines worked on single arrays. It’s also possible to combine
multiple arrays into one, and to conversely split a single array into multiple arrays.
We’ll take a look at those operations here.

### concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished
through the routines np.concatenate, np.vstack, and np.hstack. np.concatenate
takes a tuple or list of arrays as its first argument, as we can see here:

In [98]:
x=np.array([1,2,3])
y=np.array([4,5,6])
np.concatenate([x,y])

array([1, 2, 3, 4, 5, 6])

In [99]:
z=np.array([7,8,9])
np.concatenate([x,y,z])

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [100]:
p=np.array([[1,2,3],[4,5,6]])
q=np.array([[7,8,9],[10,11,12]])
np.concatenate([p,q])

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [183]:
# default axis =0 ( | axis)
# axis = 1 (--> axis)
# since this is s 2-d array, there are only | axis and --> axis
np.concatenate([p,q],axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [186]:
# vertical stack
x=np.array([1,2,3])
y=np.array([[4,5,6],[7,8,9]])
np.vstack([x,y])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [187]:
x=np.array([[99],[99]])
np.hstack([x,y])

array([[99,  4,  5,  6],
       [99,  7,  8,  9]])

### splitting array

The opposite of concatenation is splitting, which is implemented by the functions
np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices
giving the split points:

In [101]:
x=np.array([0,1,2,3,4,5,6,7,8])

In [102]:
#Notice that N split points lead to N + 1 subarrays.
# the split points [3, 5] are indices of the array x
y1,y2,y3=np.split(x,[3,5])

In [103]:
y1

array([0, 1, 2])

In [104]:
y2

array([3, 4])

In [105]:
y3

array([5, 6, 7, 8])

In [106]:
y1,y2,y3,y4=np.split(x,[3,5,8])
print(y1)
print(y2)
print(y3)
print(y4)

[0 1 2]
[3 4]
[5 6 7]
[8]


In [117]:
grid=np.arange(20).reshape([5,4])

In [118]:
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [119]:
upper,lower=np.vsplit(grid,[2])
# try changing value of 2 to see result change

In [120]:
print(upper)
print('==============')
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


In [121]:
left,right=np.hsplit(grid,[2])
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [122]:
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]
 [16 17]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]
 [18 19]]


## computation on numpy arrays

Up until now, we have been discussing some of the basic nuts and bolts of NumPy; in
the next few sections, we will dive into the reasons that NumPy is so important in the
Python data science world. Namely, it provides an easy and flexible interface to optimized
computation with arrays of data.
Computation on NumPy arrays can be very fast, or it can be very slow. The key to
making it fast is to use vectorized operations, generally implemented through Num‐
Py’s universal functions (ufuncs). This section motivates the need for NumPy’s ufuncs,
which can be used to make repeated calculations on array elements much more efficient.
It then introduces many of the most common and useful arithmetic ufuncs
available in the NumPy package.

In [200]:
# lets calculate reciprocal of an array elements
l=np.random.randint(1,10,size=5)
l

array([1, 5, 6, 6, 7])

In [201]:
def reciprocal(l):
    output=np.empty(len(l))
    for i in range(len(l)):
        output[i]=1/l[i]
    (output)

In [202]:
# lets calculate time taken for this loop for 5 elements by using timeit
%timeit reciprocal(l)

2.35 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [203]:
# lets calculate for 1000000 elements
l=np.arange(1,1000001)
print(len(l))

1000000


In [204]:
# this takes lot of time to compute
%timeit reciprocal(l)

268 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It takes several seconds to compute these million operations and to store the result!
When even cell phones have processing speeds measured in Giga-FLOPS (i.e., billions
of numerical operations per second), this seems almost absurdly slow. It turns
out that the bottleneck here is not the operations themselves, but the type-checking
and function dispatches that CPython must do at each cycle of the loop. Each time
the reciprocal is computed, Python first examines the object’s type and does a
dynamic lookup of the correct function to use for that type. If we were working in
compiled code instead, this type specification would be known before the code executes
and the result could be computed much more efficiently.

In [205]:
%timeit reciprocal(l)

258 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [206]:
# directly performing operation on array
%timeit (1.0/l)

2.38 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Looking at the execution time for our big array, we see that it completes orders of
magnitude faster than the Python loop:
For many types of operations, NumPy provides a convenient interface into just this
kind of statically typed, compiled routine. This is known as a vectorized operation.
You can accomplish this by simply performing an operation on the array, which will
then be applied to each element. This vectorized approach is designed to push the
loop into the compiled layer that underlies NumPy, leading to much faster execution.

Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is
to quickly execute repeated operations on values in NumPy arrays. Ufuncs are
extremely flexible—before we saw an operation between a scalar and an array, but we
can also operate between two arrays:

In [207]:
np.arange(5)/np.arange(1,6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

And ufunc operations are not limited to one-dimensional arrays—they can act on
multidimensional arrays as well:

In [208]:
x=np.arange(9).reshape((3,3))
x

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [209]:
#each element replaced by 2^that element
2**x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]], dtype=int32)

In [210]:
# each element^2
x**2

array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]], dtype=int32)

## exploring numpy's ufuncs 

Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary
ufuncs, which operate on two inputs. We’ll see examples of both these types of functions
here.

### array arithmetic

In [211]:
x=np.arange(1,5)
x

array([1, 2, 3, 4])

In [212]:
print("x+5=",x+5)

x+5= [6 7 8 9]


In [213]:
print("x-5=",x-5)

x-5= [-4 -3 -2 -1]


In [214]:
print("x/2=",x/2)

x/2= [0.5 1.  1.5 2. ]


In [215]:
print("x*2=",x*2)

x*2= [2 4 6 8]


In [218]:
# floor-divide
print("x//2=",x//2)

x//2= [0 1 1 2]


In [219]:
# ** for exponent
# % for modulus
print(x**2)
print(x%2)

[ 1  4  9 16]
[1 0 1 0]


In [220]:
# In addition, these can be strung together however you wish, and the standard order
# of operations is respected:
-(.5*x+1)**2

array([-2.25, -4.  , -6.25, -9.  ])

In [221]:
# arithmetic operations implemented in numpy
print(x)
print(np.add(x,2)) # x+2
print(np.subtract(x,2)) #x-2
print(np.negative(x)) #-x
print(np.multiply(x,2)) #x*2
print(np.divide(x,2)) #x/2
print(np.floor_divide(x,2)) #x//2
print(np.power(x,2)) #x^2
print(np.mod(x,2)) #x%2
# we will also see boolean and bitwise operations later

[1 2 3 4]
[3 4 5 6]
[-1  0  1  2]
[-1 -2 -3 -4]
[2 4 6 8]
[0.5 1.  1.5 2. ]
[0 1 1 2]
[ 1  4  9 16]
[1 0 1 0]


In [222]:
# inbuilt absolute function
y=np.array([-1,-2,-3,-4])
print(abs(y))
print(np.absolute(y))
print(np.abs(y))

[1 2 3 4]
[1 2 3 4]
[1 2 3 4]


### trigonometric functions

In [223]:
theta=np.linspace(0,np.pi,3)
theta

array([0.        , 1.57079633, 3.14159265])

In [224]:
x=np.sin(theta)
x

array([0.0000000e+00, 1.0000000e+00, 1.2246468e-16])

In [225]:
np.cos(theta)

array([ 1.000000e+00,  6.123234e-17, -1.000000e+00])

In [226]:
np.tan(theta)

array([ 0.00000000e+00,  1.63312394e+16, -1.22464680e-16])

In [227]:
#The values are computed to within machine precision, which is why values that
#should be zero do not always hit exactly zero. Inverse trigonometric functions are also
#available:

np.arcsin(x)

array([0.00000000e+00, 1.57079633e+00, 1.22464680e-16])

In [228]:
# exponential  
np.exp(x) #e^x

array([1.        , 2.71828183, 1.        ])

In [229]:
x=np.array([1,0,2]) #2^x
np.exp2(x)

array([2., 1., 4.])

In [230]:
np.power(3,x)
#3^x

array([3, 1, 9], dtype=int32)

In [231]:
# log functions
x=np.array([1,2,3,4])
np.log(x) #ln(x)

array([0.        , 0.69314718, 1.09861229, 1.38629436])

In [232]:
np.log2(x) #log2(x)

array([0.       , 1.       , 1.5849625, 2.       ])

In [233]:
np.log10(x) #log10(x)

array([0.        , 0.30103   , 0.47712125, 0.60205999])

In [234]:
# when x is very small ,use this function istead of above discussed..these are expm1 and log1p as they give more accurate result for smaller value
x=np.array([0,0.1,0.01,0.001])

In [235]:
np.expm1(x) #exp(x)-1

array([0.        , 0.10517092, 0.01005017, 0.0010005 ])

In [236]:
np.log1p(x) # log(1+x)

array([0.        , 0.09531018, 0.00995033, 0.0009995 ])

# advanced ufuncs features

### specifying output

For large calculations, it is sometimes useful to be able to specify the array where the
result of the calculation will be stored. Rather than creating a temporary array, you
can use this to write computation results directly to the memory location where you’d
like them to be. For all ufuncs, you can do this using the out argument of the
function:

In [237]:
x=np.arange(5)
print("value in x =",x)
y=np.empty(5,dtype=int)
np.multiply(x,2,out=y)
print("value of y =",y)

value in x = [0 1 2 3 4]
value of y = [0 2 4 6 8]


In [238]:
#This can even be used with array views. For example, we can write the results of a
#computation to every other element of a specified array:

y=np.zeros(10)
np.power(2,x,out=y[::2])
print(y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


### aggregates

For binary ufuncs, there are some interesting aggregates that can be computed
directly from the object. For example, if we’d like to reduce an array with a particular
operation, we can use the reduce method of any ufunc. A reduce repeatedly applies a
given operation to the elements of an array until only a single result remains.


For example, calling reduce on the add ufunc returns the sum of all elements in the
array:

In [239]:
x=np.arange(1,6)
print(x)
np.add.reduce(x)

[1 2 3 4 5]


15

In [240]:
np.multiply.reduce(x)

120

In [241]:
# if we like to store all the intermediate values , we can use accumulate
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15], dtype=int32)

In [242]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120], dtype=int32)

### outer product

Finally, any ufunc can compute the output of all pairs of two different inputs using
the outer method. This allows you, in one line, to do things like create a multiplication
table:

In [243]:
x=np.arange(1,6)
np.multiply.outer(x,x)

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

In [244]:
np.add.outer(x,x)

array([[ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10]])

## aggregations: min, max and everything in between

Often when you are faced with a large amount of data, a first step is to compute summary
statistics for the data in question. Perhaps the most common summary statistics
are the mean and standard deviation, which allow you to summarize the “typical” values
in a dataset, but other aggregates are useful as well (the sum, product, median,
minimum and maximum, quantiles, etc.).

NumPy has fast built-in aggregation functions for working on arrays; we’ll discuss
and demonstrate some of them here.

### summing values in array

In [245]:
#As a quick example, consider computing the sum of all values in an array. Python
#itself can do this using the built-in sum function:
x=np.random.randint(1,100,size=10000)
sum(x)

496103

In [246]:
# numpy sum function
np.sum(x)

496103

In [249]:
# However, because it executes the operation in compiled code, NumPy’s version of the
#operation is computed much more quickly:
%timeit sum(x)

583 µs ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [250]:
%timeit np.sum(x)

6.85 µs ± 57.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Be careful, though: the sum function and the np.sum function are not identical, which
can sometimes lead to confusion! In particular, their optional arguments have different
meanings, and np.sum is aware of multiple array dimensions, as we will see in the
following section.

### minimum and maximum

Similarly, Python has built-in min and max functions, used to find the minimum value
and maximum value of any given array:

In [252]:
x=np.arange(50,100000)
x

array([   50,    51,    52, ..., 99997, 99998, 99999])

In [253]:
# python inbuilt min and numpy min ,again where numpy implementation is much fatser
print(min(x))
np.min(x)

50


50

In [254]:
print(max(x))
np.max(x)

99999


99999

In [255]:
%timeit min(x)

4.23 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [256]:
%timeit np.min(x)

54.2 µs ± 322 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [257]:
# we can also use methods of array objects
x.min(),x.max()

(50, 99999)

Whenever possible, make sure that you are using the NumPy version of these aggregates
when operating on NumPy arrays!

### multidimensional aggregates

One common type of aggregation operation is an aggregate along a row or column.
Say you have some data stored in a two-dimensional array:

In [258]:
x=np.random.random((3,4))
x

array([[0.30946286, 0.48914295, 0.69297402, 0.75867347],
       [0.40178098, 0.75560642, 0.72760401, 0.63247417],
       [0.14162976, 0.28724399, 0.53387675, 0.81393193]])

In [259]:
# by default over entire table
x.sum()

6.544401310812729

Aggregation functions take an additional argument specifying the axis along which
the aggregate is computed. For example, we can find the minimum value within each
column by specifying axis=0:

In [260]:
# The function returns four values, corresponding to the four columns of numbers.
x.sum(axis=0)

array([0.8528736 , 1.53199336, 1.95445478, 2.20507957])

In [261]:
x.sum(axis=1)

array([2.25025329, 2.51746558, 1.77668244])

In [263]:
# to find max value within each row
x.max(axis=1)

array([0.75867347, 0.75560642, 0.81393193])

The way the axis is specified here can be confusing to users coming from other languages.
The axis keyword specifies the dimension of the array that will be collapsed,
rather than the dimension that will be returned. So specifying axis=0 means that the
first axis will be collapsed: for two-dimensional arrays, this means that values within
each column will be aggregated.

### other aggregate functions

NumPy provides many other aggregation functions, but we won’t discuss them in
detail here. Additionally, most aggregates have a NaN-safe counterpart that computes
the result while ignoring missing values, which are marked by the special IEEE
floating-point NaN value
Some of these NaN-safe functions were not added until
NumPy 1.8, so they will not be available in older NumPy versions.

In [264]:
np.sum(x)

6.544401310812729

In [265]:
# nan support sum aggregate
np.nansum(x)

6.544401310812729

np.sum np.nansum Compute sum of elements

np.prod np.nanprod Compute product of elements

np.mean np.nanmean Compute median of elements

np.std np.nanstd Compute standard deviation

np.var np.nanvar Compute variance

np.min np.nanmin Find minimum value

np.max np.nanmax Find maximum value

np.argmin np.nanargmin Find index of minimum value

np.argmax np.nanargmax Find index of maximum value

np.median np.nanmedian Compute median of elements

np.percentile np.nanpercentile Compute rank-based statistics of elements

np.any N/A Evaluate whether any elements are true

np.all N/A Evaluate whether all elements are true

## computation on arrays : broadcasting

We saw in the previous section how NumPy’s universal functions can be used to vectorize
operations and thereby remove slow Python loops. Another means of vectorizing
operations is to use NumPy’s broadcasting functionality. Broadcasting is simply a
set of rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on
arrays of different sizes.

### introducing broadcasting

In [266]:
x=np.array([1,2,3,4])
y=np.array([5,6,7,8])

In [267]:
x+y

array([ 6,  8, 10, 12])

Broadcasting allows these types of binary operations to be performed on arrays of different
sizes—for example, we can just as easily add a scalar (think of it as a zerodimensional
array) to an array:

In [268]:
x+5

array([6, 7, 8, 9])

We can think of this as an operation that stretches or duplicates the value 5 into the
array [5, 5, 5, 5], and adds the results. The advantage of NumPy’s broadcasting is that
this duplication of values does not actually take place, but it is a useful mental model
as we think about broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when
we add a one-dimensional array to a two-dimensional array:

In [269]:
arr=np.arange(9).reshape((3,3))

In [270]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [271]:
z=[1,2,3]
arr+z

array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11]])

Here the one-dimensional array a is stretched, or broadcast, across the second
dimension in order to match the shape of M.

While these examples are relatively easy to understand, more complicated cases can
involve broadcasting of both arrays. Consider the following example:

In [272]:
a=np.arange(3)
a

array([0, 1, 2])

In [273]:
b=np.arange(3)[:,np.newaxis]
b

array([[0],
       [1],
       [2]])

In [274]:
a+b


array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction
between the two arrays:

• Rule 1: If the two arrays differ in their number of dimensions, the shape of the
one with fewer dimensions is padded with ones on its leading (left) side.

• Rule 2: If the shape of the two arrays does not match in any dimension, the array
with shape equal to 1 in that dimension is stretched to match the other shape.

• Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is
raised.

In [275]:
# example where one array is broadcasted
m=np.ones((3,3))

In [276]:
a=np.arange(3)
# a is broadcasted

In [277]:
m+a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [278]:
# example where both arrays are broadcasted

In [279]:
m=np.array([0,1,2])

In [280]:
n=np.array([0,1,2]).reshape((3,1))

In [281]:
m+n

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

In [282]:
# example where 2 arrays are not compatible

In [283]:
m=np.ones((3,2))
n=np.arange(3)

In [284]:
m+n

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

We saw that using +, -, *, /,
and others on arrays leads to element-wise operations. NumPy also implements comparison
operators such as < (less than) and > (greater than) as element-wise ufuncs.
The result of these comparison operators is always an array with a Boolean data type.
All six of the standard comparison operations are available:

In [285]:
x=np.array([1,2,3,4])
x<3

array([ True,  True, False, False])

In [286]:
x>3

array([False, False, False,  True])

In [287]:
x<=3

array([ True,  True,  True, False])

In [288]:
x>=3

array([False, False,  True,  True])

In [289]:
x!=3

array([ True,  True, False,  True])

In [290]:
x==3

array([False, False,  True, False])

It is also possible to do an element-by-element comparison of two arrays, and to
include compound expressions:

In [291]:
x*2==x**2

array([False,  True, False, False])

As in the case of arithmetic operators, the comparison operators are implemented as
ufuncs in NumPy; for example, when you write x < 3, internally NumPy uses
np.less(x, 3). A summary of the comparison operators and their equivalent ufunc
is shown here:
Operator    Equivalent ufunc

== np.equal

!= np.not_equal

< np.less

<= np.less_equal

, > np.greater

, >= np.greater_equal

In [292]:
# works on 2d array
x=np.random.randint(0,10,(3,3))

In [293]:
x

array([[7, 1, 1],
       [1, 7, 4],
       [6, 5, 8]])

In [294]:
x<4

array([[False,  True,  True],
       [ True, False, False],
       [False, False, False]])

In [295]:
# To count the number of True entries in a Boolean array, np.count_nonzero is useful
np.count_nonzero(x<4)

3

Another way to get at this
information is to use np.sum; in this case, False is interpreted as 0, and True is interpreted
as 1:

In [296]:
np.sum(x<4)

3

In [297]:
#The benefit of sum() is that like with other NumPy aggregation functions, this summation
# can be done along rows or columns as well:

## how many values less than 4 in each row?
np.sum(x<4,axis=1)
# This counts the number of values less than 4 in each row of the matrix.

array([2, 1, 0])

In [298]:
# If we’re interested in quickly checking whether any or all the values are true, we can
# use (you guessed it) np.any() or np.all():
np.any(x<6)

True

In [299]:
np.all(x<6)

False

In [300]:
np.any(x<6,axis=1)

array([ True,  True,  True])

## boolean operation

In [320]:
x=np.arange(0,10,2)
x

array([0, 2, 4, 6, 8])

In [321]:
np.sum((x>3) & (x<6))

1

& np.bitwise_and

| np.bitwise_or

^ np.bitwise_xor

~ np.bitwise_not

In [322]:
np.bitwise_and(x,1)

array([0, 0, 0, 0, 0], dtype=int32)

## boolean array as masks

In the preceding section, we looked at aggregates computed directly on Boolean
arrays. A more powerful pattern is to use Boolean arrays as masks, to select particular
subsets of the data themselves. Returning to our x array from before, suppose we
want an array of all values in the array that are less than, say, 5:

In [307]:
x=np.arange(9).reshape((3,3))
x

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [308]:
x<5

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

In [306]:
x[x<5]

array([0, 1, 2, 3, 4])

## Using the Keywords and/or Versus the Operators &/|

One common point of confusion is the difference between the keywords and and or
on one hand, and the operators & and | on the other hand. When would you use one
versus the other?
The difference is this: and and or gauge the truth or falsehood of entire object, while &
and | refer to bits within each object.
When you use and or or, it’s equivalent to asking Python to treat the object as a single
Boolean entity. In Python, all nonzero integers will evaluate as True. Thus:

In [309]:
bool(42),bool(0)

(True, False)

In [310]:
bool(42 and 0)

False

In [311]:
42 and 0

0

In [330]:
42 and 11

11

In [323]:
1 and 2

2

In [332]:
2 and 1

1

When you use & and | on integers, the expression operates on the bits of the element,
applying the and or the or to the individual bits making up the number:

In [315]:
42 & 20

0

In [316]:
3 & 1

1

In [317]:
1 & 3

1

In [318]:
3 | 2

3

In [319]:
4|2

6

In [325]:
bin(2)

'0b10'

In [326]:
bin(4)

'0b100'

In [327]:
bin(6)

'0b110'

In [328]:
bin(4|2)

'0b110'

In [333]:
 #When you have an array of Boolean values in NumPy, this can be thought of as a
#string of bits where 1 = True and 0 = False, and the result of & and | operates in a
#similar manner as before:

a=np.array([1,0,1,0,1])
b=np.array([0,1,1,1,0])
a|b

array([1, 1, 1, 1, 1], dtype=int32)

In [334]:
a=np.array([1,0,1,0,1],dtype=bool)
b=np.array([0,1,1,1,0],dtype=bool)
a|b

array([ True,  True,  True,  True,  True])

In [335]:
a=np.array([1,0,1,0,1],dtype=bool)
b=np.array([0,1,1,1,0],dtype=bool)
a&b

array([False, False,  True, False, False])

In [336]:
a |b

array([ True,  True,  True,  True,  True])

So remember this: and and or perform a single Boolean evaluation on an entire
object, while & and | perform multiple Boolean evaluations on the content (the individual
bits or bytes) of an object. For Boolean NumPy arrays, the latter is nearly
always the desired operation.

## fancy indexing

In the previous sections, we saw how to access and modify portions of arrays using
simple indices (e.g., arr[0]), slices (e.g., arr[:5]), and Boolean masks (e.g., arr[arr> 0] )

In this section, we’ll look at another style of array indexing, known as fancy
indexing. Fancy indexing is like the simple indexing we’ve already seen, but we pass
arrays of indices in place of single scalars. This allows us to very quickly access and
modify complicated subsets of an array’s values.

Fancy indexing is conceptually simple: it means passing an array of indices to access
multiple array elements at once. For example, consider the following array:

In [337]:
x=np.random.randint(100,size=20)

In [338]:
x

array([21, 90, 61, 72,  8, 28, 14, 81, 50, 32, 55, 88,  6, 15, 15, 92, 89,
       14, 22, 66])

In [339]:
# suppose we want to access 3 different element 
[x[2],x[9],x[17]]

[61, 32, 14]

In [340]:
ind=[2,9,17]
x[ind]

array([61, 32, 14])

In [341]:
ind=np.array([[1,2],[3,4]])
x[ind]

array([[90, 61],
       [72,  8]])

In [342]:
# fancy indexing also works on multiple dimension
x=np.arange(12).reshape((3,4))
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [343]:
row=np.array([0,1,2])
col=np.array([1,2,3])
x[row,col]
# first value is x[0,1] then x[1,2] then x[2,3]

array([ 1,  6, 11])

In [345]:
print(row[:,np.newaxis])
x[row[:,np.newaxis],col]

[[0]
 [1]
 [2]]


array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

### combined indexing

In [346]:
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [347]:
x[2,[0,1,2]]

array([ 8,  9, 10])

In [348]:
x[1:,[2,0,1]]

array([[ 6,  4,  5],
       [10,  8,  9]])

In [349]:
# we can combine fancy indexing with masking
mask=np.array([1,0,1,0],dtype=bool)
print(mask)
x[row[:,np.newaxis],mask]

[ True False  True False]


array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])