
# NumPy and SciPy

In this notebook we will learn NumpY and SciPy. 

### Installation

If you have installed Anaconda, then you should be ready to go. If not, then you will have to install add-ons manually after installing Python, in the order of Numpy and the Scipy. You can use pip installer. Execute the following command to install:

 pip install numpy

Otherwise, if you are running Python via the Anaconda distribution, you can execute the following command instead:

conda install numpy

### Importing the Numpy module
There are several ways to import Numpy. The standard approach is to use a simple import setatement: 

In [1]:
#import numpy

However, for large amounts of calls to NumPy functions, it can become tedious to write numpy.X over and over again. Instead, it is common to import under the briefer name np:

In [1]:
import numpy as np

Above code renames the Numpy namespace to np. This permits us to prefix Numpy function, methods, and attributes with " np " instead of typing " numpy."
This statement will allow us to access NumPy objects using np.X instead of numpy.X. 

To check the version of numpy use the command

In [2]:
print (np.__version__)

1.16.2


### Arrays

The whole NumPy library is based on one main object: **ndarray** which stands for N-Dimensional array which is fast, flexible container for large datasets in Python. This object is a multidimensional homogeneous array with a predetermined number of items: homogeneous because virtually all the items within it are of the same type and the same size. In fact, the data type is specified by another NumPy object called dtype (data-type); each ndarray is associated with only one type of dtype.

Arrays are similar to lists in Python, except that every element of an array must be of the same type. Items in the collection can be accessed using a zero-based index.

Moreover, another peculiarity of NumPy arrays is that their size is fixed, that is, once you defined their size at the time of creation, it remains unchanged. This behavior is different from Python lists, which can grow or shrink in size.

Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

The number of the dimensions and items in an array is defined by its shape, a tuple of N-positive integers that specifies the size for each dimension. The dimensions are defined as **axes** and the number of axes as **rank**.

### Creating ndarrays
The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array con‐ taining the passed data.


In [4]:
# Syntax 
np.array

<function numpy.array>

In [5]:
x = np.array([1,2,4,5])   # Python assigns the data type creates one- dimensional array

print(x)


# what is its type? 
print(" type:", type(x))

# What is stored. Type of the item can be checked with the dtype property
print("type of variable:",x.dtype)

      
# how many elements
print("size:", np.size(x))


# shape
print("shape:", x.shape)


[1 2 4 5]
 type: <class 'numpy.ndarray'>
type of variable: int64
size: 4
size: (4,)


In [6]:
x1 = [1,2,3,4]
x11 = np.array(x1)
x11

array([1, 2, 3, 4])

In [7]:
x = np.array([1.0,2,4.0,5.0])   # Python assigns the data type
print(x.dtype)

float64


### Data Types for ndarrays
The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:
Here, the function array takes two arguments: the list to be converted into the array and the type of each member of the list.

In [8]:
# you can use to force the type
x = np.array([1,2,4,5], dtype = float)
print(x)
print("type of x:", x.dtype)

[1. 2. 4. 5.]
type of x: float64


**dtypes** are a source of NumPy’s flexibility for interacting with data coming from other systems. In most cases they provide a mapping directly onto an underlying disk or memory representation, which makes it easy to read and write binary streams of data to disk and also to connect to code written in a low-level language like C or Fortran.

In [3]:
#You can explicitly convert or cast an array from one dtype to another using 
# ndarray’s astype method:

arr = np.array([1, 2, 3, 4, 5])
print("arr type:", arr.dtype)
float_arr = arr.astype(np.float64)
print("arr_after_conversion:", float_arr.dtype)

arr type: int64
arr_after_conversion: float64


In [4]:
str_arr = np.array(["john", "jenny", "ron"], dtype = np.string_)
print(str_arr)
print("type:", str_arr.dtype)

[b'john' b'jenny' b'ron']
type: |S5


If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised

In [5]:
str_arr.astype(np.float64)

ValueError: could not convert string to float: 'john'

You can also use another array’s dtype attribute:

In [6]:
x1 = np.array([1,2,3,4,5,6,7,8,9,10])
print("x1 type:", x1.dtype)
x2 = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
x1_modified = x1.astype(x2.dtype)
print("x1_modified type:", x1_modified.dtype)


x1 type: int64
x1_modified type: float64


#### ndarray.shape

Just-created array has one axis, and then its rank is 1, while its shape should be (3,1). To obtain these values from the corresponding array it is sufficient to use the **ndim** attribute for getting the axes, the **size** attribute to know the array length, and the **shape** attribute to get its shape.
This array attribute returns a tuple consisting of array dimensions. It can also be used to resize the array.

In [13]:
x1.ndim

1

In [7]:
# Shape of the array have one dimension
print(x1.shape)

(10,)


In [8]:
x1.size

10

In [16]:
len(x1)

10

Another important attribute is **itemsize**, which can be used with ndarray objects. It defines the size
in bytes of each item in the array, and **data** is the buffer containing the actual elements of the array.


In [17]:
x1.itemsize

8

In [18]:
x1.data

<memory at 0x106d85460>

### Intrinsic Creation of an Array
The NumPy library provides a set of functions that generate the ndarrays with an initial content, created with some different values depending on the function. You’ll discover that these features will be very useful. In fact, they allow a single line of code to generate large amounts of data.

In [9]:
# repeat a sequence 5 times

# print(np.array([2]*5))

a = np.zeros(5)  # creates an array of all zeros 
print(a) 



[0. 0. 0. 0. 0.]


In [10]:
b = np.ones(5)  # creates an arry of all ones
print(b)


[1. 1. 1. 1. 1.]


In [11]:
c = np.full(5, 7)  # Create a constant array
print(c)

[7 7 7 7 7]


By default, the two functions have created arrays with float64 data type. A feature that will be particularly useful is **arange()**. This function generates NumPy arrays with numerical sequences that respond to particular rules depending on the passed arguments. For example, if you want to generate a sequence of values between 0 and 10, you will be passed only one argument to the function, that is the value with which you want to end the sequence.

In [20]:
print(np.arange(0,10))
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]


If instead of starting from zero you want to start from another value, simply specify two arguments: the first is the starting value and the second is the final value.

In [12]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

It is also possible to generate a sequence of values with precise intervals between them. If the third argument of the arange() function is specified, this will represent the gap between a value and the next one in the sequence of values.

In [13]:
# A step value can also be provided to np.arange
print(np.arange(0,10, 3))

[0 3 6 9]


Another function very similar to arange() is **linspace()**. This function still takes as its first two arguments the initial and end values of the sequence, but the third argument, instead of specifying the distance between one element and the next, defines the number of elements into which we want the interval to be split.


In [14]:
# np.linspace() function is similar to np.arange(), but generates an array of a 
# specific number of items between the specified start and stop values
print(np.linspace(0,11,11))

[ 0.   1.1  2.2  3.3  4.4  5.5  6.6  7.7  8.8  9.9 11. ]


#### Find the index of value in Numpy Array using numpy.where()

In [15]:
# Create a numpy array from a list of numbers
arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])

In [18]:
# Get the index of elements with value 15
result = np.where(arr == 15)
 
print(type(result))
print(len(result))
print('Tuple of arrays returned : ', result)
print("Elements with value 15 exists at following indices", result[0], sep='\n')
result[0][2]   # get the last index of the 15

<class 'tuple'>
1
Tuple of arrays returned :  (array([ 4,  7, 11]),)
Elements with value 15 exists at following indices
[ 4  7 11]


11

In [26]:
'''np.where() accepts a condition and 2 optional arrys

np.where(condition[,x,y])

If only condition argument is given then it returns the indices of the elements which are 
TRUE in bool numpy array returned by condition. 
'''

boolarr = (arr==15)
boolarr

array([False, False, False, False,  True, False, False,  True, False,
       False, False,  True, False, False])

In [27]:
result = np.where(boolarr)  # Then it will return a tuple of arrays (one for each axis) containing indices where value was TRUE in given bool numpy array i.e.
result

(array([ 4,  7, 11]),)

In [21]:
'''
If element not found in array

'''
# If given element doesn't exist in the array then it will return an empty array
result = np.where(arr == 111)
 
print('Empty Array returned : ', result)
print("value 111 exists at following indices", result[0], sep='\n')

Empty Array returned :  (array([], dtype=int64),)
value 111 exists at following indices
[]


### Accessing the elements of arrray, indexing and slicing
NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple;they act similarly to Python lists.

The elements are accessed with square brackets [] with index (integer). 

In [22]:
print(x1[0], x1[1])  # print first two element

1 2


another way of accessing the same elements


In [30]:
print(x[:2])

[1. 2.]


In [31]:
print(x[0:2])

[1. 2.]


In [32]:
x

array([1., 2., 4., 5.])

In [33]:
x[1:]

array([2., 4., 5.])

In [34]:
x[-1]

5.0

In [24]:
x =np.array([1,2,4,5])

Note: Second index is excluded. ndarrays are mutable, here we change an element of the array.

In [25]:
x[0] = 10
print(x)

[10  2  4  5]


We replace the first element of x with 10. 

The *in* statement can be used to test if value is present in an array

In [36]:
2 in x

True

### How to create a multidimensional array. 

Two dimensional array are great for representing matrices which are often used in data science.


In [26]:
a = np.array([[10, 20, 30], [40, 50, 60]], float)
print(a)
print('Shape of a', a.shape)

[[10. 20. 30.]
 [40. 50. 60.]]
Shape of a (2, 3)


In [28]:
a.ndim

2

#### Find index of a value in 2D Numpy array 

In [29]:
# Create a 2D Numpy array from list of lists
arr2D = np.array([[11, 12, 13],
                [14, 15, 16],
                [17, 15, 11],
                [12, 14, 15]])

print(arr2D)

[[11 12 13]
 [14 15 16]
 [17 15 11]
 [12 14 15]]


In [30]:
# Let us find indices of the elements with vlaue 15 

res2D = np.where(arr2D==15)

print('Tuple of arrays returned : ', res2D)
print(list(zip(res2D[0],res2D[1])))

Tuple of arrays returned :  (array([1, 2, 3]), array([1, 1, 2]))
[(1, 1), (2, 1), (3, 2)]


In [31]:
### Find the indices of minimum. 
res2D = np.where(arr2D==np.min(arr2D))
res2D

print(list(zip(res2D[0],res2D[1])))

[(0, 0), (2, 2)]


With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [33]:
a = np.array([[10, 20, 30], [40, 50, 60]], float)
print(a)
a[1]

[[10. 20. 30.]
 [40. 50. 60.]]


array([40., 50., 60.])

Individual elements can be accessed recursively. But that is a bit too much work, you can pass comma separated list of indices to access individual elements. The first index is the row and second is the column number starting with zero.   

In [34]:
print(a[0,0])
print(a[0][0])

10.0
10.0


In [17]:
a = np.array([[10, 20, 30], [40, 50, 60], [70,80,90]], float)
a

array([[10., 20., 30.],
       [40., 50., 60.],
       [70., 80., 90.]])

In [10]:
print(a[:,1])  # To access whole column

[20. 50. 80.]


In [11]:
print(a[1,:])  # To access whole row.


[40. 50. 60.]


In [12]:
print(a[:2])

[[10. 20. 30.]
 [40. 50. 60.]]


As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a range of elements along an axis. It can be helpful to read the expression a[:2] as “select the first two rows of array.

You can pass multiple slices just like you can pass multiple indexes:

In [13]:
 a[:2,1:]   

array([[20., 30.],
       [50., 60.]])

In [14]:
a[0,0];a[1,1] # combine this in one statemnet

50.0

In [48]:
# Assigning to a slice expression assigns to the whole selection

In [18]:
a

array([[10., 20., 30.],
       [40., 50., 60.],
       [70., 80., 90.]])

In [None]:
a.shape
row = [0,1,2]
col = [0,1,2]
indxrow =[:2] = [0 1]
indxcol =[1:]= [1 2]

In [16]:
a[:2,1:]= 5  #[0,1] [1,2]  = [0,1] [0,2], [1,1], [1,2]
a

array([[10.,  5.,  5.],
       [40.,  5.,  5.],
       [70., 80., 90.]])

In [19]:
# Transpose a matrix
a.transpose()

array([[10., 40., 70.],
       [20., 50., 80.],
       [30., 60., 90.]])

In [22]:
# Transpose a matrix
print(a.transpose())
print('\n')
# Alternatively, this can also be perfromed with the .T property
print(a.T)

[[10. 40. 70.]
 [20. 50. 80.]
 [30. 60. 90.]]


[[10. 40. 70.]
 [20. 50. 80.]
 [30. 60. 90.]]


In [23]:
print(a)
a.swapaxes(0, 1)

[[10. 20. 30.]
 [40. 50. 60.]
 [70. 80. 90.]]


array([[10., 40., 70.],
       [20., 50., 80.],
       [30., 60., 90.]])

## Boolean Indexing

### Array indexing for changing elements

In [24]:
# create a 3x2 array
a_array = np.array([[11,12], [21, 22], [31, 32]])
print(a_array)

[[11 12]
 [21 22]
 [31 32]]


In [26]:
# create a filter which will be boolean values for whether each element meets this condition
a_filter = (a_array > 15)
print(a_filter)

[[False False]
 [ True  True]
 [ True  True]]


In [33]:
a_array[(a_array>11) & (a_array <22)]

array([12, 21])

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [27]:
# we can now select just those elements which meet that criteria
print(a_array[a_filter])

[21 22 31 32]


The Python keywords and and or do not work with boolean arrays. Use &(and) and | (or) instead 

## Reshaping arrays

In [35]:
# reshape() will reshape the array into the specified shape
a = np.arange(0,9)
print(a)
m = a.reshape(3,3)  # Change the array to size 3x3
print(m)

[0 1 2 3 4 5 6 7 8]
[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [36]:
# back to original 
reshaped = m.reshape(9)
print(reshaped)

[0 1 2 3 4 5 6 7 8]


In [37]:

# The .reshape method is not the only means of reoragnizing data. Another
# means is the .ravel() method that will flatten a matrix into one dimension

raveled = m.ravel
print(raveled)

# .flatten is like a ravel, but a copy of the data not a view into the source
m2 = np.arange(0,9).reshape(3,3)
flattened = m2.flatten()
flattened[0]= 1000
print(flattened)
print(m2)

<built-in method ravel of numpy.ndarray object at 0x10fb5e5d0>
[1000    1    2    3    4    5    6    7    8]
[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [57]:
c = m.resize((3,3))
print(c) 
# the above line will return None as resize modifies inplace 
# while reshape creates a new array 
print(m)

None
[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [58]:
# .resize() method functions similarly to the .reshape(0 method, except that while 
# reshaping returns a new array with the data copied into it, .resize() peforms 
# an in-place reshsping of the array

m2.resize(1,9)
m2

array([[0, 1, 2, 3, 4, 5, 6, 7, 8]])


Arrays can be reshaped using tuples that specify new dimensions. In the following example, we turn a ten-element one-dimensional array into a two-dimensional one whose first axis has five elements and whose second axis has two elements:

In [59]:
x = np.array(range(10), float)
print(x)
x = x.reshape((5, 2))
print(x)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[[0. 1.]
 [2. 3.]
 [4. 5.]
 [6. 7.]
 [8. 9.]]


Notice that the reshape function creates a new array and does not itself modify the original array. Keep in mind that Python's name-binding approach still applies to arrays. The copy function can be used to create a new, separate copy of an array in memory if needed:

In [38]:
a = np.array([1, 2, 3], float)
b = a
c = a.copy()
a[0] = 0
print(a)
print(b)
print(c)

[0. 2. 3.]
[0. 2. 3.]
[1. 2. 3.]


In [39]:
# good time to talk about shallow copy vs deep copy

# shallow copy
d = a
a[0] = -4
print(a)
print(d)
# note that d and a will have same values even though we modified only a

[-4.  2.  3.]
[-4.  2.  3.]


In [7]:
#CHECK IT
# Deep copy
e = a.copy()
a[0] = 3.0
print(a)
print(e)
# note that e and a will have different values

[3. 2. 3.]
[-4.  2.  3.]


Lists can also be created from arrays:

In [40]:
a = np.array([1, 2, 3], float)
print(type(a))
a1=a.tolist()
print(type(a1))

<class 'numpy.ndarray'>
<class 'list'>


One can fill an array with a single value

In [41]:
a = np.array([1, 2, 3], float)
print(a)
a.fill(0)
print(a)


[1. 2. 3.]
[0. 0. 0.]


In [42]:
# create a 2x2 array of zeros
x1 = np.zeros((2,2))
print(x1)

[[0. 0.]
 [0. 0.]]


In [43]:
# create a 2x2 array filled with 10.0
x1 = np.full((2,2), 10.0)
print(x1)

[[10. 10.]
 [10. 10.]]


In [45]:
# create a 2x2 matrix with the diagonal 1s and the others 0s
x = np.eye(2,2)
print(x)


[[1. 0.]
 [0. 1.]]


In [44]:
x1= np.identity(2)
print(x1)

[[1. 0.]
 [0. 1.]]


The *eye* function returns matrix with ones along the kth diagonal:

In [46]:
 np.eye(3, k=1, dtype=float)  # upper diagonal

array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

In [47]:
np.eye(3, k=-1, dtype=float)  # lower diagonal

array([[0., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.]])

Two or more arrays can be concatenated together using the concatenate function with a tuple of the arrays to be joined:

In [48]:
x = np.array([1,2], float)
y = np.array([3,4,5,6], float)
z = np.array([7,8,9], float)
np.concatenate((x, y, z))

array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

## Combing arrays

arrays can be combined in various ways. This process in Numpy is referred to as stacking. Stacking can take various forms, including horizontal, vertical, and depth-wise stacking. 

In [50]:
x = np.arange(9).reshape(3,3)
y = (x+1)*10
print(x)
print("\n")
print(y)
# Horizontal stacking 
np.hstack((x,y))   # equivalne to cbind in R



[[0 1 2]
 [3 4 5]
 [6 7 8]]


[[10 20 30]
 [40 50 60]
 [70 80 90]]


array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

In [None]:
[[0 1 2]
 [ 3 4 5]
 [6 7 8 ]
[ 10 20 30]
[ 40 50 60]
[70 80 90]]

In [51]:
# vertical stack 
np.vstack((x,y))   # equivalent to rbind in R
# There ic column_stack and row_stack also

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

If an array has more than one dimension, it is possible to specify the axis along which multiple arrays are concatenated. By default (without specifying the axis), NumPy concatenates along the first dimension, i.e., row

In [52]:
x = np.array([[10, 20], [30, 40]], float)
print(x)
print("\n")
y = np.array([[50, 60], [70,80]], float)
print(y)
print("\n")
print(np.concatenate((x,y), axis=0))  # equivalent to v stack or rbind in r
print("\n")
print(np.concatenate((x,y), axis=1))   # equivalent to hstack or cbind in r 

[[10. 20.]
 [30. 40.]]


[[50. 60.]
 [70. 80.]]


[[10. 20.]
 [30. 40.]
 [50. 60.]
 [70. 80.]]


[[10. 20. 50. 60.]
 [30. 40. 70. 80.]]


BRADCAST   DO IT LATER

Operations between differently sized array is called *broadcasting* 

In [54]:
 a = np.array([[1, 2], [3, 4], [5, 6]], float)
b = np.array([-1, 3], float)
print(a)
print('\n')
print(b)
print('\n')
print(a + b)

print(np.add(a,b))   ## we can use add funtion from numpy

[[1. 2.]
 [3. 4.]
 [5. 6.]]


[-1.  3.]


[[0. 5.]
 [2. 7.]
 [4. 9.]]
[[0. 5.]
 [2. 7.]
 [4. 9.]]


In [None]:
np.array([[-1.,  3.],
       [-1.,  3.],
[-1., 3.]])

In addition to the standard operators, NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. Among these are the functions: *abs, sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh,and arctanh*. The functions *floor, ceil, and rint* give the lower, upper, or nearest (rounded) integer:

In [57]:
np.sqrt(a)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ],
       [2.23606798, 2.44948974]])

In [56]:
np.ceil(np.sqrt(a))

array([[1., 2.],
       [2., 2.],
       [3., 3.]])

In [58]:
np.floor(np.sqrt(a))

array([[1., 1.],
       [1., 2.],
       [2., 2.]])

Also included in the NumPy module are two important mathematical constants: pi and exponent


In [None]:
np.pi


In [None]:
np.e

In [59]:
print(np.exp(a))

[[  2.71828183   7.3890561 ]
 [ 20.08553692  54.59815003]
 [148.4131591  403.42879349]]


### Array iteration

It is possible to iterate over arrays in a manner similar to that of lists:

In [60]:
 a = np.array([1, 4, 5]) 
for x in a:
    print (x)
    
    

1
4
5


For multidimensional arrays, iteration proceeds over the first axis such that each loop returns a subsection of the array:

In [62]:
a = np.array([[1, 2], [3, 4], [5, 6]])
print(a)
print('\n')
for i in a:
    print(i)

[[1 2]
 [3 4]
 [5 6]]


[1 2]
[3 4]
[5 6]


In [63]:
# Multiple assignment can also be used with array iteration:
for i, j in a:
    print(i)
    print(j)
    
        

1
2
3
4
5
6


## Array Operations/Mathematics
Arrays are important because they enable you to express batch operations on data without writing any for loops. This is known as *vectorization* operation. When standard mathematical operations are used with arrays, they are applied on an element- by-element basis. This means that the arrays should be the same size during addition, subtraction, etc.:

In [None]:
a = np.array([1,2,3], float)
b = np.array([5,2,6], float)
print(a + b)
print()
np.add(a,b)

In [None]:
a-b

print()

print(np.subtract(a,b)

In [None]:
a*b

print(np.multiply(a,b))

In [None]:
a/b

np.divide(a,b)

In [None]:
a**b

For two-dimensional arrays, multiplication remains elementwise and does not correspond to matrix multiplication. There are special functions for matrix math that we will cover later.

In [None]:
a = np.array([[1,2], [3,4]], float)
b = np.array([[2,0], [1,3]], float)
a * b


### Universal Functions: Fast element-wise array functions

Many functions exist for extracting whole-array properties. A universal function, or *ufunc* is a function that performs element wise operations on data in ndarrays. The items in an array can be summed or multiplied:

In [None]:
a = np.arange(10)
print("sum of elements:", a.sum())

print(" another way:")
print("product of elements:", np.sum(a))

In [None]:
a = np.array([2, 4, 3], float)
print("product of elements:", a.prod())

print(" another way:")
print("product of elements:", np.prod(a))

In [None]:
a = np.array([2, 4, 3], float)
print("product of elements:", a.prod())

print(" another way:")
print("product of elements:", np.prod(a))

In [None]:
print(a.mean())

print(" another way:")
print("mean:", np.mean(a))

In [None]:
print(a.min())

print(" another way:")
print("mean:", np.min(a))  

In [None]:
np.sqrt(a)

In [None]:
np.exp(a)

For multidimensional arrays, each of the functions thus far described can take an optional argument axis that will perform an operation along only the specified axis, placing the results in a return array:

In [64]:
a = np.array([[0, 2], [3, -1], [3, 5]])
print(a)

[[ 0  2]
 [ 3 -1]
 [ 3  5]]


In [65]:
print(a.min(axis = 0))   # 0 for column

print(" another way:")
print("min:", np.min(a, axis = 0))

[ 0 -1]
 another way:
min: [ 0 -1]


In [None]:
print(a.min(axis = 1))   # 1 for row

print(" another way:")
print("min:", np.min(a, axis = 1))

In [None]:
print(a.mean(axis = 1))   # 1 for row

print(" another way:")
print("mean:", np.mean(a, axis = 1))

In multidimensional arrays, accumulation functions like cumsum return an array of the same size, but with the partial aggregates computed along the indicated axis according to each lower dimensional slice:

In [None]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

In [None]:
arr.cumsum(axis=0)

In [None]:
arr.cumsum(axis=1)

In addition to the mean, var, and std functions, Numpy supplies several other methods for returning statistical features of arrays. The correlation coefficient for multiple variables observed at multiple instances can be found for arrays of the form [[x1, x2, ...], [y1, y2, ...], [z1, z2, ...], ...] where x, y, z are different observables and the numbers indicate the observation times:


In [None]:
a = np.array([[1, 2, 1, 3], [5, 3, 1, 8]])
print(a)
c = np.corrcoef(a)
print(c)

a = np.array([6, 2, 5, -1, 0])
sorted(a)
a.sort()
a


For two dimensional arrays, the diagonal can be extracted:

In [14]:
a = np.array([[1, 2], [3, 4]], float)
a
#a.diagonal()

#print(np.diagonal(a))
#np.sum??

array([[1., 2.],
       [3., 4.]])

### Sorting

In [None]:
arr = np.random.randn(6)
arr

In [None]:
# NumpPy arrays can be sorted in-place with teh *sort* method
arr.sort()
arr

You can sort each one-dimensional section of values in a multidimensional array in- place along an axis by passing the axis number to sort:

In [None]:
x = np.random.randn(4, 5)
x

In [None]:
x.sort(1)
x

#### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [None]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

# Now y is the following
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]
print(y)

This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing elementwise summation of x and vv. We could implement this approach like this:

In [None]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
print(vv)                 # Prints "[[1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]]"
y = x + vv  # Add x and vv elementwise
print(y)  # Prints "[[ 2  2  4
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [None]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)  # Prints "[[ 2  2  4]
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

The line y = x + v works even though x has shape (4, 3) and v has shape (3,) due to broadcasting; this line works as if v actually had shape (4, 3), where each row was a copy of v, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

For more details about broadcasting check <a href =  "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"> here  </a>

## Dot Product on Matrices and Inner Product on Vectors:

In [2]:
# determine the dot product of two matrices
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [None]:
# determine the inner product of two vectors
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

It is also possible to generate inner, outer, and cross products of matrices and vectors. For vectors, note that the inner product is equivalent to the dot product:

In [7]:
x1 = np.array([1, 4, 0])
y1 = np.array([2, 2, 1])

In [8]:
np.outer(x1,y1)

array([[2, 2, 1],
       [8, 8, 4],
       [0, 0, 0]])

In [None]:
np.inner(x1,y1)

In [None]:
np.cross(x1,y1)

NumPy also comes with a number of built-in routines for linear algebra calculations. These can be found in the sub-module linalg. Among these are routines for dealing with matrices and their inverses. The determinant of a matrix can be found:

In [9]:
a = np.array([[4, 2, 0], [9, 3, 7], [1, 2, 1]])

np.linalg.det(a)

-48.00000000000003

In [10]:
# inverse of a matrix
b = np.linalg.inv(a)
b

array([[ 0.22916667,  0.04166667, -0.29166667],
       [ 0.04166667, -0.08333333,  0.58333333],
       [-0.3125    ,  0.125     ,  0.125     ]])

In [11]:
(np.dot(a,b))

array([[ 1.00000000e+00,  0.00000000e+00, -2.22044605e-16],
       [ 0.00000000e+00,  1.00000000e+00,  0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00]])

One can find the eigenvalues and eigenvectors of a matrix

In [12]:
vals, vecs = np.linalg.eig(a)
print("eigen value:", "\n")
print(vals)
print("eigen vectrs:", "\n")
print(vecs)


eigen value: 

[ 8.85591316  1.9391628  -2.79507597]
eigen vectrs: 

[[-0.3663565  -0.54736745  0.25928158]
 [-0.88949768  0.5640176  -0.88091903]
 [-0.27308752  0.61828231  0.39592263]]


Singular value decomposition (analogous to diagonalization of a nonsquare matrix) can also be performed:

In [13]:
a = np.array([[1, 3, 4], [5, 2, 3]], float)
U, s, Vh = np.linalg.svd(a)
print(U, "\n")
print(s, "\n")
print(Vh, "\n")


[[-0.6113829  -0.79133492]
 [-0.79133492  0.6113829 ]] 

[7.46791327 2.86884495] 

[[-0.61169129 -0.45753324 -0.64536587]
 [ 0.78971838 -0.40129005 -0.46401635]
 [-0.046676   -0.79349205  0.60678804]] 



## Set Operations:


In [15]:
x1 = np.array(['John','Stacy','Ron'])
x2 = np.array(['Don','Mat','Ron'])
print(x1, x2)

['John' 'Stacy' 'Ron'] ['Don' 'Mat' 'Ron']


In [16]:
print( np.intersect1d(x1,x2))

['Ron']


In [17]:
print( np.union1d(x1,x2))

['Don' 'John' 'Mat' 'Ron' 'Stacy']


In [19]:
print( np.setdiff1d(x1,x2)) # elements in x1 and not in x2

['Don' 'Mat']


In [None]:
print( np.in1d(x1,x2))   # elements in x1 in x2

 It is possible to test whether or not values  are NAN ("not a number") or finite

In [20]:
a = np.array([1, np.NaN, np.Inf])
a

array([ 1., nan, inf])

In [21]:
np.isnan(a)

array([False,  True, False])

In [22]:
np.isfinite(a)

array([ True, False, False])

There are various logical operations also.

In [23]:
a = np.array([[6, 4], [5, 9]])

In [None]:
a>=5

In [24]:
a[a>=5]  # Returs the elements greater or equal to 5


# another way
indx = (a>=5)

a[indx]


array([6, 5, 9])

In [None]:
# Logical and operation  

a[np.logical_and(a>5, a<9)]

For multidimensional arrays, we have to send multiple one-dimensional integer arrays to the selection bracket, one for each axis. Then, each of these selection arrays is traversed in sequence: the first element taken has a first axis index taken from the first member of the first selection array, a second index from the first member of the second selection array, and so on.

In [29]:
a = np.array([[1, 4], [9, 16]], float)
print(a)
indx_row = np.array([0, 1], int)
indx_col = np.array([0, 1], int)
a[indx_row, indx_col]

[[ 1.  4.]
 [ 9. 16.]]


array([ 1., 16.])

In [30]:
a[[0,0,1], [0,1,0]]  # smart indexing

array([1., 4., 9.])

In [None]:
However, arrays that do not match in the number of dimensions will be 
broadcasted by Python to perform mathematical operations. 
This often means that the smaller array will be repeated as necessary 
to perform the operation indicated. Consider the following:

Errors are thrown if arrays do not match in size:

In [None]:
a = np.array([1,2,3], float)
b = np.array([4,5], float)
b+a

#### save Numpy Array to a CSV File using numpy.savetxt() 

In [None]:
'''
save 1D & 2D Numpy arrays in a CSV file with or without header and footer.

numpy.savetxt()


numpy.savetxt(fname, arr, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)


Arguments:

arr : 1D or 2D numpy array (to be saved)

fmt : A formatting pattern or sequence of patterns, that will be used while saving elements to file.
    If a single formatter is specified like ‘%d’ then it will be applied to all elements.
    In case of 2D arrays, a list of specifier i.e. different for each column. (Optional)

delimiter : String or character to be used as element separator (Optional)

newline : String or character to be used as line separator (Optional)

header : String to be written at the beginning of the txt file.

footer : String to be written at the end of the txt file.

comments : Custom comment marker , default is ‘#’. Will be pre-appended to the header and footer.

'''

# Create a Numpy array from list of numbers
arr = np.array([6, 1, 4, 2, 18, 9, 3, 4, 2, 8, 11])

np.savetxt('filenm.csv',[arr], delimiter = ',', fmt = '%d')

!ls -lh filenm.csv   # To see file wis there

!cat filenm.csv   # To show the contents of the file


'''
NOTE: Also not that if you don’t surround numpy array by this [] i.e. convert it to 
list while passing it to numpy.savetxt() then comma delimiter will not work, 
it will use ‘\n’ as delimiter by default. So, surrounding array by [] i.e. [arr] is must.

'''


'''
Save 1D Numpy array to csv file with Header and Footer
'''

np.savetxt('filenm.csv', [arr], delimiter=',', fmt='%d' , header='A Sample 1D Numpy Array :: Header', footer='This is footer')
!cat filenm.csv 

'''
Save 2D Numpy array to CSV File

'''

# Create a 2D Numpy array list of list
arr2D = np.array([[11, 12, 13, 22], [21, 7, 23, 14], [31, 10, 33, 7]])

np.savetxt('filenm2.csv', arr2D, delimiter=',', fmt='%d')
!cat filenm2.csv

In [None]:
!ls -lh filenm.csv

NumPy also includes generators for many other distributions, including the Beta, binomial, chi-square, Dirichlet, exponential, F, Gamma, geometric, Gumbel, hypergeometric, Laplace, logistic, log- normal, logarithmic, multinomial, multivariate, negative binomial, noncentral chi-square, noncentral F, normal, Pareto, Poisson, power, Rayleigh, Cauchy, student's t, triangular, von Mises, Wald, Weibull, and Zipf distributions. Here we only give examples for two of these.


NumPy contains many other built-in functions that we have not covered here. In particular, there are routines for discrete Fourier transforms, more complex linear algebra operations, size / shape / type testing of arrays, splitting and joining arrays, histograms, creating arrays of numbers spaced in various ways, creating and evaluating functions on grid arrays, treating arrays with special (NaN, Inf) values, set operations, creating various kinds of special matrices, and evaluating special mathematical functions (e.g., Bessel functions). You are encouraged to consult the NumPy documentation at http://docs.scipy.org/doc/ for more details.