# NumPy Library

Fundamental data type is ndarray (N-dimensional array)
<br> NumPy arrays contain values of a single type

In [1]:
import numpy as np

np.array() function creates a numpy ndarray

In [2]:
data1=[6,7.5,8,0,1]

In [4]:
arr1=np.array(data1)

In [5]:
arr1

array([ 6. ,  7.5,  8. ,  0. ,  1. ])

In [6]:
arr1.dtype

dtype('float64')

In [7]:
data2=[[1,3,5],[2,4,8]]

In [8]:
arr2=np.array(data2)

In [9]:
arr2.dtype

dtype('int64')

Unless explicitly specified (more on this later), np.array tries to infer a good data type for the array that it creates

np.zeros(),np.ones creates of 0’s or 1’s, respectively, with a given length or shape

In [10]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [12]:
np.zeros((4,4))

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

You can define data types as you create ndarrays

In [14]:
arr1=np.array([1,2,3],dtype=float)

In [15]:
arr2=np.array([1,2,3],dtype=int)

In [16]:
arr1.dtype

dtype('float64')

In [17]:
arr2.dtype

dtype('int64')

### Changing the data type with astype function 

You can change type of the data if you wish.

In [18]:
arr=np.array([1,2,3,4,5])

In [19]:
arr.dtype

dtype('int64')

In [20]:
arr=arr.astype(float)

In [21]:
arr.dtype

dtype('float64')

### np.arange() function similar to range function creates array from a given interval

In [22]:
arr=np.arange(5,10)

In [23]:
arr

array([5, 6, 7, 8, 9])

np.dim returns This array attribute returns the number of array dimensions.
<br> np.shape attribute returns a tuple consisting of array dimensions

In [24]:
x=np.array([[1,2,3],[4,5,6]])

In [25]:
x.shape

(2, 3)

In [26]:
x.ndim

2

## np.append(array,values,axis)
<br> This function adds values at the end of an input array.
<br> The append operation is not in- place, a new array is allocated.

In [27]:
a=np.array([[1,2,3],[4,5,6]])

In [28]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [29]:
np.append(a,[[4,6,8]],axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [4, 6, 8]])

In [30]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [31]:
np.append(a,[[4],[8]],axis=1)

array([[1, 2, 3, 4],
       [4, 5, 6, 8]])

In [32]:
np.append(a,[4,6,8],axis=0)

ValueError: all the input arrays must have same number of dimensions

we got an error for the above script, because dimensions did not match

In [33]:
array=np.array([2,4,4])

In [34]:
array.shape

(3,)

In [35]:
array.ndim

1

In [36]:
array=np.array([[2,4,4]])

In [37]:
array.shape

(1, 3)

In [38]:
array.ndim

2

## Operations between Arrays and Scalars

In [4]:
a=np.arange(10);a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
a*5

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

if we wanted to do the same operation using list

In [59]:
l=list(range(10))

In [60]:
[x*5 for x in l]

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45]

In [61]:
b=np.arange(10)

In [62]:
a+b

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [10]:
mylist=[[1,2,3],[4,5,6],[7,8,9]];mylist

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

let's multiply every element of this mylist with 3

In [13]:
for i in range(len(mylist)):
    for j in range(len(mylist[0])):
        mylist[i][j]=mylist[i][j]*3

In [14]:
mylist

[[3, 6, 9], [12, 15, 18], [21, 24, 27]]

do the same operation using numpy

## Ndarray indexing with slices

In [3]:
a=np.arange(5,11);a

array([ 5,  6,  7,  8,  9, 10])

In [70]:
a[0],a[-1]

(5, 10)

In [71]:
a[0:3]

array([5, 6, 7])

In [74]:
arr2d=np.array([[1,2,3],[4,5,6],[7,8,9]])

In [75]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [76]:
arr2d[1,1]

5

In [77]:
arr2d[1:,1]

array([5, 8])

In [78]:
arr2d[1:,1:]

array([[5, 6],
       [8, 9]])

In [79]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [80]:
arr2d[:,1:]

array([[2, 3],
       [5, 6],
       [8, 9]])

 ### An important first distinction from lists is that array slices are views on the original array. 

This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [15]:
arr=np.arange(10);arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]:
arr_slice=arr[5:8]

In [17]:
arr_slice

array([5, 6, 7])

In [18]:
arr_slice[1]=1000

In [19]:
arr_slice

array([   5, 1000,    7])

let's check our original array

In [20]:
arr

array([   0,    1,    2,    3,    4,    5, 1000,    7,    8,    9])

let's assign all the elements of arr_slice as 64

In [21]:
arr_slice[:]=64

In [22]:
arr_slice

array([64, 64, 64])

In [23]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

### This feature is quite useful for processing large data sets.
When we work with large datasets, we can access and process pieces of these datasets
without the need to copy the underlying data buffer. 

## Creating copies of arrays 

• Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray.
<br>• This can be most easily done with the copy() method:

In [24]:
a=np.arange(20).reshape(4,5);a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [25]:
a_sub=a[:2,2:].copy();a_sub

array([[2, 3, 4],
       [7, 8, 9]])

In [26]:
a_sub[0,0]=100;a_sub

array([[100,   3,   4],
       [  7,   8,   9]])

check the array a

In [27]:
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

do the same without using copy function

In [28]:
a_sub=a[:2,2:];a_sub

array([[2, 3, 4],
       [7, 8, 9]])

In [29]:
a_sub[0,0]=100;a_sub

array([[100,   3,   4],
       [  7,   8,   9]])

In [30]:
a

array([[  0,   1, 100,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14],
       [ 15,  16,  17,  18,  19]])

# Comparisons, Masks and Boolean Logic

Boolean masks is useful when we want to extract, modify, count, or manipulate values in an array based on some criterion (comparison).
<br> • First we will see comparison operators.

In [32]:
x=np.arange(6);x

array([0, 1, 2, 3, 4, 5])

In [33]:
x<3

array([ True,  True,  True, False, False, False], dtype=bool)

In [34]:
x>=3

array([False, False, False,  True,  True,  True], dtype=bool)

In [35]:
result=_

In [36]:
result.dtype

dtype('bool')

In [37]:
y=np.arange(20).reshape(4,5);y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [38]:
y>5

array([[False, False, False, False, False],
       [False,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

## Using arrays as masks to select particular subsets of the data.
• Similar operations will be in PANDAS as well. 

In [39]:
y[y>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

### y[y>5] gets all the values in positions at which the mask array is True 

### Sum is often used as a means of counting True values in a boolean array:

In [45]:
x=np.arange(20)

In [46]:
(x<5).sum()

5

In [47]:
x<5

array([ True,  True,  True,  True,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False], dtype=bool)

## We can apply function on the data selected by boolean arrays.

In [48]:
np.sum(x<5)

5

In [49]:
np.sum(x[x<5])

10

In [50]:
x[x<5]

array([0, 1, 2, 3, 4])

## Boolean operators can compute the Truthness (True or False) by element-wise

In [52]:
x=np.arange(6);x

array([0, 1, 2, 3, 4, 5])

In [53]:
x>1

array([False, False,  True,  True,  True,  True], dtype=bool)

In [54]:
x<4

array([ True,  True,  True,  True, False, False], dtype=bool)

In [55]:
(x>1) & (x<4)

array([False, False,  True,  True, False, False], dtype=bool)

In [56]:
x>1 & x<4

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### When we remove paranthese, ”&” operator takes priority and the calculation  becomes x> (1&x)<4

## Using the Keywords and/or Versus the Operators &/|

• It is often confused to use ”and” and ”or” instead of ”&” and ”|”
<br>• The difference is this: and and or give the truth or falsehood of entire object, while & and | refer to bits within each object.
<br>• When you use and or or, it's equivalent to asking Python to treat the object as a single Boolean entity.

In [57]:
x=np.arange(10);x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
(x>4) & (x<8)

array([False, False, False, False, False,  True,  True,  True, False, False], dtype=bool)

In [59]:
(x>4) and (x<8)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

## Fancy indexing 

To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order

In [4]:
x=np.ones((8,4));x

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [14]:
y=np.arange(1,9).reshape(-1,1);y

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

In [15]:
z=x*y;z

array([[1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.],
       [8., 8., 8., 8.]])

In [16]:
z[[4,3,0]]

array([[5., 5., 5., 5.],
       [4., 4., 4., 4.],
       [1., 1., 1., 1.]])

try to get columns as well

In [22]:
z[[4,3,0],[1,2,3]]

array([5., 4., 1.])

note that with these we got intersections, not a sub matrix

### we can get a list of rows and columns using np.ix_ function 

In [24]:
z[np.ix_([4,3,0],[1,2,3])]

array([[5., 5., 5.],
       [4., 4., 4.],
       [1., 1., 1.]])

## Universal Functions: Fast Element-wise Array Functions

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarray

In [25]:
a=np.arange(10)

In [26]:
np.sqrt(a)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [27]:
np.exp(a)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

## Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the expression x if condition else y.

In [28]:
a=np.random.randint(0,10,(5,5));a

array([[9, 4, 3, 0, 6],
       [8, 5, 2, 1, 2],
       [8, 9, 2, 7, 8],
       [6, 4, 8, 0, 1],
       [5, 7, 3, 7, 3]])

In [30]:
np.where(a>5,1,2)

array([[1, 2, 2, 2, 1],
       [1, 2, 2, 2, 2],
       [1, 1, 2, 1, 1],
       [1, 2, 1, 2, 2],
       [2, 1, 2, 1, 2]])

## If only condition is given, return the tuple condition.nonzero(), the indices where condition isTrue.

In [41]:
a=np.arange(10)

In [42]:
result=np.where(a>7)

In [43]:
result

(array([8, 9]),)

In [44]:
a[result]

array([8, 9])

# Mathematical and Statistical Methods

A set of mathematical functions which compute statistics about an entire array or about the data along an axis are accessible as array methods.

In [49]:
x=np.ones((4,4));x

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

let's multiply second row with 2, third row with 3 and fourth row with 4

type x.mean() see what happens

In [None]:
x.mean()

x.mean(axis=0) does mean operation along the rows

In [None]:
x.mean(axis=0)

x.mean(axis=1) does mean operation along the columns

x.sum() works similary, try x.sum(axis=0) x.sum(axis=1)

# Dealing with NA values (missing data)

Np.nan is a floating point number

In [51]:
a=np.arange(20).reshape(4,5);a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [52]:
a[0,0]=np.nan

ValueError: cannot convert float NaN to integer

In [53]:
a.dtype

dtype('int64')

In [54]:
a=a.astype(float)

In [55]:
a[0,0]=np.nan;a

array([[nan,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]])

# NaN-safe methods

In [56]:
a

array([[nan,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]])

In [58]:
np.sum(a,axis=0)

array([nan, 34., 38., 42., 46.])

In [59]:
np.nansum(a,axis=0)

array([30., 34., 38., 42., 46.])

-
<img src="attachment:Screen%20Shot%202018-10-28%20at%2021.03.39.png" width="600" height="300" >

# Sorting

array.sort() sorts an array in place.
<br>np.sort(array) returns a sorted copy of an array instead of modifying the array in place. <br>We can sort values along rows or columns

In [70]:
x=np.array([[4, 5, 8, 4, 0],
       [3, 7, 8, 2, 5],
       [8, 1, 1, 2, 1],
       [4, 3, 1, 2, 4],
       [6, 7, 0, 2, 9]])

In [71]:
x

array([[4, 5, 8, 4, 0],
       [3, 7, 8, 2, 5],
       [8, 1, 1, 2, 1],
       [4, 3, 1, 2, 4],
       [6, 7, 0, 2, 9]])

In [72]:
x.sort(0);x

array([[3, 1, 0, 2, 0],
       [4, 3, 1, 2, 1],
       [4, 5, 1, 2, 4],
       [6, 7, 8, 2, 5],
       [8, 7, 8, 4, 9]])

In [73]:
x=np.array([[4, 5, 8, 4, 0],
       [3, 7, 8, 2, 5],
       [8, 1, 1, 2, 1],
       [4, 3, 1, 2, 4],
       [6, 7, 0, 2, 9]])

In [74]:
np.sort(x,axis=0)

array([[3, 1, 0, 2, 0],
       [4, 3, 1, 2, 1],
       [4, 5, 1, 2, 4],
       [6, 7, 8, 2, 5],
       [8, 7, 8, 4, 9]])

In [75]:
x

array([[4, 5, 8, 4, 0],
       [3, 7, 8, 2, 5],
       [8, 1, 1, 2, 1],
       [4, 3, 1, 2, 4],
       [6, 7, 0, 2, 9]])

## Indirect Sorts: argsort

Argsort : Returns the indices that would sort an array.

In [76]:
v=np.array([4,0,1,5,2])

In [77]:
t=np.argsort(v);t

array([1, 2, 4, 0, 3])

In [78]:
v[t]

array([0, 1, 2, 4, 5])

## Saving and Loading Text Files

In [81]:
x=np.random.randint(0,100,(10,5));x

array([[93, 21, 25, 43, 98],
       [87, 53, 11, 25, 96],
       [ 7, 36, 77,  9, 16],
       [90, 96, 80, 77, 57],
       [35, 54,  4, 20, 89],
       [15, 52,  0, 85, 85],
       [80, 44, 96, 46, 76],
       [71,  6, 63,  4, 99],
       [35, 50, 50, 11,  9],
       [50, 53, 38, 96, 29]])

In [82]:
np.savetxt("random.txt",x,delimiter=",")

check out the random.txt file in your working directory

Default format fmt='%.18e'

you can change the format as you wish

In [84]:
np.savetxt("random.txt",x,delimiter=",",fmt="%s")

# Example

In this example, we will investigate the sales profile of a certain production to different customers.
<br> Suppose we have 1000 customers and we have the sales amount of the product for 30 days.
<br> First, construct two-dimensional array for this hypothetical example.

In [85]:
data=np.random.randint(0,10,(30,1000));data

array([[7, 8, 5, ..., 3, 1, 9],
       [9, 0, 5, ..., 8, 5, 4],
       [3, 0, 4, ..., 6, 6, 5],
       ...,
       [5, 0, 3, ..., 1, 2, 4],
       [7, 8, 0, ..., 9, 0, 5],
       [2, 1, 2, ..., 1, 2, 4]])

## calculate the total amount of sales for each day.

In [90]:
%time totalsales=data.sum(axis=1)

CPU times: user 94 µs, sys: 19 µs, total: 113 µs
Wall time: 81.1 µs


## Let’s calculate the total sales without using numpy:

In [87]:
def totaldata(data):
    [m,n]=data.shape
    total_sales=[]
    for i in range(m):
        sales=0
        for j in range(n):
            sales += data[i,j]
        total_sales.append(sales)
    return total_sales


In [89]:
%time totalsales=totaldata(data)

CPU times: user 7.92 ms, sys: 103 µs, total: 8.02 ms
Wall time: 8.01 ms


## Let’s count the number of customers who did not buy the product for each day.
• We will simply count zeros at every row.

In [102]:
%time zerosales=np.where(data==0,True,False).sum(axis=1)

CPU times: user 423 µs, sys: 230 µs, total: 653 µs
Wall time: 399 µs


In [103]:
zerosales

array([ 86, 119, 116, 108,  94,  92,  86,  85,  96, 104, 110, 104, 103,
       115, 119, 102,  99, 110,  99, 108, 107, 101,  99, 105,  99,  96,
       107, 100, 106,  98])

In [100]:
def countzeros(data):
    [m,n]=data.shape
    zeros=[]
    for i in range(m):
        zero=0
        for j in range(n):
            if data[i,j] == 0:
                zero += 1
        zeros.append(zero)
    return zeros
    

In [104]:
%time zerosales=countzeros(data)

CPU times: user 11.8 ms, sys: 238 µs, total: 12 ms
Wall time: 12.2 ms


In [105]:
zerosales

[86,
 119,
 116,
 108,
 94,
 92,
 86,
 85,
 96,
 104,
 110,
 104,
 103,
 115,
 119,
 102,
 99,
 110,
 99,
 108,
 107,
 101,
 99,
 105,
 99,
 96,
 107,
 100,
 106,
 98]

## Let’s find the customer who has the highest standard deviation of the orders.

In [108]:
customer_std=np.std(data,axis=0)

In [109]:
customer_std.shape

(1000,)

In [110]:
indices=np.argsort(customer_std)

In [111]:
indices[-1]

454

customer with index 454 has the highest sales fluctuations, note that you will have different result since data array is constructured randomly!