#### What is NumPy?

It stands for "Numerical Python". NumPy is a Python module that provides fast and efficient array operations of homogeneous data. The central feature of NumPy is the array object class, also called the ndarray. Arrays are very similar to lists in Python, except that every element of an array must be of the same type (in lists you can hold data which have different types), typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.

#### Creating NumPy arrays

The syntax of creating a NumPy array is:

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

Here, the arguments

    object: Any object exposing the array interface
    dtype: Desired data type of array, optional
    copy: Optional. By default (true), the object is copied
    order: C (row major) or F (column major) or A (any) (default)
    subok: By default, returned array forced to be a base class array. If true, sub-classes passed through
    ndim: Specifies minimum dimensions of resultant array


In [1]:
#importing numpy library
import numpy as np

In [2]:
#creating 1D array
a = np.array([1,2,3,4])
#creating 2D array
b = np.array([[1,2,3,4],[5,6,7,8]])
print(a)
print(b)

[1 2 3 4]
[[1 2 3 4]
 [5 6 7 8]]


### Attributes of numpy arrays

In [3]:
# Shape: returns a tuple consisting of array dimensions
print(a.shape)
print(b.shape)

(4,)
(2, 4)


In [4]:
#dimension: returns a number that describes no of dimesions of array
print(a.ndim)
print(b.ndim)

1
2


In [5]:
#Size: total number of items in the array
print(a.size)
print(b.size)

4
8


In [6]:
#Datatype: name of the datatype that is stored in array
print(a.dtype)
print(b.dtype)

int32
int32


In [7]:
#itemsize: the memory consumed by the array
print(a.itemsize)
print(b.itemsize)

4
4


In [8]:
#creating an array of 1st 10 natural numbers
#please note arange function of numpy is similar to range function in python
#arange(startpoint=0, endpoint, increment=1)
x = np.arange(1,11)
print(x)

[ 1  2  3  4  5  6  7  8  9 10]


In [9]:
#Changing the shape of the array, basically changing the dimension of the array
x.reshape(5,2)

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

In [10]:
lst = [1,2.3, "abc", True]
y = np.array(lst)
print(y)

['1' '2.3' 'abc' 'True']


### Techniques of creating arrays

In [11]:
#1. creating an empty array
a1 = np.empty((3,2), dtype='int32')
print("a1:" ,a1)


a1: [[-1808162208         536]
 [          0           0]
 [          1           0]]


In [12]:
# creating an array with all zeroes
a2 = np.zeros((3,4), dtype='int32')
print("a2:" ,a2)


a2: [[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]


In [13]:
#creating an array with all ones
a3 = np.ones((2,4), dtype="int32")
print("a3:" ,a3)

a3: [[1 1 1 1]
 [1 1 1 1]]


In [14]:
#creating an array filled with constants
a4 = np.full((2,2), 7)
print("a14:" ,a4)

a14: [[7 7]
 [7 7]]


In [15]:
#creating a 2-D array with ones on diagonal and zeros elsewhere
a5 = np.eye(3)
print("a5:" ,a5)

a5: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [16]:
#creating an array similar to number line
# here argument num essentially represent number of elements in sequence , elements in sequence are natuarlly equally spaced
#linspace(startpoint, endpoint, number of items)
a6 = np.linspace(1,3,5)
print("a6 : ", a6)

a6 :  [1.  1.5 2.  2.5 3. ]


#### Using random to generate arrays
we can also create new arrays by randomly sampling an original array, or simply creating an array with random numbers using random package within numpy

In [17]:
#creating an array using random function
np.array(np.random.randn(5), dtype='int')

array([ 0, -1, -1,  0,  0])

In [18]:
np.array(np.random.randn(5))

array([ 0.83002409,  2.05216501,  1.61424837, -0.50473902,  1.08407034])

In [19]:
np.random.randint(high=25,low=1,size=(2,3))

array([[10, 15,  6],
       [ 6, 23, 14]])

In [20]:
np.random.randint(high=10,low=1,size=(2,))

array([6, 2])

In [21]:
np.random.random(size=(3,4))

array([[0.84910922, 0.02969037, 0.83822113, 0.94292869],
       [0.25335781, 0.4024361 , 0.43998216, 0.17139885],
       [0.1632956 , 0.38433853, 0.43934198, 0.92716153]])

In [22]:
np.random.choice(['a','b'],6)

array(['a', 'b', 'a', 'a', 'b', 'a'], dtype='<U1')

In [23]:
import string
y=np.random.choice(['a','b'],1000)
print(y)

['b' 'b' 'b' 'b' 'b' 'b' 'b' 'b' 'a' 'b' 'a' 'a' 'b' 'a' 'b' 'b' 'a' 'b'
 'a' 'a' 'b' 'a' 'b' 'a' 'a' 'b' 'a' 'a' 'b' 'b' 'b' 'a' 'b' 'b' 'b' 'b'
 'b' 'b' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'b' 'b' 'a' 'a' 'b' 'b' 'a' 'a' 'a'
 'a' 'a' 'a' 'b' 'a' 'b' 'b' 'b' 'a' 'b' 'a' 'b' 'a' 'a' 'b' 'b' 'a' 'a'
 'a' 'b' 'b' 'a' 'a' 'a' 'b' 'a' 'a' 'a' 'b' 'a' 'a' 'a' 'b' 'b' 'b' 'a'
 'b' 'a' 'b' 'b' 'a' 'a' 'b' 'b' 'a' 'a' 'b' 'b' 'a' 'b' 'a' 'b' 'b' 'b'
 'b' 'a' 'a' 'b' 'a' 'a' 'b' 'b' 'a' 'b' 'a' 'a' 'a' 'a' 'b' 'a' 'a' 'a'
 'b' 'a' 'a' 'b' 'b' 'b' 'b' 'b' 'a' 'b' 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b'
 'b' 'b' 'a' 'a' 'b' 'a' 'a' 'a' 'a' 'b' 'a' 'a' 'b' 'b' 'a' 'a' 'b' 'b'
 'a' 'a' 'b' 'b' 'b' 'a' 'b' 'b' 'a' 'a' 'a' 'a' 'b' 'a' 'a' 'a' 'a' 'b'
 'b' 'b' 'a' 'b' 'b' 'a' 'a' 'b' 'b' 'b' 'b' 'a' 'a' 'a' 'b' 'b' 'a' 'a'
 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'b' 'b' 'b' 'b' 'a' 'a' 'b' 'a' 'a' 'a' 'a'
 'b' 'b' 'b' 'a' 'b' 'b' 'b' 'b' 'a' 'a' 'a' 'a' 'a' 'a' 'b' 'a' 'b' 'b'
 'b' 'b' 'b' 'b' 'b' 'b' 'b' 'a' 'a' 'b' 'b' 'a' 'a

In [24]:
np.unique(y,return_counts=True)

(array(['a', 'b'], dtype='<U1'), array([494, 506], dtype=int64))

In [25]:
# by default here both a and b get picked with almost equal probability in the random sample. 
# Although this doesnt mean that individual values are not random , but overall percentage of a and b remains almost same. 
# We can force approx probability structure by using option p
y=np.random.choice(['a','b'],1000,p=[0.8,0.2])
np.unique(y,return_counts=True)

(array(['a', 'b'], dtype='<U1'), array([796, 204], dtype=int64))

In [26]:
#creating an array with float type
np.array([1,2,3,4], dtype='float')

array([1., 2., 3., 4.])

In [27]:
#creating multi dimensional arrays
np.arange(10,20).reshape(2,5)

array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [28]:
#converting 2D array into 1D
np.array([[2,3,4],[5,6,7]]).reshape(6,)

array([2, 3, 4, 5, 6, 7])

In [29]:
#creating 3D array
np.random.randn(24).reshape(2,3,4)

array([[[-0.47168734,  0.94813016, -0.63469878, -1.41932391],
        [-0.20449079, -0.70269743, -0.77716613, -0.44260528],
        [ 1.12026386,  1.09042916, -1.91152801, -0.46076283]],

       [[-0.3379365 , -1.01894509,  0.30727233,  0.11672303],
        [-0.63363799, -1.36203869,  0.1511238 , -1.25421237],
        [-0.13300937, -0.826911  ,  0.98157362, -1.44472317]]])

### Indexing and Slicing

In [30]:
# Like Python lists, index starts at 0 for arrays as well. 
# array[startpoint, endpoint, increment]
i1 = np.arange(10,20).reshape(5,2)
print(i1)

[[10 11]
 [12 13]
 [14 15]
 [16 17]
 [18 19]]


In [31]:
i1[1]

array([12, 13])

In [32]:
i1[1,1]

13

In [33]:
i1[1][1]

13

In [34]:
i1[2:4]

array([[14, 15],
       [16, 17]])

In [35]:
#slicing the internal array along with external array
i1[2:4,0:2]

array([[14, 15],
       [16, 17]])

In [36]:
#slicing the internal array along with external array
i1[2:4, 1]

array([15, 17])

In [37]:
#slicing as per index number
i1[[0,2]]

array([[10, 11],
       [14, 15]])

In [38]:
#slicing as per index number and subsequent inner slicing
i1[[0,2],1]

array([11, 15])

In [39]:
x = np.arange(10,40,2).reshape(3,5)

In [40]:
for i in range(len(x)):
    print(x[[i],[1,3]])

[12 16]
[22 26]
[32 36]


In [41]:
np.array([x[[i],[1,3]] for i in range(len(x))])

array([[12, 16],
       [22, 26],
       [32, 36]])

### Boolean indexing

In [42]:
#return True or False for every element
i1 > 15

array([[False, False],
       [False, False],
       [False, False],
       [ True,  True],
       [ True,  True]])

In [43]:
#display elements as per condition
i1[i1>15]

array([16, 17, 18, 19])

In [44]:
#array of even numbers
i1[i1 % 2 == 0]

array([10, 12, 14, 16, 18])

### Vectorization

Vectorization is the ability of NumPy by which we can perform operations on entire arrays rather than on a single element.

In [45]:
#addition
x = np.array([1,2,4])
y = np.array([0,1,2])

In [46]:
x+10

array([11, 12, 14])

In [47]:
np.add(x,5)

array([6, 7, 9])

In [48]:
np.add(x,y)

array([1, 3, 6])

In [49]:
#subtraction
np.subtract(x,y)

array([1, 1, 2])

In [50]:
#multiplication
np.multiply(x,y)

array([0, 2, 8])

In [51]:
#division
np.divide(x, 2)

array([0.5, 1. , 2. ])

In [52]:
# Square root transformation
np.sqrt(x)

array([1.        , 1.41421356, 2.        ])

In [53]:
#Log transformation
np.log(x)

array([0.        , 0.69314718, 1.38629436])

### Aggregate Functions

In [54]:
y = np.arange(10,20).reshape(2,5)
print(y)

[[10 11 12 13 14]
 [15 16 17 18 19]]


In [55]:
#total sum of each items in the array
y.sum()

145

In [56]:
#column wise sum
y.sum(axis=0)

array([25, 27, 29, 31, 33])

In [57]:
#row wise sum
y.sum(axis=1)

array([60, 85])

In [58]:
#maximum of all
y.max()

19

In [59]:
#column wis max
y.max(axis=0)

array([15, 16, 17, 18, 19])

In [60]:
# row wise max
y.max(axis=1)

array([14, 19])

In [61]:
#minimum of all
y.min()

10

In [62]:
#mean
y.mean()

14.5

In [63]:
#median
np.median(y)

14.5

In [64]:
#variance
y.var()

8.25

In [65]:
#standard devation
np.std(y)

2.8722813232690143

### Math functions


In [66]:
np.sin(90)

0.8939966636005579

In [67]:
np.sqrt(144)

12.0

In [68]:
np.cos(90)

-0.4480736161291701

In [69]:
np.log(10)

2.302585092994046

In [70]:
np.log10(10)

1.0

### Array Comparision

In [71]:
np.array_equal(x,y)

False

### Broadcasting

In any dimension where one array had size 1 and the other array had a size greater than 1, the first array behaves as if it were copied along that dimension

In [72]:
a1 = np.array([[1,2],[3,4]])
a2 = np.array([2,3])

In [73]:
a1

array([[1, 2],
       [3, 4]])

In [74]:
a2

array([2, 3])

In [75]:
a1 + a2

array([[3, 5],
       [5, 7]])

### Cross product vs Dot product

In [76]:
a1 = np.array([[1,2],[3,4]])
a2 = np.array([[5,6],[7,8]])

In [77]:
a1

array([[1, 2],
       [3, 4]])

In [78]:
a2

array([[5, 6],
       [7, 8]])

In [79]:
#cross product
a1 * a2

array([[ 5, 12],
       [21, 32]])

In [80]:
#dot product
np.dot(a1, a2)

array([[19, 22],
       [43, 50]])

###  IQR, lower whisker and upper whisker using numpy

https://drive.google.com/open?id=1Kp1wACkgsFzHrU1GjJ9YzxI4vodtzOjz

In [81]:
a = np.array([1,1,2,2,2,2,2,3,3,3,4,4,5,6,7,8,9, 15, 20])

In [82]:
a

array([ 1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  4,  4,  5,  6,  7,  8,  9,
       15, 20])

In [83]:
np.mean(a)

5.2105263157894735

In [84]:
np.median(a)

3.0

In [85]:
# calculating quantile at 25%
q1 = np.quantile(a, 0.25)

In [86]:
q1

2.0

In [87]:
# calculation quantile at 75%
q3 = np.quantile(a, 0.75)

In [88]:
q3

6.5

In [89]:
iqr = q3-q1

In [90]:
iqr

4.5

In [91]:
lower_whisker = q1 - 1.5*iqr

In [92]:
lower_whisker

-4.75

In [93]:
upper_whisker = q3 + 1.5*iqr

In [94]:
upper_whisker

13.25

### Reducing Skewness

According to Wikipedia, "In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean."

https://drive.google.com/file/d/1FXuLbFaJ5-TFfTfiFJHDBeAYT3CgyqaK/view?usp=sharing

    Normally skewed falls in the range -0.5 to 0.5
    Negatively skewed is < -0.5
    Positively skewed is > 0.5

In [95]:
x = np.array([1,2,2,3,3,3,4,4,16,20,25])

In [96]:
x

array([ 1,  2,  2,  3,  3,  3,  4,  4, 16, 20, 25])

In [97]:
np.mean(x)

7.545454545454546

In [98]:
np.median(x)

3.0

In [99]:
from scipy.stats import skew
skew(x)
# this data is positively skewed as the skew value > 0.5

1.1712731595847037

In [100]:
# Square root transformation
x1 = np.sqrt(x)

In [101]:
skew(x1)

0.9630668329521322

In [102]:
# Cube root transformation
x2 = np.cbrt(x)

In [103]:
skew(x2)

0.8614367405966747

In [104]:
#log transformation
x3 = np.log(x)

In [105]:
skew(x3)

0.5583822828505749

### More functions

In [106]:
#The copy function can be used to create a new, separate copy of an array in memory if needed
a=np.array([1,2,3])

In [107]:
a

array([1, 2, 3])

In [108]:
b=a

In [109]:
b

array([1, 2, 3])

In [110]:
a[0] = 100

In [111]:
a

array([100,   2,   3])

In [112]:
b

array([100,   2,   3])

In [113]:
c = a.copy()

In [114]:
a[0] = 1000

In [115]:
a

array([1000,    2,    3])

In [116]:
b

array([1000,    2,    3])

In [117]:
c

array([100,   2,   3])

In [118]:
#Transposed versions of arrays can also be generated, which will create a new array with the final two axes switched: 
y = np.arange(10,20).reshape(2,5)
y

array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [119]:
y.transpose()

array([[10, 15],
       [11, 16],
       [12, 17],
       [13, 18],
       [14, 19]])

In [120]:
#One-dimensional versions of multi-dimensional arrays can be generated with flatten
y.flatten()

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [121]:
# Two or more arrays can be concatenated together using the concatenate function with a tuple of the arrays to be joined
a = np.array([1,2])
b = np.array([3,4,5])
c = np.array([5,6,7])
#concatenate works with same dimension
np.concatenate((a,b,c))

array([1, 2, 3, 4, 5, 5, 6, 7])

In [122]:
#If an array has more than one dimension, it is possible to specify the axis along which multiple arrays are concatenated.
#By default (without specifying the axis), NumPy concatenate along the first dimension:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
np.concatenate((x,y))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [123]:
np.concatenate((x,y), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [124]:
np.concatenate((x,y), axis=1)

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [125]:
#sorting techniques
x=np.random.randint(high=100,low=2,size=(15,))
x

array([90, 82, 63, 20, 24, 70,  4, 71, 84, 95,  4, 13, 75, 12, 37])

In [126]:
x.sort()
x

array([ 4,  4, 12, 13, 20, 24, 37, 63, 70, 71, 75, 82, 84, 90, 95])

In [127]:
x=np.random.randint(high=100,low=12,size=(2,3))
x

array([[98, 26, 98],
       [23, 73, 87]])

In [128]:
x.sort()
x

array([[26, 98, 98],
       [23, 73, 87]])