# Numpy
-----
- https://www.numpy.org/
- Numerical Python, or "Numpy" for short, is a `foundational package` on which many of the most common `data science packages are built`.
- Numpy provides us with `high performance multi-dimensional arrays` which we can use as vectors or matrices.

### Additional Recommended Resources:
------------
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>


## 3. NumPy stands for Numerical Python
- Generally we will use numpy to store numbers.
- However it can be used to store any collection of `single datatype` values
- NumPy is a powerful **N-dimensional array object**


#### 3.1 How to identify given array is 1 d ??
- 1d array is nothing but `One Column/ Vector`
- In core python range function is equal to numpy arange function
- Using range ,we will get list as a output
- Using np.arange, we will get array as a output

 ## 1. Why we need to learn `Numpy list/array` ? though we have `python list` ?
 - Let us observe below `BMI` calculation
$BMI = \frac {weight\ in\ kgs}{height\ in\ meters^2} $
- It's one way to see if you're at a healthy weight.
- `Underweight`: Your BMI is **less than 18.5**. 
- `Healthy weight`: Your BMI is **18.5 to 24.9**.
- `Overweight`: Your BMI is **25 to 29.9**. 
- `Obese`: Your BMI is **30 or higher**

### 1.1 Using normal list to calculate BMI of 4 members? is it possible ?
- Normal list we learned in level 1: core python

In [1]:
import numpy as np

In [2]:
weights = [60,65,70,75]    # in Kgs
heights = [1.60, 1.65, 1.70,1.75]  # in meters

In [3]:
weights

[60, 65, 70, 75]

In [5]:
heights

[1.6, 1.65, 1.7, 1.75]

In [7]:
type(weights), type(heights)

(list, list)

In [8]:
bmi = weights/heights ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

This is the issue with list operations, will solve the same problem using numpy

In [10]:
npweights = np.array(weights)     # in Kgs
npheights = np.array(heights)  # in meters
type(npweights), type(npheights)
bmi = npweights/npheights ** 2
print(bmi)

[23.4375     23.87511478 24.22145329 24.48979592]


In [11]:
type(bmi)

numpy.ndarray

In [None]:
# define array containing list witin a list

In [12]:
alist = [  [11,22],[33,44],[55,66],[77,88],[99,00]   ]

In [13]:
type(alist)

list

In [14]:
arr = np.array(alist)

In [15]:
type(arr)

numpy.ndarray

In [16]:
arr

array([[11, 22],
       [33, 44],
       [55, 66],
       [77, 88],
       [99,  0]])

In [17]:
print(arr)

[[11 22]
 [33 44]
 [55 66]
 [77 88]
 [99  0]]


In [None]:
# How to access 99 here in this array

In [18]:
arr[4][0]

99

In [None]:
# Memory efficiency with numpy

In [25]:
py_list = list(range(200))

In [26]:
min(py_list),max(py_list),len(py_list)

(0, 199, 200)

In [27]:
np_list = np.array(list(range(200)))

In [28]:
min(np_list),max(np_list),len(np_list)

(0, 199, 200)

In [29]:
import sys
print(sys.getsizeof(py_list))
print(sys.getsizeof(np_list))

1912
896


In [30]:
# Please observe numpy takes hardly 896 compare to 1912 bytes for list

In [32]:
# Create an array with evenly spaced numer using linspace function

In [33]:
lsarr = np.linspace(0,5,10)

In [34]:
lsarr.ndim

1

In [35]:
print(np.round(lsarr))
print(np.round(lsarr,1))
print(np.round(lsarr,2))
print(np.round(lsarr,3))
print(lsarr)

[0. 1. 1. 2. 2. 3. 3. 4. 4. 5.]
[0.  0.6 1.1 1.7 2.2 2.8 3.3 3.9 4.4 5. ]
[0.   0.56 1.11 1.67 2.22 2.78 3.33 3.89 4.44 5.  ]
[0.    0.556 1.111 1.667 2.222 2.778 3.333 3.889 4.444 5.   ]
[0.         0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
 3.33333333 3.88888889 4.44444444 5.        ]


In [36]:
# Create identity matrix

In [38]:
eyearr1 = np.eye(4)    # Indetity matrix has all 1s at digonal , Here no of rows = no of columns
print(eyearr1)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [39]:
eyearr1.ndim, eyearr1.shape

(2, (4, 4))

In [42]:
eyearr2 = np.eye(2)
print(eyearr2) 
print(eyearr2.ndim)
print(eyearr2.shape)

[[1. 0.]
 [0. 1.]]
2
(2, 2)


In [46]:
np.random.rand(3,4)   # generating number between 0 to 1

array([[0.69342031, 0.12535851, 0.58327504, 0.57231308],
       [0.64937995, 0.02077915, 0.67111473, 0.8445612 ],
       [0.22374068, 0.8806988 , 0.62215728, 0.63801231]])

In [43]:
# Generate n random integer numbers using random.randint function 

randomarray = np.random.randint(1,100,10)

In [44]:
randomarray

array([43, 65, 67, 34, 97, 11, 38, 15, 52, 75])

In [45]:
np.random.randint?

In [48]:
# Functions on np array
print(randomarray.min())   
print(randomarray.max())
print(randomarray.argmax())


11
97
4


# Accessing element from np array using 2 ways
---
- slicing
- fancy
- Index

In [50]:
randomarray

array([43, 65, 67, 34, 97, 11, 38, 15, 52, 75])

In [52]:
randomarray[0:2]  # 2 is excluded , slice method

array([43, 65])

In [55]:
randomarray[5]             # Index method

11

In [56]:
randomarray[:6]        # print elements until 5 th index, excluding 6th element

array([43, 65, 67, 34, 97, 11])

In [57]:
randomarray[6:]    # prints element frmo 6th index onwards

array([38, 15, 52, 75])

In [59]:
randomarray[[2,5,3]]  # fancy method  

array([67, 11, 34])

In [62]:
# Lets define new array 
arr = np.arange(10)

In [63]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [64]:
arr[0:5] =100    # Assigning multiple element within an array in one go

In [65]:
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9])

In [66]:
arr[:] = 200 # replace all elements using 200

In [67]:
arr

array([200, 200, 200, 200, 200, 200, 200, 200, 200, 200])

In [68]:
# Now see the trick
#define main array having 10 elements
mainarr = np.arange(1,11)
mainarr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [69]:
# slice mainarray from 1st index to 4th index
mainarrslice = mainarr[1:5]
mainarrslice

array([2, 3, 4, 5])

In [70]:
mainarrslice[:] = 100
mainarrslice

array([100, 100, 100, 100])

In [73]:
mainarr            # It has updated original array as sliced array was extracted from main arr
##Both arrays - main and slice are changes that means when slicing is done, numpy does not create physical copy of an array

array([  1, 100, 100, 100, 100,   6,   7,   8,   9,  10])

In [76]:
# how to create physical copy o f an array?
arrcopy = mainarr.copy()

In [77]:
arrcopy

array([  1, 100, 100, 100, 100,   6,   7,   8,   9,  10])

In [78]:
mainarr

array([  1, 100, 100, 100, 100,   6,   7,   8,   9,  10])

In [79]:
arrcopy[:] =500

In [80]:
arrcopy

array([500, 500, 500, 500, 500, 500, 500, 500, 500, 500])

In [81]:
mainarr           # Main arra does not changes as physical copy was made and not slicing

array([  1, 100, 100, 100, 100,   6,   7,   8,   9,  10])

In [82]:
# How to reverse the np array
a = np.arange(1,11)

In [83]:
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [84]:
a[-1::-1]   # first method to reverse

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [85]:
a[::-1] # second method to reverse

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [86]:
a[-1:0:-1] # # third method to reverse

array([10,  9,  8,  7,  6,  5,  4,  3,  2])

In [None]:
a[-1:-11:-1] ## fourth way to reverse

In [None]:
# How to apply conditions on an array

In [87]:
arr = np.arange(1,10)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [88]:
arr > 5

array([False, False, False, False, False,  True,  True,  True,  True])

In [89]:
arr[arr > 5]       To get elements which satisfy the condition

array([6, 7, 8, 9])

In [90]:
# How to have multiple conditions using np array
arr[np.logical_and(arr>0, arr<6)]                # this requires logical anding of np array, this is special operation

array([1, 2, 3, 4, 5])

Numpy Operations

In [92]:
# Array with Array
# Array with Scalers
# Universal Array functions
#define array arr1,arr2 and have operations like +,-,* and /
# only condition is that both array should have same number of rows, columns

In [93]:
arr1 = np.arange(1,11)
arr2 = np.arange(11,21)

In [94]:
arr1

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [95]:
arr2

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

In [96]:
arr1+arr2

array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30])

In [97]:
arr2-arr1

array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])

In [98]:
arr1*arr2

array([ 11,  24,  39,  56,  75,  96, 119, 144, 171, 200])

In [99]:
# divide 2 arrays
arr2/arr1

array([11.        ,  6.        ,  4.33333333,  3.5       ,  3.        ,
        2.66666667,  2.42857143,  2.25      ,  2.11111111,  2.        ])

In [100]:
# find modulus of 2 array operation
arr2%arr1

array([0, 0, 1, 2, 0, 4, 3, 2, 1, 0], dtype=int32)

In [101]:
#Add any scaler to array
arr1+100

array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [102]:
arr1-100      # can apply for *,/,%, //

array([-99, -98, -97, -96, -95, -94, -93, -92, -91, -90])

In [103]:
arr1//2 # returns integer quotient

array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5], dtype=int32)

In [105]:
arr1/2

array([0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

In [106]:
arr1

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [107]:
# If both arrays size is differnt above operatiosn are not possible

In [108]:
a1=np.arange(1,10)
a2=np.arange(1,5)

In [109]:
print(a1)
print(a2)

[1 2 3 4 5 6 7 8 9]
[1 2 3 4]


In [110]:
a1+a2


ValueError: operands could not be broadcast together with shapes (9,) (4,) 

In [114]:
arr = np.arange(1,11)

In [115]:
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [116]:
arr ** 2   # Find poer of every element

array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100], dtype=int32)

In [117]:
# Find cube of every element
arr ** 3

array([   1,    8,   27,   64,  125,  216,  343,  512,  729, 1000],
      dtype=int32)

In [118]:
# Find sqrt of every element within an array
np.sqrt(arr)

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
       2.44948974, 2.64575131, 2.82842712, 3.        , 3.16227766])

In [125]:
print(np.max(arr))
print(np.min(arr))
print(np.median(arr))
print(np.std(arr))
print(np.sin(arr))
print(np.cos(arr))
print(np.tan(arr))
print(np.log(arr))
print(np.log2(arr))
print(np.log10(arr))
print(np.exp(arr))

10
1
5.5
2.8722813232690143
[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427 -0.2794155
  0.6569866   0.98935825  0.41211849 -0.54402111]
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362  0.28366219  0.96017029
  0.75390225 -0.14550003 -0.91113026 -0.83907153]
[ 1.55740772 -2.18503986 -0.14254654  1.15782128 -3.38051501 -0.29100619
  0.87144798 -6.79971146 -0.45231566  0.64836083]
[0.         0.69314718 1.09861229 1.38629436 1.60943791 1.79175947
 1.94591015 2.07944154 2.19722458 2.30258509]
[0.         1.         1.5849625  2.         2.32192809 2.5849625
 2.80735492 3.         3.169925   3.32192809]
[0.         0.30103    0.47712125 0.60205999 0.69897    0.77815125
 0.84509804 0.90308999 0.95424251 1.        ]
[2.71828183e+00 7.38905610e+00 2.00855369e+01 5.45981500e+01
 1.48413159e+02 4.03428793e+02 1.09663316e+03 2.98095799e+03
 8.10308393e+03 2.20264658e+04]


In [126]:
# 2D operations
# Define matrix 5,5 
mat = np.arange(1,26).reshape(5,5)
mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [127]:
# find out last row
mat[-1,:]

array([21, 22, 23, 24, 25])

In [128]:
# find out columnwise sum
np.sum(mat, axis=0)

array([55, 60, 65, 70, 75])

In [129]:
# find out row-wise sum
np.sum(mat, axis=1)

array([ 15,  40,  65,  90, 115])

In [132]:
# Sum complete matrix
print(np.sum(mat))
print(mat.sum())

325
325


In [133]:
# Satatistical operations

In [177]:
np_list = np.array(range(2,21,2))

In [135]:
np_list

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [179]:
np.mean(np_list)

11.0

In [180]:
np.median(np_list)

11.0

In [137]:
np.median([1,20,30,40,60])

30.0

In [138]:
np.median([1,20,30,40,60,70])

35.0

In [184]:
a_np_array= np.array([6, 3, 9, 6, 6, 3,3,5, 9, 3,6,1,1,1,1])

In [182]:
from scipy import stats

In [185]:
stats.mode(a_np_array)

ModeResult(mode=array([1]), count=array([4]))

# Standard Deviation

- https://www.mathsisfun.com/data/standard-deviation.html

    $\sigma = \sqrt{ \frac{1}{N}\sum^N_{i=1}(x_{i}-\mu)^2} $

- It is useful to find out `outliers`
- Square root of `variance`$ (\sigma^2) $ is `standard deviation`$(\sigma)$

In [143]:
np.std(np_list)

5.744562646538029

In [144]:
# Sorting

In [145]:
np.random.seed(1)
x = np.random.randn(10)

In [146]:
x

array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862,  0.86540763,
       -2.3015387 ,  1.74481176, -0.7612069 ,  0.3190391 , -0.24937038])

In [149]:
np.random.seed(1)       # Same result as of above because of seed
x = np.random.randn(10)
x

array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862,  0.86540763,
       -2.3015387 ,  1.74481176, -0.7612069 ,  0.3190391 , -0.24937038])

In [151]:
x.sort()  # default ascending order
x

array([-2.3015387 , -1.07296862, -0.7612069 , -0.61175641, -0.52817175,
       -0.24937038,  0.3190391 ,  0.86540763,  1.62434536,  1.74481176])

In [155]:
-np.sort(-x)        # descending order 

array([ 2.3015387 ,  1.07296862,  0.7612069 ,  0.61175641,  0.52817175,
        0.24937038, -0.3190391 , -0.86540763, -1.62434536, -1.74481176])

In [156]:
#2 D array sorting

In [1]:
aa = np.array([[1,4],[6,5]])

NameError: name 'np' is not defined

In [2]:
aa

NameError: name 'aa' is not defined

In [163]:
np.sort(aa)                # sort along the last axis

array([[1, 4],
       [1, 3]])

In [164]:
np.sort(aa, axis=None)     # sort the flattened array


array([1, 1, 3, 4])

In [165]:
np.sort(aa, axis=0)        # sort along the first axis

array([[1, 1],
       [3, 4]])

In [189]:
# Unique
array = np.array([10,20,10,40,20,10,20,40,20])
u = np.unique(array,return_counts=False)   # Return_counts is used for displaying freuency

In [190]:
u

array([10, 20, 40])

In [175]:
u[0]

array([10, 20, 40])

In [176]:
len(u[0])

3

In [188]:
u[0][0]

10

In [173]:
u

(array([10, 20, 40]), array([3, 4, 2], dtype=int64))

In [5]:
import numpy as np
a = np.array([1,2,3,'a','b','Pune',12.34,True,[10,20]])

In [6]:
a

array([1, 2, 3, 'a', 'b', 'Pune', 12.34, True, list([10, 20])],
      dtype=object)

In [7]:
type(a)

numpy.ndarray

In [9]:
type(a[3])

str

In [8]:
type(a[0])

int

In [10]:
a[0]+2

3