# Numpy
## A practical Numpy tutorial for Data Science by Ehsan Mokhtari
### I used 'Python Data Science Handbook by Jake VanderPlas' as a reference for this tutorial

all data fundamentally as arrays of numbers. No matter what the data
are, the first step in making them analyzable will be to transform them into arrays of
numbers. tools that Python has for handling such numerical arrays: the NumPy package
and the Pandas package. NumPy (short for Numerical Python) pro‐
vides an efficient interface to store and operate on dense data buffers. In some ways,
NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much
more efficient storage and data operations as the arrays grow larger in size. NumPy
arrays form the core of nearly the entire ecosystem of data science tools in Python, so
time spent learning to use NumPy effectively will be valuable no matter what aspect
of data science interests you.

simply use 'pip install numpy' in your shell script to install the numpy.
else use 'pip install numpy --upgrade' for updating numpy.

In [1]:
import numpy as np
print(np.__version__)

1.25.0


#### Storing different data types in variables

In [2]:
a = 5
b = 'hello'
c = 54.3
d = True
e = [5,7,'yes',False]

## <font color= red>The Basics of NumPy Arrays</font>

### Creating Arrays from Python Lists


In [3]:
import numpy as np
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [4]:
np.array([[1, 4, 2, 5, 3],[2,3,4,5,6]]) #2-d array

array([[1, 4, 2, 5, 3],
       [2, 3, 4, 5, 6]])

In [5]:
# create using default data type
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

### creating from scratch

In [6]:
np.zeros(10, dtype=int) #numpy array with 10 elements all set to 0

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
np.ones((3, 5), dtype=np.int32) # 2-d array with 3 rows and 5 cols all set to 1
#dtype = np.int8~64  np.float np.bool  np.complex  np.string_ 

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [8]:
np.full((3, 5), 7) #the same 2-d python all set to default value we set

array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

In [9]:
# Starting at 0, ending at 10, stepping by 3
np.arange(0, 10, 3)

array([0, 3, 6, 9])

In [10]:
# Create an array of ten values evenly spaced between 0 and 1
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [11]:
np.random.random((3)) #1-d array with 3 random values between 0 and 1 

array([0.78402642, 0.04403597, 0.71252067])

In [12]:
np.random.randint(0, 10, (3)) #the same array but this time the random range is between 0 and 10

array([7, 2, 6])

### NumPy Array Attributes

In [13]:
x = np.random.randint(0,10, size=(2, 3, 4)) # 2 dimension, 3 rows for each dimension, 4 col for each dimension 
print(x)
print("x ndim: ", x.ndim)  #number of dimension
print("x shape:", x.shape) #the total shape of array
print("x size: ", x.size)  #totall elements of array
print("dtype:", x.dtype)   # data type of elements

[[[3 7 2 3]
  [9 5 4 9]
  [6 3 1 7]]

 [[2 0 2 8]
  [4 3 8 2]
  [4 1 7 9]]]
x ndim:  3
x shape: (2, 3, 4)
x size:  24
dtype: int32


### Array Indexing

In [14]:
x = np.arange(10)
print(x[0])        # access the first element of the array
print(x[len(x)-1]) # last element
print(x[-1])       # last element

0
9
9


In [15]:
x = np.random.randint(0, 10, (3,4))
print(x)
print(x[1, 2])  # first element is for row second is for column
print(x[2, -1]) 
print(x[2][-1]) #another way for accessing
x[0, 0] = 12    # modifying an array element
print(x)

[[3 3 6 4]
 [6 5 9 5]
 [7 1 9 4]]
9
4
4
[[12  3  6  4]
 [ 6  5  9  5]
 [ 7  1  9  4]]


### Array slicing

In [16]:
x = np.arange(20)
print(x)
print(x[:5])    #elements from start of the array to index n-1 here is 5-1 = 4 so these are : x0 x1 x2 x3 x4 
print(x[5:])    # elements after index 5
print(x[10:15]) #beteween index 10 and 15-1 =14
print(x[2:14:3]) #between 2 and 14-1=13 with the steps of 3

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[0 1 2 3 4]
[ 5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[10 11 12 13 14]
[ 2  5  8 11]


In [17]:
x = np.random.randint(0, 10, (3,4))
print(x)
print(x[:2, :2]) # two first rows with two first columns
print(x[:, 0])  #first columns of all rows
x[0, 0] = 99    # modify the element

[[9 5 8 9]
 [5 2 0 1]
 [5 5 0 2]]
[[9 5]
 [5 2]]
[9 5 5]


In [18]:
y = x.copy() #create a copy of x in variable y
# y = x just is a view and stores the address of x inside y and every time we modify y, the x changes too!
print(y)

[[99  5  8  9]
 [ 5  2  0  1]
 [ 5  5  0  2]]


### Reshaping of Arrays

In [19]:
x = np.arange(1, 13)      #1-d array from 0 to 11
print(x)
x = x.reshape(3,4)        #reshaping it to an 3*4 array with 3 rows and 4 columns
print(x)                  
x = x.reshape(x.size,1)   #converting it to an vertical array
print(x) 
x = x.flatten()           ##converting it to the same 1-d array
print(x)

[ 1  2  3  4  5  6  7  8  9 10 11 12]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]
 [12]]
[ 1  2  3  4  5  6  7  8  9 10 11 12]


### Concatenation of arrays

In [20]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.concatenate([x, y])
print(z)

[1 2 3 4 5 6]


In [21]:
x = np.array([[1, 2, 3],[4,5,6]])         #1-d array
y = np.array([[7, 8, 9],[10,11,12]])      #2-d array
z = np.concatenate([x, y])
print(z)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [22]:
# For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack)
# and np.hstack (horizontal stack) functions
x = np.array([1, 2, 3])
y = np.array([[4, 5, 6],[7,8,9]])
z = np.vstack([x, y])
print(z)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


### Splitting of arrays

In [23]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9 ,10]
x1, x2, x3 = np.split(x, [3, 5]) #x1 from x[0] to x[3-1] , x2 from x[3] to x[5-1], x3 = x[5] to the end
print(x1, x2, x3)

[1 2 3] [4 5] [ 6  7  8  9 10]


## <font color = red>Computation on NumPy Arrays: Universal Functions</font>

Python’s default implementation (known as CPython) does some operations very
slowly. This is in part due to the dynamic, interpreted nature of the language: the fact
that types are flexible, so that sequences of operations cannot be compiled down to
efficient machine code as in languages like C and Fortran.
For many types of operations, NumPy provides a convenient interface into just this
kind of statically typed, compiled routine. This is known as a vectorized operation.
You can accomplish this by simply performing an operation on the array, which will
then be applied to each element, leading to much faster execution

In [24]:
x = np.arange(4)
print("x =", x)           
print("x + 5 =", x + 5)    #ufunc +
print("x - 5 =", x - 5)    #ufunc -
print("x * 2 =", x * 2)    #ufunc *
print("x / 2 =", x / 2)    #ufunc /
print("x // 2 =", x // 2)  #ufunc floor division
print("-x = ", -x)         #ufunc *negative
print("x ** 2 = ", x ** 2) #ufunc power of
print("x % 2 = ", x % 2)   #ufunc reminder
print(np.abs(x-4))         #absolute function

x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2 =  [0 1 0 1]
[4 3 2 1]


In [25]:
# print("sin(theta) = ", np.sin(theta))  for calculationg the sin
# print("cos(theta) = ", np.cos(theta))  for calculationg the cos
# print("tan(theta) = ", np.tan(theta))  for calculationg the tan whis is sin/cos

In [26]:
x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))             # e to power of x
print("2^x =", np.exp2(x))            # 2 to power of x
print("3^x =", np.power(3, x ))        # n to power of x

x = [1, 2, 3]
e^x = [ 2.71828183  7.3890561  20.08553692]
2^x = [2. 4. 8.]
3^x = [ 3  9 27]


In [27]:
x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np.log(x))         # ln x
print("log2(x) =", np.log2(x))      # log x  for base = 2
print("log10(x) =", np.log10(x))    # log x  for base 10

x = [1, 2, 4, 10]
ln(x) = [0.         0.69314718 1.38629436 2.30258509]
log2(x) = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


## <font color = red> Aggregations</font>
NumPy has fast built-in aggregation functions for working on arrays; we’ll discuss and demonstrate some of them here.

In [28]:
x = np.random.randint(1,100,100) 
sum_of_x = np.sum(x)       #sum function
print('sum is : ',sum_of_x)
maxim = np.max(x)
print('max is : ',maxim)
minim = np.min(x)
print('min is : ',minim)
mean = np.mean(x)
print('mean is : ', mean)
std = np.std(x)
print('standard deviation is : ', std)
variance = np.var(x)
print('variance is : ', variance)
median = np.median(x)
print('median is : ', median)

sum is :  4989
max is :  99
min is :  1
mean is :  49.89
standard deviation is :  26.47220995685853
variance is :  700.7778999999999
median is :  48.0


## <font color = red> Broadcasting </font>
Broadcasting is simply a
set of rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on
arrays of different sizes.

#### Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:
##### • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
##### • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
##### • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.


![02.05-broadcasting.png](attachment:02.05-broadcasting.png)

In [29]:
#Broadcasting example for rule 1
x = np.ones((2, 3))
print('x is : \n',x,'\n ************')
y = np.arange(3)
print('y is : \n',y,'\n ************')
z = x+y
print('z is : \n',z)

x is : 
 [[1. 1. 1.]
 [1. 1. 1.]] 
 ************
y is : 
 [0 1 2] 
 ************
z is : 
 [[1. 2. 3.]
 [1. 2. 3.]]


In [30]:
#Broadcasting example for rule 2
x = np.arange(3).reshape((3, 1))
print('x is : \n',x,'\n ************')
y = np.arange(3)
print('y is : \n',y,'\n ************')
z = x+y
print('z is : \n',z)

x is : 
 [[0]
 [1]
 [2]] 
 ************
y is : 
 [0 1 2] 
 ************
z is : 
 [[0 1 2]
 [1 2 3]
 [2 3 4]]


## <font color = red> Masks</font>
Masking comes up when you want to extract, modify, count, or
otherwise manipulate values in an array based on some criterion

In [31]:
x = np.arange(1,21)
print(x)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]


In [32]:
print(x[ x%2 == 0])    # == returning the elements satisfying the expression
print(x[ x%2 != 0])    # != returning the elements not satisfying the expression
print(x[ x < 5])       # <  less
print(x[ x <= 5])      # <= less equal
print(x[ x > 18])      # >  greater
print(x[ x >= 18])     # >= greater equal

[ 2  4  6  8 10 12 14 16 18 20]
[ 1  3  5  7  9 11 13 15 17 19]
[1 2 3 4]
[1 2 3 4 5]
[19 20]
[18 19 20]


In [33]:
# we can mix the operations above with & | ~ to create variate expressions
print(x[ (x%2 == 0) & (x%3 == 0)])  # and operator
print(x[ (x%2 == 0) | (x%3 == 0)])  # or operator
print(x[ ~(x%2 == 0)])              # not operator

[ 6 12 18]
[ 2  3  4  6  8  9 10 12 14 15 16 18 20]
[ 1  3  5  7  9 11 13 15 17 19]


## <font color = red> Sorting arrays</font>

In [34]:
x = np.random.randint(1,20,10)
print(x)

[11  9 10  6  5 19 17  4  9  6]


In [35]:
np.sort(x) #it uses fast sort algorithm to sort the array.

array([ 4,  5,  6,  6,  9,  9, 10, 11, 17, 19])

## <font color = red> NumPy’s Structured Arrays </font>
While often our data can be well represented by a homogeneous array of values,
sometimes this is not the case. This section demonstrates the use of NumPy’s struc‐
tured arrays and record arrays, which provide efficient storage for compound, hetero
geneous data

In [36]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),'formats':('U10', 'i4', 'f8')})

data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [37]:
print(data['name'])

['Alice' 'Bob' 'Cathy' 'Doug']


In [38]:
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')