# Basics of NumPy
- NumPy - Introduction and Installation
- NumPy - Arrays Data Structure ( 1D, 2D, ND arrays)
- Creating Arrays
- NumPy - Data Types
- Array Attributes
- Creating Arrays – Alternative Ways
- Sub-setting, Slicing and Indexing Arrays
- Operations on Arrays
- Array Manipulation


### NumPy – Introduction and Installation

- NumPy stands for ‘Numeric Python’
- Used for mathematical and scientific computations
- NumPy array is the most widely used object of the NumPy library

#### Installing numpy

!pip install numpy

#### Importing numpy

In [2]:
import numpy as np

### Arrays Data Structure

An `Array` is combination of homogenous data objects and can be indexed across multiple dimensions

#### Arrays are –
- ordered sequence/collection of Homogenous data
- multidimensional
- mutable


#### Creating Arrays – From list/tuple

- `np.array()` is used to create a numpy array from a list


#### Example on 1-D Array

In [3]:
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

#### Example on 2-D Array

In [4]:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

### Array Attributes

- Attributes are the features/characteristics of an object that describes the object

- Some of the attributes of the numpy array are:
    - **shape** - Array dimensions
    - **size** - Number of array elements
    - **dtype** - Data type of array elements
    - **ndim** - Number of array dimensions
    - **dtype.name** - Name of data type
    - **astype** - Convert an array to a different type


In [5]:
arr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [6]:
arr.shape

(3, 4)

In [7]:
arr.size

12

In [8]:
arr.ndim

2

In [9]:
arr.dtype

dtype('int64')

In [10]:
arr.astype(float)

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

### Indexing, Slicing and Boolen Indexing

In [11]:
arr = np.random.randint(5, 50, size = 10)
arr

array([35, 18, 43, 47, 11, 18, 25, 10, 40, 32], dtype=int32)

#### 1-D Arrays

**Indexing**

- this concept is same for str, list, tuples, arrays

In [12]:
arr[0]  # - first element

np.int32(35)

In [14]:
arr[3]  # - fourth element

np.int32(47)

In [15]:
type(arr[3])

numpy.int32

In [16]:
type(32)

int

In [17]:
int(arr[3])

47

In [19]:
arr[-1]  # last element

np.int32(32)

**Slicing**

- this concept is same for str, list, tuples, arrays

###### Ex. Extract first 3 elements

In [21]:
arr[0:3]

array([35, 18, 43], dtype=int32)

###### Ex. Extract elements from postion 3 to the end if the array

In [22]:
arr[3 : ]  # Keep end point as empty if you want to extract till the last element

array([47, 11, 18, 25, 10, 40, 32], dtype=int32)

###### Ex. Extract last 5 elements

In [23]:
arr[-5 : ]

array([18, 25, 10, 40, 32], dtype=int32)

**Conditional or boolean slicing/indexing - Filtering the arrays**
- applicable only to arrays

###### Ex. Extract elements at index position 2, 5, 9.

In [25]:
arr[[9, 2, 5, ]]

array([32, 43, 18], dtype=int32)

###### Ex. Extract elements less than 20

In [27]:
arr < 20  # returns a boolean array

array([False,  True, False, False,  True,  True, False,  True, False,
       False])

In [29]:
arr[arr < 20] # Extracts the values where condition is True

array([18, 11, 18, 10], dtype=int32)

In [33]:
# Sum of numbers in the array
np.sum(arr)

np.int64(279)

In [34]:
# Count the number of values greater than 20 - 
np.sum(arr < 20) # arr < 20 generate a bool array, where True - 1 and False - 0 - sum of all the True values 

np.int64(4)

In [35]:
# Are there any numbers less than 20
np.sum(arr < 20) >= 1

np.True_

In [37]:
# Are there any numbers less than 20
np.any(arr < 20) # checks if any 1 value in the bool array is True

np.True_

### 2-D Arrays

In [38]:
arr = np.random.randint(5, 50, size = (6,4))
arr

array([[28, 19, 24, 14],
       [28, 43, 37, 20],
       [40, 37, 33, 17],
       [19, 11,  5, 20],
       [11, 11, 16, 13],
       [11, 47,  9,  9]], dtype=int32)

###### Ex. Extract first 3 rows

In [39]:
arr[0:3]

array([[28, 19, 24, 14],
       [28, 43, 37, 20],
       [40, 37, 33, 17]], dtype=int32)

###### Ex. Extract last 2 rows

In [40]:
arr[-2:]

array([[11, 11, 16, 13],
       [11, 47,  9,  9]], dtype=int32)

###### Ex. Extract second column - count wise

In [42]:
arr[:, 1]  # Extracting single row or col from 2-D arrays will always return output in 1-D

array([19, 43, 37, 11, 11, 47], dtype=int32)

###### Ex. Extract row 2 and 3 and column 2 and 3

In [44]:
arr[1:3, 1:3]

array([[43, 37],
       [37, 33]], dtype=int32)

###### Ex. Extract values less than 25

In [46]:
arr[arr < 25]

array([19, 24, 14, 20, 17, 19, 11,  5, 20, 11, 11, 16, 13, 11,  9,  9],
      dtype=int32)

###### Ex. Identify largest value. Extract values less than half of largest values

In [47]:
arr[arr < np.max(arr)/2]

array([19, 14, 20, 17, 19, 11,  5, 20, 11, 11, 16, 13, 11,  9,  9],
      dtype=int32)

### Array Operations

#### Arithmetic operations on Arrays -
 - Addition, Substraction, Multiplication, Division, etc.
 - Operations on array and a scalar value
 - Operations between two arrays
 - Matrix Operations - Multiplication(np.dot()), Transpose(np.transpose())


#### Array and Scalar

In [48]:
arr1 = np.random.randint(1,10,size = 5)
arr1 

array([7, 6, 5, 3, 3], dtype=int32)

In [49]:
arr1 + 5 # Addition

array([12, 11, 10,  8,  8], dtype=int32)

In [50]:
arr1 - 5 # Substraction

array([ 2,  1,  0, -2, -2], dtype=int32)

In [51]:
arr1 * 5 # Multiplication

array([35, 30, 25, 15, 15], dtype=int32)

In [52]:
arr1 / 5 # Division

array([1.4, 1.2, 1. , 0.6, 0.6])

In [53]:
arr1 // 5 # Floor Division

array([1, 1, 1, 0, 0], dtype=int32)

In [54]:
arr1 % 5 # Modulus

array([2, 1, 0, 3, 3], dtype=int32)

#### Two Arrays

In [55]:
arr1 = np.random.randint(1,10,size = 5)
arr1

array([1, 5, 6, 9, 8], dtype=int32)

In [56]:
arr2 = np.random.randint(1,10,size = 5)
arr2

array([2, 8, 8, 4, 3], dtype=int32)

In [57]:
arr1 + arr2 # Addition

array([ 3, 13, 14, 13, 11], dtype=int32)

In [58]:
arr1 - arr2 # Substraction

array([-1, -3, -2,  5,  5], dtype=int32)

In [59]:
arr1 * arr2 # Multiplication

array([ 2, 40, 48, 36, 24], dtype=int32)

In [60]:
arr1 / arr2 # Division

array([0.5       , 0.625     , 0.75      , 2.25      , 2.66666667])

In [61]:
arr1 // arr2 # Floor Division

array([0, 0, 0, 2, 2], dtype=int32)

In [62]:
arr1 % arr2 # Modulus

array([1, 5, 6, 1, 2], dtype=int32)

#### Relational operations on Arrays -
 - ==, !=, <, >, <=, >=
 - Operations on array and a scalar value
 - Operations between two arrays

#### Array and Scalar

In [63]:
arr1 = np.random.randint(1,10,size = 5)
arr1

array([7, 5, 4, 3, 6], dtype=int32)

In [64]:
arr1 == 5

array([False,  True, False, False, False])

In [65]:
arr1 != 5

array([ True, False,  True,  True,  True])

In [66]:
arr1 < 5

array([False, False,  True,  True, False])

In [67]:
arr1 > 5

array([ True, False, False, False,  True])

In [68]:
arr1 <= 5

array([False,  True,  True,  True, False])

In [69]:
arr1 >= 5

array([ True,  True, False, False,  True])

#### Two Arrays

In [70]:
arr1 = np.random.randint(1,10,size = 5)
arr1

array([5, 1, 8, 8, 2], dtype=int32)

In [71]:
arr2 = np.random.randint(1,10,size = 5)
arr2

array([6, 6, 6, 8, 4], dtype=int32)

In [72]:
arr1 == arr2

array([False, False, False,  True, False])

In [73]:
arr1 != arr2

array([ True,  True,  True, False,  True])

In [74]:
arr1 < arr2

array([ True,  True, False, False,  True])

In [75]:
arr1 > arr2

array([False, False,  True, False, False])

In [76]:
arr1 <= arr2

array([ True,  True, False,  True,  True])

In [77]:
arr1 >= arr2

array([False, False,  True,  True, False])

#### Logical operations on Arrays -
 - np.logical_or()
 - np.logical_and()
 - np.logical_not()
 - np.logical_xor()

In [78]:
arr1 = np.random.randint(1,10,size = 5)
arr1

array([6, 6, 6, 9, 8], dtype=int32)

In [79]:
arr2 = np.random.randint(1,10,size = 5)
arr2

array([3, 2, 7, 3, 5], dtype=int32)

In [82]:
np.logical_and(arr1 > 5, arr2 > 5)

array([False, False,  True, False, False])

In [83]:
np.logical_or(arr1 > 5, arr2 > 5)

array([ True,  True,  True,  True,  True])

In [84]:
np.logical_not(arr1 > 5)

array([False, False, False, False, False])

In [85]:
np.logical_xor(arr1 > 5, arr2 > 5)

array([ True,  True, False,  True,  True])

#### Set Operations on Arrays

Applicable to 1-D Ararys only

- np.unique() - Find the unique elements of an array.
- np.in1d() - Test whether each element of a 1-D array is also present in a second array.
- np.intersect1d() - Find the intersection of two arrays.
- np.setdiff1d() - Find the set difference of two arrays.
- np.union1d() - Find the union of two arrays.

In [87]:
arr1 = np.random.randint(1,10,size = 10)
arr1

array([8, 7, 2, 3, 7, 4, 7, 7, 3, 4], dtype=int32)

In [89]:
np.unique(arr1)

array([2, 3, 4, 7, 8], dtype=int32)

In [91]:
np.unique(arr1, return_counts=True, return_index=True)

(array([2, 3, 4, 7, 8], dtype=int32),
 array([2, 3, 5, 1, 0]),
 array([1, 2, 2, 4, 1]))

In [92]:
arr1 = np.random.randint(1,10,size = 10)
arr1

array([1, 3, 6, 3, 2, 8, 4, 1, 5, 6], dtype=int32)

In [93]:
arr2 = np.random.randint(1,10,size = 10)
arr2

array([5, 3, 3, 2, 3, 4, 5, 3, 7, 2], dtype=int32)

In [94]:
np.intersect1d(arr1, arr2) # common elements in 2 arrays

array([2, 3, 4, 5], dtype=int32)

In [96]:
np.union1d(arr1, arr2)

array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)

In [98]:
np.setdiff1d(arr1, arr2) # elements of only arr1 removing common

array([1, 6, 8], dtype=int32)

In [99]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([1, 3, 5, 7, 9])

# Checks if elements of array1 are present in array 2
np.in1d(arr1, arr2)

  np.in1d(arr1, arr2)


array([ True, False,  True, False,  True])

In [100]:
np.isin(arr1, arr2)

array([ True, False,  True, False,  True])

In [101]:
dict(zip(arr1, np.isin(arr1, arr2)))

{np.int64(1): np.True_,
 np.int64(2): np.False_,
 np.int64(3): np.True_,
 np.int64(4): np.False_,
 np.int64(5): np.True_}

### Array Functions/Methods

- np.all(), np.any()
- arr.sum()
- arr.min(), arr.max(), arr.argmin(), arr.agrmax()
- np.round()
- np.mean(), np.median(), np.average(), np.percentile()


In [102]:
arr2 = np.array([1, 3, 5, 7, 9])

In [103]:
np.max(arr2)

np.int64(9)

In [104]:
arr2.max()

np.int64(9)

In [107]:
arr2.argmax() # the index position of largest element

np.int64(4)

### Array Manipulations

- **Changing Shape** – np.reshape()
- **Adding/Removing Elements** – np.append(), np.insert(), np.delete()
- **Splitting Arrays** – np.hsplit(), np.vsplit(), arr_obj.flatten()
- **Sorting Arrays** - arr_obj.sort(), arr_obj.argsort()

In [109]:
arr = np.arange(1, 13)  # generates sequence of numbers
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

#### np.reshape()

In [110]:
arr = np.reshape(arr, (4, 3))
arr

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

#### np.append()

In [111]:
np.append(arr, 10) # Flattens the array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 10])

In [112]:
np.append(arr, [13, 14, 15]) # Flattens the array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [113]:
np.append(arr, np.reshape(np.array([13, 14, 15]), (1, 3)), axis=0) # axis = 0 adds a 2-D array row-wise

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [114]:
np.append(arr, [[10],[20],[30],[40]], axis=1) # axis = 1 adds a 2-D array column-wise

array([[ 1,  2,  3, 10],
       [ 4,  5,  6, 20],
       [ 7,  8,  9, 30],
       [10, 11, 12, 40]])

#### np.insert()

In [116]:
arr = np.reshape(np.arange(1,13), (4,3))
arr

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [117]:
np.insert(arr, 1, 5) # Flattens the arr and inserts 5 at index 1

array([ 1,  5,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [118]:
np.insert(arr, 1, 5, axis=0) # Inserts [5, 5, 5] as row 1

array([[ 1,  2,  3],
       [ 5,  5,  5],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [119]:
np.insert(arr, 1, 5, axis=1) # Inserts [5, 5, 5, 5] as column 1

array([[ 1,  5,  2,  3],
       [ 4,  5,  5,  6],
       [ 7,  5,  8,  9],
       [10,  5, 11, 12]])

In [120]:
np.insert(arr, 1, [10, 20, 30, 40], axis=1)

array([[ 1, 10,  2,  3],
       [ 4, 20,  5,  6],
       [ 7, 30,  8,  9],
       [10, 40, 11, 12]])

In [121]:
np.insert(arr, 1, [10, 20, 30], axis=0)

array([[ 1,  2,  3],
       [10, 20, 30],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

#### np.delete()

In [122]:
arr = np.reshape(np.arange(1,13), (4,3))
arr

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [123]:
np.delete(arr, 1) # Flattens the arr and deletes element at index 1

array([ 1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [125]:
np.delete(arr, 1, axis=0) # deletes row 1

array([[ 1,  2,  3],
       [ 7,  8,  9],
       [10, 11, 12]])

In [126]:
np.delete(arr, 1, axis=1) # deletes column 1

array([[ 1,  3],
       [ 4,  6],
       [ 7,  9],
       [10, 12]])

In [127]:
np.delete(arr,[0,2], axis=0) # deletes selected rows

array([[ 4,  5,  6],
       [10, 11, 12]])

In [128]:
np.delete(arr,[0,2], axis=1) # deletes selected columns

array([[ 2],
       [ 5],
       [ 8],
       [11]])

##### Note - All the 3 functions generate a new array

#### np.hsplit(), np.vsplit()

In [129]:
arr = np.reshape(np.arange(1, 25), (6,4))
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

In [130]:
np.vsplit(arr, 2)

[array([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]),
 array([[13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]])]

In [131]:
np.hsplit(arr, 2)

[array([[ 1,  2],
        [ 5,  6],
        [ 9, 10],
        [13, 14],
        [17, 18],
        [21, 22]]),
 array([[ 3,  4],
        [ 7,  8],
        [11, 12],
        [15, 16],
        [19, 20],
        [23, 24]])]

#### flatten()

In [132]:
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

In [134]:
arr.flatten()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24])

In [133]:
arr.flatten(order = "F") # returns a 1-D array

array([ 1,  5,  9, 13, 17, 21,  2,  6, 10, 14, 18, 22,  3,  7, 11, 15, 19,
       23,  4,  8, 12, 16, 20, 24])

In [None]:
help(arr.flatten)

#### Sorting Arrays

In [146]:
products = np.array(["p1", "p2", "p3", "p4"])
prices = np.array([200, 400, 300, 100])

np.sort(prices) # Generates a new array

array([100, 200, 300, 400])

In [138]:
np.sort(prices)[::-1]  # DESC Sort

array([400, 300, 200, 100])

In [149]:
np.argsort(prices)  # returns the index position of the elements in sorted order of values

array([3, 0, 2, 1])

In [147]:
prices[np.argsort(prices)]

array([100, 200, 300, 400])

In [150]:
products[np.argsort(prices)] # Sort the products by prices

array(['p4', 'p1', 'p3', 'p2'], dtype='<U2')

### Examples on Coffee Shop Data Set

In [None]:
import numpy as np
products = np.array(['Caffe Latte', 'Cappuccino', 'Colombian', 'Darjeeling', 'Decaf Irish Cream', 'Earl Grey', 'Green Tea', 'Lemon', 'Mint', 'Regular Espresso'])
sales = np.array([52248.0, 14068.0, 71060.0, 60014.0, 69925.0, 27711.0, 19231.0, 24873.0, 32825.0, 44109.0])
profits = np.array([17444.0, 5041.0, 28390.0, 20459.0, 23432.0, 7691.0, -2954.0, 7159.0, 10722.0, 14902.0])
target_profits = np.array([15934.0, 4924.0, 31814.0, 19649.0, 24934.0, 8461.0, 7090.0, 7084.0, 10135.0, 16306.0])
target_sales = np.array([48909.0, 13070.0, 80916.0, 57368.0, 66906.0, 30402.0, 18212.0, 21628.0, 27336.0, 42102.0])

###### Ex. How many products are there in the dataset?

###### Ex. Which product had the highest sales?

###### Ex. Which product had a loss?

###### Ex. Which products had profit margins (profit/sales) greater than 30%?

###### Ex. Find products with low sales but high profits (sales < median, profit > median)

###### Ex. Generate new values of Sales after applying 18% Tax

###### Ex. Top 3 most profitable products?

###### Ex. Which products exceeded their sales targets?

###### Ex. How many products met or exceeded profit targets?

###### Ex. What is the average sales target achievement rate?

###### Ex. Find Products Falling Short of Profit Targets and Sort by Achievement Percentage
1. Filter products that did not meet their profit targets
2. Calculate the percentage of target profit achieved for each
3. Display these products in descending order based on their achievement percentage