# Food Delivery Data Exploration and analysis 2

# Agenda

### Numpy Topics

1. `np.arange`
2. Fancy Indexing
3. 2D Matrix  
   a. Reshape  
   b. Indexing and Slicing  
   c. Masking  
4. Aggregate Functions  
   a. Axis  
5. `np.all`  
6. `np.any`  
7. `np.where`  

### Array Operations

1. Sort  
2. Sort 2D Array  
3. Matrix Multiplication  
   a. `np.dot`  
   b. `np.matmul`  
   c. `@`
   
---


# Recapping EDA and Data Scientist Roles

## <span style="color: skyblue;"> The Goal: Becoming a Data Scientist </span>
- To do **Machine Learning**, the first step is **EDA (Exploratory Data Analysis)**.
- **Analogy:** Just like we explore reviews and ratings before going to a movie, companies explore humongous data to find patterns.

## <span style="color: skyblue;"> Industry Examples of EDA </span>
- **Amazon during Diwali:** Amazon identifies patterns (e.g., people buying lights, decorations, or artificial flowers) and offers specific discounts based on popularity.
- **Zomato Delivery Time:**
  - On the app, you see "41 minutes" or "29 minutes".
  - **The Problem:** You cannot take a mathematical average of "41 minutes" because it contains the string "minutes".
  - **The Solution:** We need to clean the data (convert to numbers like 41, 29) using libraries like **NumPy** or **Pandas**.

### **The Toolkit**
To extract information from millions of records and millions of customers, we use:
1.  **NumPy**: Numerical operations.
2.  **Pandas**: Data manipulation.
3.  **Matplotlib & Seaborn**: Converting humongous data into useful charts (pie charts, bar charts, line charts).

We will use all this.

---
# NumPy Basics



## <span style="color: skyblue;"> The Easiest Function: `np.arange` </span>
- **Use Case:** Creating every customer ID automatically (1, 2, 3... millions) instead of doing it manually.
- **Syntax:** `np.arange(start, end, step_size)`
- **Rule:** The `end` value is **never included** (it is excluded).

### Code:


In [None]:
import numpy as np

# Creating IDs for 1000 customers
ids = np.arange(1, 1001)
print(ids)

[   1    2    3    4    5    6    7    8    9   10   11   12   13   14
   15   16   17   18   19   20   21   22   23   24   25   26   27   28
   29   30   31   32   33   34   35   36   37   38   39   40   41   42
   43   44   45   46   47   48   49   50   51   52   53   54   55   56
   57   58   59   60   61   62   63   64   65   66   67   68   69   70
   71   72   73   74   75   76   77   78   79   80   81   82   83   84
   85   86   87   88   89   90   91   92   93   94   95   96   97   98
   99  100  101  102  103  104  105  106  107  108  109  110  111  112
  113  114  115  116  117  118  119  120  121  122  123  124  125  126
  127  128  129  130  131  132  133  134  135  136  137  138  139  140
  141  142  143  144  145  146  147  148  149  150  151  152  153  154
  155  156  157  158  159  160  161  162  163  164  165  166  167  168
  169  170  171  172  173  174  175  176  177  178  179  180  181  182
  183  184  185  186  187  188  189  190  191  192  193  194  195  196
  197 

## <span style="color: skyblue;"> Step Size and Floats </span>
- **Step Size:** You can skip numbers. A step size of 2 results in `1, 3, 5...`.
- Unlike Python's native `range`, NumPy's `arange` supports **decimal points (floats)**.

<span style="background-color: red;"> **[Ask Learners]:** </span>
> <span style="color: violet;"> Question: If I want multiples of 10 from 10 to 90, what should be the `np.arange` parameters? </span>
> Answer: `np.arange(10, 100, 10)` or `np.arange(10, 91, 10)`.

### Code:


In [None]:
np.arange(1,3,0.2)

array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])

## <span style="color: violet;"> Why not use a `for` loop? </span>
- If you have 10 million customers, a `for` loop is too slow.
- NumPy performs operations in **parallel** (Vectorization), making it much faster.


# Question
For `arr = np.arange(10)`, what is `arr[::-2]`?

# Choices
- [x] `[9,7,5,3,1]`
- [ ] `[8,6,4,2,0]`
- [ ] `[0,2,4,6,8]`
- [ ] Error


(Step = $-2$ starts from end → odd numbers descending.)

# Fancy Indexing and Boolean Masking

<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Explain the buisness context for each.

In Zomato restaurant analytics, we often need to filter data based on specific conditions—for example, restaurants with votes below a certain threshold, or costs divisible by a particular value for promotional analysis. Fancy indexing (masking) in NumPy allows analysts to apply these conditions efficiently across large datasets without loops. This technique enables quick identification of targeted subsets of restaurants for marketing campaigns, pricing adjustments, or customer engagement strategies.

## <span style="color: skyblue;"> Filtering Data </span>
- Every restaurant at Zomato has votes number.
- Imagine an array of `votes` for different restaurants.
- If you want to find restaurants with more than 500 votes:

### Code:


In [None]:
votes = np.array([ 775,  787,  918,   88,  166,  286, 2556,  324,  504,  402])
costs = np.array(["'800.0'" ,"'800.0'", "'800.0'", "'300.0'", "'600.0'", "'600.0'", "'600.0'", "'700.0'" ,"'550.0'", "'500.0'"])

print(votes >= 500)

[ True  True  True False False False  True False  True False]


## <span style="color: skyblue;"> Fancy Indexing </span>
- A Boolean array of `True/False` isn't useful on its own.
- **Fancy Indexing** allows you to pass a list of indices or a condition inside the square brackets to get the actual values.

### Code:


In [None]:
# Get only values where votes >= 500
high_votes = votes[votes >= 500]
print(high_votes)

# Or pass a list of specific indices
print(votes[[1, -1]]) # Gets index 1 and the last element (Fancy Indexing)

[ 775  787  918 2556  504]
[787 402]


If you notice, we also have cost array. Let's say it is average cost of 1 person in that restaurant.

- We have a `1x1` mapping here of cost and votes.
- We can perform this:

### Code:

In [None]:
costs[votes>=500] # the cost where rating are greater than 500 votes

array(["'800.0'", "'800.0'", "'800.0'", "'600.0'", "'550.0'"], dtype='<U7')

- **Requirement:** This only works if arrays are **perfectly matching** in length and indexing.

## <span style="color: skyblue;"> Another Industry Use Case: Reliance/Jio Health Band </span>
- Reliance make fitness bands that captures steps as well as mood of users.
- We had matching arrays for **Mood** (Happy/Sad) and **Step Count**.
- By applying a filter like `steps[mood == "Happy"]`, we could analyze that happy people walk more on average than sad people.

# 2D Matrices and the Reshape Function

<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Explain the buisness context for each.

In Zomato analytics, restaurants are described by multiple numeric attributes, such as votes and approximate costs. Representing this data as a 2D array (matrix) allows us to treat each restaurant as a row and each attribute as a column. This structure is essential for:
- Performing multivariate analysis (e.g., votes vs. costs).
- Feeding data into statistical models or machine learning algorithms.
- Reshaping or transforming data efficiently for segmentation, clustering, or pivot-style analyses.

Understanding the shape and dimensionality ensures operations like
gation, filtering, or reshaping for advanced analytics are error-free and meaningful.

## <span style="color: skyblue;"> Working with 2D Arrays </span>
- Matrices have rows and columns.
- **Mantra:** Always remember **R, C** (Row, Column).

### Code:


In [None]:
# Take a sample of 50 restaurants
sample_votes = np.array([775, 787, 918, 88, 166, 286, 2556, 324, 504, 402, 150, 164, 424, 918, 90, 133, 144, 93, 62, 180, 62, 148, 219, 506, 172, 415, 230, 1647, 4884, 133, 286, 540, 2556, 36, 244, 804, 679, 245, 345, 618, 1047, 627, 354, 244, 163, 808, 1720, 868, 520, 299])
sample_costs = np.array([800.0, 800.0, 800.0, 300.0, 600.0, 600.0, 600.0, 700.0, 550.0, 500.0, 600.0, 500.0, 450.0, 800.0, 650.0, 800.0, 700.0, 300.0, 400.0, 500.0, 600.0, 550.0, 600.0, 500.0, 750.0, 500.0, 650.0, 600.0, 750.0, 200.0, 500.0, 800.0, 600.0, 400.0, 300.0, 450.0, 850.0, 300.0, 400.0, 750.0, 450.0, 450.0, 800.0, 800.0, 800.0, 850.0, 400.0, 1200.0, 300.0, 300.0])

# Create a 2D array: rows = restaurants, columns = [votes, costs]
restaurants_data = np.column_stack((sample_votes, sample_costs))

print("2D Array (votes, costs):\n", restaurants_data)
print("Shape:", restaurants_data.shape)
print("Dimensions:", restaurants_data.ndim)  # 2D


2D Array (votes, costs):
 [[ 775.  800.]
 [ 787.  800.]
 [ 918.  800.]
 [  88.  300.]
 [ 166.  600.]
 [ 286.  600.]
 [2556.  600.]
 [ 324.  700.]
 [ 504.  550.]
 [ 402.  500.]
 [ 150.  600.]
 [ 164.  500.]
 [ 424.  450.]
 [ 918.  800.]
 [  90.  650.]
 [ 133.  800.]
 [ 144.  700.]
 [  93.  300.]
 [  62.  400.]
 [ 180.  500.]
 [  62.  600.]
 [ 148.  550.]
 [ 219.  600.]
 [ 506.  500.]
 [ 172.  750.]
 [ 415.  500.]
 [ 230.  650.]
 [1647.  600.]
 [4884.  750.]
 [ 133.  200.]
 [ 286.  500.]
 [ 540.  800.]
 [2556.  600.]
 [  36.  400.]
 [ 244.  300.]
 [ 804.  450.]
 [ 679.  850.]
 [ 245.  300.]
 [ 345.  400.]
 [ 618.  750.]
 [1047.  450.]
 [ 627.  450.]
 [ 354.  800.]
 [ 244.  800.]
 [ 163.  800.]
 [ 808.  850.]
 [1720.  400.]
 [ 868. 1200.]
 [ 520.  300.]
 [ 299.  300.]]
Shape: (50, 2)
Dimensions: 2


## <span style="color: skyblue;"> The `reshape` Function </span>
- Used to change the "**dhacha**" (structure) of the data.
- **Constraint:** The new shape must have the same number of elements as the original. (e.g., 9 elements can be 3x3, but not 2x4).
- If the count doesn't match, NumPy gives you a **"tight slap of an error."**

**Disclaimer:**
> Reshape can actually mess your data completely! Use it very cautiously.

A classic use case of Reshape at initial stages:

### Code:


In [None]:
# Creating a 1D array and reshaping to 2D
arr = np.arange(10, 100, 10).reshape(3, 3)
print(arr)

[[10 20 30]
 [40 50 60]
 [70 80 90]]


- Error while reshaping:

### Code:

In [None]:
# This is expected to fail!
np.arange(10,110,10).reshape(3,3) # 10 elements 3x3

ValueError: cannot reshape array of size 10 into shape (3,3)

## <span style="color: skyblue;"> The Unknown Dimension (`-1`) </span>
- If you provide `-1` for one dimension, NumPy automatically calculates it based on the other provided value.
- *Example:* 10 elements with `reshape(5, -1)` will automatically become `5x2`.

### Code:


In [None]:
np.arange(10,110,10).reshape(5,-1) # 10 elements 5x2 -1 will automically fig out rows or columns if one of them is given

array([[ 10,  20],
       [ 30,  40],
       [ 50,  60],
       [ 70,  80],
       [ 90, 100]])

- We cannot have both dimensions as `-1`.
    - We can also put `-1` as the first argument. As long as one argument is given, it will calculate the other one.

<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Clarify the difference between **Transpose** and **Reshape**:
> - **Transpose:** Flips rows to columns (A becomes A.T).
> - **Reshape:** Re-arranges elements into a new grid entirely.

---

# Aggregate Functions and Transformation

<span style="background-color: red;"> **[Note to Instructor]:** </span>
>**SHARE THE BUSINESS CONTEXT WITH THE LEARNERS**
### Business Context

In Zomato restaurant analytics, aggregate functions allow analysts to summarize key metrics across restaurants quickly. By calculating totals, averages, minimums, maximums, and standard deviations, we can understand overall trends such as:
- Total customer engagement (votes) in a city or segment.
- Average spending per restaurant or per customer.
- The most and least popular restaurants, or the most and least expensive options.


## <span style="color: skyblue;"> Aggregate Functions </span>
- Functions like `mean()`, `sum()`, `max()`, and `min()` allow for rapid statistical analysis.

**Aggregate functions** summarize data:
- `np.sum()` to total votes
- `np.mean()` to find average cost
- `np.min()/np.max()` to find min/max rating or cost
- `np.std()` to measure spread

We’ll use these on our numeric arrays to quickly derive insights.


### Code:


In [None]:
# Total votes of these 10 restaurants
total_votes = np.sum(restaurants_data[:, 0])

# Average cost for these 10 restaurants
avg_cost = np.mean(restaurants_data[:, 1])

# Max votes, Min cost
max_votes = np.max(restaurants_data[:, 0])
min_cost = np.min(restaurants_data[:, 1])

print("Total Votes:", total_votes)
print("Average Cost:", avg_cost)
print("Max Votes:", max_votes)
print("Min Cost:", min_cost)

Total Votes: 30583.0
Average Cost: 587.0
Max Votes: 4884.0
Min Cost: 200.0


- Aggregates help us understand overall trends quickly.
- For the full dataset (not just the sample), we could apply the same functions to the entire `votes` or `costs` arrays.
- This is crucial for highlighting overall patterns (e.g., average spend in the city or total engagement through votes).

Now we can confidently summarize data, enabling high-level business insights.

## <span style="color: violet;"> What if we want to do the elements row-wise or column-wise? </span>

- By **setting `axis` parameter**

## <span style="color: violet;"> What will `np.sum(a, axis=0)` do? </span>

- `np.sum(a, axis=0)` adds together values in **different rows**
- `axis = 0` $\rightarrow$ **Changes will happen along the vertical axis**
- Summation of values happen **in the vertical direction**.
- Rows collapse/merge when we do `axis=0`.

### Code:


In [None]:
data = np.arange(1,10).reshape(3,3)
np.sum(data, axis=0)

array([12, 15, 18])

So to summarize:
- **Axis Concept:**
  - `axis=0`: **Column-wise** operations (Top to Bottom).
  - `axis=1`: **Row-wise** operations (Left to Right).
  - **Mantra:** The trick is **C, R** (Column=0, Row=1).
      - The Mantra get reversed here.


<span style="background-color: red;"> **[Note to Instructor]:** </span>

>**SHARE THE BUSINESS CONTEXT WITH THE LEARNERS**

### Business Context
Logical operations in Zomato analytics enable fast filtering and validation of restaurant data. By identifying premium options, checking data quality, and creating targeted datasets, businesses can segment restaurants, run precise marketing campaigns, and make informed, data-driven decisions efficiently.


---


# Question
Suppose `arr` is a 3D array with shape `(3, 4, 5)`. What is the shape of the result of `arr.sum(axis=(1, 2))`?
# Choices
- [x] `(3,)`
- [ ] `(4, 5)`
- [ ] `(1, 4, 5)`
- [ ] Scalar

 Explanation: Collapsing axes $1$ and $2$ leaves only axis 0 → output shape `(3,)`.

## <span style="color: skyblue;"> Logical Operations </span>

**Logical operations** in NumPy help us filter and query data based on conditions.

- <font color="magenta">`np.where(condition)`</font>: Returns indices where the condition is True.
- <font color="magenta">`np.any(condition)`</font>: Checks if **any** elements satisfy a condition.
- <font color="magenta">`np.all(condition)`</font>: Checks if **all** elements satisfy a condition.

**Use Cases:**
- Find restaurants with certain attributes (e.g., cost above a threshold).
- Check if at least one restaurant meets a condition (`np.any()`).
- Check if all restaurants meet a certain standard (`np.all()`).
- Checking instructor ratings. If Scalar wants all sessions with ratings > 4.2, they use `np.all(ratings > 4.2)`.

### Code:


In [None]:
costs = np.array([800.0 ,800.0, 800.0, 300.0, 600.0, 600.0, 600.0, 700.0 ,550.0, 500.0])
any_above_3000 = np.any(costs > 3000)
print("Any cost above 3000?", any_above_3000)

Any cost above 3000? False


### Code:


In [None]:
all_below_5000 = np.all(costs < 5000)
print("All cost below 5000?", all_below_5000)

All restaurants cost below 5000? True


## <span style="color: skyblue;"> Conditional Transformation: `np.where` </span>
- This is essentially an "If-Else" condition for arrays.
- **Syntax:** `np.where(condition, value_if_true, value_if_false)`

### Code:


In [None]:
ratings = np.array([4.9, 4.1, 4.8])
# Labeling sessions
labels = np.where(ratings >= 4.2, "Green Flag", "Red Flag")
print(labels)

['Green Flag' 'Red Flag' 'Green Flag']


<span style="background-color: red;"> **[Ask Learners]:** </span>
> <span style="color: violet;"> Question: If I use `np.where(condition)` without the true/false values, what does it return? </span>
> Answer: It returns the **indexes** where the condition is True.

### Code:

In [None]:
high_cost_indices = np.where(costs > 1000)
print("Indices with cost > 1000:", high_cost_indices)

Indices with cost > 1000: (array([], dtype=int64),)


In [None]:
selected_costs = costs[np.where((costs > 500) & (costs < 1000))]
print("Costs between 500 and 1000:", selected_costs)

Costs between 500 and 1000: [800. 800. 800. 600. 600. 600. 700. 550.]


<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Show learners the `help(np.all)` command in Colab to demonstrate how to read documentation for parameters like `axis`.

To summarize,
- `np.where()` gives us flexibility in selecting elements or their indices based on a condition.
- `np.any()` and `np.all()` quickly inform us about the existence or universality of a condition across the dataset.
- These tools are critical for filtering data before applying further analysis or visualization.

With logical operations, we can focus on the subsets of data that matter, speeding up our decision-making and insights discovery.

<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Share the <span style="background-color: Blue;">[summary sheet](https://docs.google.com/spreadsheets/d/1VzddiMRWPJ3VE67UmUHu1vBOCOyXgSwcZJEHNoQW8vk/edit)</span> with learners.


---


# Question
Which statement is correct about `np.any()` and `np.all()`?
# Choices
- [ ] Both return arrays of booleans with the same shape as the input.
- [ ] `np.any()` checks if all elements are True, while `np.all()` checks if at least one element is True.
- [x] Both return a single boolean by default unless axis is specified.
- [ ] Both functions only work on 1D arrays.


Explanation: By default, they reduce the entire array to one boolean. With axis, they can work along rows/columns.

# Sort

## <span style="color: skyblue;"> Sorting a 1D Array </span>

We’ll sort a simple 1D array using:
* `np.sort()`: Returns a sorted copy.
* `np.argsort()`: Returns indices that would sort the array.

Sorting Ascending:


In [None]:
sorted_votes = np.sort(votes)       # Sort the votes
print("Sorted Votes:", sorted_votes[:10])  # Display the first 10 sorted values

Sorted Votes: [  88  166  286  324  402  504  775  787  918 2556]


Sorting Descending:

In [None]:
np.sort(votes)[::-1]

array([2556,  918,  787,  775,  504,  402,  324,  286,  166,   88])

Using index sorting:

In [None]:
sorted_indices = np.argsort(votes)  # Indices that would sort the votes
print("Indices for Sorting:", sorted_indices[:10])

Indices for Sorting: [3 4 5 7 9 8 0 1 2 6]


**Explanation:**
* `np.sort(votes)` creates a sorted copy of the votes array without altering the original.
* `np.argsort(votes)` provides the indices required to sort the array. Useful for sorting related columns.

<span style="background-color: red;"> **[Note to Instructor]:** </span>
> Explain the industry use cases.
> - `np.argsort` is used to get the indices. There are lots of classes and associated probabilities, to print them, we often need their indexes.

## <span style="color: skyblue;"> Sorting a 2D Array </span>

Sorting can be done along rows or columns:
* Use the axis parameter:
    * axis=0: Sort each column.
    * axis=1: Sort each row.


In [None]:
# Example 2D array
array_2d = np.array([[34, 11, 8],
                     [7, 45, 18],
                     [9, 23, 20]])

# Sort along rows (axis=1)
sorted_rows = np.sort(array_2d, axis=1)

# Sort along columns (axis=0)
sorted_columns = np.sort(array_2d, axis=0)

print("Original 2D Array: \n", array_2d)
print("---"* 10)
print("Sorted along Rows: \n", sorted_rows)
print("---"* 10)
print("Sorted along Columns: \n", sorted_columns)

Original 2D Array: 
 [[34 11  8]
 [ 7 45 18]
 [ 9 23 20]]
------------------------------
Sorted along Rows: 
 [[ 8 11 34]
 [ 7 18 45]
 [ 9 20 23]]
------------------------------
Sorted along Columns: 
 [[ 7 11  8]
 [ 9 23 18]
 [34 45 20]]


**Explanation:**
* axis=1 sorts each row independently.
* axis=0 sorts each column independently.


## <span style="color: violet;"> Which restaurants have the top customer ratings, and what are their ratings in ascending order? </span>

In [None]:
# Sorting by ratings
sorted_indices_by_rating = np.argsort(ratings)
sorted_ratings = ratings[sorted_indices_by_rating]

print("Sorted Ratings:\n", sorted_ratings[:10])  # Show top 10 sorted ratings

Sorted Ratings:
 [4.1 4.8 4.9]



**Explanation:**
* np.argsort() provides indices to sort the ratings.
* We can use these indices to sort the corresponding rows or another column (e.g., costs or votes).

### Business Context

In Zomato restaurant analytics, sorting is critical for ranking and prioritizing restaurants based on key metrics like votes, ratings, or costs. Using NumPy’s np.sort() and np.argsort():

- We can rank restaurants by popularity or customer satisfaction.
- Sort numeric attributes independently along rows or columns to compare multiple features efficiently.
- Align related columns (e.g., votes and costs) after sorting to maintain data consistency.
- Sorting enables businesses to identify top-performing restaurants, analyze trends, and make decisions about recommendations, promotions, or premium listings.


# Question
What will be the output?
```
nums = ["10", "2", "1", "20"]
print(sorted(nums))
```

# Choices
- [ ] `['1', '2', '10', '20']`
- [ ] `['10', '1', '2', '20']`
- [x] `['1', '10', '2', '20']`
- [ ] Error

# Matrix multiplication

## <span style="color: skyblue;"> Element-wise Multiplication </span>

Element-wise multiplication multiplies corresponding elements in two arrays.
This uses the <font color="red">*</font> operator in NumPy.

We’ll perform element-wise multiplication on numeric columns like:
- `votes` and `rate`: To calculate a **weighted vote score**.



In [None]:
# This is an array multiplied by an array of the same shape

# Element-wise multiplication
votes = np.array([ 775,  787,  918])
weighted_scores = votes * ratings

print("Weighted scores (sample):\n", weighted_scores[:5])

Weighted scores (sample):
 [3797.5 3226.7 4406.4]


**Explanation:**
- Element-wise multiplication combines ratings and votes to create a weighted metric.
- This can be used for scoring restaurants based on popularity and quality.

In [None]:
# This is an array multiplied by a number

vote_by_5 = votes * 5

print(vote_by_5)

[3875 3935 4590]


In [None]:
# This is an array multiplied by an array of a different shape
# This is expected to fail!

vote_by_array = votes * np.array([1,2,3,4])

print(vote_by_array)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

<span style="color: red;"> **Note:** </span>
* Array * Number  →  WORKS
* Array * Array (same shape)  →  WORKS
* Array * Array (different shape)  →  DOES NOT WORK


## <span style="color: skyblue;"> Matrix Multiplication </span>

We’ll calculate a transformation using:
* Ratings, Votes, and Approximate Costs.
* Methods:
    * np.dot()
    * np.matmul()
    * @ operator.



In [None]:
# numeric_data
numeric_data = np.arange(11,20).reshape(3,3)

# Create a random transformation matrix
transformation_matrix = np.array([[1.2, 0.8, 0.5],
                                   [0.5, 1.5, 1.0],
                                   [0.7, 0.6, 1.8]])

# Matrix multiplication using np.dot()
transformed_data_dot = np.dot(numeric_data, transformation_matrix)

# Matrix multiplication using @ operator
transformed_data_at = numeric_data @ transformation_matrix

# Matrix multiplication using np.matmul()
transformed_data_matmul = np.matmul(numeric_data, transformation_matrix)

print("Transformed Data (np.dot):\n", transformed_data_dot)
print("---"* 10)
print("Transformed Data (@ operator):\n", transformed_data_at)
print("---"* 10)
print("Transformed Data (np.matmul):\n", transformed_data_matmul)
print("---"* 10)

Transformed Data (np.dot):
 [[28.3 34.6 40.9]
 [35.5 43.3 50.8]
 [42.7 52.  60.7]]
------------------------------
Transformed Data (@ operator):
 [[28.3 34.6 40.9]
 [35.5 43.3 50.8]
 [42.7 52.  60.7]]
------------------------------
Transformed Data (np.matmul):
 [[28.3 34.6 40.9]
 [35.5 43.3 50.8]
 [42.7 52.  60.7]]
------------------------------



**Explanation:**
- A transformation matrix allows us to apply scaling and weighting to the original data.
- np.dot(), @, and np.matmul() produce the same results for matrix multiplication.

**Rule:** The Number of columns of the first matrix should be equal to the number of rows of the second matrix.

- (A, B) * (B, C) -> (A, C)
- (3,4) * (4,3) -> (3,3)

<span style="background-color: Blue;">[Visual Demo:](https://www.geogebra.org/m/ETHXK756)</span>

In [None]:
# This is expected to fail!
a = np.array([1,2,3,4])
a@5

ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)

In [None]:
# This is expected to fail!
np.matmul(a, 5)

ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)

In [None]:
np.dot(a, 5)

array([ 5, 10, 15, 20])

<span style="color: red;"> **Important:** </span>

- `dot()` function supports the vector multiplication with a scalar value, which is not possible with `matmul()`.
- `Vector * Vector` will work for `matmul()` but `Vector * Scalar` won't.

### Business Context

Matrix operations in Zomato analytics allow efficient computation of weighted scores and composite metrics by combining votes, ratings, and costs. This helps businesses evaluate restaurant popularity and quality, create recommendation systems, prioritize listings, and make data-driven decisions for promotions and strategic initiatives.



# Question
What will be the result of matrix multiplication?

```
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 2]])
print(np.dot(A, B))
```

# Choices
- [x] `[[4, 4], [10, 8]]`
- [ ] `[[2, 2], [3, 4]]`
- [ ] `[[1, 2], [3, 4]]`
- [ ] Error