# **`Data Science Learners Hub`**

**Module : Python**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

### **`#2: Operations on NumPy Arrays`**

1. **Element-wise Operations:**
   - Performing basic arithmetic operations on arrays.
   - Universal functions (ufuncs) in NumPy.

2. **Aggregation and Statistics:**
   - Sum, mean, median, variance, and standard deviation.
   - Min, max, and other aggregation functions.

3. **Array Comparison and Boolean Operations:**
   - Comparing arrays element-wise.
   - Boolean indexing and masking.

4. **Array Manipulation:**
   - Reshaping arrays.
   - Concatenation and splitting arrays.

#### **`4. Array Manipulation in NumPy:`**

**1. Reshaping Arrays:**

- **Scenario:** Image Processing
- **Application:** Reshaping arrays is crucial in image processing, where images are represented as multi-dimensional arrays. Reshaping can be used to change the dimensions of an image array.

In [9]:
import numpy as np

image = np.array([[1, 2, 3], [4, 5, 6]])
reshaped_image = np.reshape(image, (3, 2))
print(reshaped_image)

[[1 2]
 [3 4]
 [5 6]]


**2. Concatenation:**

- **Scenario:** Combining Datasets
- **Application:** Concatenation is useful when combining datasets, especially when datasets are collected separately but need to be analyzed together.

In [12]:
import numpy as np

data1 = np.array([1, 2, 3])
data2 = np.array([3, 5, 6])
concatenated_data = np.concatenate((data1, data2))
print(concatenated_data)

[1 2 3 3 5 6]


![DSLH-Axis0andAxis1.jpeg](attachment:DSLH-Axis0andAxis1.jpeg)

In [2]:
# axis = 0 is also called as first axis and it represents rows and operations are performed on rows
# axis = 1 is also called as second axis and it represents columns and operations are performed on columns
# What is the output of this ?
import numpy as np
a1 = np.array([[1,2],[3,4]])
b1 = np.array([[5,6],[7,8]])

print("np.concatenate((a1,b1)) :\n",np.concatenate((a1,b1)))
print("np.concatenate((a1,b1), axis=1) :\n",np.concatenate((a1,b1), axis=1)) # axis =1 means operations are performed horizontally
print("np.concatenate((a1,b1), axis=0) :\n",np.concatenate((a1,b1), axis=0)) # axis =0 means operations are performed vertically
print("np.hstack((a1,b1)) :\n",np.hstack((a1,b1)))
print("np.vstack((a1,b1)) :\n",np.vstack((a1,b1)))
print("np.column_stack((a,b))\n",np.column_stack((a1,b1)))





np.concatenate((a1,b1)) :
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
np.concatenate((a1,b1), axis=1) :
 [[1 2 5 6]
 [3 4 7 8]]
np.concatenate((a1,b1), axis=0) :
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
np.hstack((a1,b1)) :
 [[1 2 5 6]
 [3 4 7 8]]
np.vstack((a1,b1)) :
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
np.column_stack((a,b))
 [[1 2 5 6]
 [3 4 7 8]]


#### Observations:

- `concatenate()` by default behaves as `vstack()` and `axis=0` (operation vertically)
- `column_stack()` behvaes like `hstack()` and concatenate() with `axis=1` parameter

#### What is the differnece between hstack() and concatenate()

1. `numpy.hstack()`:
    - `numpy.hstack()` is part of the NumPy library, which is widely used for numerical and scientific computing in Python.
    - It is specifically used to horizontally stack (concatenate along the second axis) arrays or sequences of the same shape.
    - For example, if you have two 1D arrays of the same length, you can use `numpy.hstack()` to concatenate them into a single 1D array. Similarly, if you have two 2D arrays with the same number of rows, you can use `hstack()` to concatenate them along the columns.
    - Here's an example using `numpy.hstack()`:
    
    ```python
    import numpy as np
    
    arr1 = np.array([1, 2, 3])
    arr2 = np.array([4, 5, 6])
    result = np.hstack((arr1, arr2))
    
    ```
    
2. `numpy.concatenate()`:
    - `numpy.concatenate()` is also part of the NumPy library, and it is a more general-purpose function for concatenating arrays along any specified axis.
    - You can use `numpy.concatenate()` to combine arrays along any axis (not just the second axis as in `hstack()`).
    - You can specify the axis along which you want to concatenate the arrays as an argument.
    - Here's an example using `numpy.concatenate()`:
    
    ```python
    import numpy as np
    
    arr1 = np.array([1, 2, 3])
    arr2 = np.array([4, 5, 6])
    result = np.concatenate((arr1, arr2), axis=0)  # Concatenate along the first axis (0)
    
    ```
    

In summary, `numpy.hstack()` is a specialized function for horizontally stacking arrays, while `numpy.concatenate()` is a more general function that can concatenate arrays along any axis. The choice of which one to use depends on your specific requirements and the shape of the arrays you are working with.

**3. Splitting:**

- **Scenario:** Separating Datasets
- **Application:** Splitting is handy when you have a single dataset that needs to be divided into multiple parts for separate analysis or processing.

In [13]:
import numpy as np

original_data = np.array([1, 2, 3, 4, 5, 6])
split_data = np.split(original_data, 2)
print(split_data)

[array([1, 2, 3]), array([4, 5, 6])]


**Real-world Examples:**

1. **Image Processing: Reshaping Arrays**

   - **Scenario:** Color channels in an image represented as separate arrays.
   - **Application:** Reshape the color channels into a single array for further image processing.

In [14]:
    import numpy as np

    red_channel = np.array([[255, 0], [0, 255]])
    green_channel = np.array([[0, 255], [255, 0]])
    blue_channel = np.array([[0, 0], [255, 255]])

    # Combine color channels into a single image array
    image = np.stack((red_channel, green_channel, blue_channel), axis=-1)
    reshaped_image = np.reshape(image, (2, 6))
    print(reshaped_image)

[[255   0   0   0 255   0]
 [  0 255 255 255   0 255]]


#### Explanation:

1. Combine color channels into a single image array using `np.stack()`:

    ```python
    image = np.stack((red_channel, green_channel, blue_channel), axis=-1)
    ```

    - `np.stack()`: Stacks the three color channels along the last axis (`axis=-1`), creating a 2x2x3 array. Each element in the array represents a pixel with three values corresponding to the intensities of red, green, and blue.

2. Reshape the image array using `np.reshape()`:

    ```python
    reshaped_image = np.reshape(image, (2, 6))
    ```

    - `np.reshape()`: Reshapes the image array into a 2x6 array. This means that the pixels are rearranged into rows of length 6.

3. **More about axis values**
    - `axis=0` : for a 2D array,  refers to operations along the rows
    - `axis=1` : for a 2D array,  refers to operations along the columns
    - `axis=None` : In some functions, setting axis to None (the default) means that the operation is performed on the flattened array.
    - `axis=-1` : Negative integers can be used to specify counting from the end. For example, axis=-1 refers to the last axis.


2. **Combining Financial Datasets: Concatenation**

   - **Scenario:** Quarterly financial reports stored in separate arrays.
   - **Application:** Concatenate quarterly reports into a single dataset for annual financial analysis.

In [15]:
import numpy as np

q1_data = np.array([100, 150, 120])
q2_data = np.array([130, 140, 110])

# Combine quarterly data into a single dataset
annual_data = np.concatenate((q1_data, q2_data))
print(annual_data)

[100 150 120 130 140 110]


In [4]:
import numpy as np

q1_data = np.array([[100, 150, 120],[20,30,40]])
q2_data = np.array([[130, 140, 110],[30,50,50]])

# Combine quarterly data into a single dataset
annual_data = np.concatenate((q1_data, q2_data))
print(annual_data)

[[100 150 120]
 [ 20  30  40]
 [130 140 110]
 [ 30  50  50]]


#### Observation from above two code 
- In the first case the arrays are 1D so when we concatenate it happens horizontally only and not vertically as there is no options
- In the second case the arrays are 2D so by default concatenate performs operations along axis 0 i.e it stacks both arrays vertically like vstack.

3. **Splitting Time Series Data: Splitting**

   - **Scenario:** Time series data collected over a year.
   - **Application:** Split the time series into monthly data for individual analysis.

In [16]:
import numpy as np

yearly_data = np.array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65])

# Split yearly data into monthly data
monthly_data = np.split(yearly_data, 12)
print(monthly_data)

[array([10]), array([15]), array([20]), array([25]), array([30]), array([35]), array([40]), array([45]), array([50]), array([55]), array([60]), array([65])]


**Key Takeaway:**

Array manipulation operations in NumPy, such as reshaping, concatenation, and splitting, are essential tools for organizing and preparing data for various applications. These operations find practical use in scenarios like image processing, financial analysis, and time series data management, allowing users to efficiently handle complex data structures.