# Assignment 2 - Numpy Array Operations


## NUMPY

- NumPy is a Python library used for working with arrays.
- It also has functions for working in domain of linear algebra, fourier transform, and matrices.
- NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.
- NumPy stands for Numerical Python.
- In Python we have lists that serve the purpose of arrays, but they are slow to process.
- NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
- The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.
- Arrays are very frequently used in data science, where speed and resources are very important.


Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [1]:
import numpy as np

*List of functions explained*
- function1 = np.array_split  
- function2 = np.sort
- function3 = np.where
- function4 = np.reshape
- function5 = np.squeeze_

## Function 1 - np.array_split
```numpy.array_split``` is a function in the NumPy library for Python that allows you to split a NumPy array into multiple sub-arrays along a specified axis. It takes three main arguments: the array to be split, the number of equally-sized sub-arrays you want to create, and the axis along which you want to split the array. It returns a list of sub-arrays.

In [2]:
# Example 1 - working
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

sub_arrays = np.array_split(arr, 3)

for sub_arr in sub_arrays:
    print(sub_arr)

[1 2 3]
[4 5 6]
[7 8 9]


In the above example, we create a 1D NumPy array with 7 elements and attempt to split the array into 4 equally-sized sub-arrays and print them.

In [3]:
# Example 2 - working
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

sub_arrays = np.array_split(arr, 2, axis=0)

for sub_arr in sub_arrays:
    print(sub_arr)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In the above example, we create a 2D numpy array and split the array into 2 equally-sized sub-arrays along axis 0 and print them.

In [4]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.array([1, 2, 3, 4, 5, 6, 7])

sub_arrays = np.array_split(arr, 4)

for sub_arr in sub_arrays:
    print(sub_arr)

[1 2]
[3 4]
[5 6]
[7]


```np.array_split``` can break when the specified number of sub-arrays is not compatible with the size of the original array along the specified axis. In such cases, you may end up with sub-arrays of different sizes. The example where np.array_split breaks is given above.

You can use np.array_split in various situations when you need to split a NumPy array into multiple sub-arrays along a specified axis. Here are some common scenarios where np.array_split can be useful:

* Data Preprocessing: When you have a large dataset stored in a NumPy array, you might want to split it into smaller, more manageable chunks for processing. For example, splitting a large dataset into batches for training a machine learning model.

* Parallel Processing: When you're performing parallel processing or distributing work among multiple cores or processors, you can use np.array_split to divide your data into segments that can be processed independently by different workers.

* Array Manipulation: If you need to perform different operations on separate parts of a NumPy array, you can split it into sub-arrays to apply those operations individually.

* Visualization: In some data visualization scenarios, you may want to split your data into smaller segments to create multiple plots or visualizations based on different subsets of the data.

* Handling Uneven Data: When working with data that doesn't divide evenly into the desired number of sub-arrays, np.array_split allows you to handle this situation by distributing the remaining elements as evenly as possible.

## Function 2 - np.sort

```np.sort``` is a NumPy function that is used to sort the elements of a NumPy array along a specified axis. It returns a new array containing the sorted elements without modifying the original array.

In [5]:
# Example 1 - working
arr = np.array([3, 1, 5, 2, 4])

sorted_arr = np.sort(arr)

print(sorted_arr)

[1 2 3 4 5]


In this example, we have a 1D array arr, and we use np.sort to sort its elements in ascending order. The result is a new array sorted_arr with the elements sorted.

In [6]:
# Example 2 - working
arr = np.array([[3, 1, 5],
                [2, 4, 6]])

sorted_arr = np.sort(arr, axis=1)

print(sorted_arr)

[[1 3 5]
 [2 4 6]]


In this example, we have a 2D array arr, and we use np.sort with axis=1 to sort the elements along each row in ascending order. The result is a new array sorted_arr where each row is sorted independently.

In [7]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.array([3, 1, 'apple', 2, 'banana'])

sorted_arr = np.sort(arr)

print(sorted_arr)

['1' '2' '3' 'apple' 'banana']


In this example, we have an array that contains both integers and strings. When we attempt to sort it, NumPy treats all elements as strings for comparison, resulting in a lexicographical (string-based) sorting order. This may not be the desired behavior if you intended to sort the integers numerically.

```np.sort``` doesn't "break" in the sense of producing an error or crashing, but there are situations where the behavior of np.sort might not produce the desired result or might not be straightforward. One such scenario is when you have structured or mixed data types within the array. NumPy's np.sort is designed primarily for homogeneous arrays, and when dealing with mixed or structured data, it may not work as expected. Example is given above.

You should use the np.sort function in NumPy when you need to perform sorting operations on NumPy arrays. Sorting is a fundamental operation in data manipulation and analysis, and np.sort provides a convenient way to achieve this. Here are common scenarios where you would use np.sort:

* Data Analysis and Statistics: Sorting is often a preliminary step in various data analysis tasks. You might want to arrange data in ascending or descending order to identify outliers, calculate percentiles, or perform other statistical analyses.

* Data Visualization: When creating plots or visualizations, you may need data to be sorted to create meaningful representations. Sorting can help you create sorted bar charts, histograms, or other visualizations.

* Searching and Indexing: Sorted arrays allow for efficient searching and indexing operations. You can use techniques like binary search when you know your data is sorted.

* Merging and Combining Data: When combining or merging datasets, sorting can be helpful. For example, you might merge two sorted arrays efficiently.

* Ranking: In ranking tasks, you might need to assign a rank to each element based on its value. Sorting is often the first step in these tasks.

## Function 3 - np.where

```np.where``` is a NumPy function that is used to return the indices of elements in an array that satisfy a given condition. It can be used to locate specific elements in an array based on a condition and return their indices.

In [8]:
# Example 1 - working
arr = np.array([1, 5, 2, 7, 3, 8, 4])

indices = np.where(arr > 4)

print(indices)

(array([1, 3, 5], dtype=int64),)


In this example, we have a 1D NumPy array arr, and we use np.where to find the indices where the elements are greater than 4. The function returns a tuple containing the indices that satisfy the condition.

In [9]:
# Example 2 - working
arr = np.array([1, 5, 2, 7, 3, 8, 4])

arr[np.where(arr > 4)] = 0

print(arr)

[1 0 2 0 3 0 4]


In this example, we use np.where to find the indices where elements in the array are greater than 4. Then, we use these indices to replace those elements with 0. This allows us to selectively modify elements in the array based on a condition.

In [10]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.array([0.1, 0.2, 0.3, 0.4, 0.5])

index = np.where(arr == 0.3)

print(index)

(array([2], dtype=int64),)



``np.where`` is a versatile function that works well in a wide range of scenarios, but there are situations where it may not give the expected results or can be considered "broken" in the sense that it might not provide the desired outcome. One common scenario where np.where might not work as expected is when dealing with floating-point numbers and issues related to precision and equality checks. An example is given above.

You should use the np.where function in NumPy when you need to locate specific elements in a NumPy array based on a certain condition and retrieve their indices. np.where is a versatile tool that is particularly useful in various data manipulation, analysis, and filtering tasks. Here are some common scenarios where you would use np.where:

* Filtering Data: You can use np.where to filter elements in an array based on a condition. For example, you can extract all values greater than a certain threshold or within a specific range.

* Finding Indices: When you need to know the indices of elements that meet a particular condition, np.where provides these indices efficiently.

* Replacing Values: You can use np.where to selectively replace values in an array based on a condition. This is useful for data cleaning or data transformation tasks.

* Boolean Masking: Creating a Boolean mask for indexing arrays. It helps you select elements from one array based on the condition defined in another array.

* Conditional Operations: When you want to perform different operations on elements of an array based on a condition, np.where can help you choose which operation to apply.

## Function 4 - np.reshape

```np.reshape``` is a NumPy function that allows you to change the shape or dimensions of a NumPy array without modifying its data. It's a powerful tool for rearranging data within an array to meet specific requirements.

In [11]:
# Example 1 - working
arr = np.arange(12)

reshaped_arr = np.reshape(arr, (3, 4))

print(reshaped_arr)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In this example, we start with a 1D array arr containing 12 elements. Using np.reshape, we transform it into a 3x4 2D array. The reshaped array has three rows and four columns.

In [12]:
# Example 2 - working
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

reshaped_arr = np.reshape(arr, (6,))

print(reshaped_arr)

[1 2 3 4 5 6]


In this example, we start with a 2D array arr, and we use np.reshape to flatten it into a 1D array. The reshaped array has a single dimension with all the elements from the original array.

In [13]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.arange(10)

reshaped_arr = np.reshape(arr, (3, 4))

print(reshaped_arr)

ValueError: cannot reshape array of size 10 into shape (3,4)

When you run this code, you'll get a ValueError because the original array arr has 10 elements, but you're trying to reshape it into a 3x4 2D array, which requires 12 elements. Since the shapes are incompatible, np.reshape will raise an error:

To avoid this issue, make sure that the new shape you provide to np.reshape is compatible with the number of elements in the original array. If the shapes are not compatible, you may need to reconsider the reshaping strategy or modify the data accordingly before reshaping.

```np.reshape``` in NumPy is generally robust and works as expected when you're trying to reshape an array into a compatible shape (i.e., a shape that has the same number of elements). However, it can "break" or produce unexpected results when you try to reshape an array into a shape that is not compatible with the original array's size. An example is given above.

You should use the np.reshape function in NumPy when you need to change the shape or dimensions of a NumPy array to meet specific requirements or to facilitate various data manipulation tasks. Here are common scenarios where you would use np.reshape:

* Data Preparation for Machine Learning: When preparing data for machine learning models, you often need to reshape input data to match the expected input shape of the model. For example, converting a 1D array of image pixels into a 2D or 3D array suitable for image processing tasks.

* Image Processing: When working with images, you may need to reshape or reformat the image data to match the input requirements of image processing libraries or algorithms.

* Flattening and Unflattening: Converting a multi-dimensional array into a flat 1D array (flattening) or restoring a flat array to its original shape (unflattening). This is useful when storing or transmitting data.

* Reshaping for Visualization: Rearranging data for visualization purposes, such as displaying multi-dimensional data as an image or heatmap.

* Reorganizing Data: When dealing with data analysis and manipulation, you might need to change the structure of data to fit a particular data model or analysis approach.

## Function 5 - np.squeeze_

```np.squeeze_``` is a NumPy function that removes dimensions from an array with size 1, effectively "squeezing" those dimensions out. It is a convenient tool when you want to eliminate unnecessary singleton dimensions from your array, making it more manageable and suitable for various operations.

In [14]:
# Example 1 - working
arr = np.array([[1], [2], [3]])

squeezed_arr = np.squeeze(arr)

print(squeezed_arr)

[1 2 3]


In this example, we start with a 2D array arr with a singleton dimension along the second axis. When we use np.squeeze, it removes that singleton dimension, resulting in a 1D array squeezed_arr.

In [15]:
# Example 2 - working
arr = np.array([[[1]], [[2]], [[3]]])

squeezed_arr = np.squeeze(arr)

print(squeezed_arr)

[1 2 3]


In this example, we have a 3D array arr with singleton dimensions along the second and third axes. When we use np.squeeze, it removes both of these singleton dimensions, resulting in a 1D array squeezed_arr.

In [16]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.array([1, 2, 3, 4, 5])

squeezed_arr = np.squeeze(arr)

print(squeezed_arr)

[1 2 3 4 5]


```np.squeeze_```is a fairly straightforward and robust function in NumPy, and it generally doesn't "break" in the sense of producing errors or unexpected results. It's designed to safely remove singleton dimensions (dimensions with size 1) from a NumPy array. However, there are cases where using np.squeeze might not have any effect because there are no singleton dimensions to remove. 

```np.squeeze_``` is a NumPy function that performs in-place squeezing of singleton dimensions in a NumPy array. It's essentially the same as np.squeeze but modifies the original array rather than returning a new array. You should use np.squeeze_ when you want to remove singleton dimensions from your array in-place, and you are confident that this modification will not adversely affect your subsequent computations or data processing. Here are some scenarios when you might consider using np.squeeze_:

* Data Preparation for Specific Libraries: Some libraries or functions may require data to be in a specific shape without singleton dimensions. If you are confident that the squeezed array will be used exclusively in this context, you can use np.squeeze_ to modify the array in-place to meet these requirements.

* Memory Efficiency: When working with large datasets, creating a new squeezed array using np.squeeze may lead to increased memory usage. If memory efficiency is crucial, using np.squeeze_ to modify the existing array in-place can be beneficial.

* Performance Optimization: In some cases, modifying the array in-place can lead to improved performance, especially when dealing with very large arrays, because it avoids unnecessary memory allocation.

* Data Cleanup: If you are sure that singleton dimensions in your array are extraneous and won't be needed for future operations, you can use np.squeeze_ to clean up the array and make it more manageable.

## Conclusion

In this notebook, we saw five numpy array operations with few examples.

## Reference Links
Provide links to your references and other interesting articles about Numpy arrays:
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
* W3schools: https://www.w3schools.com/python/numpy/default.asp

In [22]:
import jovian
jovian.commit(filename='numpy-array-operations')

<IPython.core.display.Javascript object>

[jovian] Committed successfully! https://jovian.com/mehwish67/numpy-array-operations[0m


'https://jovian.com/mehwish67/numpy-array-operations'