# Module 6 Assignment: Data Toolkit, Numpy

## Theory Part

#### 1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it enhance Python's capabilities for numerical operations?

NumPy is like the backbone of scientific computing in Python. It's built specifically for working with arrays and numbers, making it much more efficient for these tasks than Python’s built-in features. If you're dealing with a lot of data or complex mathematical operations, NumPy is the go-to library.

Why NumPy is Important:
Array Handling: At its core, NumPy is all about working with arrays. Arrays are like Python lists, but way more powerful and faster. You can create 1D, 2D, or even higher-dimensional arrays (think grids of numbers), and perform operations on them all at once, instead of element by element.

Speed and Efficiency:
NumPy is fast—much faster than using lists, for example. That's because it’s written in C, so it runs at a lower level, closer to the machine. Plus, it uses memory more efficiently, so you can work with bigger datasets without hogging your system’s resources.

Math Powerhouse:
It has built-in functions for pretty much every mathematical operation you can think of—whether it’s basic stuff like adding and subtracting arrays, or more advanced things like linear algebra, random number generation, and Fourier transforms. NumPy lets you do these calculations on entire datasets at once, without writing a bunch of loops.

The Advantages of Using NumPy:
Performance Boost: One of the biggest perks is vectorization. Normally, in Python, you’d need loops to work on each element of a list. But with NumPy, you can perform operations on entire arrays in one shot. This is not only more efficient but way faster.

Memory Efficiency:
Python lists are flexible, but that flexibility comes with overhead. NumPy arrays store data more compactly, so they use less memory, especially when working with large datasets.

Broadcasting:
This is a fancy way of saying that NumPy knows how to automatically match the shapes of arrays when you’re performing operations. If one array is smaller, it stretches it out behind the scenes to match the bigger array, so you don’t have to worry about dimensions.

Works Well with Other Libraries:
NumPy is the foundation for a lot of other Python libraries like Pandas (for data analysis) or SciPy (for more complex scientific calculations). When you use these libraries, you’re usually dealing with NumPy arrays under the hood.

Large Datasets:
If you’re working with huge datasets (think millions of data points), Python’s lists might slow you down or eat up too much memory. NumPy can handle that kind of workload with ease.

How NumPy Enhances Python:
Python by itself isn't built for heavy numerical calculations. Lists are great for general use but slow when you start doing lots of math with them. NumPy steps in and makes things faster by turning Python into a high-performance tool for working with numbers. Plus, it lets you do things like matrix operations or calculus that would otherwise be much harder and slower in plain Python.

#### 2. Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the other?

Similarities:
Basic Function: Both functions compute the arithmetic mean (the sum of the elements divided by the number of elements) for the data provided.
Data Handling: Both can work with arrays, lists, or other iterable data structures in NumPy.
Differences:
Weighted Average:

np.mean(): It calculates the simple (unweighted) mean by default. It doesn’t support weighting.
np.average(): It can compute a weighted average if you provide a weights parameter. A weighted average takes into account the importance or frequency of some elements over others.
Example of weighted average using np.average():

python
Copy code
data = np.array([1, 2, 3, 4])
weights = np.array([1, 2, 3, 4])
weighted_avg = np.average(data, weights=weights)
Return Type:

np.mean(): Always returns a float, regardless of the input data type.
np.average(): The return type will depend on the input array. If the input is of an integer type and no weights are provided, it can return an integer.
Axis Support:

Both functions support the axis argument, which lets you calculate the mean along a specific axis for multi-dimensional arrays. However, this functionality is more commonly used with np.mean().
Weights Handling:

np.mean(): Doesn’t accept a weights argument. It’s strictly for calculating the simple mean.
np.average(): If no weights are provided, it behaves exactly like np.mean().
When to Use One Over the Other:
Use np.mean() when:

You just need the simple arithmetic mean of the data.
You don’t have weights or don’t need to worry about weighting elements differently.
You prefer clarity or simplicity, as np.mean() is more straightforward and commonly used for the basic task of finding the mean.
Use np.average() when:

You need to calculate a weighted average, where certain elements in the data should contribute more (or less) to the final result.
You want to retain control over the type of averaging, since np.average() gives more flexibility with weights.

In [4]:
import numpy as np

data = np.array([1, 2, 3, 4, 5])

# Using np.mean() (simple mean)
mean = np.mean(data)
print(mean)  # Output: 3.0

# Using np.average() (simple average)
average = np.average(data)
print(average)  # Output: 3.0

# Using np.average() with weights
weights = np.array([1, 2, 3, 4, 5])
weighted_avg = np.average(data, weights=weights)
print(weighted_avg)  # Output: 3.666...


3.0
3.0
3.6666666666666665


#### 3. Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D arrays.

Methods for Reversing a NumPy Array:
Slicing ([::-1]):

Simple, commonly used for reversing arrays.
Can reverse along specific axes in multi-dimensional arrays.
np.flip():

More flexible, allowing reversal along a chosen axis.
Works for 1D, 2D, and higher-dimensional arrays.


In [7]:
import numpy as np

# 1D array example
arr_1d = np.array([1, 2, 3, 4, 5])

# Reverse using slicing
reversed_arr_1d = arr_1d[::-1]
print("Reversed 1D array using slicing:", reversed_arr_1d)
# Output: [5 4 3 2 1]

# Reverse using np.flip
flipped_arr_1d = np.flip(arr_1d)
print("Reversed 1D array using np.flip:", flipped_arr_1d)
# Output: [5 4 3 2 1]

# 2D array example
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Reverse rows using slicing
reversed_rows = arr_2d[::-1, :]
print("Reversed 2D array rows using slicing:\n", reversed_rows)
# Output:
# [[7 8 9]
#  [4 5 6]
#  [1 2 3]]

# Reverse columns using np.flip
flipped_cols = np.flip(arr_2d, axis=1)
print("Reversed 2D array columns using np.flip:\n", flipped_cols)
# Output:
# [[3 2 1]
#  [6 5 4]
#  [9 8 7]]


Reversed 1D array using slicing: [5 4 3 2 1]
Reversed 1D array using np.flip: [5 4 3 2 1]
Reversed 2D array rows using slicing:
 [[7 8 9]
 [4 5 6]
 [1 2 3]]
Reversed 2D array columns using np.flip:
 [[3 2 1]
 [6 5 4]
 [9 8 7]]


#### 4. How can you determine the data type of elements in a NumPy array? Discuss the importance of data types in memory management and performance.

Determining the Data Type of Elements in a NumPy Array:
To find the data type of the elements in a NumPy array, you can use the .dtype attribute. Knowing the data type is important for memory management and performance, especially when working with large datasets.

Importance of Data Types:
Memory Management:

NumPy arrays store elements in a contiguous block of memory using fixed data types (e.g., int32, float64). This reduces memory overhead compared to Python lists.
Choosing the right data type minimizes memory consumption. For instance, using int8 instead of int64 can save significant memory when you know your data won't exceed certain values.
Performance:

Fixed data types enable faster operations because NumPy can optimize calculations, knowing the exact size of each element.
Choosing a smaller or simpler data type (e.g., float32 instead of float64) can improve performance by reducing computational load.

In [11]:
import numpy as np

# Creating an array
arr = np.array([1, 2, 3])

# Determine the data type
print(arr.dtype)  # Output: int64 (or system-specific integer type)

# Example with a float array
arr_float = np.array([1.1, 2.2, 3.3])
print(arr_float.dtype)  # Output: float64

int32
float64


#### 5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?

Definition of ndarrays in NumPy
ndarray, short for "n-dimensional array," is the core data structure in NumPy. It is a powerful object that allows you to store and manipulate large datasets in a multidimensional format efficiently.

Key Features of ndarrays:
Homogeneous Data:

All elements in a NumPy array must be of the same data type, ensuring consistency and optimizing performance.
Multidimensional:

NumPy arrays can be 1D (vectors), 2D (matrices), or higher-dimensional, allowing for complex data representation.
Efficient Memory Usage:

ndarray uses contiguous memory allocation, leading to better memory management and faster access compared to Python lists.
Vectorized Operations:

NumPy allows for element-wise operations on arrays without the need for explicit loops, leading to cleaner code and improved performance.
Broadcasting:

NumPy supports broadcasting, enabling operations on arrays of different shapes in a flexible manner, enhancing the capability to perform mathematical operations on arrays without needing to reshape them.
Extensive Functionality:

NumPy provides a vast collection of mathematical functions and tools to manipulate arrays, including linear algebra, Fourier transforms, and statistical operations.
Differences from Standard Python Lists:
Data Type Homogeneity:

ndarrays: All elements are of the same type.
Python Lists: Can store mixed data types (e.g., integers, strings, objects).
Performance:

ndarrays: Optimized for performance; faster for mathematical operations and array manipulations.
Python Lists: Slower for numerical computations since they are not optimized for such tasks.
Memory Efficiency:

ndarrays: More memory-efficient due to contiguous storage.
Python Lists: Store references to objects, leading to higher memory overhead.
Functionality:

ndarrays: Equipped with many built-in mathematical and statistical functions.
Python Lists: Limited built-in functions for mathematical operations; typically requires loops for such tasks.
Multidimensionality:

ndarrays: Can easily represent multi-dimensional data.
Python Lists: Can represent multi-dimensional data using nested lists, but operations become complex and less efficient.

#### 6. Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations.

Performance Benefits of NumPy Arrays Over Python Lists for Large-Scale Numerical Operations
Memory Efficiency:

Contiguous Memory Allocation: NumPy arrays are stored in contiguous blocks of memory, which allows for better cache performance and minimizes memory fragmentation. This is in contrast to Python lists, which store references to objects, leading to additional overhead and inefficient memory usage.
Fixed Data Types: NumPy arrays enforce a single data type for all elements, allowing for compact storage and reduced memory consumption compared to Python lists, which can contain mixed types.
Faster Operations:

Vectorization: NumPy allows for vectorized operations, meaning that operations can be applied to entire arrays without explicit loops. This leads to concise and efficient code. For example, adding two arrays element-wise is performed with a single operation in NumPy, while Python lists require looping through each element.
Optimized Libraries: NumPy uses highly optimized libraries (such as BLAS and LAPACK) for mathematical computations, which are often implemented in lower-level languages (like C or Fortran). These libraries significantly speed up operations compared to Python's built-in functions, which may not be optimized for performance.
Broadcasting:

Flexible Operations: NumPy's broadcasting capability allows for operations on arrays of different shapes without the need to manually reshape them. This feature leads to less code and improved performance as it minimizes the overhead of creating temporary arrays.
Reduced Function Call Overhead:

NumPy performs operations in bulk, reducing the number of Python function calls required. In contrast, operations on Python lists typically involve multiple function calls and loops, which can add significant overhead in large-scale computations.
Parallel Processing:

Some NumPy operations can be parallelized, leveraging multiple CPU cores, whereas Python lists do not inherently support this type of operation. Libraries that build on NumPy, like Dask, can further enhance performance by distributing computations across multiple processors or machines.
Example Comparison
Here’s a brief comparison to illustrate the performance benefits:

In [13]:
import numpy as np
import time

# Create a large NumPy array
arr_np = np.random.rand(1_000_000)

# Measure time for vectorized operation
start_time = time.time()
arr_np_squared = arr_np ** 2  # Element-wise squaring
end_time = time.time()
print("NumPy operation time:", end_time - start_time)

# Create a large Python list
arr_list = list(np.random.rand(1_000_000))

# Measure time for list operation
start_time = time.time()
arr_list_squared = [x ** 2 for x in arr_list]  # Element-wise squaring with list comprehension
end_time = time.time()
print("Python list operation time:", end_time - start_time)

NumPy operation time: 0.002033233642578125
Python list operation time: 0.12540602684020996


#### 7. Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and output.

Comparison of vstack() and hstack() Functions in NumPy
vstack(): This function stacks arrays in sequence vertically (row-wise). It takes a sequence of arrays and stacks them along the first axis (rows).

hstack(): This function stacks arrays in sequence horizontally (column-wise). It takes a sequence of arrays and stacks them along the second axis (columns).

Key Differences:
Direction:

vstack(): Stacks arrays vertically, adding new rows.
hstack(): Stacks arrays horizontally, adding new columns.
Input Requirements:

Both functions require that the dimensions of the input arrays match along the axes that are not being stacked.

In [14]:
import numpy as np

# Create two 1D arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Use vstack to stack them vertically
vstack_result = np.vstack((arr1, arr2))
print("vstack result:")
print(vstack_result)


vstack result:
[[1 2 3]
 [4 5 6]]


In [15]:
import numpy as np

# Create two 2D arrays
arr3 = np.array([[1, 2, 3],
                 [4, 5, 6]])

arr4 = np.array([[7],
                 [8]])

# Use hstack to stack them horizontally
hstack_result = np.hstack((arr3, arr4))
print("hstack result:")
print(hstack_result)


hstack result:
[[1 2 3 7]
 [4 5 6 8]]


#### 8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various array dimensions.


Comparison of vstack() and hstack() Functions in NumPy
vstack(): This function stacks arrays in sequence vertically (row-wise). It takes a sequence of arrays and stacks them along the first axis (rows).

hstack(): This function stacks arrays in sequence horizontally (column-wise). It takes a sequence of arrays and stacks them along the second axis (columns).

Key Differences:
Direction:

vstack(): Stacks arrays vertically, adding new rows.
hstack(): Stacks arrays horizontally, adding new columns.
Input Requirements:

Both functions require that the dimensions of the input arrays match along the axes that are not being stacked.

In [16]:
import numpy as np

# Create two 1D arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Use vstack to stack them vertically
vstack_result = np.vstack((arr1, arr2))
print("vstack result:")
print(vstack_result)


vstack result:
[[1 2 3]
 [4 5 6]]


In [17]:
import numpy as np

# Create two 2D arrays
arr3 = np.array([[1, 2, 3],
                 [4, 5, 6]])

arr4 = np.array([[7],
                 [8]])

# Use hstack to stack them horizontally
hstack_result = np.hstack((arr3, arr4))
print("hstack result:")
print(hstack_result)


hstack result:
[[1 2 3 7]
 [4 5 6 8]]


#### 9. Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?

Functionality of array_split() Method in NumPy
The array_split() method in NumPy is used to divide an array into multiple sub-arrays along a specified axis. This method is particularly useful when you need to partition your data into smaller segments for processing or analysis.

Key Features of array_split():
Splitting Along an Axis:

You can specify the axis along which to split the array. By default, it splits along the first axis (axis=0), which is the row direction for 2D arrays.
Uneven Splits:

Unlike split(), which requires the array to be evenly divisible by the number of splits, array_split() can handle uneven splits. If the array cannot be evenly divided, it will distribute the remaining elements across the sub-arrays.
Return Type:

The function returns a list of sub-arrays, each containing the specified number of elements.
Parameters:

ary: The input array to be split.
indices_or_sections: The number of splits or the specific indices at which to split the array.
axis: The axis along which to split (default is 0).
Handling Uneven Splits
When you specify a number of splits that does not evenly divide the size of the array along the specified axis, array_split() will create as many equal-sized sub-arrays as possible and distribute the leftover elements across the resulting sub-arrays.

For example, if you try to split an array of size 10 into 3 parts, array_split() will create two arrays of size 3 and one array of size 4.


In [18]:
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Split the array into 3 parts
split_result = np.array_split(arr, 3)
print("Split 1D array:")
for i, sub_array in enumerate(split_result):
    print(f"Sub-array {i+1}: {sub_array}")


Split 1D array:
Sub-array 1: [1 2 3 4]
Sub-array 2: [5 6 7]
Sub-array 3: [ 8  9 10]


In [19]:
import numpy as np

# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [10, 11, 12]])

# Split the 2D array into 2 parts along axis 0
split_result_2d = np.array_split(arr_2d, 2, axis=0)
print("Split 2D array:")
for i, sub_array in enumerate(split_result_2d):
    print(f"Sub-array {i+1}:\n{sub_array}")


Split 2D array:
Sub-array 1:
[[1 2 3]
 [4 5 6]]
Sub-array 2:
[[ 7  8  9]
 [10 11 12]]


#### 10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array operations?

Concepts of Vectorization and Broadcasting in NumPy
Vectorization and broadcasting are essential features in NumPy that enhance the efficiency of array operations.

Vectorization
Definition: Vectorization converts operations that would typically use loops into operations that apply to entire arrays.
Benefits:
Element-wise Operations: Perform calculations on whole arrays at once.
Conciseness: Results in shorter and more readable code.
Performance: Faster execution due to optimized operations in C.
Broadcasting
Definition: Broadcasting allows arithmetic operations on arrays of different shapes by automatically adjusting their sizes.
Key Rules:
Smaller arrays are padded with ones on the left to match dimensions.
Dimensions are compatible if they are equal or if one is 1.

In [20]:
import numpy as np

# Create two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vectorized addition
result = a + b
print("Vectorized result:", result)


Vectorized result: [5 7 9]


In [21]:
import numpy as np

# Create a 1D array
a = np.array([1, 2, 3])  # Shape: (3,)

# Create a 2D array
b = np.array([[10],      # Shape: (3, 1)
              [20],
              [30]])

# Broadcasting
result = a + b
print("Broadcasting result:\n", result)


Broadcasting result:
 [[11 12 13]
 [21 22 23]
 [31 32 33]]


## Practical

#### 1. Create a 3x3 NumPy array with random integers between 1 and 100 and interchange its rows and columns.

In [22]:
import numpy as np

# Create a 3x3 array with random integers between 1 and 100
array_3x3 = np.random.randint(1, 101, size=(3, 3))
print("Original 3x3 Array:\n", array_3x3)

# Interchange rows and columns (transpose)
transposed_array = array_3x3.T
print("Transposed Array:\n", transposed_array)


Original 3x3 Array:
 [[ 6 52 21]
 [30 35 74]
 [25 11 33]]
Transposed Array:
 [[ 6 30 25]
 [52 35 11]
 [21 74 33]]


#### 2. Generate a 1D NumPy array with 10 elements, reshape it into a 2x5 array, then into a 5x2 array.

In [23]:
# Generate a 1D array with 10 elements
array_1d = np.arange(10)
print("Original 1D Array:", array_1d)

# Reshape into a 2x5 array
array_2x5 = array_1d.reshape(2, 5)
print("Reshaped 2x5 Array:\n", array_2x5)

# Reshape into a 5x2 array
array_5x2 = array_1d.reshape(5, 2)
print("Reshaped 5x2 Array:\n", array_5x2)


Original 1D Array: [0 1 2 3 4 5 6 7 8 9]
Reshaped 2x5 Array:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
Reshaped 5x2 Array:
 [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


#### 3. Create a 4x4 NumPy array with random float values and add a border of zeros.

In [24]:
# Create a 4x4 array with random float values
array_4x4 = np.random.rand(4, 4)
print("Original 4x4 Array:\n", array_4x4)

# Add a border of zeros around it
bordered_array = np.pad(array_4x4, pad_width=1, mode='constant', constant_values=0)
print("Array with Border:\n", bordered_array)


Original 4x4 Array:
 [[0.33439122 0.6792706  0.24617096 0.93796075]
 [0.83095928 0.56019368 0.41033444 0.10648798]
 [0.64449869 0.13838567 0.91596418 0.26726842]
 [0.02619796 0.09335624 0.60113926 0.88236218]]
Array with Border:
 [[0.         0.         0.         0.         0.         0.        ]
 [0.         0.33439122 0.6792706  0.24617096 0.93796075 0.        ]
 [0.         0.83095928 0.56019368 0.41033444 0.10648798 0.        ]
 [0.         0.64449869 0.13838567 0.91596418 0.26726842 0.        ]
 [0.         0.02619796 0.09335624 0.60113926 0.88236218 0.        ]
 [0.         0.         0.         0.         0.         0.        ]]


#### 4. Create an array of integers from 10 to 60 with a step of 5.

In [25]:
# Create an array of integers from 10 to 60 with a step of 5
array_step = np.arange(10, 61, 5)
print("Array with Step of 5:", array_step)


Array with Step of 5: [10 15 20 25 30 35 40 45 50 55 60]


#### 5. Create a NumPy array of strings and apply different case transformations.

In [None]:
# Create an array of strings
string_array = np.array(['python', 'numpy', 'pandas'])

# Apply different case transformations
uppercase_array = np.char.upper(string_array)
lowercase_array = np.char.lower(string_array)
titlecase_array = np.char.title(string_array)

print("Uppercase:", uppercase_array)
print("Lowercase:", lowercase_array)
print("Title Case:", titlecase_array)


#### 6. Generate a NumPy array of words and insert a space between each character.

#### 7. Create two 2D NumPy arrays and perform element-wise addition, subtraction, multiplication, and division.

#### 8. Use NumPy to create a 5x5 identity matrix and extract its diagonal elements.

#### 9. Generate a NumPy array of 100 random integers between 0 and 1000 and find all prime numbers.

#### 10. Create a NumPy array representing daily temperatures for a month and calculate weekly averages.