SCT211-0848/2018  
Jany Muong

# ICS 2207 SCIENTIFIC COMPUTING  
**CAT** - Due Date: `12th April 2024`  

## a) Advantages of NumPy Arrays, and Vectorization for Numpy Efficiency:  

NumPy arrays is a beter approach than Python lists for numerical computations - I have listed out below how it is better in the context of numerical computations:  
- **Better Performance:** as a part of their implementation, **NumPy** **arrays** are stored in **contiguous blocks of computer memory**, allowing for efficient vectorized operations. The basic difference is this - while Python lists are generic containers of objects, NumPy arrays are 
homogenous and typed arrays of fixed size. This means (batch and aggregate) operations are performed on entire arrays simultaneously, significantly faster than sequential element-by-element computations in lists.  
- **Computational Efficiency - from use of Appropriate Data types:** NumPy supports various data types specifically designed for numerical data (e.g., integers, floats, **complex numbers**). This ensures efficient storage and calculations compared to storing **mixed data types** in lists.  
- **Mathematical operations - Convenience:** **NumPy** offers a rich set of **built-in functions** for mathematical operations (e.g., addition, multiplication, element-wise operations) that work directly on arrays, eliminating the need for custom loops which is the casee in regular Python lists. Also, NumPy provides a wide range of other built-in functions and operations specifically designed for numerical computations, such as element-wise arithmetic, linear algebra operations, and statistical functions. These functionalities simplify and streamline code development for scientific computing tasks making it convenient.
- **Memory Efficiency**: NumPy arrays consume less memory compared to Python lists, especially when dealing with large datasets, and, again, this is also due the contiguous memory allocation in Numpy arrays. This efficiency is achieved through optimized storage of homogeneous data types and the elimination of Python object overhead for each element in the array in place of  need for many explicit loops.  


### What **Vectorization** Is, and Its **Efficiency in NumPy**:

Vectorization in NumPy refers to the process of applying batch operations on arrays (performing operations on entire arrays at once), rather than iterating through individual elements using loops. This leverages optimized, low-level implementations in NumPy's underlying libraries to execute computations efficiently. By vectorizing operations, NumPy minimizes the overhead associated with Python ( being an interpreted language) and this enhances the speed and performance of numerical computations.

## b) Python Function - `Dot Product`, with `Error Handling`:  

This function takes two lists as input, converts them to NumPy arrays, checks for mismatched lengths (raising a `ValueError`), and then performs the dot product using `np.dot` for efficient vectorized calculation.

In [1]:
import numpy as np
'''Module:
Is a Python function using NumPy to calculate the dot product of two vectors. 
The function accepts two lists of numbers as input, convert them into NumPy 
arrays, and return the dot product. Include error handling for cases with mismatched 
input list lengths.
'''

def dot_product_np(list_a, list_b):
    '''
    np.dot(A, B) compute the dot product of two vectors represented as lists.
    
    args:
      list_a: The first list of numbers.
      list_b: The second list of numbers.
    
    returns:
      dot product of the two vectors.
    
    raises:
      ValueError: If the input lists have different lengths.
    '''
    try:
        # convert Python inout lists to NumPy arrays
        np_array_a = np.array(list_a)
        np_array_b = np.array(list_b)

        # error handling and validation - this a check for matching lengths
        if np_array_a.shape != np_array_b.shape:
            raise ValueError('Input lists must have the same lengths for dot product.')

        # compute dot product - vectorization
        return np.dot(np_array_a, np_array_b)
    except Exception as e:
        return str(e)
    # finally:
    #     return f'And this is how you use Numpy to compute Dot Product :)'

### Usage - Ascertaining It Works As Expected:  
We're passing in *Python lists* as **input**.  
This is a test to see the function computation as correct, and the error handling in action. We'll use 1-D arrays:

In [2]:
# matching lengths - good shape;
list_a = [7, 3, 9]
list_b = [4, 5, 6]
print(f'The Dot Product of the two arrays from lists is: ', end='')
print(dot_product_np(list_a, list_b))

The Dot Product of the two arrays from lists is: 97


In [3]:
# mismach - this should raise error;
list_a = [1, 2, 3, 8]
list_b = [4, 5, 6]
dot_product_np(list_a, list_b)

'Input lists must have the same lengths for dot product.'

## c) Use and Importance of Pandas in Data Analysis

Let's use data structures to explain how pandas work: well, the key data structures are:

- **Series:** One-dimensional labeled arrays capable of holding various data types. They're essentially one-dimensional arrays holding data of any type(1-D array - vector - items plus index).

- **DataFrame:** Two-dimensional, size-mutable, tabular data structure with labeled rows (index) and columns. In other words: it can be thought of as a 2-dimensional data structure like a 2-dimensional array, or a table with rows and columns.  DataFrames offer efficient storage, handling of missing data, and powerful indexing capabilities.  

DataFrames as Pandas data structures are used in managing and analyzing large datasets - as explained below:

- **Data manipulation:** Pandas provides intuitive methods for filtering, sorting, grouping, and aggregating data based on specific criteria. (e.g., selecting rows based on conditions, calculating group statistics). Examples: *filtering* rows based on conditions (`df[df['column'] > value]`), *grouping* data for aggregation (`df.groupby('column').agg('mean')`), *merging/joining* multiple DataFrames (`pd.merge(df1, df2, on='key_column')`), and *reshaping data* (`pivot_table`, `melt`, `stack`, `unstack`).

- **More on Merging and joining:** DataFrames can be merged or joined based on shared keys, allowing for combining data from different sources.
- **Missing data handling:** Pandas offers tools for identifying and handling missing data (e.g., imputing missing values, dropping rows/columns with missing values).

- **Importance in Large Datasets**: Also, Pandas an ability to load data incrementally, perform operations in-memory, and optimize memory usage through data types and indexing.  

I have listed out below examples of data manipulation using pandas in Python code:  

In [4]:
import pandas as pd

# this a Python data structure - dictionary
data = {'Name': ['Jany Muong', 'Satoru Gojo', 'Mr Robot'],
        'Age': [25, 28, 35],
        'Salary': [90000, 60000, 70000]}

# this is a DataFrame created from a dictionary above
df = pd.DataFrame(data)

# filtering
filtered_df = df[df['Age'] > 28]

# grouping and aggregation
grouped_df = df.groupby('Age').agg({'Salary': 'mean'})

# merging DataFrames
anime_data = {'Name': ['Panda', 'Megumi Fushiguro'], 'Age': [16, 15]}
anime_data = pd.DataFrame(anime_data)
merged_df = pd.merge(df, anime_data, on='Age', how='outer')

print(f'Filteed DataFrame: \n{filtered_df}')
print()
print(f'Grouped DataFrame: \n{grouped_df}')
print()
print(f'Merged DataFrame: \n{merged_df}')

Filteed DataFrame: 
       Name  Age  Salary
2  Mr Robot   35   70000

Grouped DataFrame: 
      Salary
Age         
25   90000.0
28   60000.0
35   70000.0

Merged DataFrame: 
        Name_x  Age   Salary            Name_y
0   Jany Muong   25  90000.0               NaN
1  Satoru Gojo   28  60000.0               NaN
2     Mr Robot   35  70000.0               NaN
3          NaN   16      NaN             Panda
4          NaN   15      NaN  Megumi Fushiguro


## d) Importance of Symbolic Computation  

Symbolic computation deals with manipulating mathematical expressions in symbolic form (letters, variables) instead of numerical values. It plays a vital role in scientific computing for several reasons as listed out:

- **Mathematical Modeling**: Symbolic computation facilitates the creation and manipulation of mathematical models in various scientific domains, such as physics, engineering, and mathematics. Researchers and practitioners can express and analyze complex systems and phenomena through symbolic representations, leading to deeper insights and predictions.  
- **Exact Representation**: Symbolic computation systems work with mathematical expressions symbolically, allowing for exact representation and manipulation of mathematical entities like equations, variables, and functions. This precision is vital for maintaining accuracy in scientific calculations and analyses.  
- **Algebraic Manipulations**: Symbolic computation systems can perform complex algebraic manipulations, including simplification, expansion, differentiation, integration, and solving equations symbolically. These capabilities are fundamental in many scientific disciplines for deriving analytical solutions and understanding mathematical relationships.
- **Analytical solutions:** Symbolic computation allows deriving analytical solutions to mathematical problems, providing insights beyond just numerical results. This can be crucial for understanding the underlying relationships between variables.  
- **Simplification:** Symbolic computation can simplify complex expressions, making them easier to interpret and manipulate.
- **Code generation:** Symbolic computation tools can generate optimized code for numerical computations based on derived symbolic expressions.  
- **Verification and Validation and Error analysis**: Symbolic manipulation helps analyze the behavior and potential errors in mathematical models before numerical computation. This ensures the accuracy and validity of the obtained results. Also, symbolic computations enable verification and validation of numerical algorithms and results. By comparing symbolic solutions with numerical computations, researchers can detect errors, ensure algorithm correctness, and improve the reliability of scientific simulations and analyses.

## End File