### Notes on 4  NumPy Basics: Arrays and Vectorized Computation

* NumPy, short for Numerical Python, is pivotal for numerical computing in Python.
* Numerous computational packages employ NumPy's array objects for data exchange.
* Knowledge on NumPy also aids in understanding pandas.

<b>Key Features of NumPy:</b>

1. ndarray:

* An efficient multidimensional array.
* Enables fast array-oriented arithmetic operations.
* Offers flexible broadcasting capabilities.

2. Mathematical functions:

* Enables fast operations on whole arrays.
* Avoids the need to write loops.

3. I/O Tools

* Methods to read and write array data to disk, and to work with memory-mapped files.

4. Additional Capabilities

* Linear algebra, random number generation, and Fourier transform functions.

5. C API Integration:

* Enables integration with libraries written in C, C++, or FORTRAN.
* Makes Python suitable for wrapping legacy low-level language codebases.
* While NumPy is foundational, it doesn't directly provide scientific or modeling functionality. But knowing NumPy can aid in utilizing other tools like pandas more effectively.

<b> NumPy's Importance </b>

* Array-based operations for:
    * Data munging and cleaning.
    * Subsetting, filtering, and transforming data.
    * Computing.


* Algorithms: Sorting, finding unique elements, and set operations.

* Data Operations: Descriptive statistics, data aggregation, and summarization.

* Data Handling: Aligning data, merging and joining datasets.

* Logic Expression: Using array expressions over traditional if-elif-else loop branches.

* Group Operations: Aggregating, transforming, and applying functions.

* Pandas offers more domain-specific functions, like time series manipulation.

<b>Why is NumPy Efficient?</b>

* Contiguous Memory Storage: Stores data in a block of memory, separate from Python's built-in objects.

* C-based Algorithms: Uses C language algorithms that eliminate type-checking and overhead found in Python. These algorithms directly operate on NumPy's memory storage.

* Memory Efficiency: Uses significantly less memory than built-in Python sequences.

* Speed: Performs computations on entire arrays without needing Python loops.

#### The NumPy ndarray: A Multidimensional Array Object

NumPy's ndarray is a flexible container for datasets in Python, allowing for efficient mathematical operations on entire datasets.

In [1]:
import numpy as np

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
print(data * 10)
print(data + data)

[[ 15.  -1.  30.]
 [  0. -30.  65.]]
[[ 3.  -0.2  6. ]
 [ 0.  -6.  13. ]]


ndarray's have attributes like `shape` and `dtype` that provide information about the dimensions and type of the data.

In [2]:
print(data.shape)
print(data.dtype)

(2, 3)
float64


<b>1. Creating ndarrays</b>

You can convert regular Python sequences into ndarrays:

In [3]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)

# multidimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

# Inspecting ndim and shape
print(arr2.ndim)
print(arr2.shape)

[6.  7.5 8.  0.  1. ]
2
(2, 4)


There are functions to generate arrays filled with zeros, ones, or uninitialized values:

In [4]:
print(np.zeros(10))
print(np.zeros((3, 6)))
print(np.empty((2, 3, 2)))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]
[[[1.49166815e-154 2.00389432e+000]
  [2.25633662e-314 2.26110380e-314]
  [2.25635365e-314 2.25652526e-314]]

 [[2.25699915e-314 2.25634739e-314]
  [2.25651302e-314 2.27631916e-314]
  [2.28730775e-314 8.34404953e-309]]]


There's an array-version of Python's range: `arange`.

In [5]:
print(np.arange(15))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


2. Data Types for ndaarays

NumPy's flexibility stems from its varied data types. Here's how you can specify data types for an array:

In [6]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

print(arr1.dtype)
print(arr2.dtype)

float64
int32


You can also cast an array from one data type to another:

In [7]:
arr = np.array([1, 2, 3, 4, 5])
float_arr = arr.astype(np.float64)
print(float_arr.dtype)

arr_floats = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr_floats.astype(np.int32))

float64
[ 3 -1 -2  0 12 10]


3. Arithmetic with NumPy Arrays

Arrays allow batch operations on data without explicit for loops, a process termed as vectorization. Operations between equal-size arrays are executed element-wise.

In [8]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
print(arr * arr)
print(arr - arr)

[[ 1.  4.  9.]
 [16. 25. 36.]]
[[0. 0. 0.]
 [0. 0. 0.]]


Arithmetic operations with scalars propagate the scalar to each element:

In [9]:
print(1 / arr) 
print(arr ** 2)


[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]
[[ 1.  4.  9.]
 [16. 25. 36.]]


Comparisons between same-size arrays yield Boolean arrays:

In [10]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
print(arr2 > arr)

[[False  True False]
 [ True False  True]]


4. Basic Indexing and Slicing

Array indexing in NumPy is versatile, offering various ways to select data subsets.
One-dimensional arrays are straightforward, behaving similarly to Python lists:

In [14]:
arr = np.arange(10)
print(arr[5]) 
print(arr[5:8])
arr[5:8] = 12
print(arr)


5
[5 6 7]
[ 0  1  2  3  4 12 12 12  8  9]


Important points:
* Array slices are views on the original array. Any changes to the view reflect in the original array.
* Explicit copying is required if a copy of the slice, not a view, is desired.

For higher-dimensional arrays, individual elements are accessed with comma-separated indices:

In [15]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d[2])
print(arr2d[0, 2])

[7 8 9]
3


In multidimensional arrays, if you omit later indices, the result is an ndarray of all data along higher dimensions:

In [16]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3d[0])
old_values = arr3d[0].copy()
arr3d[0] = 42
print(arr3d)
arr3d[0] = old_values
print(arr3d)
print(arr3d[1, 0])

[[1 2 3]
 [4 5 6]]
[[[42 42 42]
  [42 42 42]]

 [[ 7  8  9]
  [10 11 12]]]
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
[7 8 9]


Subsections of arrays selected are views, ensuring any changes affect the original array. This multidimensional indexing doesn't apply to regular Python lists.