### Notes on 4  NumPy Basics: Arrays and Vectorized Computation

* NumPy, short for Numerical Python, is pivotal for numerical computing in Python.
* Numerous computational packages employ NumPy's array objects for data exchange.
* Knowledge on NumPy also aids in understanding pandas.

<b>Key Features of NumPy:</b>

1. ndarray:

* An efficient multidimensional array.
* Enables fast array-oriented arithmetic operations.
* Offers flexible broadcasting capabilities.

2. Mathematical functions:

* Enables fast operations on whole arrays.
* Avoids the need to write loops.

3. I/O Tools

* Methods to read and write array data to disk, and to work with memory-mapped files.

4. Additional Capabilities

* Linear algebra, random number generation, and Fourier transform functions.

5. C API Integration:

* Enables integration with libraries written in C, C++, or FORTRAN.
* Makes Python suitable for wrapping legacy low-level language codebases.
* While NumPy is foundational, it doesn't directly provide scientific or modeling functionality. But knowing NumPy can aid in utilizing other tools like pandas more effectively.

<b> NumPy's Importance </b>

* Array-based operations for:
    * Data munging and cleaning.
    * Subsetting, filtering, and transforming data.
    * Computing.


* Algorithms: Sorting, finding unique elements, and set operations.

* Data Operations: Descriptive statistics, data aggregation, and summarization.

* Data Handling: Aligning data, merging and joining datasets.

* Logic Expression: Using array expressions over traditional if-elif-else loop branches.

* Group Operations: Aggregating, transforming, and applying functions.

* Pandas offers more domain-specific functions, like time series manipulation.

<b>Why is NumPy Efficient?</b>

* Contiguous Memory Storage: Stores data in a block of memory, separate from Python's built-in objects.

* C-based Algorithms: Uses C language algorithms that eliminate type-checking and overhead found in Python. These algorithms directly operate on NumPy's memory storage.

* Memory Efficiency: Uses significantly less memory than built-in Python sequences.

* Speed: Performs computations on entire arrays without needing Python loops.

#### The NumPy ndarray: A Multidimensional Array Object

NumPy's ndarray is a flexible container for datasets in Python, allowing for efficient mathematical operations on entire datasets.

In [1]:
import numpy as np

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
print(data * 10)
print(data + data)

[[ 15.  -1.  30.]
 [  0. -30.  65.]]
[[ 3.  -0.2  6. ]
 [ 0.  -6.  13. ]]


ndarray's have attributes like `shape` and `dtype` that provide information about the dimensions and type of the data.

In [2]:
print(data.shape)
print(data.dtype)

(2, 3)
float64


<b>1. Creating ndarrays</b>

You can convert regular Python sequences into ndarrays:

In [4]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)

# multidimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

# Inspecting ndim and shape
print(arr2.ndim)
print(arr2.shape)

[6.  7.5 8.  0.  1. ]
2
(2, 4)


There are functions to generate arrays filled with zeros, ones, or uninitialized values:

In [5]:
print(np.zeros(10))
print(np.zeros((3, 6)))
print(np.empty((2, 3, 2)))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]
[[[-1.28822975e-231 -1.28822975e-231]
  [ 2.41907520e-312  2.14321575e-312]
  [ 2.46151512e-312  2.31297541e-312]]

 [[ 2.35541533e-312  2.05833592e-312]
  [ 2.22809558e-312  2.56761491e-312]
  [ 2.48273508e-312  2.05833592e-312]]]


There's an array-version of Python's range: `arange`.

In [6]:
print(np.arange(15))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


2. Data Types for ndaarays

NumPy's flexibility stems from its varied data types. Here's how you can specify data types for an array:

In [7]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

print(arr1.dtype)
print(arr2.dtype)

float64
int32


You can also cast an array from one data type to another:

In [8]:
arr = np.array([1, 2, 3, 4, 5])
float_arr = arr.astype(np.float64)
print(float_arr.dtype)

arr_floats = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr_floats.astype(np.int32))

float64
[ 3 -1 -2  0 12 10]
