# Scientific Computing in Python

## Why Python? {.hide}

Python is extremely popular for scientific computing due to several key reasons:

- A wide range of high-quality scientific libraries
- Open-source nature of both the language and its libraries
- The language's accessibility and flexibility

## Scientific libraries {.hide}

Some of the more popular scientific libraries in Python include:

- NumPy
- SciPy
- Matplotlib
- Pandas
- Scikit-learn, TensorFlow, PyTorch, etc.

## Description of libraries {.hide .smaller-90}

- **NumPy** forms the foundations by providing a basic array data type (think of
  vectors and matrices) and functions for acting on these arrays (e.g., matrix
  multiplication).  
- **SciPy** builds on NumPy by adding the kinds of numerical methods that are
  routinely used in science (interpolation, optimization, root finding, etc.).  
- **Matplotlib** is used to generate figures, with a focus on plotting data stored in NumPy arrays.  
- **Pandas** provides types and functions for empirical work (e.g., manipulating data).  
- **Scikit-learning**, **TensorFlow**, **PyTorch**, etc., provide tools for machine learning and deep learning.


## Why use these libraries?

- It's often better to use an existing routine than to write one from scratch.
- Established libraries typically have efficient implementations and greater accuracy, as they are written by experts and optimized by a large community of users.
- Pure Python is not inherently fast; these libraries speed up Python code execution because they are often implemented in lower-level languages like C or Fortran.

## Why not use pure Python? {.hide}

**Why not pure Python?**  Although Python is faster to write, less error-prone, and easier to debug compared to low-level languages.  However:

- Higher-level languages like Python are optimized for human readability and ease of use.
- The standard implementation of Python (CPython) is slower than compiled languages.
- Python is harder to optimize into fast machine code compared to languages like C or Fortran.

## Vectorization {.hide .smaller-85}

**Vectorization** is a method used to speed up numerical applications by sending array processing operations in batch to pre-compiled and efficient native machine code, typically compiled from optimized C or Fortran.

In Python, **NumPy** is used for vectorization to achieve significant performance improvements.

:::{.callout-important}
Vectorization is not without disadvantages; it can be highly memory-intensive and may not be suitable for all algorithms.
:::

## Native Python vs. Vectorized Implementation {.hide .smaller-75}

Comparing native Python code with a vectorized implementation using NumPy:

In [11]:
import random
import numpy as np

n = 1_000_000

In [9]:
%%time

y = 0      # Will accumulate and store sum
for i in range(n):
    x = random.uniform(0, 1)
    y += x**2

CPU times: user 256 ms, sys: 606 μs, total: 257 ms
Wall time: 255 ms


In [6]:
%%time

x = np.random.uniform(0, 1, n)
y = np.sum(x**2)

CPU times: user 11.4 ms, sys: 9.24 ms, total: 20.6 ms
Wall time: 18.8 ms


We square and then sum a large number of random variables:
 
The second code block breaks the loop down into three basic operations handled in **batch**

1. draw `n` uniforms  
1. square them  
1. sum them