# NumPy basics
## Data 765 tutoring

[NumPy](https://numpy.org/) is one of the central packages of the scientific Python ecosystem. NumPy implements accelerated array operations using [optimized, low level loops](https://chelseatroy.com/2018/11/07/code-mechanic-numpy-vectorization/) as well as [BLAS](https://numpy.org/devdocs/user/building.html) libraries.

NumPy is important because array operations are central in maths. Vectors and matrices are easy to represent as arrays. Basic formulas; such as averages, standard deviation, or effect size; are all implemented with arrays/vectors/matrices. Means are calculated by summing up the elements of an array and dividing by the length. Machine or statistical learning is also just vector and matrix operations. In other words, linear algebra is central to what we do, and fast array (i.e. vector or matrix) operations greatly improve our quality of life.

[Pandas](https://pandas.pydata.org/) is an in memory columnar data frame library that is built on NumPy. Understanding NumPy helps with pandas as well as the rest of Python's science ecosystem. You will often use NumPy directly in service to some goal you have with the other scientific libraries.

# Lists, arrays, primitives, and composites

Python is popular for both enthusiasts as well as casual programmers. For enthusiasts, Python is a clean language with great FFI support which allows easy interfacing to lower level languages such as C or Rust. For newbies, Python has, well, those same two benefits without casual programmers having to think about what that means. NumPy is one of the Python ecosystem's greatest boons, and the library relies on lower level code for speed.

But why?

Python follows the principle of least astonishment. Python reasonably tries to do what you expect it to do. `list`s store any object rather than being limited to a single type as with arrays and Vectors in Rust or C and C++. Rust and the like aren't inferior to Python; they have different goals and targets. Lower level languages are closer to the hardware which engenders great speed and power. However, that also means that a programmer must code closer to how a computer works as well.

Python is a higher level language which means that it is more abstracted from the hardware. That allows certain built in and automatically enabled features that would be undesirable in a lower level language. For example, Python's base integer is [arbitrarily precise](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic) which means you can do this:

In [3]:
big_long = 10**10000
print(big_long)

1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

BigInts are not always beneficial as an immediately enabled feature. Other languages usually require a library that implements BigInts. Instead, those languages had integer types that are fixed in size. Rust has unsigned and signed 8 bit, 16 bit, 32 bit, 64 bit, and 128 bit [integers](https://doc.rust-lang.org/std/index.html#primitives). Thus, integers in Rust have a fixed minimum and maximum size just like Python's floats. Working with integers of those sizes are very efficient for CPUs. Arbitrary precision requires extra logic for every calculation which slows down operations.

_NumPy uses [primitives](https://numpy.org/doc/stable/user/basics.types.html) to speed up calculations._

Arrays in lower level languages are "limited" to a single type because they're a block of contiguous memory. Accessing each element is fast and cheap. CPUs have special instructions such as SIMD and MIMD to speed up operations, such as math, even more on blocks of data.