# Introduction to NumPy for Working with Numerical Data Arrays

<div class="alert alert-success">
    
## This notebook covers
- NumPy multidimensional array data structure
- NumPy data types
- Selecting, slicing, and querying dataframes
- Simple calculations with summary functions
- Sorting and grouping data
- Copying and renaming dataframe columns
- Handling missing values
- Merging dataframes and writing to file
</div>

<div class="alert alert-warning">

## Reminders

Remember, you can use Jupyter's built-in table of contents (hamburger on the far left) to jump from heading to heading.

---

This notebook will run in the MSUpy conda environment, which you create in the previous lesson. To select the Jupyter kernel the MSUpy environment go to the Kernel tab, select Change Kernel, then select the MSUpy kernel in the pop up window.

---

To turn on line number for code cells go to View menu and click Show Line Numbers.

</div>

# I. Importing Necessary Packages

In [1]:
import numpy as np

# II. Introduction to the NumPy Data Structure

The NumPy (**Num**erical **Py**thon) package is widely used in science and engineering for working with data in homogenous (same data type) multidimensional arrays. Using NumPy multidimensional arrays, as opposed to nested Python lists, improves speed, reduces memory consumption, and offers easier syntax for performing a variety of common data processing tasks. 

The NumPy data structure is called an *ndarray*, for n-dimensional array, which we may refer to throughout this notebook simply as an "array". Below is a schematic of a 1-d, 2-d, and 3-d array. A 1-d array looks similar to a list or a single row of data values. A 2-d array looks similar to a grid or table with rows and columns. A 3-d array looks similar to a cube or cuboid. Each dimension in an ndarray is called an *axis* and the first axis is always axis zero, keeping consistent with Python's zero-based indexing. Also, NumPy arrays must be rectangular and not jagged (i.e., each row of a two-dimensional array must have the same number of columns) and all of the data values in each array must be of the same data type. You'll notice below that similar to Pandas DataFrames, NumPy arrays have a shape attribute where each number in the shape tuple is the length of an array axis. 

<img src="images/numpy_arrays.png" alt="schematic of ndarrays" width="700"/>

It could be helpful to think of array dimensionality in terms of the data your arrays could hold (see image below). A geoscientist, for example, may use a 1-d array to hold solar radiation observations over time at a single location (a timeseries). In this case the only axis (axis 0) would represent data values at different times. A 2-d array could hold spatially gridded data like a gridded map of annual mean temperature or alternatively it could hold land surface imagery with 1-meter pixel resolution, for example. In this case axis 0 (rows) could represent data values at different latitudes and axis 1 (columns) could represent data values at different longitudes. A 3-d array could hold a timeseries of spatially gridded data like daily gridded maximum temperature. In this case axis 0 (the stack in the image below) could represent different times and axes 1 and 2 could represent different latitudes and longitudes.  

<img src="images/numpy_geoarrays.png" alt="ndarray data examples" width="700"/>

The above are examples of array data you may encounter frequently, but actually NumPy arrays can have as many axes as you can dream up and each axis can represent whatever you want it to. In this notebook though, we'll think of 1-d arrays mostly as timeseries, 2-d arrays as latitude-longitude grids, and 3-d arrays as timeseries of latitude-longitude grids. 

We won't get into higher dimensional data arrays and we'll only work with numerical data arrays (integer, float, boolean). Technically, NumPy arrays can contain numerical or non-numerical data (such as strings and bytes) but non-numerical data must be defined as fixed-width data types, which requires knowing or calculating the sizes of the longest text or byte sequence in advance. We will not cover non-numerical NumPy arrays in this course. 

# III. NumPy Data Types

We'll cover a few of the numerical NumPy data types here. NumPy supports a much greater variety of numerical data types than core Python does.
NumPy numerical data types are more specific than the core Python numerical data types, allowing for more efficient memory usage and faster computation. More detail can be found in the [NumPy Documentation for Data Types](https://numpy.org/doc/stable/user/basics.types.html#data-types).

The NumPy data types below are *concrete* types, meaning that a consistent number of bits are reserved for each individual data value in memory. A bit is the smallest unit of data in a computer (represented by a 0 or a 1) and you need 8 bits (1 byte) to represent one character like a letter or number in memory. The concrete data types below are named by combining the basic numerical type name (e.g., integer, float) with the number of bits that are needed to represent a single value in memory (bitsize).

np.bool 8-bit Boolean (True/False, 1/0)

np.int8 8-bit integer value (-128 to 127)

np.int16 16-bit integer value (-32768 to 32767)

np.int32 32-bit integer value (-2147483648 to 2147483647)

np.int64 64-bit integer value (-9223372036854775808 to 9223372036854775807)

np.uint8 Unsigned 8-bit integer value (0 to 255)

np.uint16 Unsigned 16-bit integer value (0 to 65535)

np.uint32 Unsigned 32-bit integer value (0 to 4294967295)

np.uint64 Unsigned 64-bit integer value (0 to 18446744073709551615)

np.float16 Half precision float values

np.float32 Single-precision float values

np.float64 Double-precision float values





*When you start working larger array objects, you may need to be more aware of how much memory your objects are consuming. You may want to convert your data to a type that consumes less memory.*