# Working with Numpy Arrays

In this notebook we'll cover reading data from a file into a Numpy array, separating out different parts of the array to different namespaces and reshaping some of the elements. Additionally, we'll cover using some of the basic statistical functions and methods associated with float and integer arrays and be able to find where maximum and minimum values occur within an array.

## Reading in data to Python
It is difficult to easily talk about reading data from a file into a Python program simply because there are so many ways to do so! Depending on what the data is and what you want to do with it, there will be a number of different methods to read the file in. Specific to the Numpy module, there are simple read functions, and we are going to use one here. In the future we will use some read functionality from a packaged called Pandas (https://pandas.pydata.org) that is in some ways built on top of the Numpy module and brings with it a whole lot of additional functionality. In this example we'll read in an ASCII (plain text) file. We can then use the data to calculate or graph stuff to answer scientific questions! Specifically, the data that we are reading in is the North Atlantic Oscillation (NAO) data from 1950 to 2024.

Source: https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/nao.shtml

In [None]:
import numpy as np

In [None]:
import os
data = np.loadtxt("data/nao.txt", skiprows=2)

In the command above you are using a function (```loadtxt```) from the numpy module (```np```) to read in a file (nao.txt).

How can I find out more about this function? Well, there are many ways, but two of the easiest would be to 1) google <span style="font-family:Courier"> **numpy loadtxt** </span> or 2) in a new cell below type ```np.loadtxt?```. The second way will bring up the manual pages from the numpy module to a dialogue box at the bottom of your web browser. You can also put your cursor in the function name and press shift+tab to get a dialogue box that contains the documentation for that particular function.

In [None]:
# So what does the data look like?


In [None]:
# What is the shape of this array that we have read in?


## Separating Columns
With multidimensional arrays, we can isolate different columns and rows into separate arrays. This might be advantageous for working with the data where you can ignore certain parts of the data during your analysis. Separating dimensions may also allow you to work with them in different ways.

In [None]:
# Get the NAO data in its own array separate from the month column.
nao_values = 

In [None]:
# Get the months/year data in its own array separate from the NAO data.


## Reshape
We've got a single column of data, but there is more inherent organization to this data that might be better to represent as a table instead. We can reshape an array by our known structure.


In [None]:
nao_five_decades = 

In [None]:
nao_by_year = 


## Common Functions in Numpy
Do you every compute averages? maximum? minimum?

Numpy can help!

Statistical Functions: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
* `max`
* `min`
* `std`
* `median`
* `mean`
* `average`
* `nanmean`
* `corrcoef`

A number of these functions can also be called as methods on a Numpy data object.

```python
jan_avg = np.mean(data[::12, 1])
jan_avg_method = data[::12, 1].mean()
print(jan_avg)
print(jan_avg_method)
```
Output:
```
0.12416666666666669
0.12416666666666669
```
Note: The `average` function actually computes a weighted average. The default weight is 1 for all observations, but this function gives you the option of changing them. The `mean` function just computes the straight arithmatic mean of the array along an axis. It does not have the ability to provide weights.

Each of these Numpy functions can be computed over the entire array and return a single value, or over a particular axis to return the average for each row or column depending on which axis was chosen. To do this you would specify the keyword argument of `axis` and set it to the desired axis number (e.g., 0, 1, 2, etc.)

In [None]:
# Let's compute the average value of NAO for the month of January during the 1950s.
# January is index value 0 (we want all rows from column 1)
jan_avg = 

In [None]:
# June Average NAO for the 1950s


In [None]:
# Average over all of the months for the 1950s?

# Advanced: Average of June-July-August range for all years.

In [None]:
# Now let's compute the max NAO value from the whole dataset
max_nao_value = 

In [None]:
# Minimum NAO value from the whole dataset


## Finding Values in an Array

Sometimes you want an efficient way to search an array for a particular value or values. The `np.where` function can use logical operations to identify indices that correspond to where the logical condition is True. The returned value is a tuple of arrays, so it will require a bit to get out individual index values.

```python
a = np.array([-2, -1, 0, 1, 2])
print(np.where(a <= -1))
```
Output:
```
(array([0, 1],))
```

In [None]:
# So when did the maximum value occur?


## Exercise 1
Find the minimum NAO that occurs between 1990-2010 and when it occurs.

In [None]:
# Find the maximum that occurs between 1990 - 2010 and where it occurs


## Exercise 2
Use the where function to find all of the elements where one of the following conditions are true:  
https://numpy.org/doc/stable/reference/generated/numpy.where.html  

NAO >= 1.25

NAO <= -1.25

Note: The `and` condition will not work in this case. To assess the combination of two inequalities with Numpy arrays we need to exploit the mathematical operations related to boolean arrays.

`True * True = True`

`True * False = False`

`False * False = False`

`True + True = True`

`True + False = True`

`False + False = False`

In [None]:
# Multiple Conditions
# Find all index values where NAO is >= 1.25 or <= -1.25
