# Lesson 3: Arrays

## Content

### Poll Review
* Lists and Dictionaries

## Libraries

## Data structure: Arrays
Arrays are the new data structure of this lesson.  They are similar to lists in that they represent a group of values.  Arrays are also simliar to list because you reference specific values using an index.  The main difference between lists and arrays is that arrays can have multiple dimensions.

Still, if you array only has 1 dimension (like a list), why not just use a list?  Because:
* `numpy` arrays are super optimized for speed
* `numpy` arrays have lots of funcions build in that let you do extra things to a list

In the end the choice between an array and a list depends on what kind of data you have and what you want to do with it.  There is also an element of personal preference.

<img src="../images/L3_numpy-arrays.png" width=550>

_Image ammended from 
[Indian AI Production](https://indianaiproduction.com/python-numpy-tutorial/)._

## `numpy` library
`numpy` is the standard Python library for dealing with arrays and matricies.  A lot of time and energy has gone into making `numpy` fast.

`conda install -n lectures numpy`

In [4]:
import numpy as np

### Side note: `xarray`
`xarray` is another array/matrix library that has gained a LOT of traction among scientists.  It is a great library and it is developed by a group of scientists in the earthdata community so it often works really well for earth science applications.

There are two big problems with `xarray` right now:
* it doesn't have good spatial referencing.  This means that there isn't good tooling for people to, for example, pick a data point based on latitude and longitude
* There isn't a great way to read `xarray` data from the cloud

These are both problems that are being actively worked on by members of the community and will, I think, be resolved in the near future.  Until then, though, and for the sake of stability, this summer we are focused on `numpy`.  If you are interested in trying `xarray`, certainly go for it or talk to me.

## Creating `numpy` arrays

Because arrays are so similar to lists one common way to make an array is by starting with a list.  

We use `np.array()` to convert a list to an array.  In that line `np.` reminds Python which library we are referecing.  Here `np.` means `numpy`, which is what we told Python to call `numpy` when we imported it with `import numpy as np`.

In [5]:
reflectance_list = [25, 65, 43, 13, 54]

In [6]:
reflectance_array = np.array([25, 65, 43, 13, 54])

In [25]:
reflectance_array

array([25, 65, 43, 13, 54])

Nothing very special so far.

What we get with the array, though, is some extra **properties** (information) and **methods** (actions).

Some useful properties of an array are:
- **dtype**: the data type of the array (Ex. integer, float, etc.)
- **shape**: the number of items along each dimension

Properties are applied using a period `.` at the end of the object:
* Ex. `my_array.property`

In [27]:
reflectance_array.dtype

dtype('int64')

In [26]:
reflectance_array.shape

(5,)

We will look at some useful methods in a bit.

### üëÄNotice
We have seen both properties and methods before for lists.  Now that we have seen them in multiple places, you might start noticing a pattern.  
* Properties of objects you access with `data_structure.property`.  They **tell** you something about the object
* Methods of objects you access with `data_structure.method()`.  They **do** something to the object.

_Starting Structures_
* `list_ex = ['co','co2','ch4']`
* `array_ex = np.array([6.5, 7.3. 2.9])`

_Examples of properties_

* `print(array_ex.shape)  # returns (3,)`

_Examples of methods_

* `list_ex.append('o3')`
* `array_ex.full(3)`

## Array Indexes

You can access elements of the array by index, just like with lists.

In [7]:
# Return the 5th value in the array
reflectance_array[4]

54

With a one-dimensional array this looks exactly the same as a list.  With higher-dimensional arrays there is a little more to keep track of.

### üìù Checking In
Print out the shape of the array

Print the value of the 3rd item in the array

## More than one dimension
The array we made so far is a one-dimensional array.  Very often when working with arrays you will have two or more dimensions.  You can create arrays like this by putting lists inside of lists, or **nesting** them.

In [19]:
nested_list = [[1,2,3], [4,5,6]]
array_2d = np.array(nested_list)

In [21]:
print(array_2d.shape)

(2, 3)


Oen thing that you have to start keeping track of once your arrays have more than one dimension is their **axes**. An axis is a number assigned to each new dimension to keep track of it.

<img src="https://codeguru.academy/wp-content/uploads/2020/11/numpy-intro-01.png" width=450>

_Image from [Code Guru](https://codeguru.academy/?p=335)_

You might often hear/read arrays called `ndarray`s.  This is short for an n-dimensional array, which refers to the fact that it can have any, or "n", number of dimensions.

In [16]:
reflectance_list = [[1, 2, 3], [4, 5, 6]]

In [17]:
reflectance_array = np.array(reflectance_list)

|   |   |   |
|---|---|---|
| 1  | 2  |3   |
|  4 |  5 | 6 |

### Indexing multi-dimensional arrays

In [18]:
reflectance_array[1,2]

6

## Aggretations

Really nice visuals on this https://numpy.org/doc/stable/user/absolute_beginners.html

In [21]:
reflectance_array.sum()

288

In [24]:
reflectance_array.max()

65

Keeping track of axes

In [22]:
reflectance_array.sum(axis=0)

array([ 50, 130, 108])

In [25]:
reflectance_array.max(axis=0)

array([25, 65, 54])

## Adding/subtracting arrays

Once we start getting into manipulating the values of matrices we are approaching some of the principles of linear algebra.  If you have taken linear algebra before then you'll be right at home.  If not, don't worry too much.  The operations that you are most likely to use don't get too deep.

The most important thing to remember when you start trying to add/subtract/multiply values to an array is that the shape of you array is important.  So if you have a 3x2 array and you want to add some values you need to add another matrix which has the shape 3x2 OR you add one value to all of the values in that array.  This second option is called "broadcasting".

In [29]:
reflectance_array

array([[25, 65, 54],
       [22, 35, 12]])

In [28]:
reflectance_array + 1

array([[26, 66, 55],
       [23, 36, 13]])

In [30]:
reflectance_array + np.array([1,2,3])

array([[26, 67, 57],
       [23, 37, 15]])

## Array Methods (Doing more stuff to arrays)

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html

In [7]:
array = np.array([[1,2,3], [4,5,6]])

In [5]:
array.fill(3)

In [8]:
array

array([[1, 2, 3],
       [4, 5, 6]])

## Intermediate Addon

* unique items and counts
* adding dimensions
* conceptual detail -- numpy is faster because it accesses Cython, the C "bindings" for Python.  I have never delved into that world but people do it.

## Potential Practice Problems
Looking at docs to create an array of zeros. https://numpy.org/doc/stable/user/quickstart.html