# Arrays

---

As we saw with built in types, from a science/research point of view, Python has some odd execution.



In [7]:
s = 'a string'

print(s*4)

a stringa stringa stringa string


Similar oddities happen with data structures as well.

In [8]:
a = [1,2,3]
b = [4,5,6]
c = a+b
print(c)

[1, 2, 3, 4, 5, 6]


In [9]:
a.append(b)
a

[1, 2, 3, [4, 5, 6]]

### Scientific Data Structures

Within the Python ecosystem there are several packages which have been developed for numerical and scientific computing. These packages provide scientific data structures which work as vectors or matrices, have built in and fast functions for various operations (arithmetic, vector, and matrix), and work seemlessly with other analysis packages (e.g., SciPy). 

#### Data Structure Packages

- [Numpy](https://numpy.org/), Python's fundemental package for scientific computing.
    - Python standard for working with numerical data in Python
    - Store data as an ordered vector/series or matrix/array
    - Efficiently manipulate data and perform mathematic operations
    - Multi-dimensional arrays
    - Import/Output data
    - [Documentation](https://numpy.org/doc/stable/)
- [Pandas](https://pandas.pydata.org/)
    - Excellent for tabular, relational, or labeled data (e.g., data stored as columns, different observations/instruments)
    - Store data in a Series (1-dimension) or DataFrame (2-dimensional)
    - Rapidly analyze and visualize data
    - Import/Output data
    - Built on Numpy
    - [Documentation](https://pandas.pydata.org/docs/user_guide/index.html)
    
Several other packages exist which also have built-in data structures (typically built off Numpy). The data structures in these packages often have additional utility that does not exist in Numpy or Pandas (e.g., metadata). Some of these include:

- [Astropy](https://docs.astropy.org/en/stable/index.html)
- [Sunpy](https://docs.sunpy.org/en/stable/guide/data_types/index.html)
- [Spacepy](https://spacepy.github.io/datamodel.html)
- [xarray](https://docs.xarray.dev/en/stable/user-guide/data-structures.html)
    
#### Tutorials

- Numpy
    - [Numpy Quick Start](https://numpy.org/doc/stable/user/quickstart.html)
    - [Problem Solving With Python - Numpy and Arrays](https://problemsolvingwithpython.com/05-NumPy-and-Arrays/05.00-Introduction/)
    - [Earth Lab - Numpy Arrays](https://problemsolvingwithpython.com/05-NumPy-and-Arrays/05.00-Introduction/)
    - [Scipy Lecture Notes - Numpy](http://scipy-lectures.org/intro/numpy/index.html)
    - [NASA Advanced Software Technolocy Group - Intro to Numpy](https://colab.research.google.com/github/astg606/py_materials/blob/master/numpy/introduction_numpy.ipynb)
- Pandas
    - [Pandas - 10 Minutes to Pandas](https://pandas.pydata.org/docs/user_guide/10min.html)
    - [Earth Lab - Pandas DataFrames](https://www.earthdatascience.org/courses/intro-to-earth-data-science/scientific-data-structures-python/pandas-dataframes/)
    - [NASA Advanced Software Technolocy Group - Intro to Pandas](https://colab.research.google.com/github/astg606/py_materials/blob/master/pandas/introduction_pandas.ipynb)
    
----

### Numpy

Numpy addresses some of the limitations in Python regarding data types, effeciency, representation of matrices, matrix manipulation and linear algerbra.

- The official [Numpy documentation](http://numpy.org/) 
- List of the [data types](https://numpy.org/doc/stable/user/basics.types.html) in Numpy
- Numpy for [IDL users](http://mathesaurus.sourceforge.net/idl-numpy.html)
- Numpy for [Matlab Users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)
- Numpy for [R and S-Plus users](http://mathesaurus.sourceforge.net/r-numpy.html)

#### Creating arrays with Numpy

In [15]:
import numpy as np #access all of numpy with 'np' 

#'from numpy import *' works too 
# however, this is bad coding style 
# as it pollutes your namespace and 
# you can 'overwrite' certian objects,
# variables, or methods

# a single element array
arr1 = np.array(4)
print(arr1)
print(arr1.dtype)

4
int32
()


In [17]:
# a 2x2 array
arr1 = np.array([[7,6],[5,4]])
print(arr1)
print(arr1.dtype)
print(arr1.shape)

[[7 6]
 [5 4]]
int32
(2, 2)


In [18]:
#a 3x2 array
arr1 = np.array([[1.,2],[3,4],[5,6]])
print(arr1)
print(arr1.dtype)
print(arr1.shape)

[[1. 2.]
 [3. 4.]
 [5. 6.]]
float64
(3, 2)


In [19]:
# an array of zeros
# dtype specifies the type
# can use: 
# 'np.int64' -> 64 bit integer
# 'np.uint64' -> unsigned 64 bit integer, can't take on negative values 
# 'np.complex64' -> 64 bit complex (2 32 bit floats, one real and one imaginary)
          
np.zeros([3,3], dtype=np.float64)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
# an evenly and linearly spaced array
print(np.linspace(0,100,11)) # start, stop, number of samples/entries

# note the difference between range
print(range(0,100,10)) # start, stop, step

# np.arange() is the numpy equivalent to pythons range()
print(np.arange(0,100,10))

# logrithmically spaced array
print(np.logspace(3,6,4))

----
### Array Indexing

Numpy is row-major, consecutive elements of a row reside next to each other in memory. 

Indexing starts at 0.

In [None]:
a = np.linspace(-1, 1, 6)
a[2:4] = -1        # set a[2] and a[3] equal to -1
a[-1]  = a[0]      # set last element equal to first one
a[:]   = 0         # set all elements of a equal to 0
a.fill(0)          # set all elements of a equal to 0

i = 1
j = 2
k = 2
a.shape = (2,3)    # turn a into a 2x3 matrix
print(a[0,1])       # print element (0,1)
a[i,j] = 10        # assignment to element (i,j)
a[i][j] = 10       # equivalent syntax (slower)
print(a[:,k])       # print column with index k
print(a[1,:])       # print second row
a[:,:] = 0         # set all elements of a equal to 0

More on [indexing](https://numpy.org/doc/stable/user/basics.indexing.html).

---

### Slicing

Numpys slicing extends Python's slicing to N dimensions. 

Slicing is constructed using start:stop:step notation inside of brackets (note the stop index is not included in slices)

Slicing in Numpy creates a reference of the array. It does not create a copy!

In [20]:
a = np.linspace(0, 29, 30)
a.shape = (5,6)
print(a)

[[ 0.  1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10. 11.]
 [12. 13. 14. 15. 16. 17.]
 [18. 19. 20. 21. 22. 23.]
 [24. 25. 26. 27. 28. 29.]]


In [21]:
print(a[1:3,:-1:2])  # a[i,j] for i=1,2 and j=0,2,4

[[ 6.  8. 10.]
 [12. 14. 16.]]


In [24]:
print(a)
b = a[1,:]      # extract 2nd column of a
b[1] = 2
print(b[1])
print(a[1,1])

[[ 0.  1.  2.  3.  4.  5.]
 [ 6.  2.  8.  9. 10. 11.]
 [12. 13. 14. 15. 16. 17.]
 [18. 19. 20. 21. 22. 23.]
 [24. 25. 26. 27. 28. 29.]]
[1, 1]
2.0
2.0


In [26]:
# Take a copy to avoid referencing via slices:
b = a[1,:].copy()
print(a[1,1])
b[1] = 7777     # b and a are two different arrays now
print(b[1])
print(a[1,1])

2.0
7777.0
2.0


More on slicing:
- [Numpy](https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding)
- [Problem Solving with Python - Array Slicing](https://problemsolvingwithpython.com/05-NumPy-and-Arrays/05.06-Array-Slicing/)

---

### Numpy Computations and Broadcasting

What happens when we add two Numpy arrays together?

In [31]:
a = np.array([1,2,3])
b = np.array([4,5,6])
c = a+b

print(c)

print(a+11)
print(a*3)

[5 7 9]
[12 13 14]
[3 6 9]


In [32]:
# Array must have similar dimensions to broadcast
arr1 = np.arange(4)
arr2 = np.arange(10, 15)

print(arr1)
print(arr2)

print(arr1+arr2)

[0 1 2 3]
[10 11 12 13 14]


ValueError: operands could not be broadcast together with shapes (4,) (5,) 

**Broadcasting** is a powerful way to effeciently manipulate arrays. When numpy broadcasts new arrays aren't created. In the example above, ```a+11```, the addition was carried out as if ```11``` was a 1-d array the same size as ```a```, however no new array was created. This can save memory and can have implications on the performance of code especially when arrays are large.

**Note** that when broadcasting arrays must have similar dimensions or one array needs to have a dimension of 1 or none. 

More on [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html).

Numpy also provides routines for:
- [Math](https://numpy.org/doc/stable/reference/routines.math.html)
- [Matrices (in place of arrays)](https://numpy.org/doc/stable/reference/routines.matlib.html)
- [Matrix operations and Linear Algerba for arrays](https://numpy.org/doc/stable/reference/routines.linalg.html)



---