<a href="https://colab.research.google.com/github/rkbono/GLY4451/blob/main/GLY4451_Lab_Lecture_07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Let's get some basic modules imported
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image 
import os

In [None]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !git clone https://github.com/rkbono/GLY4451.git
    fpath = './GLY4451/'
else:
    print('Not running on CoLab')
    fpath = './'

# Lecture 7:

- Learn more about **NumPy** and **matplotlib**
- Learn more about **NumPy** arrays.  
 




### NumPy and N-dimensional arrays

We briefly mentioned **arrays** in the last lecture but quickly moved into plotting (because that is more fun). But **arrays** are essential to our computational happiness, so we need to bite the bullet and learn about them now. 

**Arrays**  in **Numpy**  are somewhat  similar to  lists but there are important differences with advantages and disadvantages.
Unlike lists,   **arrays** are usually all of the same data type (**dtype**), usually numbers (integers or floats) and at times characters.  A "feature" of arrays is that the size,  shape and type are fixed when it's created. 

Remember, we can define a list:

L=\[ \]

then append to it as desired using the **L.append( )** method. It is more complicated (but still possible) to **extend**   arrays. 

Why use arrays when you can use lists?  Arrays are far more computationally efficient than lists, particularly for things like matrix math.  You can perform calculations on the entire array in one go instead of looping through element by element as for lists.  

To make things a little confusing, there are  several different data objects that are loosely called arrays, e.g., arrays, character arrays and matrices.  These are all subclasses of **ndarray** (N-dimensional array).  We will just worry about **arrays** in this course.  

Apart from reading in a data file with **NumPy**, as we did in the last lecture, there are many different ways of creating arrays.  Here are a few examples:

In [None]:
# define the values with the function array( ). For example a 3x3 array
A= np.array([[1, 2, 3],[4,2,0],[1,1,2]])
print (A)

# notice how there are no commas in arrays  

As we learned in the last lecture, **NumPy** can also generate an array using the **np.arange( )** function which works in a manner similar to **range( )** but creates an array with floats or integers.  **range( )** makes a list generator.  

This is just a reminder from Lecture 6:

In [None]:
# use list(range( )) to generate a one-dimensional (1D) list that ranges 
# from the first arguement up to (but not including) the second, that
# increments by the third:
#we learned that range( ) creates a list generator for integers
B=list(range(10)) 
print ("List made by 'range': ",B)
B_integers=np.arange(0,10,1) #arange( ) is an np function that creates an array of integers
print ("Array made by np.arange( ): ", B_integers)
B_real=np.arange(0,10,.2) #  and with floats
print ("Array with real numbers: \n",B_real) # notice the "\n"? that creates a new line in the text string?
# Notice that while "range" makes a list of integers, arange makes an array of integers 
#   or real numbers.  

There are several ways to create special arrays, for example, arrays initialized by zeroes, ones, or any other value: 

In [None]:
D=np.zeros((2,3)) # Notice the size is specified by a tuple of numbers of rows and columns.
print (D)

In [None]:
E=np.ones((2,3))
print (E)

You can quickly create arrays that same shape as others as well:

In [None]:
R = np.ones_like(E)
print(R)

To get any other value, just multiply your "ones" array by whatever number you want:

In [None]:
print (E*42)

As you might have guessed, **np.arange(start, end, step)** generates numbers between two endpoints (**start** and up to but not including **end**) that are spaced by **step**. 

At times, it is useful to have __N__ numbers equally spaced between two endpoints. For this, we use the function **np.linspace(start,end,N)** which generates an array starting with **start**, going up to (and including!)  **end** with $N$ linearly spaced elements:  

In [None]:
F=np.linspace(0,10,14) # give me 14 numbers from 0 to 10, including 0 and 10.
print  (F)
print (len(F))


To summarize: 

**np.linspace( )** creates an array with $N$ evenly spaced elements starting at **start** and including the **end** value,
while **np.arange( )** creates an array with elements at **step** intervals between  **start** up to but NOT including the  **end** value.  

Another trick for creating arrays, is to use the **np.loadtxt( )** function.   It reads  a data file into an array.   This example uses a 'pathname' which we  learned about in Lecture 1. 

In [None]:
newarray = np.loadtxt(fpath+'Datasets/random_numbers.txt')
newarray

### A few words about array types

In the last example, **NumPy** figured out what array type was required - it decided to make  a floating point array without our having to specify the type.  But what if we wanted an integer array with numbers from 0. to 9. instead?  

There are a few solutions to this.  First, we could use integers in the **np.arange( )** call:

In [None]:
np.arange(0.,10,1)

Or, we could specify the array type with the _dtype_ argument, where _dtype_ can be _int_, _float_, _str_, among others. 

In [None]:
print (np.arange(0,10,1,dtype='float'))
print (np.arange(0,10,1,dtype='int'))

So, what is an _object_ array?  That would be an array that allows different data types:


In [None]:
np.array([[1, 2, 3],[4,2,0],['Ashley', 'Luther','Zayd']],dtype='object')

But object arrays have their own limitations, e.g., you can't multiply the array by anything.  

Array attributes
Like other Python objects we have already encountered, arrays also have attributes and methods. As before, attributes do not have parentheses while methods do.

We will start by looking at array attributes which report on the state of the array.

As an example of the use of an attribute, we can find out what the data type of an array is with the attribute array.dtype:

In [None]:
D.dtype

As you may have already figured out, arrays have dimensions and shape. Dimensions define the number of axes, as in the illustration below. 

Rember our first array, $A$?  It  had two dimensions (axis 0 and 1).   We can use the attribute **ndim** to find find this out:
   



In [None]:
Image(filename=fpath+'Figures/ndim.jpg') # just ignore this - i just want to show you the pretty picture.

In [None]:
A= np.array([[1,2,3],[4,2,0],[1,1,2]]) # just to remind you
print ("the dimensions of A are: ",A.ndim)


Notice how **np.zeros( )**, **np.ones( )** and **np.ndarray( )** used a shape tuple in order to define the arrays in the examples above.   The shape of an array tells us how many elements are along each axis.  Python returns a tuple with the shape information if we use the **shape** _attribute_:  



In [None]:
A.shape

### Array methods

Arrays, like lists,   have a bunch of _methods_, but the _methods_ are different than the  _methods_ we learned about for lists.  For example, you can **append** to an array, but the results may surprise you. 



In [None]:
print ('D: \n',D)
print ('\n')
print ('D after append: \n',np.append(D,[2,2,2]))

See how we now have a 1-D array?  Not exactly what you expected?  We can deal with that problem by reshaping the array, as we shall see.  But first, you can also **concatenate** arrays which may be a simpler way to extend your array: 

In [None]:
print (np.concatenate((D,E)))

To solve the shape problem (2D versus 1D), you can  re-arrange a 1D array into a 2D array (as long as the total number of elements is the same).   To do that, we use the **array.reshape( )** _method_:

In [None]:
# we can take a 1D array with 50 elements and reshape it into, say a 5 X 10 2-D array:
B_real_2D=B_real.reshape((5,10))
print ('B_real: \n',B_real)
print ('\n B_real after reshaping: \n',B_real_2D)


You can go the other way, by taking a 2D (or more) array and turning it into one long 1D array using **array.flatten( )**.  

In [None]:
B_real_1D=B_real_2D.flatten()
print (B_real_1D)

Another super useful array method is **array.transpose( )** \[equivalent to **array.T( )**\] which swaps rows and columns:

In [None]:
print ('B_real_2D: \n',B_real_2D)
print ('\n B_real_2D transposed: \n',B_real_2D.transpose())


### Slicing and indexing ndarrays 

The syntax for slicing an array is similar to that for a list:  

In [None]:
B=A[0:2] # access the top two lines  of matrix A 
print (B)

### Masking Arrays

We can also 'mask' arrays. This is a handy thing a bit like doing an **if** statement for an array. For example, we could make an array of numbers,  say, times, between 0 and 10 minutes and then search just for the times greater than 5 minutes.

In [None]:
time=np.linspace(0,10,11)
lateTime=time[time>5]
print(lateTime)

If two arrays are the same shape, we can use one array to mask another array. For example, we could make an array of distances traveled at a constant speed of 20 miles per hour, and mask to show only the distances for the last 5 minutes.

In [None]:
distance=time/3
lateDistance=distance[time>5]
print(lateDistance)

How does this work? We can peek into this by looking at the result when we print (time>5). It turns out that this creates an array of True and False which tells the program what elements of the array to choose.

In [None]:
boolTime=time>5
print(boolTime)



For many more methods and attributes of ndarrays, visit the NumPy Reference website:  http://docs.scipy.org/doc/numpy/reference/.   


### Converting between Data Structures

We can convert from an array to a list:

In [None]:
L=A.tolist()
print ("Original array: \t", type(A)) # the '\t' inserts a tab
print ("List form: \t\t", type(L))
print (A)
print (L)

# notice the commas, the array turned into  a list of three lists

From a list to an array:

In [None]:
AfromL=np.array(L)# from a list
print ('AfromL: ')
print (AfromL)



Or from a tuple to an array:


In [None]:
AfromT=np.array((4,2)) # from a tuple 
print ('AfromT: ')
print (AfromT)

### Saving NumPy arrays as text files

Having created, sliced and diced an array, it is often handy to save the data to a file for later use.  We can do that with the command **np.savetxt( )**.  

Let's save our **A** array to a file called _A.txt_.

In [None]:
np.savetxt(fpath+'A.txt',A)

In [None]:
#and clean up
os.remove(fpath+'A.txt')

# Lecture 8: Pandas


### The Joy of Pandas

**Pandas** is a relatively new package for Python.  It allows us to read in more complicated data file formats than **NumPy**, and  wrangle the data in powerful ways. It also provides many useful data analysis tools.

There are two basic data structures in **Pandas**, the **DataFrame**, which is essentially a spreadsheet with multiple columns while the **Series**  is a single column of data. A **Series** is like a **list** or **array** on steroids. 

The dipole_coeffs file includes column headers (strings) as the first row.  This kind of file does not play nicely with **np.loadtxt( )**,  but we can use the **Pandas** function, **read_excel( )** to read in the datafile.  This function not only reads in 'comma separated variable' files (.csv), but also other data formats once we tell it how the file is delimited.  

Of course we must first import **Pandas** into the notebook:

In [None]:
import pandas as pd

In [None]:
dfDipole = pd.read_excel(fpath+'Datasets/dipole_coeffs.xlsx',index_col=0,header=0)
dfDipole.head()

**dfDipole** is now a Pandas **DataFrame**.  

So what is a **DataFrame**?   It is a new data container that is more sophisticated than any we have learned about so far (**lists, tuples, sets, dictionaries, arrays**).   
It has named columns (like an Excel spreadsheet) and identifies the rows by _indices_ starting with 0. 

The file we read in included column headers and **Pandas** knows which line they were in (after the header or skiprows arguments).  

If we want to be sure, we can use the **DataFrame.columns** attribute on the dfDipole DataFrame:


In [None]:
dfDipole.columns

Notice that a **DataFrame** is of type _object_, similar to one of the **NumPy** array types that mixed data types we briefly encountered before. Let's explore these objects with Pandas DataFrames.  

We see that the  columns of **dfDipole**  are: 
- "g10": the axial dipole component
- "g11": the equatorial dipole along prime meridian
- "h11": the  equatorial dipole perpendicular to prime meridian

Each one of these columns is a **Pandas Series.**  So to review:, **DataFrames** are like Excel spreadsheets and **Series** are one column of the spreadsheet.  

DataFrames can be seen as fancy dictionaries. One feature is that columns and indices are accessed mostly by using their name.

In [None]:
dfDipole['g10']

To save a DataFrame to a file, we use the **to_excel** method: 

In [None]:
dfDipole.to_excel(fpath+'dfDipole.xlsx', index=False)

Without the argument **index=False**, there is an annoying extra column with all the DataFrame's index numbers, with **index** set to False, these do not appear.  You can check it out with excel or something.  

Also, there are many other file formats besides 'excel spreadsheet' (.xlsx) which can be saved using the **to_csv** method with varying **sep** arguments.  **sep** stands for "separator".   For example, sep='\t' makes it a tab delimited (separated) file: 

## Playing with Pandas

We can also edit and transform dataframes. First thing we should do is add a column with dipole moment.

In [None]:
# adding dipole moment column, recall formula from lecture
dfDipole['moment'] = np.sqrt(dfDipole['g10']**2 + dfDipole['g11']**2 + dfDipole['h11']**2)
dfDipole.head()

We can slice and filter DataFrames similar to how we access numpy arrays. There are some nuances.<br><br>Based on my experience, using the ***.loc*** method has been the most reliable, if verbose, way of filtering data.

Let's only return rows where the dipole moment is greater than 61 but less than or equal to 64 $\mu$T.

In [None]:
# note the square braces!!, parantheses around filtering instruction, and '&' showing the AND conditional
dfDipole.loc[(dfDipole['moment']>61)&(dfDipole['moment']<=64)] 

and if we only want certain columns:

In [None]:
dfDipole.loc[(dfDipole['moment']>61)&(dfDipole['moment']<=64),['g10','moment']] 

Let's add two more columns, this time with pole latitude and longitude:

First, we need a function to return pole lat. and long. given g10, g11 and h11.

In [None]:
def gauss2pole(g10,g11,h11):
    """
    Returns pole latitude and pole longitude given dipole gauss coeffs g10, g11 and h11.
    """
    # remember equation from lecture returns colat from north pole
    plat = 90 - np.degrees(np.arccos(g10/np.sqrt(g10**2+g11**2+h11**2)))
    
    plon = np.degrees(np.arctan2(h11,g11))  # note arctan2 is better than arccos at dealing with quadrant issues
    
    return plat,plon
    

Okay, let's run this function. Note that since numpy and pandas play nice together, numpy treats Series as arrays.

In [None]:
dfDipole['plat'],dfDipole['plon'] = gauss2pole(dfDipole['g10'],dfDipole['g11'],dfDipole['h11'])
dfDipole.head()

Also, note how we split the two returns (a tuple) into two separate columns -- we essentially specified the LHS of the assignment as its own tuple.

Let's add another column, we can make it a boolean for whether the field is normal or reverse polarity. Technically, this can be done as a one-line command, but I want to show a powerful way to loop through DataFrames.

In [None]:
"""
this loops through two returns at each step, the dataframe row's index and row contents. 
Note the "iterrows" -- while you can loop through dataframes without iterrows, but it can lead to unexpected behavior
"""
polBool = [] # collector list for polarity assignments
for idx,row in dfDipole.iterrows():
    # each loop returns a new index (idx) and dataframe row (row)
    # rows are Pandas Series and columns can be accessed like dictionaries
    if row['plat']>=0:
        # if row polarity is >0, normal polarity
        polBool.append(True)
    else:
        polBool.append(False)
dfDipole['normal'] = polBool    

Note that you should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

In [None]:
dfDipole

## Plotting

Let's make some simple time series plots for now. We can try making polar wander maps when we discuss mapping modules such as cartopy

First, a simple plot of axial dipole term 'g10'

In [None]:
plt.plot(dfDipole['g10'])

We can use markers instead of lines:

In [None]:
plt.plot(dfDipole['g10'],'o')

Okay, so far so good but it's not very useful. It needs axes labels, and maybe we can make it look nicer.

In [None]:
# using the following set up gives you more control with how you create your figures

# creates figure object, sets dimensions (for matlab people, this is what would be returned if you ran 'gcf')
fig = plt.figure(figsize=(6,4)) 
# defines first set of axes for plotting (matlab people, this is 'gca')
ax = fig.subplots(1,1) 

ax.plot(dfDipole['g10'],color='black',linewidth=0.75,linestyle='-')
ax.set_xlabel('Time (unitless)')
ax.set_ylabel('$g_1^0$ ($\mu$T)') # some latex for fun


Let's see how adding extra lines looks:

In [None]:
# using the following set up gives you more control with how you create your figures

# creates figure object, sets dimensions (for matlab people, this is what would be returned if you ran 'gcf')
fig = plt.figure(figsize=(6,4)) 
# defines first set of axes for plotting (matlab people, this is 'gca')
ax = fig.subplots(1,1) 

ax.plot(dfDipole['g10'],color='black',linewidth=0.75,linestyle='-',label='g10')

ax.plot(dfDipole['g11'],label='g11')
ax.plot(dfDipole['h11'],label='h11')


ax.legend()
ax.set_xlabel('Time (unitless)')
ax.set_ylabel('$g_1^0$ ($\mu$T)'); # some latex for fun


We can also make multiple panels in the same figure. Those panels can be independent or shares parts of their axes. Here, we will use the same x-axis since it represents time.

In [None]:
fig = plt.figure(figsize=(6,12)) # note the new size
ax = fig.subplots(3,1,sharex=True) # and new numbers: #rows, #columns of subfigures

# now that there are multiple subplots, ax is a list and needs to be indexed to access each axes
ax[0].plot(dfDipole['plat'],color='black',linewidth=0.75,linestyle='-')
ax[0].set_title('Pole latitude')

ax[1].plot(dfDipole['plon'],color='black',linewidth=0.75,linestyle='-')
ax[1].set_title('Pole longitude')

ax[2].plot(dfDipole['moment'],color='black',linewidth=0.75,linestyle='-')
ax[2].set_title('Dipole Moment')

ax[-1].set_xlabel('Time (unitless)') # regular list indexing rules apply still

fig.tight_layout(); # this moves subplots around so things look neat 

Something looks interesting during the magnetic reversals -- let's directly compare pole latitude and moment. But if we plot both lines on top of each other, the different scales will make it hard to see what's going on. Let's generate twin axes to plot on different scales.

In [None]:
# using the following set up gives you more control with how you create your figures

# creates figure object, sets dimensions (for matlab people, this is what would be returned if you ran 'gcf')
fig = plt.figure(figsize=(6,4)) 
# defines first set of axes for plotting (matlab people, this is 'gca')
ax = fig.subplots(1,1) 

ax.plot(dfDipole['plat'],color='black',linewidth=0.75,linestyle='-',label='pole lat.')
ax.set_xlabel('Time (unitless)')
ax.set_ylabel('degree')

ax2 = ax.twinx()
ax2.plot(dfDipole['moment'],color='red',linewidth=0.75,linestyle='-',label='dipole moment')
ax2.set_ylabel('moment');

It looks like the weakest field strengths happen during reversals and excursions!