# Practical Python for Scientists and Engineers

Welcome!  The goal of these tutorials is to help you get familiar with basic aspects of Python that will allow you to be more productive in everyday work.  We will work on skills that will let you graph, manipulate, and manage data.  Our goal will be to take things one step at a time, learning only what is needed to accomplish a specific task.  The philosophy behind these tutorials is learning by doing, rather than learning to let you do something later.  Hopefully you will start learning tools right from day one that will be useful in other settings.  By the end of these tutorials, you will be able to make complicated applications that load and save data to and from files, manipulate data, run numerical simulations, make complex visualizations and more! 

## Tutorial 4: Array Indexing - Part 3
In this final part of the tutorial, we will see an example of how we might use arrays to help us work with data sets in a more efficient manner.

<U>Part 1:</u>
- Refresher on Lists and Arrays
- Indexing in a One Dimensional Array
- Special Functions for Creating One Dimensional Arrays

<u>Part 2:</u> 
- Multidimensional Arrays
- Special Array Functions

<u><b>Part 3:</b></u>
- Using Arrays to Store and Plot Data Sets

## Step 1: Using Arrays to Store Data
Given that arrays are a great way to oranize numbers and in the two-dimensional case are just a table, they are a great way to store data.  

Let's imagine that we performed an experiment where we collected the following data:

time |	output1	| output2
-----|----------|-------------
10 |	136	|178
20 |208|	450
30|	382| 962
40|	458|	1654
50|	515|	2572

We could easily store those results as a two-dimensional array.

In [None]:
#don't forget to import numpy!
import numpy as np

#create a two dimensional array storing the data:

#data =                                               #add your code here (it is ok to delete this comment)

print(data)

In [None]:
#imagine a scenario where you happened to have the data as a single long list of values 
#(e.g., perhaps you read them in from a file)
data2 = [10, 136, 178, 20, 208, 450, 30, 382, 962, 40, 458, 1654, 50, 515, 2572]
         
#change data2 from a list to an array:
data2 = np.array(data2)

#we can easily use the array tools we learned before to change this into a two-dimensional matrix:
data2 = data2.reshape([5,3])
print(data2)

The matrix output above should look the same as what you found by entering your own values above.

In [None]:
#note that if the data were read in as columns rather than rows, we would have had the following result instead:
data2 = [10, 20, 30, 40 ,50, 136, 208, 382, 458, 515, 178, 450, 962, 1654, 2572]

#if we applied the same steps, the result would be slightly different:
data2 = np.array(data2)
data2 = data2.reshape([5,3])
print(data2)

Notice that Python is filling the rows first, so our "time" values and data are wrapping around to different rows.  You can fix this by simply reshaping the data matrix differently:

In [None]:
#return the data2 array back to its original shape:
data2 = data2.reshape(data2.size)
print(data2)

#now reshape with 3 rows and 5 columns:
data2 = data2.reshape([3,5])
print("data2 reshaped with 3 rows and 5 columns")
print(data2)

This looks better, but is a "rotated" version of our data table.  We need to "rotate" the table back so that the columns, rather than the rows, hold our time and observation data.  "Rotating" a matrix is done with the transpose function, which is easily done with arrays by simply adding a "T" to the end of the array as below:

In [None]:
data2 = data2.T
print(data2)
#now the result is the same as the table

We want to compare the data values to some theoretical best fit lines so that we can see how well the match.  One of these theoretical equations is: $ d = 10*t$ and the other is $d=t^{2}$, where d is the observed data and t is the time when the observation was made.

We can use our data table to calculate the values of these models:

In [None]:
#first let's calculate the values of the models that we want to include in the table:
model1 = 10*data2[:,0]  #remember that we just need the t values to use in the formula
model2 = data2[:,0]**2

print('model1 results:')
print(model1)
print('model2 results:')
print(model2)

We could also combine our original data and these model results into one table:

In [None]:
#Start by creating a new array that can hold all of the observed data and model values.
#The original data table had 5 rows and 3 columns.  We are adding 2 new columns (one for each model).
#We therefore need to made a new array with 5 rows and 5 columns to store all of the results.
dataM = np.zeros([5,5])
print('before assigning data to the table:')
print(dataM)

#next we can assign the data values to the table
dataM[:,[0,1,2]] = data2   #note: we could have replaced the list [0,1,2] with the expression 0:3.  Try it! 
print('after assigning data to the table')
print(dataM)

#now let's assign the model results to the table:
dataM[:,3] = model1
dataM[:,4] = model2
print('after assigning model output to the table')
print(dataM)

Now we have all of our data in a format where we can easily access it.  In reality, we did not need to take all of these steps, but the goal in doing so was to illustrate how you might need to bring all of the data together in a format that is easy to use and share.

## Step 2: Using Arrays to Graph Data
Now that we have all of the data and model results in a table, let's plot them.

In [None]:
#remember, since we want to plot we will need to import matplotlib
import matplotlib.pyplot as plt

plt.plot(dataM[:,0], dataM[:,1],'.b')
plt.plot(dataM[:,0], dataM[:,2],'.r')
plt.plot(dataM[:,0], dataM[:,3],'b')
plt.plot(dataM[:,0], dataM[:,4],'r')

plt.xlabel('Time')
plt.ylabel('Observations')m
plt.legend(['obs1','obs2','model 1','model 2'])

Hopefully this is already making you feel like using arrays is an easy way to make these plots.  
<B>But wait - there's more!</b> You don't even need to plot each line seperately!  You can simply pass the entire array in to the plot function!

In [None]:
plt.plot(dataM[:,0],dataM[:,1:5],'.') #note that you still need to seperate x versus y values!

plt.xlabel('Time')
plt.ylabel('Observations')
plt.legend(['obs1','obs2','model 1','model 2'])

While that is super easy, we don't have the ability to make the custom line style choices we made in the first plot.  We can address this problem by creating a function, but we will have to make a few assumptions first. 

Assumptions:

1) we will always have two sets of observation data and two sets of model reults

2) the observed data will always be in the second and third column of the array and the model results will always be in the fourth and fifth columns

3) we always want the data to be plotted with points and the models to be plotted with a straight line


We won't always need to make these assumptions once we get some more programming tools in future tutorials, but for now we must make this assumption.

In [None]:
#define the function:

def data_plot(data):
    #we are going to plot each data set seperately so that we can set different line styles
    plt.plot(data[:,0], data[:,1],'.b')
    plt.plot(data[:,0], data[:,2],'.r')
    plt.plot(data[:,0], data[:,3],'b')
    plt.plot(data[:,0], data[:,4],'r')

    plt.xlabel('Time')
    plt.ylabel('Observations')
    plt.legend(['obs1','obs2','model 1','model 2'])

Now that we have created the `data_plot()` function, we can use it with our array.

In [None]:
data_plot(dataM)

A perfect plot created with only one command!  (At least after we had made the function.)