Lecture Notes document #2:  <span style="font-size:larger;color:blue">**Introduction to Python Programming, Part IV**</span>

This document was developed as part of a collection to support open-inquiry physical science experiments in Bachelor's level lab courses.  

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.  Everyone is free to reuse or adapt the materials under the conditions that they give appropriate attribution, do not use them nor derivatives of them for commercial purposes, and that any distributed or re-published adaptations are given the same Creative Commons License.

A list of contributors can be found in the Acknowledgements section.  Forrest Bradbury (https://orcid.org/0000-0001-8412-4091) of Amsterdam University College is responsible for this material and can be reached by email:  forrestbradbury ("AT") gmail.com
******

This is the fourth Jupyter Notebook document in a series of four which serve as a brief introduction to programming in <span style="font-size:larger;color:brown">**Python**</span>.  

<span style="color:red">**This material is not intended to substitute a good introductory programming course, but rather gives an overview of Python coding tricks that might be encountered in and utilized for data analysis methods.**</span>

><span style="font-size:larger;color:brown">**Outline of Part IV.**</span>
>
>
>- Modules and `import`
>
>
>- Reading from files
>
>
>- Numpy for data analysis
>
>
>- Matplotlib for plotting
>
>
>- Acknowledgements
>
>
>****************
>
>(please read and work through these Jupyter Notebook lecture notes to learn some useful programming tricks and note that recommended exercises are flagged below with:  <span style="font-size:larger;color:orange">**"EXERCISE"**</span> )
>
>****************

# Modules and `import`

A module is a Python program with useful functions that are intended to be used within other Python programs. There are many Python modules, a lot of which are part of the Python standard library:

- **Math**: supports numerical constants and operations like `sin`, `log`, etc.
- **Random**: a module for random number generation
- **Pickle**: a module for saving arbitrary Python objects to a file or transmitting them over the network.

But there are also modules distributed by other parties, and some of these are very sophisticated:

- **NumPy**: a module that supports efficient linear algebra and other numerical methods
- **Matplotlib**: a module for making graphs and charts.
- **Pandas**: a module for working with R-style data frames
- **ScikitLearn**: a module that contains many machine learning algorithms

In this lecture, we'll look at Numpy and Matplotlib.

You load a module by

```
import <modulename>
```

and this makes available any number of functions and variables that can be accessed using the module name followed by a dot. 

Example:

In [8]:
import math

print(math.pi)              # this and the following all return float values
print(math.sin(3))
print(math.sin(math.pi/2))  # we see the sine function takes angle in radians as input
print(math.log(10))         # notice this is the natural log function (not log base 10)

3.141592653589793
0.1411200080598672
1.0
2.302585092994046


If `<modulename>` is very long and you don't want to have to type it in all the time, you can rename it using:

```
import <modulename> as <alias>
```
If this is still too much typing, you can use a slightly different syntax that allows you to import those variables or functions into the global namespace directly:

```
from <modulename> import <name>, <name>, ...
```

In [9]:
from math import pi, sqrt, exp, atan2

#note sqrt() is the square root function
#note exp() is the natural exponent raised to the given power: e^()
#note atan2() is arc-tangent taking in both X and Y values to yield angle in plane

pi, sqrt(36), exp(1), atan2(0,-1)   


(3.141592653589793, 6.0, 2.718281828459045, 3.141592653589793)

# File I/O (loading/saving) and `with`


Python offers built-in functions `open` and `close` that allow low level access to files, allowing you to read and write strings or binary (encoded) data. The structure is as follows:

- The call to `open` yields an object that is called a "file handle", that serves as an access point to file operations:  `<handle> = open(<filepath>, <mode>)`
- You can open a file in several different modes; see the table below.
- Your program does what it wants to do with the file using its handle.
- When you're done, you have to `close` the file handle to free up system resources and let the system know that the file is not in use.


Modes for opening a file:

| mode        | meaning                                        |
| ----------- | ---------------------------------------------- |
| `r`/`w`/`+` | read/write/update (default `r` can be omitted) |
| `t`/`b`     | text/binary (default `t` can be omitted)       |
| `a`         | append                                         |
| `x`         | check that file did not exist before           |

Once a file is open, there is the danger that an error will occur in the program, leading to the file never being closed again. This will keep costing system resources until the program is terminated. For that reason, Python has a built-in keyword `with` that can be used to handle resources appropriately under all circumstances. It looks like this:

```
with open(<filepath>, <mode>) as <handle>:
    # do stuff with <handle>
```

Once you have a file handle, there are all kinds of things you can do with the file like reading/writing a line. See the Python tutorial (https://docs.python.org/3/tutorial/) for more information.

**IMPORTANT NOTES:**  

Python's file path names can be "relative" paths.  Thus, using only the filename works whenever that file is in the same directory as the Jupyter Notebook that you're using.  It's often more convenient to keep your data files in a sub-folder of your code's directory, which is what we do here.

Unfortunately, file paths on different operating systems use slightly different slashes.  To make sure the following code works for different operating systems, the nifty `pathlib` module is used to automatically fix (when needed) the notation for the file paths!

While `pathlib` will make sure the notation is good, the following example will only actually find the data file if you have it located in the folder named below which has been placed as a subfolder of the directory where you are running this Jupyter Notebook file:  `datafolder/dorian_gray.txt`.  This should already be arranged if you had extracted the given materials in one go and kept the directory structure intact.



In [None]:
# Read the entire novel "Picture of Dorian Gray"  by Oscar Wilde into a string:
# note: text file must be in subfolder "datafolder" of this Jupyter Notebook's own directory!

from pathlib import Path  # pathlib.Path formats pathnames to suit your operating system

# declare the appropriate filename as a string:
filename = "dorian_gray.txt"   # was sent in zipfile along with this Jupyter Notebook

sub_folder = Path("datafolder/")     # set name of sub-directory where data is to be stored
file_path = sub_folder / filename    # file path is combination of filename and its folder


with open(file_path, 'r') as handle:
    dorian_gray = handle.read() # get all data from the file as a string
    
# Split up the string into its individual words:
dorian_gray_words = dorian_gray.split()

print("The number of words in the book is", len(dorian_gray_words), 
      "; the 1000th word is", dorian_gray_words[999])


# NumPy for data analysis

We will see that the NumPy module is really useful for data analysis.

There is actually already a module in the Python standard library called `array`, which allows construction of "homogeneous" arrays that have an efficient representation in memory.  A homogeneous array contains values of only a single type.  However, the functionality of Python arrays is rather limited if you want to do data analysis.

Thanks to the good people in the open source community, we can switch to `NumPy`, which provides a more full-featured alternative which supports many methods from linear algebra, and has really fast implementations.

In [15]:
import numpy as np

## ndarray: construction
The basic `NumPy` type is the `ndarray`. One way to construct it is from a Python list or tuple, or a list/tuple of list/tuples.

In [16]:
# Create a single dimensional array from a list or tuple.
a = np.array(["hello", 3])
a

array(['hello', '3'], dtype='<U5')

In [17]:
type(a)

numpy.ndarray

Note that all arguments are converted to a single type, in this case <U5 which seems to be used for strings.  Every `numpy` object has a `dtype`. It is usually inferred automatically, but you can also specify it explicitly:

In [18]:
np.array( [ [1,2], [3,4] ], dtype=complex )

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

In [198]:
# Here is a three dimensional array:
np.array( [[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]]] )

array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]])

You can also create arrays of a given size without immediately filling it with data from regular Python objects. The fastest way to construct an array is using `empty`, which does not initialise the entries at all: the values are defined what happens to be present in the computer memory at the time. So make sure to initialise it properly later before you use any of these values!

In [None]:
# warning!  This yields random junk dependent on your recent actions!
np.empty([15, 3])

# try running it, but don't bother figuring out why it yields the given values

It is also often convenient to initialise an `ndarray` with zeros or ones.

In [21]:
np.zeros([2,3])

array([[0., 0., 0.],
       [0., 0., 0.]])

Note that they are initialised by default to some floating point type: let's find out which it is:

In [22]:
np.zeros([2,3]).dtype

dtype('float64')

But, as before you can explicitly specify a different data type:

In [23]:
np.ones([3,2], dtype=np.int32)

array([[1, 1],
       [1, 1],
       [1, 1]])

Finally, there is an analogue of the `range` function, called `arange`, that constructs single dimensional arrays, and a related method called `linspace` in which you provide the number of steps rather than the stepsize.

In [11]:
# go from 0 to 40 (excluding 40 itself) in steps of 3
np.arange(0,40,3)

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39])

And an alternative linspace, where the first and last inputs specify the first and last values, and the third input specifies how long the array should be.

In [12]:
# go from 0 all the way to 40 in 3 steps
np.linspace(0,40,3)

array([ 0., 20., 40.])

## Views

In some object oriented designs, and certainly in `NumPy`, it is possible that you are offered access to some data not through one object, but through a whole bunch of objects. Multiple objects providing access to the same underlying data are called *views* on the data. Many of the methods and functions that we've considered (such as `reshape` and `transpose`, as well as extracting rows or columns through indexing) actually yield a different view of the same underlying data.

It is important to realise that `ndarray`'s are **mutable**, and that mutations to an `ndarray` will affect *all* views on it.

The advantage of views is that they are lightweight: if you have a huge matrix, producing a different view of the matrix does not require copying all the data.

You'll see an example of several views on the same data below when we use reshape.

## Reshaping, joining and splitting arrays

In addition to `dtype`, `ndarrays` also have a `shape` property that determines the size and dimensionality of the matrix. Changing it changes the interpretation of the data in the matrix. Changing the shape gives you two "views" of the *same* underlying data!

In [None]:
a = np.array([(1,2,3),(4,5,6)])
print("Normal a:\n",a, sep="")

b = a.reshape(3,2)
print("Shape changed:\n", b, sep="")

a[1,0]=8 # this changes the data that both a and b refer to! So:
print("After a was changed:\n", a, sep="")

b[1,0]=9 # this changes the data that both a and b refer to! So:
print("After b was changed:\n", b, sep="")

# before running this, guess what a & b will be before & after changes !

One can glue together matrices and/or vectors using `vstack` and `hstack`:

In [17]:
np.vstack((b,b,b))

array([[1, 2],
       [9, 8],
       [5, 6],
       [1, 2],
       [9, 8],
       [5, 6],
       [1, 2],
       [9, 8],
       [5, 6]])

To split an array, there are `hsplit` and `vsplit`:

In [18]:
print(a)
np.hsplit(a,3) # split into 3 chunks

[[1 2 9]
 [8 5 6]]


[array([[1],
        [8]]), array([[2],
        [5]]), array([[9],
        [6]])]

## Indexing and iterating

You can index an ndarray the same way you can index a list; if the ndarray is multidimensional you specify the ranges for each dimension in turn, separated by commas:

In [20]:
# here, remember the colon asks for all values
# (thus all rows and all columns after and including the second one - column1)
a[:,1:]

array([[2, 9],
       [5, 6]])

An `ndarray` is *Iterable*, so can be used in a `for`-loop; it will iterate over the first dimension (the rows):

In [21]:
for i in a:
    print("Here is a row:", i)

Here is a row: [1 2 9]
Here is a row: [8 5 6]


If you want to iterate over all items in the array, the `flat` field contains an iterator over all entries:

In [22]:
for entry in a.flat:
    print("Entry: ", entry)

Entry:  1
Entry:  2
Entry:  9
Entry:  8
Entry:  5
Entry:  6


To loop over the columns of the matrix, use the transpose of the matrix, which is available as the `T` field:

In [24]:
for i in a.T:
    print("Here is a column: ", i)

Here is a column:  [1 8]
Here is a column:  [2 5]
Here is a column:  [9 6]


## Functions and operators that operate on each array entry

Regular mathematical operators are applied to each entry individually:

In [None]:
c = np.arange(a.size).reshape(a.shape)  # creates new matrix "c"
print("a:\n", a, "\nc:\n", c)
print("a+c:\n", a+c)
print("a+1:\n", a+1)
print("a*c:\n", a*c)
print("a*3:\n", a*3)

There are also lots of useful functions that work on each entry individually. (These are called "universal functions") Here are a bunch of examples:

In [138]:
b = np.arange(7,0,-2) # from 7 to 0, in backwards steps of -2
print("b =", b)

print("np.exp(b)= ", np.exp(b))
print("np.exp2(b)=", np.exp2(b))    # 2^()
print("np.log(b)= ", np.log(b))
print("np.log2(b)=", np.log2(b))    # log base 2
print("np.sin(b)= ", np.sin(b))
print("np.sqrt(b)=", np.sqrt(b))

b = [7 5 3 1]
np.exp(b)=  [1096.63315843  148.4131591    20.08553692    2.71828183]
np.exp2(b)= [128.  32.   8.   2.]
np.log(b)=  [1.94591015 1.60943791 1.09861229 0.        ]
np.log2(b)= [2.80735492 2.32192809 1.5849625  0.        ]
np.sin(b)=  [ 0.6569866  -0.95892427  0.14112001  0.84147098]
np.sqrt(b)= [2.64575131 2.23606798 1.73205081 1.        ]


## Functions that operate on matrices or vectors as a whole

In [139]:
print("b   =  ",b)
print("sum(b)=   ", np.sum(b))
print("min(b)=   ", np.min(b))
print("max(b)=   ", np.max(b))
print("median(b)=", np.median(b))
print("mean(b)=  ", np.mean(b))    # average of the entries
print("var(b)=   ", np.var(b))     # variance of the entries
print("stdev(b)= ", np.std(b))     # standard deviation of the entries
print("sort(b)=  ", np.sort(b))    # new ndarray with size sorted entries

b   =   [7 5 3 1]
sum(b)=    16
min(b)=    1
max(b)=    7
median(b)= 4.0
mean(b)=   4.0
var(b)=    5.0
stdev(b)=  2.23606797749979
sort(b)=   [1 3 5 7]


## Linear algebra

`NumPy` contains many common linear algebra operations and algorithms. If you have taken a linear algebra course before, you will appreciate how quickly it can take a matrix's inverse and do other operations on large matrices :)  

For a complete reference, see [the NumPy documentation](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html), but here are some examples:

In [46]:
# transpose
a.T

array([[1, 8],
       [2, 5],
       [9, 6]])

In [52]:
# product of matrices
print("a =\n", a, "\n and the matrix product of a and transpose of a:")
np.dot(a,a.T)

a =
 [[1 2 9]
 [8 5 6]] 
 and the matrix product of a and transpose of a:


array([[ 86,  72],
       [ 72, 125]])

In [None]:
# invert the matrix
g = np.array([[1,3],[2,5]])
print("g =\n", g, "\n and inverse of g:")
print(np.linalg.inv(g))
print("and the matrix product of h and h-inverse is: \n", np.dot(h,np.linalg.inv(h)))
print("the identity matrix of course! \n")
h = np.array([[1,4],[2,5]])
print("h =\n", h, "\n and inverse of h:")
print(np.linalg.inv(h))
print("and the matrix product of h and h-inverse is: \n", np.dot(h,np.linalg.inv(h)))
print("the identity matrix of course!")

In [None]:
# compute the determinant
np.linalg.det(g)

# Matplotlib for plotting

Matplotlib is an extensive plotting library for Python. It consists of a number of components:
- Matplotlib API: the "front end" of the package for users, providing numerous primitive functions to create plots.
- Renderers: modules that know how to draw to different devices or file types, such as postscript, svg, png, etcetera
- PyPlot: a Matlab-like plotting interface.

You will mostly use matplotlib via PyPlot. (There is also a thing called PyLab, which is the same as PyPlot but includes NumPy in the same namespace. It is cleaner to use these two modules in their own namespace, so we will not use PyLab.)

Matplotlib works together beautifully with Numpy.

To gain access to PyPlot, and to call its functions using the abbreviation `plt.` (instead of `mathplotlib.pyplot.`), it is customary to import it like this:

In [59]:
import matplotlib.pyplot as plt

## Scatter and Line plots

One of the most common uses of plotting tools is to graph a bunch of $(x,y)$ coordinate pairs:

In [None]:
plt.plot(range(9),[3,5,2,8,9,7,13,11,22])
plt.show()

The `plot` function will generate the plot, and `show` will display it on the default output device, and block (suspend execution) until the output device is done plotting.  It is possible to get Python to create a separate viewing window containing the plot, but the program's execution only then resumes once the window is closed.

If it doesn't happen by default, but you prefer it, you can make sure to embed the plot output in the Jupyter notebook with the following code line.  (This is not part of Python, but interpreted by Jupyter as a special command.)

In [136]:
%matplotlib inline

Alternatively, if you want some fancy plot-interaction tools inline, you can turn them on with this:

In [120]:
%matplotlib notebook

Using either of these options, the default output device will be the notebook itself. It does not block: output is immediately rendered and Python proceeds to execute code. Still, it's important to call `show` to signal that the plot is complete and that no more embellishments need to be drawn.

In [None]:
plt.plot(range(9),[3,5,2,8,9,7,13,11,22])
plt.show()

Assuming you're in the *matplotlib notbook mode*, then after you are done interacting with the figure, it's probably a good idea to hit the "stop interaction" button in the upper right of the figure to clear up system resources.

In fact, for now, we keep thing simpler and return to the regular inline mode WITHOUT plot interaction so that we don't need to remember to keep stopping plots:

In [122]:
%matplotlib inline

Note that in this particular plot's case, the $x$-coordinates of the points are not particularly exciting.  In this special case, since their values just correspond to the indices of the array's values, the $x$-axis values can be omitted because PyPlot will automatically plot an array's values against their indices when only one array is specified.  

Also, instead of a line plot, we can select the point and line styles with an additional string argument. Below is an example, butsee online for more possibilities: there are a *lot* of available marker and line properties!

In [None]:
plt.plot([3,5,2,8,9,7,13,11,22], marker='*', linestyle='--', linewidth=2, color='g')
plt.show()

The supplied coordinates can describe any two dimensional path; often it's better to suppress lines altogether and use only markers to get a scatterplot (usually the data points correspond to actual measurements, and thus the connecting lines are often meaningless - and sometimes even distracting or misleading!). 

We can also specify the ranges of the axes and add labels and a legend. 

For constructing plots below, we can use the random number generating functions in NumPy to easily generate a bunch of normally distributed numbers to simulate noise in a measurement.

In [None]:
# run this code line several times and observe randomness!
np.random.randn(10)   # 10 random picks from standard normal dist: (center=0, stdev=1)

In [None]:
xs  = np.random.randn(100)*2-5
ys  = np.random.randn(100)*2-3
xs2 = np.random.randn(100)
ys2 = np.random.randn(100)

# add labels to the datasets
plt.plot(xs, ys,           color='r', marker='o', 
         linestyle='None', label='bunchapoints')
plt.plot(xs2, ys2,          color='g', marker='+',
         linestyle='None', label='very different points')

# change the axis limits using axis() 
# - axis('off') disables the axes altogether
# - can also use xlim, ylim to set the axes individually
plt.axis([-10,4,-8,4])

# change the vertical axis tick marks to only include even numbers
plt.yticks(np.arange(-8,6,2))

# add grid lines
plt.grid(True)

# add labels to the axes
plt.xlabel("The x coordinates of the points, yo!")
plt.ylabel("Large values of y over there! -->")

# add a legend listing the labels of the data sets
plt.legend()

plt.show()

If you want to plot data on a log scale, you can use `plt.yscale('log')` and the same for the xscale. A shorthand is to replace the `plot` function by `semilogx`, `semilogy` or `loglog`.

As a demo, consider a random walk, which starts at position 0 and where each step adds a standard normal random number to the position. The standard deviation of the position after $n$ steps is $\sqrt{n}$, which will look like a straight line on a double logarithmic scale.
So we can nicely visualise the size of *actual* deviations of the walk compared to the expectation.

In [None]:
# again, since random data are generated with each running, 
# you can rerun this code and see different results each time!

# First generate a random walk and show it
n = 10000
walk = np.cumsum(np.random.randn(n))
sdev = np.sqrt(np.linspace(0,n,n))
plt.plot(walk)
plt.plot(sdev, 'g')
plt.plot(-sdev, 'g')
plt.show()

In [None]:
# Now plot the walk on a loglog scale 

# and rerun first the above code and then this one to plot new results
# if you're in matplotlib "notebook" mode, you must first "stop interaction" 

plt.loglog(np.abs(walk))
plt.loglog([1,n],[1,np.sqrt(n)], "g-")
plt.show()

We've now made two related plots; it's often useful to make a bunch of plots and group them together in a single figure. This is done with the `figure` and `subplot` functions. The `subplot` is given a numeric argument that defined the number of rows and columns of subplots and the index of the subplot we will proceed to define. It returns a handle to the subplot, that we can use to identify which subplot we want to modify.

In [None]:
# This reuses the data above, but combines the two plots into one figure

# Create the figure and make it a bit wider
fig = plt.figure(figsize = (12,4))

# Create two subplots
p1a = plt.subplot(1,2,1)  # in figure with 1 row and 2 columns, this will be index =1
p2a = plt.subplot(1,2,2)  # in figure with 1 row and 2 columns, this will be index =2

# Plot the desired graphs in the right subplots
p1a.plot(walk)
p1a.plot(sdev, 'g')
p1a.plot(-sdev, 'g')

p2a.loglog(np.abs(walk))
p2a.loglog([1,n],[1,np.sqrt(n)], "g-")

plt.show()

## More kinds of plots

PyPlot supports many different kinds of graphs. We've covered line plots and scatter plots, but here is a brief demonstration of some of the most important kinds of graphs, where we include several plots in one figure with PyPlot's fancy `subplots()` function:

In [None]:
# create a figure and a 2x2 grid of subplots all at once
fig, ax = plt.subplots(2,2,figsize=(10,10))

# make a histogram in top-left position
ax[0][0].hist(np.random.randn(1000), bins=20)
ax[0][0].axis('off')        # can turn off the axis lines
ax[0][0].set_title("hist")


# make a bar chart in top-right position
ax[0][1].bar([2, 5, 6] ,bottom=[0,1,3], width=[1,2,1], height=[1,3,2],\
             color=['r', '#00b000', 'b']) # long code line continued with \, 
                                          # #00b000 is a hex-triplet color code
ax[0][1].set_title("bar")
#ax[0][1].axis('off')       # can turn off the axis lines


# generate a two dimensional array zs, to demonstrate contour and heatmap plots
def scalarfield(dx, dy):
    return np.sqrt(dx*dx+dy*dy)
ps = 200
zs = np.empty([ps,ps])
for x in range(ps):
    for y in range(ps):
        zs[x,y] = scalarfield(x-.1*ps, y-.3*ps) - scalarfield(x-.7*ps, y-.6*ps)

# generate 3D contour plot
lvl = np.linspace(zs.min(), zs.max(), 12) # 12 contour lines
ax[1][0].contour(zs, levels=lvl)  # here, matrix row & column indices = independent variables
#ax[1][0].axis('off')   # can turn off the axis lines
ax[1][0].set_title('contour')

# generate heat map plot
ax[1][1].pcolor(zs)     # here, matrix row & column indices = independent variables
ax[1][1].axis('off')    # can turn off the axis lines
ax[1][1].set_title('pcolor')
        
plt.show()

## Plotting data imported from a file

The code below reads in data from a text file and plots some of them against each other in scatter plots.

**NOTE:**  It will only find the data file if you have it located in the folder named below which has been placed as a subfolder of the directory where you are running this Jupyter Notebook file:  `datafolder/testdatafile.txt`

In [None]:
# this code reads in data from file and plots them

# Do you want to change whether the figure is interactive?  then one of these may help:
#%matplotlib inline
#%matplotlib notebook
#%matplotlib tk

# import relevant libraries (if not already imported)
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path  # pathlib.Path formats pathnames to suit your operating system


# declare the appropriate filename as a string:
filename = "testdatafile.txt"

# Open file for reading with handle: "f", which is then acted upon below:
sub_folder = Path("datafolder/")     # set name of sub-directory where data is to be stored
file_path = sub_folder / filename    # file path is combination of filename and its folder
f = open(file_path, 'r')             # option 'r' opens for reading only

# In case the file starts with text:  Read & then ignore any header line(s), one per line:
#header1 = f.readline()
#header1 = f.readline()
#...
#in my example of saved SERIALPORT DATA, there are NO header lines!

# Create empty lists in which to add the data from each column in the file
#in my example of saved SERIALPORT DATA, there are four columns, corresponding to:
i = []  # index of measurement
t = []  # time in microseconds of measurement
a = []  # analog reading (in 10 bits, 0-1023) of signal
d = []  # digital reading (0 or 1) of signal

# This "For Loop" continues over all the lines and extracts values from each column
for line in f:
  columns = line.split()
  i.append(int(columns[0]))  # if there are extra delimiters (eg commas) to strip, we use: 
  t.append(int(columns[1]))    # columns[1].strip(",")
  a.append(int(columns[2]))
  d.append(int(columns[3]))

# We're done reading from the file, so we close it to free up computer memory space:
f.close()

print('\n Your data file included '+ str(len(i)) +' data lines') 

# Instead of lists, we often want ndarrays to more easily analyze and plot
# Define a list with the four ndarrays as its four elements:
data_array_list = [np.array(i,dtype='i'), np.array(t,dtype='i'),\
                   np.array(a,dtype='i'), np.array(d,dtype='i')]   # long line continued: \ 

# This figure initialization and for loop creates 3 subplots
fig, ax = plt.subplots(1,3,figsize=(12,5))  #fig is whole figure, ax is list of subplots
axislabels = ["measurement's index",'time (microseconds)','analog reading','digital reading']
counter = 0          # counter is for refering to the three subplots (0,1,2)
for k in (0,2,3) :   # k is for refering to the three y-axis arrays & their labels: i, a, d 
    ax[counter].plot(data_array_list[1], data_array_list[k], '.')
    ax[counter].set(xlabel=axislabels[1])   # labels, etc in subplots are defined via .set()
    ax[counter].set(ylabel=axislabels[k])   # labels, etc in subplots are defined via .set()
    ax[counter].ticklabel_format(axis='x', style='sci', scilimits=(0,0), useMathText=True)
                # .ticklabel_format() formats axis labels, here giving scientific notation
    counter += 1

# Display the figure:
plt.tight_layout()      # a special function that helps avoid overlap of subplots & axis label
plt.show()

After importing the file's data with above code, you may want to only plot the analog reading versus time graph:

In [None]:
# after importing file's data with above code, this only plots analog reading versus time:

# uncomment next line if you want an interactive figure:
#%matplotlib notebook

plt.figure(figsize=(12,5))
plt.plot(data_array_list[1], data_array_list[2], 'o-')  # 'o-' = Dots + lines
plt.xlabel(axislabels[1])       # special function in .plot
plt.ylabel(axislabels[2])       # special function in .plot
plt.ticklabel_format(axis='x', style='sci', scilimits=(0,0), useMathText=True)
plt.show()

## Easier data import function

Finally, please note that data loading can be accomplished with fewer lines of code with the help of the function `loadtxt()` in the numpy library.  This allows us to replace the following lines of code in the previous method:

```
f = open(file_path, 'r')             # option 'r' opens for reading only

i = []  # index of measurement
t = []  # time in microseconds of measurement
a = []  # analog reading (in 10 bits, 0-1023) of signal
d = []  # digital reading (0 or 1) of signal

for line in f:
  columns = line.split()
  i.append(int(columns[0]))  # if there are extra delimiters (eg commas) to strip, we use: 
  t.append(int(columns[1]))    # columns[1].strip(",")
  a.append(int(columns[2]))
  d.append(int(columns[3]))

f.close()

data_array_list = [np.array(i,dtype='i'), np.array(t,dtype='i'),\
                   np.array(a,dtype='i'), np.array(d,dtype='i')]   # long line continued: \ 

```

with just one line:

```
array = np.loadtxt(file_path, delimiter='\t') # loads file's data (tab-separated) into "array"
                                              # for comma-separated-value data:  delimiter=','
```

Thus, in subsequent documents, we will use this easier method.  But, for some file types, the `loadtxt()` function may not work, thus it's useful to have seen the method above.

In [None]:
#  here we load the same data but use the loadtxt() function in numpy:

import numpy as np
import matplotlib.pylab as plt
from pathlib import Path  # pathlib.Path formats pathnames to suit your operating system

### uncomment one of the following:  
### "notebook" plotting mode allows for interactive (zoom-able) plots, 
### "inline" doesn't and just shows static plots
# %matplotlib notebook   
# %matplotlib inline

# declare the appropriate filename as a string:
filename = "testdatafile.txt"   # was sent in zipfile along with this Jupyter Notebook

sub_folder = Path("datafolder/")     # set name of sub-directory where data is to be stored
file_path = sub_folder / filename    # file path is combination of filename and its folder

array = np.loadtxt(file_path, delimiter='\t') # loads file's data (tab-separated) into "array"
                                              # for comma-separated-value data:  delimiter=','
    
# "array" has 4 columns, 1st = indices, 2nd = times, 3rd = analog readings, 4th = digital readings
# here, we create new arrays for each of them:
indexarray = array[:,0] # [:,0] means values from ":"=ALL rows and 0th (first) column
timearray = array[:,1]  # [:,1] means values from ":"=ALL rows and 1th (second) column
analogarray = array[:,2]  # [:,2] means values from ":"=ALL rows and 2th (third) column
digitalarray = array[:,3]  # [:,3] means values from ":"=ALL rows and 3th (fourth) column

plt.figure(figsize=(9, 6))
plt.plot(timearray, analogarray, ".") 
plt.xlabel("time (microseconds)")
plt.ylabel("analog signal (bits = arbitrary units)")
plt.show()

#  NOTE:  if you're in %matplotlib notebook Mode, you may have to run this twice before it works

# Acknowledgements

This document includes significant material, structure, and inspiration from documents in Professor Gary Steele's "Introduction to Python for Physicists" (https://gitlab.tudelft.nl/python-for-applied-physics/practicum-lecture-notes).

Jan Koetsier is largely to thank for the adaptation and extension of these materials for Maker Lab students.

Questions or suggestions can be sent to Forrest Bradbury (https://orcid.org/0000-0001-8412-4091) :  forrestbradbury ("AT") gmail.com