# Introduction to python for CHEM2000

In [None]:
import cekComputerLabs as cek
cek.checkGitRepo()

Python is a very powerful object oriented programming language, which is used in most fields of science both for computing and for data analysis.
This document is a very simplistic introduction to Python written by a non-Python expert, and is designed to be a quick introduction to Python for CHEM2000 students.


Simple python online courses for people with little or no previous programming experience can be found on the Software Carpentry website:
* [Plotting and Programming in Python](http://swcarpentry.github.io/python-novice-gapminder/) 
* [Programming with Python](https://swcarpentry.github.io/python-novice-inflammation) 
* [The Unix Shell](https://swcarpentry.github.io/shell-novice/)
Some of those courses are also run in person at Curtin through the Curtin Institute of Computation, but they can be easily done independently and they are designed to be completed in a day (about 7h30m for the python courses).

This document is aimed at providing a summary of the key concepts that you'll need  to solve the numerical laboratories that are part of the CHEM2000 unit using Python, but it is far from comprehensive. There are many different ways to solve problems numerically, different libraries/packages can be used and here we are providing only one (or a few) possible way of doing things.

The entire document and the examples therein have been developed using a **Jupyter Notebook**, and work best there.

A Jupyter notebook consists of a series of cells that can be either _text_ or _code_, the cell can then be _executed_ by pressing the **run** button or _Shift+Return_ (if you want to look like a pro). If things go pear-shaped, restarting the kernel and rerunning all the cells may help; this can be done by pressing the _fast forward_ button.

Text cells accept _Markdown_ syntax and once executed will render the text. Markdown provides a simple way of producing formatted text and it accept _LaTeX_ commands for equations, which look much nicer that those made with the equation editor in MS Word. For a Markdown tutorial click [here](https://www.markdowntutorial.com)

The code cells contain Python 3 commands. Note that some commands in Python 3 are different from Python 2, _e.g._ the **print** command works differently.

## Getting help

You can ask any python questions on Campuswire, but you would get a much quicker answer using google.
In fact, chances are that someone has already asked your question on stackoverflow, _e.g._ 
* ["How to print formatted string in Python3?"](https://stackoverflow.com/questions/26862773/how-to-print-formatted-string-in-python3)
* ["Adding a legend to a plot"](https://stackoverflow.com/questions/19125722/adding-a-matplotlib-legend)

If you need help for a specific function, in the jupyter notebook, you can use the "?" function. For example
```python
import numpy as np
np.mean?
```
will give you the help for that function.

If you know a package has the function you need, but you don't quite remember its name, you can use `tab` to ask the jupter notebook to prompt you with the possible functions, _e.g._ try typing
```python
import numpy as np
np.m
```
and then press ```tab``` 

## My first Jupyter Notebook
You can use the interactive buttons below to create an empty notebook for practising some python coding. A similar interface will be present in all virtual laboratories for this unit. In particular the last button will create a new notebook for the lab report, which contains the assignment for that week.
You can also open a notebook that you had previously created.

In [None]:
# cek.launchNotebooks()
cek.openNotebook()

In [None]:
cek.convertNotebook()

In general you would use the first _code_ cell to import all the packages that we need to use in the remainder of the code. We can import entire packages, part of the packages and assign aliases to the package name, _e.g._

In [None]:
# python packages
import pandas as pd # Dataframes and reading CSV files
import numpy as np # Numerical libraries
import matplotlib.pyplot as plt # Plotting library

this is particularly important because it ensures that if we define a variable that has the same name of a function belonging to that package there is no confusion about what the code does.
```python
# This is a variable
mean = 10 

# This is the NumPy function to compute the mean of an list of numbers
np.mean([1,2,3])
```

Things to keep in mind when programming in Pythons are

1. It is Case sensitivE

```python
average = 0
```
is different from
```python
aVeRagE = 0
```

2. Variables' names cannot have spaces. This line would give you a _syntax error_
```ptyhon
number of values = 10
```

3. Spaces between operators are ignored, _1+2_ is the same as _1 + 2_

```code
print(1 + 2)
print(1+2)
```

In [None]:
print(1 + 2)
print(1+2)

4. \# is a comment and everything on its right is ignored

In [None]:
# This is a comment
print("Hello world") # This is also a comment

5. A single command can spread over multiple lines. If we split the line between variables no continuation line character is required. Otherwise we can use the "\" character. Note the different results of the two commands below.

In [None]:
print("Hello", 
      "world")
print("Hello \
       world")
print("Hello \nWorld")

6. Indentation matters! Indentation is used to define the content of loops, functions... This will be more clear when we start using functions.

## Good programming practices

1. Use meaningful name for your variables so that you know what's inside, there is no character limit.
2. Use a consistent style and convention for your variables, the code will look neater. I like to use the _camel case_ style (numberOfValues) that allows me to have separation between words without using spaces. Alternatively you can use the _Pascal case_ style (NumberOfValues) or use the underscore character to separate words (number_of_variables) or create your own style.
3. Comment your code well, it may be obvious what it does when you write it, but it won't be so obvious after a year or more.

# Python as a simple calculator

Let's start by doing some simple mathematical operations using this Jupyter Notebook; addition, multiplication, division $\dots$, just to get familiar with the Jupyter Notebook.

In [None]:
2 + 3

In [None]:
4 * 3

In [None]:
12 / 3

In [None]:
2**3

The same operations can be done using variables. We can first define two variables _a_ and _b_ and then use them in in the following cells

In [None]:
a = 12
b = 3
a + b

This is not a very effective way of working because the result of the operation is not available to the rest of the code after the cell has been executed. Typically, we want to create new variables and than write the result using the _print_ command.

In [None]:
a = 10
b = 20
c = a + b
print("Result :",c)

## Python as a scientific calculator
Not all scientific operators are directly available in Python, but they can be accessed through optional packages that are loaded at the beginning of your notebook, _e.g._ **NumPy**.
NumPy is one of the most commonly used Python libraries, it contains all the operators for square root, logarithm, exponential, the trigonometric function, a suite of constants and much more. 
For more information see [https://numpy.org](https://numpy.org).

In the examples below we show how to use NumPy to access some of these functions.
* Note that we access the NumPy function through using the _np_ prefix because of the way we imported the NumPy library at the beginning of this notebook.

In [None]:
print("The approximate value of pi is                     :",np.pi)
print("The approximate value of the Euler constant (e) is :",np.e)
print("The square root of two is                          :",np.sqrt(2))
print("The natural logarithm of two is                    :",np.log(2))
print("The logarithm base 10 of two is                    :",np.log10(2))
print("The sine of pi is                                  :",np.sin(np.pi))
print("The cosine of pi is                                :",np.cos(np.pi))

## Lists
Even more useful than variables are lists and arrays, which allow us to store many values in one place. lists can be created by hand or be the output of other Python functions.
They can contain numbers, strings or other variables, or mixed types

In [None]:
listOfNumbers = [300, 2, 3.2]
print("One dimensional list of numbers :",listOfNumbers)
listOfStrings = ["Temperature" , "Pressure" , "Volume"]
print("One dimensional list of strings :",listOfStrings)
mixedString = ["temperature" , 300]
print("The mixed list is               :",mixedString)

## Slicing a list (or an array)
We can easily access one (or more) of the elements of the list, by specifying their location in the list.
* Note that Python starts counting from zero!

In [None]:
print("Second element of the list of numbers :",listOfNumbers[1])
print("Second element of the list of strings :",listOfStrings[1])

Note that in python the first index is included while the second isn't. So if we want the last to elements of the list we can use

In [None]:
print("Last two elements of the list of numbers :",listOfNumbers[1:3])

or provide the indices counting from the end of list

In [None]:
print("Last two elements of the list of numbers :",listOfNumbers[-2:])

## For loops
We can also construct loops to cycle over the elements of the list.
If we are interested in cycling over one list only we can use the **in** operator.
* It's important to note here that the indentation of the code determines where the loop finished

In [None]:
for value in listOfNumbers:
    print("--- This is inside the loop ----",value)
print("--- This is outside the loop ---")

Alternatively we can create a loop over the indices of the list using the **range** iterator.
* Note the use of the function **len** to compute the size of the list!
* Note that the upper limit of the **range** iterator is not included!

In [None]:
numberOfElements = len(listOfNumbers)
print("Number of elements :",numberOfElements)
for index in range(0,numberOfElements):
    print(index,                   # index
          listOfStrings[index],    # string
          listOfNumbers[index])    # number

**range** is a special thing of python 3, and it produces a list of indices only when part of a loop. This is at variance with the NumPy **arange** function, which instead will produce an list that we can use normally.
Both the **range** and **arange** function typically take three arguments, the lower limit of the range (included) the upper limit of the range (not included) and the step. The main difference is that **range** being an iterator works only with integer numbers, while **arange** can also work with floating point numbers. 
If the step is omitted, one is assumed.
Let's have a look at a couple of examples.

In [None]:
for i in range(0,4):
    print(i)

In [None]:
for i in range(0,4,2):
    print(i)

In [None]:
for i in np.arange(0,4,2):
    print(i)

In [None]:
for i in np.arange(0,4,0.5):
    print(i)

## Arrays _vs_ lists

We can use the **range** iterator and the **np.arange** function to generate lists and arrays of equally spaced values. While this is straightforward with the NumPy function, it is more complicated with **range** iterator, and we need to _recast_ the output into a list.

In [None]:
# Values0 is an array
values0 = np.arange(0,5,2)
print(values0)

# Values1 is a list
values1 = list(range(0,5,2))
print(values1)

Although the outputs look similar the variable type is different

In [None]:
print(type(values0),type(values1))
print(type(values0[0]),type(values1[0]))

and while vector operations are allowed for numpy arrays, the same is not true for a list. This means that the first command below is allowed while the second isn't. 

```python
print(12.34 * values0)
print(12.34 * values1)
```

```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[29], line 6
      4     return x*12.34
      5 print(12.34 * values0)
----> 6 print(12.34 * values1)

TypeError: can't multiply sequence by non-int of type 'float'
```

It is therefore advisable to store any numerical quantities in NumPy arrays, or to
always transform a list into a NumPy array before using it for mathematical operations or as an argument of a function. 

Moreover, while multiplying a list by a real number gives an error, multiplying a list by an integer will just double its content.

In [None]:
print(12.34 * values0)
print(12.34 * np.array(values1))
newList = list(12.34 * np.array(values1))
print(newList)
print(2*newList)

Note that you can add (subtract, multiply, or divide) two arrays, element by element, as if they were variables. The same is not possible with lists.

In [None]:
a = np.array([1.1,2.2,3.3])
b = np.array([4.4,5.5,6.6])
print("+",a+b)
print("-",b-a)
print("*",b*a)
print("/",a/b)

## Use of variables, lists and loops
As an illustrative example of using lists and variables we can now compute the sum and average of all the elements in an list of numbers. In order to do that we initialise a variable to zero, and progressively sum the list elements to it.
* Note how we can increment the value of the variable using the += operator. These two commands are equivalent
* Note we also used a different formatted printing command
```python
summ += value
summ = summ + value
```
Those two line are also equivalent to the following, where we use a temporary variable, to explicitly show what the code does
```python
tmp = summ + value
summ = tmp
```
The operators -=, \*= and /= have analogous meanings.

In [None]:
summ = 0.
print("The initial value of summ is {}",(summ))
for i in range(0,len(listOfNumbers)):
    value = listOfNumbers[i]
    summ += value 
    print("Iteration {}, list element {}, value of summ {}".format(i+1,value,summ))
print("The final value of summ is {}".format(summ))
average = summ / len(listOfNumbers)
print("The averge is {}".format(average))

The average can then be computed directly inside the print statement.

In [None]:
print("Average :",summ/len(listOfNumbers))

In [None]:
# transforms listOfNumbers into an array internally to perform array-like operations on
tally = np.sum(listOfNumbers)
average = np.mean(listOfNumbers)

print("Sum :",tally)
print("Average :",average)

## Conditionals
It is often useful to write statements that "fork" the code execution depending on come condition, typically the value of a variable.
The python syntax for this is
```python
if first condition:
    ...
elif second condition:
    ...
else:
    ...
```
for example

In [None]:
for i in range(0,len(listOfNumbers)):
    value = listOfNumbers[i]
    if value > 100:
        print("Element #{} is greater than 100 ({})".format(i,value))
    else:
        print("Element #{} is smaller than 100 ({})".format(i,value))

Note that the conditions are evaluated in order, and the first one that is ```True``` causes the code to exit the conditional statement

In [None]:
for i in range(0,len(listOfNumbers)):
    value = listOfNumbers[i]
    if value > 2.1:
        print("First condition is true ({})".format(value))
    elif value > 100:
        print("Second condition is true ({})".format(i,value))
    else:
        print("Third condition is ({})".format(i,value))

## Dictionaries
Python also has another class of data, called dictionaries, which shares many similarities with lists. The main difference is that dictionaries are not "ordered" which means that any loop over their elements may produce different outputs after any execution.
Each entry of a dictionary is defined by its label, and it can be of any kind, string, number, list, array, function...

In [None]:
student = {
    "Name" : "John",
    "Age"  : "23",
    "Mark" : 60
}

print(student)
print("Student's name: {}".format(student["Name"]))

In [None]:
students = {
    "Names" : ["John","Zak","Ashlee","Zoe"],
    "Ages" : [23,34,21,25],
    "Marks" : [60,43,87,71]
}
print(students)
print("Students' ages {}".format(students['Ages']))
print("Average mark {}".format(np.average(students['Marks'])))

In [None]:
wrongDictionary={12,32,34,78,91}
print(wrongDictionary)
for i in wrongDictionary:
    print(i)

## Functions
Functions can  be used to perform (complicated) operations on the input variables, and return a result that can be stored in a variable where the function was called. 

The function below illustrates how to define a function that computes the sum of two numbers
```python
def sumAB(a,b):
    total = a + b
    return total

a = 10
b = 21
c = sumAB(a,b)
print(c)
```

In [None]:
def sumAB(a,b):
    total = a + b
    return total

a = 10
b = 21
c = sumAB(a,b)
print(c)

Functions can take different variable type as input, and the returned value is automatically adjusted to match the input type (at least for the simple functions that will be used in this unit).

The code below illustrated how to define a simple function, which evaluates a mathematical expression, a parabola. 
```python
def parabola(x,a,b,c):
    y = a*x**2 + b*x +c
    return y

a = 2.5
b = -0.1
c = .128

# Evaluate the parabola on one point
xValue = 1
yValue = parabola(xValue,a,b,c)
print(xValue,yValue)
print("---")

# Evaluate the parabala on an array of points
xArray = np.arange(0,5,0.1)
yArray = parabola(xArray,a,b,c)
print(xArray)
print(yArray)
```
where _x_, _a_, _b_ and _c_ are input parameters and y is the result of the function. 
In this case _a_, _b_ and _c_ are numbers, while _x_ could be a number or an array.

In [None]:
def parabola(x,a,b,c):
    y = a*x**2 + b*x +c
    return y

a = 2.5
b = -0.1
c = .128

# Evaluate the parabola on one point
xValue = 1
yValue = parabola(xValue,a,b,c)
print(xValue,yValue)
print("---")

# Evaluate the parabala on an array of points
xArray = np.arange(0,5,0.1)
yArray = parabola(xArray,a,b,c)
print(xArray)
print(yArray)

## Making a plot
Let's now make a graph with these data using the **mathplotlib** library. 
As a start we can make a plot of the entire DataFrame.

The **subplots** function creates two objects, the _figure_ and the _axes_ of the figure itself.
Each of those objects contain functions that we can use to customise the final plot. More in another tutorial.

In [None]:
# Create the figure and axes objects
fig , ax = plt.subplots()

# Add the data to the plot from the DataFrame as circles
ax.scatter(xArray,yArray,label="circles")

# Add the data to the plot from the DataFrame as line
ax.plot(xArray,yArray,label="line")

# Let's add the labels to the axes
ax.set(xlabel="x")
ax.set(ylabel="y")

# Let's add the legend
plt.legend()

# Let's add a title to the plot
plt.title("Parabola")
plt.show()

## DataFrames - Creating a DataFrame for the Ideal Gas
Dataframes are powerful objects that are part of the _pandas_ package. The most simplistic description of a DataFrame is that it is a multi-dimension mixed list. Dataframes are more than that, as they also include functions that operate on the DataFrame content.
This definition of DataFrames would probably horrify a Python programmer, but it would suffice for the purpose of this course.

Dataframes can be defined by hand, or created by other functions, _e.g._ by reading a Comma Separated Values file (.csv). Let's see first how we can create a an empty DataFrame, with three columns named "Temperature", "Volume" and "Pressure"; with their units.

In [None]:
# This list is used to define the names of the columns
header = ["Temperature (K)" , "Volume (L)" , "Pressure (bar)"] 

# This is our new dataframe
df = pd.DataFrame(data=None, columns=header)
print(df)

Let's now create the DataFrame using the ideal gas law

\begin{equation}
pV = nRT
\end{equation}

where $p$ is the pressure, $V$ the volume, $T$ the temperature and $R=8.314\ J/mol/K$ is the ideal gas constant, each expressed with the units specified in the header of the DataFrame.

Let's compute the volume of an ideal gas at different pressures and temperatures.

For simplicity we fix the number of moles to 1. Note:
* we use the **range** function to create an list of integers and the NumPy **arange** function to create a an list of _floating point numbers_.
* The variable _index_ is used to count the number of elements that we already have in the DataFrame, and to add the next one. This works because Python starts counting from zero.
* we used the **loc** function to add an list to the DataFrame at a specific position.
* we also created an list to store all the temperatures we generate, the list is created empty using **= []** and then we append elements to it using the **.append()** function
* we defined a function to evaluate the ideal gas law

In [None]:
# function to compute the volume of an ideal gas give its temperature and pressure 
R = 8.314 # Ideal gas constant in J/mol/K
n = 1     # Number of moles
conversionFactor = 0.01 # Conversion factor between J/bar to litre

def idealVolume(T,p):
    return( n * R * T / p) * conversionFactor      

listOfTemperatures = []
for T in range(100,301,50):
    listOfTemperatures.append(T)
    
    for p in np.arange(0.1,1,0.02):
        V = idealVolume(T,p)
        index = len(df.index)
        df.loc[index]  = [T , V , p] # a list is added to the DataFrame as a new line
        
print(df)

An alternative way to create DataFrames is to create individual lists for the columns and they put them in the DataFrame.

In [None]:
# This list is used to define the names of the columns
header = ["Temperature (K)" , "Volume (L)" , "Pressure (bar)"] 

# This is our new dataframe
df2 = pd.DataFrame(data=None, columns=header)

arrayOfTemperatures = np.linspace(100,300,5)
arrayOfpressures = np.arange(0.1,1,0.02)

for T in arrayOfTemperatures:
    volume = idealVolume(T,arrayOfpressures)
    df_local = pd.DataFrame({
        "Temperature (K)" : T,
        "Volume (L)"      : volume,
        "Pressure (bar)"  : arrayOfpressures
    })
    df2 = pd.concat([df2,df_local])
    
print(df2)

## Slicing a DataFrame
There are many ways to access the data in a DataFrame. Here we'll show you two; one to quickly get an entire column of the list, and one to get selected chunks of data using the **iloc** function.

In [None]:
print(df["Temperature (K)"])

For the DataFrame that we have, the **iloc** function takes two arguments, the row and columns indices.

In [None]:
print(df.iloc[0,1])

We can then use "**:**" to specify a range of elements that we want to use.
* Note that the lower limit of the range is included while the upper limit is not !
* Note that if one limit of the range is missing, the start/end of the list is assumed
* Note how we have used **.values** to cast the output data in an list.

In [None]:
# these return numpy arrays
print(df.iloc[0:3 , 1  ].values) # the first three volumes
print(df.iloc[0   , 0:3].values) # the first row
print(df.iloc[0   ,  : ].values) # the first row
print(df.iloc[0   , 1: ].values) # the last two elements of the first row
print(df.iloc[0   ,  :2].values) # the first two elements of the first row

We can also select the rows that correspond to "conditions" on the content of the DataFrame, for example, we can select the data corresponding to a given temperature or pressure range

In [None]:
df_sliced = df[ (df["Temperature (K)"]<150) & (df["Pressure (bar)"]>0.9) ]
print(df_sliced) 

We can also easily convert the values in a DataFrame column to a NumPy array

In [None]:
print(type(df["Temperature (K)"]))
array = df["Temperature (K)"].values
print(type(array))

Let's now do some data manipulation to make a better plot, using a line for each isotherm.
There are many ways of doing this, but here we'll take an educational approach and use _conditional_ statements to select parts of the DataFrame, add them to lists and plot them.

What we are going to do is to choose a temperature, _e.g._ 100K, and create two lists (p,v) with the corresponding pressures and volumes, than we will use them for plotting.
* Note that we used the **scatter** and **plot** functions to plot the data as circles with a line overlaid.
* Note that we set the **set** function for to add the labels to the axes (_ax_)

In [None]:
T=100
pressure = []
volume = []
for index in range(0,len(df.index)):
    if df.iloc[index,0] == T:
        pressure.append(df.iloc[index,2])
        volume.append(df.iloc[index,1])

print("Pressure list :",pressure[0:3],"...")
print("Volume list :",volume[0:3],"...")

fig , ax = plt.subplots()
ax.scatter(pressure , volume)
ax.plot(pressure , volume)

# Let's add the labels to the axes
ax.set(xlabel="Pressure (bar)")
ax.set(ylabel="Volume (L)")

# Prevent a warning
plt.show()

If we want then to plot all isotherm in one graph we can wrap the code above in a loop over all the temperatures we have created. 

* Note the indentation of the **for** loops and **if** conditional statement.
* Note how in this example we select portions of the DataFrame including a conditional statement in **[]** when we call the DataFrame
* Note that we used **.values** to transform the DataFrame into an array

In [None]:
# First we have to create the figure
fig , ax = plt.subplots()
for T in listOfTemperatures:
    pressure = df[df["Temperature (K)"] == T]["Pressure (bar)"]
    volume = df[df["Temperature (K)"] == T]["Volume (L)"]

    # add one line to the plot for each temperature
    ax.plot(pressure , volume, label=T)

# Let's add the labels to the axes
ax.set(xlabel="Pressure (bar)")
ax.set(ylabel="Volume (L)")

# Let's also add the legend
ax.legend()

# The nex line will the figure to a file whn uncommented
# plt.savefig("figure.png")
plt.show()