# Click "Edit App" to see the code

Python is a very powerful object oriented programming language, which is used in most fields of science both for computing and for data analysis.
This document is a very simplistic introduction to Python written by a non-Python expert, and is designed to be a quick introduction to Python for CHEM2000 students.
In many places the nomenclature is likely to be inaccurate and you are encouraged to consult more rigorous resources developed by real Python programmers.

This document is aimed at providing enough knowledge to solve all the numerical laboratories that are part of the CHEM2000 unit using Python, but it is far from comprehensive. There are many different ways to solve problems numerically, different libraries/packages can be used and here we are providing only one (or a few) possible way of doing things.

The entire document and the examples therein have been developed using a **Jupyter Notebook**, and work best there.

A Jupyter notebook consists of a series of cells that can be either _text_ or _code_, the cell can then be _executed_ by pressing the **run** button or _Shift+Return_ (if you want to look like a pro). If things go pear-shaped, restarting the kernel and rerunning all the cells may help; this can be done by pressing the _fast forward_ button.

Text cells accept _Markdown_ syntax and once executed will render the text. Markdown provides a simple way of producing formatted text and it accept _LaTeX_ commands for equations, which look much nicer that those made with the equation editor in MS Word. For a Markdown tutorial click [here](https://www.markdowntutorial.com)

The code cells contain Python 3 commands. Note that some Python 3 is different from Python 2, _e.g._ the **print** command works differently.

# My first Jupyter Notebook
In general you would use the first _code_ cell to import all the packages that we need to use in the remainder of the code. We can import entire packages, part of the packages and assign aliases to the package name, _e.g._

In [None]:
# python packages
import pandas as pd # Dataframes and reading CSV files
import numpy as np # Numerical libraries
import matplotlib.pyplot as plt # Plotting library
from lmfit import Model # Least squares fitting library

this is particularly important because it ensures that if we define a variable that has the same name of a function belonging to that package there is no confusion about what the code does.
```python
# This is a variable
mean = 10 

# This is the NumPy function to compute the mean of an array of numbers
np.mean([1,2,3])
```

Things to keep in mind when programming in Pythons are

1. It is Case sensitivE

```python
average = 0
```
is different from
```python
aVeRagE = 0
```

2. Variables' names cannot have spaces. This line would give you a _syntax error_
```ptyhon
number of values = 10
```

3. Spaces between operators are ignored, _1+2_ is the same as _1 + 2_

In [None]:
print(1 + 2)
print(1+2)

4. \# is a comment and everything on its right is ignored

In [None]:
# This is a comment
print("Hello world") # This is also a comment

5. A single command can spread over multiple lines. If we split the line between variables no continuation line character is required. Otherwise we can use the "\" character. Note the different results of the two commands below.

In [None]:
print("Hello", 
      "world")
print("Hello \
       world")

6. Indentation matters! Indentation is used to define the content of loops, functions... This will be more clear when we start using functions.

# Good programming practices

1. Use meaningful name for your variables so that you know what's inside, there is no character limit.
2. Use a consistent style and convention for your variables, the code will look neater. I like to use the _camel case_ style (numberOfValues) that allows me to have separation between words without using spaces. Alternatively you can use the _Pascal case_ style (NumberOfValues) or use the underscore character to separate words (number_of_variables) or create your own style.
3. Comment your code well, it may be obvious what it does when you write it, but it won't be so obvious after a year or more.

# Python as a simple calculator

Let's start by doing some simple mathematical operations using this Jupyter Notebook; addition, multiplication, division $\dots$, just to get familiar with the Jupyter Notebook.

In [None]:
2 + 3

In [None]:
4 * 3

In [None]:
12 / 3

In [None]:
2**3

The same operations can be done using variables. We can first define two variables _a_ and _b_ and then use them in in the following cells

In [None]:
a = 12
b = 3
a + b

This is not a very efficient way of working because the result of the operation is not available to the rest of the code after the cell has been executed. Typically, we want to create new variables and than write the result using the _print_ command.

In [None]:
a = 10
b = 20
c = a + b
print("Result :",c)

# Python as a scientific calculator
Not all scientific operators are natively available in Python, but they can be accessed through optional packages that are loaded at the beginning of your notebook, _e.g._ **NumPy**.
NumPy is one of the most commonly used Python libraries, it contains all the operators for square root, logarithm, exponential, the trigonometric function, a suite of constants and much more. 
For more information see [https://numpy.org](https://numpy.org).

In the examples below we show how to use NumPy to access some of these functions.
* Note that we access the NumPy function through using the _np_ prefix because of the way we imported the NumPy library at the beginning of this notebook.

In [None]:
print("The approximate value of pi is                     :",np.pi)
print("The approximate value of the Euler constant (e) is :",np.e)
print("The square root of two is                          :",np.sqrt(2))
print("The natural logarithm of two is                    :",np.log(2))
print("The logarithm base 10 of two is                    :",np.log10(2))
print("The sine of pi is                                  :",np.sin(np.pi))
print("The cosine of pi is                                :",np.cos(np.pi))

Even more useful than variables are arrays, which allow us to store many values in one place. Arrays can be created by hand or be the output of other Python functions.
They can contain numbers, strings or other variables, or mixed types

In [None]:
arrayOfNumbers = [300, 2, 3.2]
print("Three dimensional array of numbers :",arrayOfNumbers)
arrayOfStrings = ["Temperature" , "Pressure" , "Volume"]
print("Three dimensional array of strings :",arrayOfStrings)
mixedArray = ["temperature" , 300]
print("The mixed array is                 :",mixedArray)

We can easily access one of the elements of the array, by specifying its location in the array.
* Note that Python starts counting from zero!

In [None]:
print("Second element of the array of numbers :",arrayOfNumbers[1])
print("Second element of the array of strings :",arrayOfStrings[1])

We can also construct a loop to cycle over the elements of the array.
If we are interested in cycling over one array only we can use the **in** operator.
* It's important to note here that the indentation of the code determines where the loop finished

In [None]:
for value in arrayOfNumbers:
    print("--- This is inside the loop ----",value)
print("--- This is outside the loop ---")

Alternatively we can create a loop over the indices of the array using the **range** iterator.
* Note the use of the function **len** to compute the size of the array!
* Note that the upper limit of the **range** iterator is not included!

In [None]:
numberOfElements = len(arrayOfNumbers)
print("Number of elements :",numberOfElements)
for index in range(0,numberOfElements):
    print(index,                   # index
          arrayOfStrings[index],   # sting
          arrayOfNumbers[index])   # number

**range** is a special thing of python 3, and it produces a list of indices only when part of a loop. This is at variance with the NumPy **arange** function, which instead will produce an array that we can use normally.
Both the **range** and **arange** function typically take three arguments, the lower limit of the range (included) the upper limit of the range (not included) and the step. The main difference is that **range** being an iterator works only with integer numbers, while **arange** can also work with floating point numbers. 
If the step is omitted, one is assumed.
Let's have a look at a couple of examples.

In [None]:
for i in range(0,4):
    print(i)

In [None]:
for i in range(0,4,2):
    print(i)

In [None]:
for i in np.arange(0,4,2):
    print(i)

In [None]:
for i in np.arange(0,4,0.5):
    print(i)

Let's now see how we can use the **range** iterator and the **np.arange** function to generate arrays of equally spaced values. While this is straightforward with the NumPy function, it is more complicated with **range** iterator, and we need to _recast_ the output into a list.

In [None]:
values = np.arange(1.1,4,0.7)
print(values)

values = list(range(0,5,2))
print(values)

# Operations with arrays
As an illustrative example of using arrays and variables we can now compute the sum and average of all the elements in an array of numbers. In order to do that we initialise a variable to zero, and progressively add the array elemnts to it.
* Note how we can increment the value of the variable using the += operator. These two commands are equivalent
```python
summ += value
summ = summ + value
```
Those two line are also equivalent to the following, where we use a temporary variable, to explicitly show what the code does
```python
tmp = summ + value
summ = tmp
```
The operators -=, \*= and /= have analogous meanings.

In [None]:
summ = 0.
for value in arrayOfNumbers:
    summ += value 

print("Result of += :",summ)

The average can then be computed directly inside the print statement.

In [None]:
print("Average :",summ/len(arrayOfNumbers))

Many simple operations on arrays can however be more efficiently performed using libraries such as NumPy, _e.g._ summation, average, standard deviation, etc.
Using these function will also make your code slimmer and easier to read.

In [None]:
tally = np.sum(arrayOfNumbers)
average = np.mean(arrayOfNumbers)
StDev = np.std(arrayOfNumbers)

print("Sum :",tally)
print("Average :",average)
print("Standard Deviation :",StDev)

Unfortunately there is no NumPy function for computing the standard error, but we can easily compute that from its definition

\begin{equation}
StdErr = \frac{\sigma}{\sqrt{N}}
\end{equation}
where $\sigma$ is the standard deviation and $N$ the number of values used in the calculation.

In [None]:
print("Standard Error :",StDev/np.sqrt(len(arrayOfNumbers)))

# Using DataFrames
Dataframes are powerful objects that are part of the _pandas_ package. The most simplistic description of a dataframe is that it is a multi-dimension mixed array. Dataframes are more than that, as they also include functions that operate on the DataFrame content.
This definition of DataFrames would probably horrify a Python programmer, but it would suffice for the purpose of this course.

Dataframes can be defined by hand, or created by other functions, _e.g._ by reading a Comma Separated Values file (.csv). Let's see first how we can create a an empty DataFrame, with three columns named "Temperature", "Volume" and "Pressure"; with their units.

In [None]:
# This array is used to define the names of the columns
header = ["Temperature (K)" , "Volume (L)" , "Pressure (bar)"] 

# This is our new dataframe
df = pd.DataFrame(data=None, columns=header)
print(df)

Let's now fill the dataframe using the ideal gas law

\begin{equation}
pV = nRT
\end{equation}

where $p$ is the pressure, $V$ the volume, $T$ the temperature and $R=8.314\ J/mol/K$ is the ideal gas constant, each expressed with the units specified in the header of the dataframe.

Let's compute the volume of an ideal gas at different pressures and temperatures.

For simplicity we fix the number of moles to 1.
* Note how we use the **range** function to create an array of integers and the NumPy **arange** function to create a an array of _floating point numbers_.
* The variable _index_ is used to count the number of elements that we already have in the DataFrame, and to add the next one. This works because Python starts counting from zero.
* Note we used the **loc** function to added an array to the DataFrame at a specific position.
* Note how we also created an array to store all the temperatures we generate, the array is created empty using **= []** and then we append elements to it using the **.append()** function

In [None]:
R = 8.314 # kJ/mol
n = 1
# Conversion factor between J/bar to litre
conversionFactor = 0.01 

listOfTemperatures = []
for T in range(100,301,50):
    listOfTemperatures.append(T)
    
    for p in np.arange(0.1,1,0.02):
        V = (n * R * T / p) * conversionFactor      
        index = len(df.index)
        df.loc[index]  = [T , V , p] # a vector is added to the DataFrame
        
print(df)

There are many ways to access the data in a DataFrame. Here we'll show you two; one to quickly get an entire column of the array, and one to get selected chunks of data using the **iloc** function.

In [None]:
print(df["Temperature (K)"])

For the DataFrame that we have, the **iloc** function takes two arguments, the row and columns indices.

In [None]:
print(df.iloc[0,1])

We can then use "**:**" to specify a range of elements that we want to use.
* Note that the lower limit of the range is included while the upper limit is not !
* Note that if one limit of the range is missing, the start/end of the array is assumed
* Note how we have used **.values** to cast the output data in an array.

In [None]:
print(df.iloc[0:3 , 1  ].values) # the first three volumes
print(df.iloc[0   , 0:3].values) # the first row
print(df.iloc[0   ,  : ].values) # the first row
print(df.iloc[0   , 1: ].values) # the last two elements of the first row
print(df.iloc[0   ,  :2].values) # the first two elements of the first row

# Making a graph
Let's now make a graph with these data using the **mathplotlib** library. 
As a start we can make a plot of the entire DataFrame.

The **subplots** function creates two objects, the _figure_ and the _axes_ of the figure itself.
Each of those objects contain functions that we can use to customise the final plot. More in another tutorial.

In [None]:
# Create the figure and axes objects
fig , ax = plt.subplots()
# Add the data to the plot from the DataFrame
ax.scatter(df["Pressure (bar)"] , df["Volume (L)"])
# Diplay the figure
plt.show()

Let's now do some data manipulation to make a better plot, using a line for each isotherm.
There are many ways of doing this, but here we'll take an educational approach and use _conditional_ statements to select parts of the dataframe, add them to arrays and plot them.

What we are going to do is to choose a temperature, _e.g._ 100K, and create two arrays (p,v) with the corresponding pressures and volumes, than we will use them for plotting.
* Note that we used the **scatter** and **plot** functions to plot the data as circles with a line overlaid.
* Note that we set the **set** function for to add the labels to the axes (_ax_)

In [None]:
T=100
pressure = []
volume = []
for index in range(0,len(df.index)):
    if df.iloc[index,0] == T:
        pressure.append(df.iloc[index,2])
        volume.append(df.iloc[index,1])

print("Pressure array :",pressure[0:3],"...")
print("Volume array :",volume[0:3],"...")

fig , ax = plt.subplots()
ax.scatter(pressure , volume)
ax.plot(pressure , volume)

# Let's add the labels to the axes
ax.set(xlabel="Pressure (bar)")
ax.set(ylabel="Volume (L)")

plt.show()

If we want then to plot all isotherm in one graph we can wrap the code above in a loop over all the temperatures we have created. 

* Note the indentation of the **for** loops and **if** conditional statement.
* Note how in this example we select portions of the DataFrame including a conditional statement in **[]** when we call the DataFrame
* Note that we used **.values** to transform the DataFrame into an array

In [None]:
# First we have to create the figure
fig , ax = plt.subplots()

for T in listOfTemperatures:
    pressure = df[df["Temperature (K)"] == T]["Pressure (bar)"].values
    volume = df[df["Temperature (K)"] == T]["Volume (L)"].values

    # add one line to the plot for each temperature
    ax.plot(pressure , volume, label=T)

# Let's add the labels to the axes
ax.set(xlabel="Pressure (bar)")
ax.set(ylabel="Volume (L)")

# Let's also add the legend
ax.legend()

plt.show()