# Session 0: Python and its Scientific Modules
---
Questions:
- "How do I install and use Python modules?"
- "How do I interact with Python code?"
- "What are some popular scientific Python packages?"

Objectives:
- "Ensure students can access a Linux terminal and have basic requirements installed."
- "Understand how Python packages are distributed and the benefits of environment management."
- "Understand how to import Python packages or call them via the command line interface."

---

<div class="alert alert-block alert-success">
<b>Interactive Check Point:</b>
    
Throughout this notebook you will find these boxes. These checkpoints will give you a coding prompt and encourage you to think about what will happen before you try. This process will help increase your Python comprehension as you learn. </div>

---

### Instructor notes (remove before merging to CAMLC24)

I think we can use the Molecular Sciences Software Institute's free courses on [Python Scripting for Computational Molecular Science](https://education.molssi.org/python_scripting_cms/01-introduction/index.html) and [Python Package Best Practives](https://education.molssi.org/python-package-best-practices/) to build a lot of this material

**IIA**: Maybe some of the code boxes I ahve included could be executable boxes so the students can actually execute them and see the output? instead of 'spoiling' the result? 

## 0.1. Module and library imports

In [None]:
# It is best practice to import all modules at the top of your file

# It is also best practice to be liberal with comments, so if you come 
#  back to the code or someone else tries to read your code it is more 
#  easily understood.

# To make the code easier to look at the sentence above was separated into new 
#  new lines. How you decide to format things like that is personal preference.

import numpy #https://numpy.org
import matplotlib #https://matplotlib.org
import rdkit #http://www.rdkit.org/docs/index.html
import pandas #https://pandas.pydata.org/docs/getting_started/index.html

If you receive a "ModuleNotFoundError" message, try executing in a new cell 
> pip install \\$module 

where \\$module is the name of the module not found.

## 0.2 Package installation and environment management

There are a variety of ways to install packages. You can install the source code from the associated website, or from a GitHub repository, or from a distribution platform. PIP is an example of a Python package manager and distribution platform. You can peruse the Python packages available for installation via PIP on the [Python Package Index (PyPI) distribution platform](https://pypi.org).

[Anaconda](https://anaconda.org) (also see [miniconda](https://docs.anaconda.com/free/miniconda/)) is also a package manager, but it is much more than that. Besides installing and updating packages Anaconda also creates and manages virtual environments. You can think of a virtual environment as a new computer, independent of any other environment on the computer. This becomes very important if a package you want to use requires a specific version of another package (for example, maybe it requires a Python version < 3.0), but you have other packages that require other versions of Python. You can use the conda create command to create a virtual environment that is tailored to the dependency requirements of specific packages or systems. If the package has been distributed on Anaconda you will be able to find conda installation instructions for it. Or you can also use pip to install the package- just make sure you are using the pip assigned to your conda environment (this can be checked by running which pip and making sure it is located in the path of your conda environment). Lastly, Anaconda supports management for non-Python packages, so it is much broader that pip/PyPi which is only Python.

When working on shared computing resources it is often required to use virtual environmets by using conda or venv. [venv](https://docs.python.org/3/library/venv.html) is solely a virtual environment manager and can be used similarly to Anaconda, however, it is only for managing Python packages. 

## 0.3 Python basics

### Variables and print statements

We can assign different types of values to a variable like your name (*string*), age (*integer*) or your height (*float*).

```python
name = 'Pepito Jimenez'
age = 53 #years
height = 1.73 #meters
```

Keep units in mind when you are writing Python code that deals with numerical values. We recommend using comments to remind yourself (and others who may read/use your code) what units you are working with!

We can also do any type of mathematical operation with these variables and also show the results of these operations with print statements: (`print($variable)`).

In [None]:
pizza = 12.7 #EUR
white_wine = 27.8 #EUR
cheesecake = 5.3 #EUR

total_price = pizza + white_wine + cheesecake

print(f'Total price of dinner is: {total_price} EUR')

Let's say we find a half-off coupon for the pizza (yay!). We can simply re-assign the variable `pizza` to a new value.

In [None]:
pizza = 12.7/2 #EUR

<div class="alert alert-block alert-success">
<b>Interactive Check Point:</b>
    
What do you think will happen now if we execute `print(f'Total price of dinner is: {total_price} EUR')`? Try this in a new cell below. What did you learn?

### Data structures

There are also other types of data that you can store in variables
- **lists** 
    - *Mutable*
    - ordered 
    - Defined in between `[]` or `list()`
- **tuples**
    - *Immutable*
    - ordered 
    - Defined in between `()` or `tuple()`
- **dictionaries** 
    - *Mutable*
    - unordered 
    - Defined in between `{}` or `dict()` with str(key):value pairings
    - unique keys
- **sets** 
    - *Immutable*
    - unordered 
    - Defined in between `{}` or `set()`
    - unique elements

In [None]:
list_example = ['Pepito Jimenez', 'John Smith', 'Patricia Summers']                 # List of names
tuple_example = ('Pepito Jimenez', 'John Smith', 'Patricia Summers')                # Tuple of names
dictionary_example = {'Pepito Jimenez':53, 'John Smith':27, 'Patricia Summers':32}  # Dictionary of names and their corresponding age
set_example = {'Pepito Jimenez', 'John Smith', 'Patricia Summers'}  

It is possible to refer to the values in a list using their index. 
Python is a "zero-indexed" language, meaning indices start at `0` for the first element of the array.
So, in order to print the second element of the list of names we defined we should refer to index `1`


We can also refer to several indexes using a `:` and 1 or 2 indexes, known as slices:
- `list_name[1:4]` would extract the second, third and forth element of the list (upper limit or the range is not extracted).
- `list_name[:4]` would extract all the data from the beginning of the list until the forth element.
- `list_name[1:]` would extract all the data from the second element to the end of the list.

In [None]:
print(list_example[1])

It is also posible to use negative index in order to start counting from the end of the list. 
The last element of the list has index `-1`.

In [None]:
print(list_example[-1])

<div class="alert alert-block alert-success">
<b>Interactive Check Point:</b>
Try to print the first element in the other data structure examples: `tuple_example`, `dictionary_example`, `set_example`. What did you learn?

<div class="alert alert-block alert-info">
<b>Note:</b>
In Python 3.6.0 and later versions, dictionaries remember the order of element insertions. Keep that in mind to avoid compatibility issues when using the same code in different versions of Python.


### For loops

For loops are used to do something a *N* number of times or iterate array variables aforementioned to perform an action for every element on that array.

#### Basic example:

In [None]:
for name in list_example:
    print(name)

It is often helpful to keep track of the number of iterations the loop has gone through. There are two ways to do this shown below.

> Method 1: set a count variable and update it in the loop

In [None]:
count = 0
for name in list_example:
    print(count, name)
    count += 1

> Method 2: use the builtin Python `enumerate()` function

In [None]:
for count, name in enumerate(list_example):
    print(count, name)

Both give the same result, but enumerate allows us to accomplish the same task with less lines of code, which is often favorable!

#### Scientific example:

In [None]:
energies_kcal = [3.24, 6.12, 9.65]          # List of energies in kcal
energies_kj = []                            # Empty list of energies in kJ
conversion_factor = 4.184                   # Conversion factor from kcal to kj

for energy in energies_kcal:
    energy_kj = energy * conversion_factor
    energies_kj.append(energy_kj)           # Append the converted energy to the kJ list of energies

print(energies_kj)

### Logic statements

Logic statements are used to compare different pairs of data and perform an action only if the statement is `True`.
Several types of logic operations can be performed:
- Equal to `==` checks if two values are identical
- Not equal to `!=` checks if two values are different
- Greater than `>`
- Less than `<`
- Greater or equal `>=`
- Less or equal `<=`

Several of these statements can be combined to create more complex conditions by using the following formulas 
- `and` checks that both conditions are matched
- `or` checks that one or the other statements are **True**
- `not` checks that the first contidion is matched but the later is not.

The basic syntax of these statemets is
```python
if variable == value:             # some logic statement
    # Do something
else:                             # if statement above is false
    # Do something
```

Using the variables we created above to define Pepito Jimenez we can check several conditinos. 

In [None]:
name = 'Pepito Jimenez'
age = 53
height = 1.73

if name == 'Pepito Jimenez':
    print('This person is Pepito Jimenez')
else:
    print('This person is not Pepito Jimenez')

We can write a more complex statement to check, for example, if he is older than 50 and shorter than 1.80m

In [None]:
if age > 50 and height < 1.80:
    print(f'{name} is older than 50 and shorter than 1.80m')
else:
    print(f'{name} is either younger than 50, taller than 1.80m or both.')

### Functions

A python function is a body of code that performs a specific task. 

The common structure of a function definition is:
```python
def function_name(args):
    # Code

    return value
```

Sometimes, certain pieces of code are very common and repeated several times in our scripts. A good example of this is converting units like we did before.

It is good practice to turn repetitive code into a function to:
1. save time
2. reduce the number of lines in code
3. avoid having to modify code in multiple locations

In the energy conversion example, we can define a function that takes an energy value in kcal as an argument and returns the same energy in kJ.

In [None]:
def kcal_to_kj(energy):
    """
    Convert energy in kcal to kJ.

    Parameters
    -----------
    energy: float, required. Energy value to be converted.

    Returns
    --------
    Energy in kJ units
    """
    conversion_factor = 4.184 # Conversion factor from kcal to kj
    energy_kj = energy * conversion_factor

    return energy_kj

The red text contains in the triple quotations is called a docstring. Docstrings are used to document what the function does, what parameters (if any) it accepts, and what (if any) values are returned. Executing the cell above does nothing because we are just defining the function. We must actually call the function to activate it, as shown below.

In [None]:
for energy in energies_kcal:
    energies_kj.append(kcal_to_kj(energy))           # Append the converted energy to the kJ list of energies
print(energies_kj)

Functions are very important to keep our code as clean and organised as possible and require extensive commenting and documentation to explain their functionality and make them readible for future developers (and future versions of ourselves). You can pass a function name to Python's built-in `help()` function to print its docstring.

<div class="alert alert-block alert-success">
<b>Interactive Check Point:</b>

Print the docstring of our newly created function below.

### File parsing

In computational chemistry we deal with many plain text files as most of the software we use write the output of their calculations in this format.
For that reason, having code to read these files and automatically extract the information we need is extremelly useful.

There are several ways of reading a file and storing its lines in a variable.
The two most common ones are:

> Method 1:
> ``` python
> o_file = open(filename,'r')
>text = o_file.readlines()
>ofile.close()
>```

> Method 2:
> ``` python
> with open(filename,'r') as o_file:
>     |text = o_file.readlines()
>```

The `r` argument in the `open()` function stands for read. Other arguments might be `w` for write if we want to write new information to the file, or `a` for append if we want to add new information at the end of the file.

The `readlines()` function converts the file contents to a list of strings. Notice the dot notation; readlines acts on the file object given right before the dot. The function returns a list where each element of the list is a string corresponding to each line of the file.

In [None]:
filename = 'struc.xyz'
with open(filename,'r') as o_file:
    lines = o_file.readlines()

print("First 4 lines: ", lines[:5])
print("Split line 3: ", lines[2].split())

<div class="alert alert-block alert-success">
<b>Interactive Check Point:</b>

In the cell below, save the atoms in `struc.xyz` to a list.

There are several operations that are very common when parsing these kinds of files:
- Extracting the **first** match of the desired pattern
- Extracting the **last** match of the desired pattern
- Extracting **all** the matched of the desired pattern

For that, the following functions are very useful:

```python
def find_all(text, target):
    data = []
    for line in text:
        if target in line:
            data.append(line)
    return data
```

This will return a list of lines that contain the desired target.
However, this lines will still be plain text of the form `@DF-RHF SCFEnergy:  -154.09130176573018`. 
We can use the built in function `split` to separate this text by the desired character (default is a space).

```python
line = '@DF-RHF Final Energy:  -154.09130176573018'
elements = line.split()
print(elements)
```

```output
['@DF-RHF', 'SCFEnergy:', '-154.09130176573018']
```

If we select a different character instead of the default space, we would obtain a different partition.
For example using `:` we would obtain

```python
line = '@DF-RHF Final Energy:  -154.09130176573018'
elements = line.split(':')
print(elements)
```

```output
['@DF-RHF SCFEnergy', '-154.09130176573018']
```

Now, using the previously learned slices, we can extract only the energy value as

```python
line = '@DF-RHF Final Energy:  -154.09130176573018'
elements = line.split(':')
energy = elements[-1]
print(energy)
```
```output
'-154.09130176573018'
```



#### Exercise 1

With all examples above, create a function that extracts only the last match of the `SCFEnergy` target in a file, extracts the energy from the line and converts it from the default units (*a.u.*) to *kcal/mol*.