# **Chapter 1. Introduction to Python Programming Language**

## **1.1. Introduction to Python and Jupyter Notebooks**

Python is a versatile and widely-used programming language that is well-suited for scientific computing. Jupyter Notebooks provide an interactive and flexible environment for writing and running code, making them an excellent choice for computational chemistry.

We will cover the following topics:

1. Setting up your Python environment
2. Basics of Python programming
3. Introduction to Jupyter Notebooks
4. Installing and managing Python packages

If you get stuck, it might be a good idea to do a Google search and look for solutions from the websites such as [stackoverflow.com](https://stackoverflow.com), [The W3 school](https://www.w3schools.com/python/default.asp), [Youtube](https://www.youtube.com) or get helps from [ChatGPT](https://chat.openai.com)

### **1.1.1. Anaconda and Jupyter Notebook**

#### ***1.1.1.1. Installation***

You can download Anaconda from [Anaconda's website](https://www.anaconda.com/products/distribution) and follow the installation instructions for your operating system.

Once Anaconda is installed, open Anaconda Navigator or use the Anaconda command prompt to manage your Python environments.

#### ***1.1.1.2. Python Development Environment***

Anaconda will make a default python environment for you. However, you can create your own environments if needed.

In your Anaconda Navigator or Anaconda command prompt, you can create a new python environment. Here's an example of how to create an environment named "chem-env" with python 3.11:

`conda create --name chem-env`

You can replace "chem-env" with your preferred environment name and specify a different Python version if needed.

To activate your newly created environment, use the following command:

`conda activate chem-env`

#### ***1.1.1.3. Jupyter Notebook***

Jupyter Notebook is an interactive web-based application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. It's a popular choice among scientists and researchers for conducting data analysis, running code, and creating reproducible research.

- Jupyter Notebook is based on the open-source Jupyter project and supports various programming languages, including Python, R, and Julia.
- Notebooks are divided into cells, which can contain code, markdown text, or raw text.
- It provides a user-friendly interface for combining code execution, documentation, and visualizations in a single document.

**Open Jupyter Notebook**

After installation of Anaconda, you can search for Jupyter Notebook application in your operation system and run it. Alternatively, you can open Anaconda Prompt and run the following command:

`jupyter notebook`

Jupyter notebooks can be opened using JupyterLab. Search for JupyterLab application in your operation system and run it. you can open Anaconda Prompt and run the following command:

`jupyter lab`

**Basic Operations in Jupyter Notebook**

***Creating a New Notebook***

1. To create a new Jupyter Notebook, click on the "New" button in the Jupyter interface and select "Python 3" (or your preferred kernel) from the dropdown menu.

***Cells***

- Jupyter Notebooks are composed of cells, which are individual units that can contain code, markdown text, or raw content.

***Running Code***

- To execute code in a cell, select the cell and press "Shift + Enter" or click the "Run" button in the Jupyter interface.
- The output of the code will appear below the cell.

***Adding and Deleting Cells***

- To add a new cell, click the "+" button in the toolbar or press "B" to insert a cell below the currently selected cell.
- To delete a cell, select it and press "D" twice (i.e., press "D" followed by another "D").

***Markdown Cells***

- To add markdown text to a cell, change the cell type from "Code" to "Markdown" in the dropdown menu.
- You can use markdown to format text, create headings, lists, links, and more.

***Keyboard Shortcuts***

Keyboard shortcuts can significantly improve your efficiency when working in Jupyter Notebook. Here are some useful shortcuts:

- "Shift + Enter" to run the current cell and move to the next cell.
- "Ctrl + Enter" to run the current cell and stay in the same cell.
- "A" to insert a new cell above the current cell.
- "B" to insert a new cell below the current cell.
- "D" twice (i.e., press "D" followed by another "D") to delete the current cell.
- "M" to change the current cell type to markdown.
- "Y" to change the current cell type to code.

***Saving and Exporting***

- To save your work, click the "Save" button or press "Ctrl + S" (or "Cmd + S" on Mac).
- You can export your notebook as a PDF, HTML, or other formats by going to "File" > "Download as."

***Jupyter Notebook uses the default Anaconda environment.***
To use your own environment, go to Kernel → Change Kernel.

### **1.1.2. Basics of Python Programming**

Python is a beginner-friendly programming language, and you don't need to be an expert coder to use it effectively for computational chemistry. Here are some fundamental Python concepts:

#### ***1.1.2.1. Variables***

In Python, you can assign values to variables using the `=`:

In [None]:
x = 5
y = "Hello, World!"

You can print out the value of a variable using the `print()` fuction:

In [None]:
print(x)
print(y)

#### ***1.1.2.2. Data Types***

Basic data types in python are integer (int), float, string (str), boolean (bool)

In [None]:
a = 5        # Integer
b = 3.14     # Float
c = "apple"  # String (put in a pair of "" or '')
d = True     # Boolean (True or False)

print(a)
print(b)
print(c)
print(d)

The `type()` function can be used to identify the type of each variable.

In [None]:
print(type(a))
print(type(b))
print(type(c))
print(type(d))

The boolean type variables can be the results of comparison operators:
- `==`    Equal
- `!=`    Not equal
- `>`     Greater than
- `<`     Less than
- `>=`    Greater than or equal to
- `<=`    Less than or equal to

In [None]:
bool1 = 5 > 0
print(bool1)
bool2 = 3.14 <= 2
print(bool2)

More complex data types in python are list, tuple, set, and dictionary

In [None]:
# A list
l = [1,2,3.14,'dog', True]

# A tuple 
t = (1,2,3.14,'dog', True)

# A set 
s = {'apple', 'banana', 'cherry'}

# A dictionary
d = {'student id': 11010001,
     'student grade': 'A',
     'hello': 'ciao',
     'dog': 'cane',
     'cat': 'gatto'}

You can you the length of a list, tuple, set, or dictionary using the `len()` function

In [None]:
print(len(l))
print(len(t))
print(len(s))
print(len(d))

To get the value of an item in a list or tuple, you can use the index of that item with the following syntax. In python, index start from 0.

In [None]:
list_item1 = l[0] # get the first item
tuple_item3 = t[2] # get the third item

In [None]:
print(list_item1)
print(tuple_item3)

However, you cannot use index for a set or a dictionary.

Each item of a dictionary is a pair of key and value. For example, the item
`'student id': 11010001`:
- `'student id'` is the key (type `str`)
- `1010001` is the value (type `int`)

The get the value of an item in a dictionary, you can use the key with the following syntax:

In [None]:
student_id = d['student id']
student_grade = d['student grade']

print(student_id)
print(student_grade)

For lists, you can add elements to a list using append() or extend() functions

In [None]:
l.append(10)
print(l)

In [None]:
l2 = ['cat', False]
l.extend(l2)
print(l)

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 1</b></p>

Write Python code to:
1. Create a string `my_string` with the value `'Trường Đại học Sư phạm Kỹ Thuật TPHCM'`.
2. Create a list `elements` with these elements: `Li`, `Na`, `K`, `Ca`, `Mg`
3. Retrieve the second element from the list
4. Create a dictionary `electronegativity` to store the electronegativity of elements using the symbol as key:
- B: 2.0
- C: 2.5
- N: 3.0
- O: 3.5
- F: 4.0
5. Retrieve the electronegativity of oxygen

#### ***1.1.2.3. Basic Operations***

You can perform basic operations in python, such as addition, subtraction, multiplication, and division:

In [None]:
result1 = a + b
result2 = a - b
result3 = a * b
result4 = a / b

print(result1)
print(result2)
print(result3)
print(result4)

You can format the output's display by using the string.format() function. ([*see details*](https://www.w3schools.com/python/ref_string_format.asp))

In [None]:
print("{0}".format(result1))
print("{0:.3f}".format(result2))
print("{0:.2%}".format(result3))
print("{0:.4e}".format(result4))

Other mathematic operations include integer division, modulo, power, root:

In [None]:
# integer division
print(20 // 3)

In [None]:
# modulo
print(20 % 3) # (20 mod 3)

In [None]:
# power
print(2 ** 4) # 2 raised to the fourth power

In [None]:
# root
print(16 ** (1/2)) # square root of 16

***Sometimes you might want to leave comments directly in your code. Comments come after the `#` sign***

You can compute a new value for a variable and assign it using assignment operators. For example:

In [None]:
x = 5
print(x)

In [None]:
x += 3
print(x)

In [None]:
x -= 5
print(x)

In [None]:
x *= 2
print(x)

In [None]:
x /= 3
print(x)

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 2</b></p>

Write Python code to perform the following tasks:
1. Assign the value 10 to a variable $x$ and 20 to a variable $y$. Compute their sum $x + y$ and assign it to a variable $z$.
2. Compute $x + y^2$
3. Compute $\sqrt{z^2 + y}  - x^2$

#### ***1.1.2.4. Import Python Modules***

In Python, modules are accessed by using the `import` statement. For example, we can import the `math` module using:

In [None]:
import math

Now you can use all the functions that are available within the module `math` using the `math.<function_name>` syntax ([*see details about the module*](https://docs.python.org/3/library/math.html)). For example:

In [None]:
print(math.sqrt(9))        # square root function
print(math.sin(math.pi/2)) # sine function, math.pi = 3.141592653589793 (a constant)

Alternatively, you can import a function from a module like this:

In [None]:
from math import cos # cosine function

Now you can use the `cos` function without using `math.`:

In [None]:
print(cos(math.pi))

You can import multiple functions or other elements at the same time (separated by comma)

In [None]:
from math import cos, pi # math.cos is a function, math.pi is a constant
print(cos(pi))

You can also import all elements from a module using:

In [None]:
from math import *

Now you can use any function within that module without using `math.`:

In [None]:
print(acos(cos(pi)))
print(factorial(5))

#### ***1.1.2.5. If Statement***

An if statement is written by using the `if` keyword. The body of an if statement is indented. For example:

In [None]:
a = 33
b = 200
if b > a:
    print("b is greater than a")

The `elif` keyword is Python's way of saying "if the previous conditions were not true, then try this condition".
The `else` keyword catches anything which isn't caught by the preceding conditions.

In [None]:
a = 200
b = 33
if b > a:
    print("b is greater than a")
elif a == b:
    print("a and b are equal")
else:
    print("a is greater than b")

#### ***1.1.2.6. While Loop***

With the `while` loop we can execute a set of statements as long as a condition is `True`. The body of a `while` loop is indented.

In [None]:
i = 1
while i < 6:
    print(i)
    i += 1

With the `break` statement we can stop the loop even if the while condition is `True`:

In [None]:
i = 1
while i < 6:
    print(i)
    if i == 3:
        break
    i += 1 

With the `continue` statement we can stop the current iteration, and continue with the next:

In [None]:
i = 0
while i < 6:
    i += 1
    if i == 3:
        continue
    print(i)

#### ***1.1.2.7. For Loop***

A `for` loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string). The body of a `for` loop is indented.

With the `for` loop we can execute a set of statements, once for each item in a list, tuple, set etc.

In [None]:
# Loop through a list
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    print(x)

In [None]:
# Loop through a string
for x in "banana":
    print(x)

With the `break` statement we can stop the loop before it has looped through all the items:

In [None]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    print(x)
    if x == "banana":
        break

With the continue statement we can stop the current iteration, and continue with the next:

In [None]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    if x == "banana":
        continue
    print(x)

To loop through a range of numbers, we can use the `range` function (the last number are ignored)

In [3]:
for x in range(1, 10):
    print(x ** 2)

1
4
9
16
25
36
49
64
81


<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 3</b></p>

Write Python code to perform the following tasks:
1. Use the `for` loop to check for odd/even numbers from 1 to 20
2. Use the `while` loop to print out all integers that are divisible by 3 from 1 to 30

#### ***1.1.2.8. Function***

In python we can easily define new functions. We use `def` keyword to define a function and its arguments. Then we use `return` keyword to return the result. The body of a function is indented.

For example let's write a function that takes 2 numbers x and y, and computes $\sqrt{x^2 + y^2}$.

In [None]:
def my_function(x, y):
    result = (x ** 2 + y ** 2) ** 0.5
    return result

Now you can test this function

In [None]:
print(my_function(3, 4))
print(my_function(6, 8))

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 4</b></p>

Write Python code to perform the following tasks:
1. Define a function to compute $\sqrt{x^2 -1}$
2. Define a function to count the number of integers between 2 integers `x` and `y`
3. Define a function to print out the first `n` elements of the Fibonacci sequence

## **1.2. Common Python Modules**

In this section, we will learn the essentials of common Python modules. The following modules will be covered:

1. Numpy
2. Pandas
3. Matplotlib

### **1.2.1. NumPy**

#### ***1.2.1.1. Installation***

To start using NumPy, you need to install it in your Python environment. Note that the default Anaconda environment already has NumPy installed, so you don't need to do it. To install NumPy in your custom environment, run the following command:

`conda install numpy`

For more information about NumPy, see [documentation](https://numpy.org/doc/)

#### ***1.2.1.2. Importing NumPy***

To use NumPy in your Python code, you need to import the library. It's a common convention to import NumPy as `np`:

In [None]:
import numpy as np

#### ***1.2.1.3. Creating NumPy Arrays***

NumPy's primary data structure is the numpy.array. You can create arrays in various ways. Here are a few examples:

1. Creating an Array from a List

In [None]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)

2. Creating an Array of Zeros

In [None]:
zeros_array = np.zeros(5)  # Creates a 1D array with 5 zeros

3. Creating an Array of Ones

In [None]:
ones_array = np.ones((3, 3))  # Creates a 3x3 array of ones

4. Creating a Range of Values

In [None]:
range_array = np.arange(0, 10, 2)  # Creates an array [0, 2, 4, 6, 8], the point '10' is not included

In [None]:
linspace_array = np.linspace(2.0, 3.0, num=5) # Creates an array [2.  , 2.25, 2.5 , 2.75, 3.  ]

#### ***1.2.1.4. Basic NumPy Operations***

NumPy allows you to perform mathematical operations on arrays efficiently. Here are some examples:

**Get An Element**

Similar to list, you can get an element in a NumPy array using index:

In [None]:
array1 = np.array([1, 2, 3])
element1 = array1[0]
print(element1)

**Add Elements**

Similar to list, you can get new elements into a NumPy array using `np.append()` or `np.concatenate()` function. For `np.concatenate()`, put all input arrays in a tuple. You need to assign to output of these functions to a new Numpy array.

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.append(array1, 10)
print(array1)
print(array2)

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.concatenate((array1, array2))
print(array1)
print(array2)
print(array3)

**Slicing**

Numpy slicing is a powerful and flexible way to extract and manipulate portions of a NumPy array. Slicing allows you to select a subset of elements from an array by specifying a range or a set of indices.

***Basic Slicing:***

Syntax: `array[start:stop]`
- Returns a view of the array elements from the index start (inclusive) to the index stop (exclusive).
- If `start` is not provided, it defaults to `0`. If `stop` is not provided, it defaults to the end of the array.
- You can use negative indices to count from the end of the array.

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[2:5]  # Slices from index 2 to 4
print(sliced)

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[0:-2]  # Slices from index 0 to 4 (remove the last 2 elements)
print(sliced)

***Step Slicing:***

Syntax: `array[start:stop:step]`
- The `step` argument allows you to skip elements while slicing.
- If `step` is not specified, it defaults to `1`.

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[1:5:2]  # Slices from index 1 to 4 with a step of 2
print(sliced)

**Element-wise Operations**

You can perform element-wise operations like addition, subtraction, multiplication, and division:

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

result1 = array1 + array2  # Element-wise addition
result2 = array1 + 10  # Element-wise addition with a scalar
result3 = array1 * array2  # Element-wise multiplication
result4 = array1 * 5  # Element-wise multiplication with a scalar
print(result1)
print(result2)
print(result3)
print(result4)

**Dot Product**

You can calculate the dot product of two arrays:

In [None]:
dot_product = np.dot(array1, array2)
print(dot_product)

**Statistical Functions**

NumPy provides various statistical functions, such as mean, median, and standard deviation:

In [None]:
data = np.array([1, 2, 3, 4, 5])
max_value = np.max(data)
min_value = np.min(data)
mean_value = np.mean(data)
median_value = np.median(data)
std_deviation = np.std(data)
print(max_value)
print(min_value)
print(mean_value)
print(median_value)
print(std_deviation)

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 5</b></p>

Write Python code to perform the following tasks:
1. Create a NumPy array of shape (3, 3) filled with random integers between 1 and 100.
2. Compute the mean, standard deviation, and sum of the elements in the array.
3. Extract the second row of the array and compute its sum.

### **1.2.2. Pandas**

#### ***1.2.2.1. Installation***

To start using Pandas, you need to install it in your Python environment. Note that the default Anaconda environment already has Pandas installed, so you don't need to do it. To install Pandas in your custom environment, run the following command:

`conda install pandas`

For more information about Pandas, see [documentation](https://pandas.pydata.org/docs/)

#### ***1.2.2.2. Importing Pandas***

To use Pandas in your Python code, you need to import the library:

In [None]:
import pandas as pd

#### ***1.2.2.3. Pandas Data Structures***

Pandas provides two primary data structures: Series and DataFrame.

**Series**

A Series is a one-dimensional array-like object that can hold various data types. You can think of it as a column in a spreadsheet or a single-dimensional array. Here's how you can create a Series:

In [None]:
data = pd.Series([1, 3, 5, 7, 9])
print(data)

**DataFrame**

A DataFrame is a two-dimensional tabular data structure with rows and columns, similar to a spreadsheet or a SQL table. You can create a DataFrame using dictionaries, lists, or other data structures:

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
}

df = pd.DataFrame(data)
print(df)

#### ***1.2.2.4. Reading and Writing Data***

Pandas supports various file formats for reading and writing data, including CSV, Excel, SQL, and more. Here are some examples of reading and writing data:

**Reading Data**

In [None]:
# Read data from a CSV file
df = pd.read_csv('./datasets/IrisFlower.csv')
print(df.head()) # Show the top 5 rows

In [None]:
# Read data from an Excel file
df = pd.read_excel('./datasets/IrisFlower.xlsx')
print(df.head(10)) # Show the top 10 rows

**Writing Data**

In [None]:
# Write data to a CSV file
df.to_csv('output.csv', index=False)

In [None]:
# Write data to an Excel file
df.to_excel('output.xlsx', index=False)

#### ***1.2.2.5. Basic Data Manipulation***

Pandas allows you to perform various data manipulation tasks, such as filtering, sorting, and aggregating data.

**Filtering Data**

You can filter data based on specific conditions:

In [None]:
filtered_df = df[df['Sepal length'] > 5.0]
print(filtered_df.head())

**Sorting Data**

You can sort data by one or more columns:

In [None]:
sorted_df = df.sort_values(by='Sepal length')
print(sorted_df.head())

**Aggregating Data**

You can perform operations like sum, mean, and count on specific columns:

In [None]:
# Mean sepal width
mean_sepal_width = df['Sepal width'].mean()
print(mean_sepal_width)

In [None]:
# Number of species
num_species = df['Species'].nunique()
print(num_species)

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 6</b></p>

Write Python code to perform the following tasks:
1. Create a DataFrame with columns `Name`, `Age`, and `Score` using the following data:
   - `Name`: ['Alice', 'Bob', 'Charlie']
   - `Age`: [24, 27, 22]
   - `Score`: [85, 90, 88]
2. Compute the average age and score.
3. Add a new column `Passed` that contains `True` if `Score` is greater than 85, otherwise `False`.

### **1.2.3. Matplotlib and Seaborn**

#### ***1.2.3.1. Installation***

To use Matplotlib and Seaborn, you need to install them in your Python environment. You can do this with Anaconda by running the following commands:

`conda install matplotlib`

`conda install seaborn`

[Matplotlib documentation](https://matplotlib.org/stable/index.html)

[Seaborn documentation](https://seaborn.pydata.org/)

#### ***1.2.3.2. Importing Matplotlib and Seaborn***

To use Matplotlib and Seaborn in your Python code, you need to import the libraries:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

#### ***1.2.3.3. Matplotlib***

Matplotlib is a versatile library that provides a wide range of options for creating static, animated, or interactive visualizations. It's well-suited for creating various types of plots, including line plots, bar plots, scatter plots, and more.

**Example: Line Plot**

Let's create a simple line plot to visualize a set of data points:

In [None]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

#### ***1.2.3.4. Seaborn***

Seaborn is built on top of Matplotlib and provides a high-level interface for creating attractive and informative statistical graphics. It simplifies many common tasks and offers various built-in themes and color palettes.

**Example: Scatter Plot**

Let's create a scatter plot using Seaborn to visualize the relationship between two variables:

In [None]:
# Load a sample dataset
data = pd.read_csv('.\datasets\IrisFlower.csv')

# Create a scatter plot
sns.scatterplot(data=data, x='Sepal length', y='Sepal width')
plt.title("Scatter Plot")
plt.show()

Both Matplotlib and Seaborn allow you to customize your plots extensively. You can modify colors, labels, titles, legends, and more to make your visualizations informative and visually appealing.

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 7</b></p>

Write Python code to perform the following tasks:
1. Create a line plot for the function `y = x^2` for values of `x` from -10 to 10.
2. Add appropriate labels for the x-axis and y-axis, and a title for the plot.
3. Create a bar chart showing the values of `Name` vs `Score` from the DataFrame created in Exercise 6.