# Introduction to Python and RDkit

**Topics:**

* [Python](#python)
* [RDKit](#rdkit)

## Python


### Table of Contents
1. [Introduction](#1-introduction)
2. [Setting Up Python](#2-setting-up-python)
3. [Basic Syntax](#3-basic-syntax)
4. [Variables and Data Types](#4-variables-and-data-types)
5. [Operators](#5-operators)
6. [Conditional Statements](#6-conditional-statements)
7. [Loops](#7-loops)
8. [Functions](#8-functions)
9. [Lists](#9-lists)
10. [Tuples](#10-tuples)
11. [Dictionaries](#11-dictionaries)
12. [String Manipulations](#9-string-manipulations)
13. [Pandas](#10-pandas)
14. [NumPy](#11-numpy)
15. [Plotting](#12-plotting-with-matplotlib)
16. [Practice Exercises](#13-practice-exercises)

### 1. Introduction
Welcome to the Elementary Introduction to Python! In this notebook, you will learn the basics of Python programming. Python is a powerful, easy-to-learn programming language that is widely used in various fields, including web development, data analysis, artificial intelligence, and more. By the end of this notebook, you will be familiar with Python syntax and able to write simple programs.

### 2. Setting Up Python
To write and run Python code, you need to have Python installed on your computer. My recommendation is to download [Visual Studio Code (VSCode)](https://code.visualstudio.com/), then you can download Python and Jupyter Notebook using extensions tab inside VSCode. You can also download and install Python from the official website: [python.org](https://www.python.org/). Alternatively, you can use an online Python interpreter like [Google Colab](https://colab.research.google.com/).

### 3. Basic Syntax
Python code is written in plain text and saved with a `.py` extension. Another way to write Python code is using Jupyter Notebooks saved with a `ipynb` extension, such as this file itself. Python uses indentation to define blocks of code. Let's look at a simple example:


In [None]:
# This is a comment
print("Hello, World!")


### 4. Variables and Data Types
Variables are used to store data. Python has several data types, including integers, floats, strings, and booleans. Let's see some examples:


In [None]:
# Integer
x = 5
print(f'{x} is of type {type(x).__name__}')

In [None]:
# Float
y = 3.14
print(f'{y} is of type {type(y).__name__}')

In [None]:
# String
name = "Alice"
print(f'{name} is of type {type(name).__name__}')

In [None]:
# Boolean
is_student = True
print(f'{is_student} is of type {type(is_student).__name__}')

### 5. Operators
Operators are used to perform operations on variables and values. Python supports arithmetic, comparison, logical, and assignment operators. Here are some examples:


In [None]:
# Assign variables
a = 10
b = 3

# Arithmetic operators
print('Arithmetic operators')
print(a + b)  # Addition
print(a - b)  # Subtraction
print(a * b)  # Multiplication
print(a / b)  # Division
print(a % b)  # Modulus
print(a ** b) # Exponentiation
print(a // b) # Floor division

In [None]:
# Comparison operators
print('Comparison operators')
print(a == b)  # Equal to
print(a != b)  # Not equal to
print(a > b)   # Greater than
print(a < b)   # Less than
print(a >= b)  # Greater than or equal to
print(a <= b)  # Less than or equal to

In [None]:
# Logical operators
print('Logical operators')
print(a > 5 and b < 5)  # Logical AND
print(a > 5 or b > 5)   # Logical OR
print(not(a > 5))       # Logical NOT

### 6. Conditional Statements
Conditional statements are used to perform different actions based on different conditions. The `if`, `elif`, and `else` statements are used for this purpose:


In [None]:
age = 18
if age >= 18:
    print("You are an adult.")
elif age >= 13:
    print("You are a teen.")
else:
    print("You are a child.")


### 7. Loops
Loops are used to execute a block of code repeatedly. Python has two main types of loops: `for` and `while` loops. Here are examples of each:


In [None]:
# For loop
total_sum = 0
for i in range(5):
    print(i)
    total_sum = total_sum + i
print(total_sum)

In [None]:
# While loop
count = 0
while count < 5:
    print(count)
    count += 1

### 8. Functions
Functions are blocks of code that perform a specific task. They help in organizing code and making it reusable. Here's how you define and call a function:


In [None]:
def power(number, power=2):
    res = number ** power
    return res

print(power(7))

### 9. Lists

A list is an ordered, mutable (changeable) collection of items. Lists can contain items of different data types, including other lists.


In [None]:
# Creating a list of strings
fruits = ['apple', 'banana', 'cherry']

In [None]:
# Accessing elements
print(fruits[0])  # Output: apple

In [None]:
# Modifying elements
fruits[1] = 'blueberry'
print(fruits)  # Output: ['apple', 'blueberry', 'cherry']

In [None]:
# Adding elements
fruits.append('orange')
print(fruits)  # Output: ['apple', 'blueberry', 'cherry', 'orange']

In [None]:
# Removing elements
fruits.remove('apple')
print(fruits)  # Output: ['blueberry', 'cherry', 'orange']

In [None]:
# Slicing a list
print(fruits[1:3])  # Output: ['cherry', 'orange']

### 10. Tuples

A tuple is an ordered, immutable (unchangeable) collection of items. Once a tuple is created, you cannot modify its elements.

In [None]:
# Creating a tuple
coordinates = (10.0, 20.0)

In [None]:
# Accessing elements
print(coordinates[0])  # Output: 10.0

In [None]:
# Tuples are immutable, so they can't be changed once defined
coordinates[0] = 15.0  # this line will raise a TypeError

In [None]:
# You can also create a tuple without parentheses (tuple packing)
point = 10.0, 20.0, 30.0
print(point)  # Output: (10.0, 20.0, 30.0)

# Tuple unpacking
x, y, z = point
print(x, y, z)  # Output: 10.0 20.0 30.0

### 11. Dictionaries

A dictionary is an unordered collection of key-value pairs. Dictionaries are mutable, which means you can change, add, or remove items.

In [None]:
# Creating a dictionary
student = {'name': 'Alice', 'age': 25, 'major': 'Physics'}

In [None]:
# Accessing values
print(student['name'])  # Output: Alice

In [None]:
# Modifying values
student['age'] = 26
print(student)  # Output: {'name': 'Alice', 'age': 26, 'major': 'Physics'}

In [None]:
# Adding a new key-value pair
student['GPA'] = 3.8
print(student)  # Output: {'name': 'Alice', 'age': 26, 'major': 'Physics', 'GPA': 3.8}

In [None]:
# Removing a key-value pair
del student['major']
print(student)  # Output: {'name': 'Alice', 'age': 26, 'GPA': 3.8}

In [None]:
# Iterating through a dictionary
for key, value in student.items():
    print(key, value)
# Output:
# name Alice
# age 26
# GPA 3.8

### 12. String Manipulations
Strings are a sequence of characters and are one of the most commonly used data types in Python. Python provides a variety of methods to manipulate strings. Here are some important string methods and examples of how to use them:


In [None]:
# String Initialization
text = "Hello, World!"

# Length of the String
print(len(text))  # Output: 13

In [None]:
# String Indexing and Slicing
print(text[0])    # Output: H
print(text[-1])   # Output: !
print(text[0:5])  # Output: Hello

In [None]:
# Changing Case
print(text.lower())  # Output: hello, world!
print(text.upper())  # Output: HELLO, WORLD!
print(text.capitalize())  # Output: Hello, world!

In [None]:
# Stripping Whitespace
text_with_whitespace = "  Hello, World!  "
print(text_with_whitespace.strip())  # Output: Hello, World!
print(text_with_whitespace.lstrip()) # Output: Hello, World!  
print(text_with_whitespace.rstrip()) # Output:   Hello, World!

In [None]:
# Finding Substrings
print(text.find("World"))  # Output: 7
print(text.find("Python"))  # Output: -1 (not found)

In [None]:
# Replacing Substrings
print(text.replace("World", "Python"))  # Output: Hello, Python!

In [None]:
# Splitting and Joining Strings
split_text = text.split(",")  # Output: ['Hello', ' World!']
print(split_text)
joined_text = " ".join(split_text)  # Output: Hello  World!
print(joined_text)

In [None]:
# Checking Start and End
print(text.startswith("Hello"))  # Output: True
print(text.endswith("!"))        # Output: True

In [None]:
# Count Substrings
print(text.count("l"))  # Output: 3

In [None]:
# String Formatting
name = "Alice"
age = 30
print(f"My name is {name} and I am {age} years old.")  # Output: My name is Alice and I am 30 years old.
print("My name is {} and I am {} years old.".format(name, age))  # Output: My name is Alice and I am 30 years old.

### 13. Pandas

Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like Series and DataFrame, which are very useful for handling and analyzing structured data.

**Creating a DataFrame**

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [None]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

In [None]:
# save the data to csv file
df.to_csv('data/data.csv', index=False)

**Reading Data from a CSV File**

Pandas can read data from various file formats, including CSV, Excel, SQL databases, and more. Here, we'll demonstrate how to read data from a CSV file.

In [None]:
# Reading data from a CSV file
df = pd.read_csv('data/data.csv')
print(df)

**Basic Data Manipulations**

Pandas provides numerous functions to manipulate data. Here are some basic operations:

In [None]:
# Display the first few rows of the DataFrame
print(df.head())

In [None]:
# Display summary statistics of the DataFrame
print(df.describe())

In [None]:
# Filter rows based on a condition
adults = df[df['Age'] >= 18]
print(adults)

In [None]:
# Add a new column to the DataFrame
df['Age in 10 Years'] = df['Age'] + 10
print(df)

In [None]:
# Group data by a column and calculate the mean
mean_age_by_city = df.groupby('City')['Age'].mean()
print(mean_age_by_city)

### 14. NumPy
NumPy is a fundamental package for scientific computing with Python. It provides support for arrays, mathematical functions, and much more.

**Creating Arrays**

NumPy arrays are similar to Python lists but provide many more functionalities and are more efficient for numerical operations.

In [None]:
import numpy as np

# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)

In [None]:
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)

In [None]:
# Creating arrays with special values
zeros_array = np.zeros((3, 3))
print(zeros_array)

ones_array = np.ones((2, 4))
print(ones_array)

fulls_array = np.full((2, 3), 7)

In [None]:
# Creating an array with a range of values
range_array = np.arange(10)
print(range_array)


**Basic Array Operations**

NumPy provides many functions to perform operations on arrays.

In [None]:
# Element-wise addition
array_sum = array_1d + 2
print(array_sum)

In [None]:
# Element-wise multiplication
array_product = array_1d * 3
print(array_product)

In [None]:
# Element-wise square root
array_sqrt = np.sqrt(array_1d)
print(array_sqrt)

**Basic Array Manipulations**

NumPy also provides functions to manipulate arrays, such as reshaping, slicing, and indexing.

In [None]:
# Reshaping an array
reshaped_array = array_2d.reshape((3, 2))
print(reshaped_array)

In [None]:
# Slicing an array
sliced_array = array_1d[1:4]
print(sliced_array)

In [None]:
# Indexing an array
indexed_value = array_2d[1, 2]
print(indexed_value)

### 15. Plotting with Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is particularly good for creating plots and charts.

**Line Plot**

A line plot is a simple way to visualize a sequence of data points connected by straight line segments.

In [None]:
import matplotlib.pyplot as plt

# Data for plotting
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Creating a line plot
plt.figure(figsize=(8, 5))
plt.plot(x, y, marker='o')
plt.title('Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()

**Bar Chart**

A bar chart is used to represent data with rectangular bars.

In [None]:
# Data for plotting
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]

# Creating a bar chart
plt.figure(figsize=(8, 5))
plt.bar(categories, values, color='skyblue')
plt.title('Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()


### 16. Practice Exercises
Now it's time to practice what you've learned. Below are some exercises for you to try:

1. Write a Python program to calculate the area of a rectangle.
2. Write a Python program that takes a number as input and prints whether it is positive, negative, or zero.
3. Write a Python program to find the sum of all numbers from 1 to 100.
4. Write a Python program to calculate the factorial of a number.
5. Write a Python program to check if a number is prime.
6. Write a Python function that takes two numbers as input and returns their greatest common divisor (GCD).

Here's an example of how to approach these exercises:


In [None]:
# Exercise 1: Calculate the area of a rectangle for the given length and width
length = 5
width = 3
area = ... # TODO: Calculate the area of the rectangle
print(f"The area of the rectangle is {area}")

# check answer
assert area == 15, f"Wrong answer, {area} != 15"

In [None]:
import numpy as np
# Exercise 2: Find the maximum of three numbers
numbers = np.random.randint(0, 10, 3)
if ...:
    max_number = ... # TODO: Find the maximum number
...
print(f"The maximum number is {max_number}")

# check answer
assert max_number == np.max(numbers), f"Wrong answer, {max_number} != {np.max(numbers)}"

In [None]:
# Exercise 3: Sum of numbers from 1 to 100
total_sum = 0
for i in ...: # TODO: Loop through numbers from 1 to 100
    total_sum = ... # TODO: Calculate the sum of numbers
print(f"The sum of numbers from 1 to 100 is {total_sum}")

# check answer
assert total_sum == sum(range(101)), f"Wrong answer, {total_sum} != {sum(range(101))}"

In [None]:
# Exercise 4: Write a function to calculate the factorial of a number
def factorial(n):
    if ...:
        return ...
    output = 1
    for i in ...: # TODO: Loop through numbers from 1 to n
        output = ... # TODO: Calculate the factorial of n
    return output

In [None]:
# Exercise 5: Write a function that checks if a number is prime
def is_prime(num):
    '''
    This function checks if a number is prime or not 
    and returns True if the number is prime, otherwise False.
    '''
    ... # some code here

# check answer
assert is_prime(7) == True, "Wrong answer"
assert is_prime(10) == False, "Wrong answer"
assert is_prime(1) == False, "Wrong answer"
assert is_prime(2) == True, "Wrong answer"

In [None]:
# Exercise 6: Find the GCD of two numbers
def gcd(a, b):
    '''
    This function calculates the Greatest Common Divisor (GCD) of two numbers.
    '''
    ... # some code here


# check answer
assert gcd(12, 15) == 3, "Wrong answer"
assert gcd(10, 20) == 10, "Wrong answer"
assert gcd(7, 9) == 1, "Wrong answer"
assert gcd(12, 12) == 12, "Wrong answer"

## RDkit

### Table of Contents
1. [Introduction](#1-intro)
2. [Installation](#2-installation)
3. [Basic Usage](#3-basic-usage)
4. [Molecule Visualization](#4-molecule-visualization)
5. [Molecular Descriptors](#5-molecular-descriptors)
6. [File Reading and Writing](#6-file-reading-and-writing)
7. [Practice Exercises](#7-practice-exercises)

### 1. Intro
RDKit is a collection of cheminformatics and machine learning tools. It is used for processing and analyzing chemical information. In this notebook, we will cover the basics of using RDKit, including molecule creation, visualization, calculating molecular descriptors, and substructure searching.

### 2. Installation
To use RDKit, you need to install it first. You can install RDKit using conda `conda install -c conda-forge rdkit` or pip `pip install rdkit`. 

### 3. Basic Usage
Let's start with some basic usage of RDKit. We'll learn how to create a molecule object from a SMILES string and get basic information about the molecule. Your best friend will be the [RDkit documentation](https://www.rdkit.org/docs/GettingStartedInPython.html) where you can find RDkit functions and their explanations.

In [None]:
from rdkit import Chem

# Create a molecule from a SMILES string
smiles = 'CCO'
molecule = Chem.MolFromSmiles(smiles)

# Print the molecule object
print(molecule)

# Get the number of atoms in the molecule
num_atoms = molecule.GetNumAtoms()
print(f'The molecule has {num_atoms} atoms.')

# Get the number of bonds in the molecule
num_bonds = molecule.GetNumBonds()
print(f'The molecule has {num_bonds} bonds.')


### 4. Molecule Visualization
RDKit provides tools for visualizing molecules. We can use the rdkit.Chem.Draw module to display molecules.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw

molecule = Chem.MolFromSmiles('CC1([C@@H](N2[C@H](S1)[C@@H](C2=O)NC(=O)Cc3ccccc3)C(=O)O)C')

# Draw the molecule
Draw.MolToImage(molecule)

### 5. Molecular Descriptors
Molecular descriptors are numerical values that describe the properties of molecules. RDKit provides functions to calculate various descriptors.

In [None]:
from rdkit.Chem import Descriptors

# Calculate molecular weight
mol_weight = Descriptors.MolWt(molecule)
print(f'Molecular Weight: {mol_weight}')

# Calculate the number of hydrogen bond donors
num_h_donors = Descriptors.NumHDonors(molecule)
print(f'Number of Hydrogen Bond Donors: {num_h_donors}')

# Calculate the number of hydrogen bond acceptors
num_h_acceptors = Descriptors.NumHAcceptors(molecule)
print(f'Number of Hydrogen Bond Acceptors: {num_h_acceptors}')


### 6. File Reading and Writing
RDKit allows you to read and write molecule files in various formats, such as SDF, MOL, and SMILES. This section will cover how to read and write these files using RDKit.

In [None]:
from rdkit import Chem

# Reading a molecule from a SMILES string
smiles = 'CCO'
molecule = Chem.MolFromSmiles(smiles)

# Writing the molecule to a file in SDF format
with Chem.SDWriter('molecule.sdf') as writer:
    writer.write(molecule)

# Reading a molecule from an SDF file
with Chem.SDMolSupplier('molecule.sdf') as supplier:
    mol_from_file = next(supplier)

# Confirming the molecule is read correctly
print(Chem.MolToSmiles(mol_from_file))


### 7. Practice Exercises
Here are some exercises to practice what you've learned:

1. Create a molecule from the SMILES string 'CCO' and display it.
2. Calculate and print the molecular weight, number of hydrogen bond donors, and number of hydrogen bond acceptors for the molecule 'CCO'.
3. Create a molecule from the SMILES string 'CCC(=O)O' and check if it contains a carboxyl group ([C(=O)O]).


Here's an example of how to approach these exercises:

In [None]:
# Exercise 1: Find your favorite molecule in SMILES format
smiles = ... 
molecule = ... # TODO: Create a molecule object from the SMILES string
... # TODO: Display the molecule

In [None]:
# Exercise 2: Calculate molecular weight, number of hydrogen bond donors, and number of hydrogen bond acceptors
mol_weight = ...
num_h_donors = ...
num_h_acceptors = ...
print(f'Molecular Weight: {mol_weight}')
print(f'Number of Hydrogen Bond Donors: {num_h_donors}')
print(f'Number of Hydrogen Bond Acceptors: {num_h_acceptors}')

In [None]:
# Exercise 3: Save the molecule to an SDF file, then read it back.
filename = 'molecule.sdf'
... # TODO: Write the molecule to an SDF file
... # TODO: Read the molecule from the SDF file

assert Chem.MolToSmiles(molecule) == Chem.MolToSmiles(mol_from_file), "Molecule is not read correctly"