# 1. Python Data Types

## Recap on data types

Python has several built-in data types, which are categorized into:
- Numeric: `int`, `float`
- Sequence: `str`, `list`, `tuple`
- Mapping: `dict`
- Set: `set`
- Boolean: `bool`

### Numeric Data Types
- `int`: Integer, e.g. 1, 2, 3
- `float`: Floating point number, e.g. 1.0, -2.5, 3.14
- `complex`: Complex number, e.g. 1 + 2j, 3 - 4j

In [None]:
# Integer
int_var = 10
print(type(int_var))  # <class 'int'>

# Float
float_var = 10.5
print(type(float_var))  # <class 'float'>

# Complex
complex_var = 10 + 5j
print(type(complex_var))  # <class 'complex'>

### Sequence Data Types
Sequence data types are **ordered** collections of similar or different data types. The elements in a sequence can be accessed using **indexing**.
- `str`: String, e.g. "hello", 'world'
- `list`: List, e.g. [1, 2, 3], ['a', 'b', 'c']
- `tuple`: Tuple, e.g. (1, 2, 3), ('a', 'b', 'c')

In [None]:
# String
str_var = "Hello, Python!"
print(type(str_var))  # <class 'str'>

# List
list_var = [1, 2, 3, 4, 5]
print(type(list_var))  # <class 'list'>

# Tuple
tuple_var = (1, 2, 3, 4, 5)
print(type(tuple_var))  # <class 'tuple'>

You can include anything you want in lists, from other lists, to strings to tuples. Although this behaviour nis allowed, in practise this should be avoided as it can lead ot code which behvaes unpredictably and is tricky for others ( and future you ) to debug.

In [None]:
# You can put anything you want in a list, including other lists

elements = [["Hydrogen", "Helium", "Lithium"], ["Beryllium", "Boron", "Carbon"], ["Nitrogen", "Oxygen", "Fluorine"]]

# You can also declare whacky lists like this

whacky_list = [1, 'dog', 3.14, [4, 5, 6]]

### List Operations

You can add elements to the list using the `.append` method.

In [None]:
element_list = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron"]
element_list.append("Lead")
print(element_list)

To add an element to a list at a specific index, you can use the insert method. Write code to add the missing element to the list.

In [None]:
element_list = ["Hydrogen", "Lithium", "Beryllium", "Boron"]
element_list.insert(1, 'Helium')
print(element_list)

To remove an element from a list, you can use the remove method. When removing an element from a list, you must specify the value of the element you want to remove. Write code to remove the first element from the list

In [None]:
element_list.remove("Hydrogen")
print(element_list)

The 'remove' function simply deletes the element, what If you want to retrieve the element and then delete it? You can use the pop method for this purpose. The pop method takes one argument, the index of the element you want to remove. It has interesting behavior when you don't specify an index, in this case it by default removes the last element from the list. Write code to remove the last element from the list, and then remove the second element. You can access lists in the reverse direction using negative indices, where '-1' refers to the last element, '-2' refers to the second last element and so on.

In [None]:
element_list = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron"]

a = element_list.pop(-1)
print(a)

b = element_list.pop(1)
print(b)

print(element_list)

You can delete all the elements using the clear method

In [None]:
element_list.clear()
print(element_list)

Now how about if we have two lists, and we want to combine them into a single list. For this we can just add them using the '+' operator

In [None]:
element_list_1 = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron"]
element_list_2= ["Nitrogen", "Oxygen", "Fluorine"]
element_list_3 = element_list_1 + element_list_2
print(element_list_3)

print(len(element_list_3))

#### Indexing
- Indexing in Python starts from 0.
- Negative indexing is also possible, where -1 refers to the last element, -2 refers to the second last element, and so on.
- Slicing can be used to access a range of elements in a sequence.
    - The syntax for slicing is `sequence[start:stop:step]`.

In [None]:
# These types are ordered and can be indexed
print(str_var[0])
print(list_var[3])
print(tuple_var[-1])

In [None]:
print(str_var[0:5])

#### Difference between `list` and `tuple`
- `list` is mutable, i.e. the elements in a list can be changed or modified.
- `tuple` is immutable, i.e. the elements in a tuple cannot be changed or modified.

In [None]:
list_var[0] = 10
print(list_var)  # [10, 2, 3, 4, 5]

In [None]:
tuple_var[0] = 10  # TypeError: 'tuple' object does not support item assignment

#### String Methods
- `str` has several built-in methods, such as `upper()`, `lower()`, `strip()`, `split()`, `join()`, `find()` etc.
- `str` is immutable, i.e. the elements in a string cannot be changed or modified.
- String concatenation can be done using the `+` operator.
- String formatting can be done using f-strings.

In [None]:
# built in string methods
print(str_var.lower())
print(str_var.upper())
print(str_var.split(","))
print(str_var.replace("Hello", "Hi"))
print(str_var.find("Python"))

In [None]:
# f string
molecules = 'hydrogen oxide'
atoms = 3
print(f'Water is composed of mostly {molecules} and it has {atoms} atoms')

### Set
A set is an unordered collection of **unique** elements. It is defined by a pair of curly braces `{}`.
- `set`: Set, e.g. {1, 2, 3}, {'a', 'b', 'c'}

In [None]:
set_var = {1, 2, 3, 4, 5}
print(type(set_var))  # <class 'set'>

In [None]:
# type is unordered and unindexed
print(set_var[0])  # TypeError: 'set' object is not subscriptable

In [None]:
# showcase that unique elements are stored in set
set_var = {1, 2, 3, 4, 5, 5, 5, 5, 5}
print(set_var)

#### Usage of Sets
- To eliminate duplicate elements from a list. (*See above*)
- To perform mathematical set operations like union, intersection, difference, etc.

In [None]:
# show use cases for sets
set_var1 = {1, 2, 3, 4, 5}
set_var2 = {4, 5, 6, 7, 8}

print(set_var1.union(set_var2))  # {1, 2, 3, 4, 5, 6, 7, 8}
print(set_var1.intersection(set_var2))  # {4, 5}
print(set_var1.difference(set_var2))  # {1, 2, 3}
print(set_var1.symmetric_difference(set_var2))  # {1, 2, 3, 6, 7, 8}

### Mapping Data Types
A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.
- `dict`: Dictionary, e.g. {'amino acid': ['alanine', 'valine'], 'nucleotide': ['adenine', 'thymine']}

In [None]:
dict_var = {"halogen": "fluorine", "noble_gas": "helium", "alkali_metal": "lithium"}
print(type(dict_var))  # <class 'dict'>
print('keys:', dict_var.keys())
print('values:', dict_var.values())

## Exercises

### Exercise 1.1:
What does this code return?

```python
my_string = "2cfo6njs[pwi2r3adcvy"
my_string[0:10:2]
```
What could the 2 mean in that context?

In [1]:
my_string = "2cfo6njs[pwi2r3adcvy"
my_string[0:10:2]

'2f6j['

As a final excerise on slicing, you will write code for finding the middle index of a list and then use list slicing to split the list into two sublists. Put it inside the function `split_list` and test it using the `test_split` function. When dividing a list in two, think about the edge cases you must consider. Will your code work for both even and odd length's of lists? What about empty lists? Remember that lists in Python have indexes which start from `0`, so the `7th` element has index `6`. Not correctly accounting for this is an extremely common problem in programming and can be tricky to debug. 

First, implement the simplest case, where the list length is an even number and write code for this. Your output should be the middle index, and the two equal lenght halves of the list. Make sure to calculate the list slices using simple mathematical operations in the code

In [None]:
test_even = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine", "Sodium"]
test_odd = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]


middle_index = ...
first_half = ...
second_half = ...


Now extend it to work with odd numbered lists. It is good practise when splitting lists into an even and odd partition, to have the longest segement be the lowest segement, this is indicated in the test case.

In [None]:
test_even = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine", "Sodium"]
test_odd = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]



middle_index = ...
first_half = ...
second_half = ...



### Exercise 1.2:

How can you make this calculation work?

```python
a = 5
b = "6"
a + b
```

In [None]:
a = 5
b = "6"
# correct here


### Exercise 1.3:

Now you have seen that Python has interoperability of certain variable types. 

Next we will look at boolean variables. These hold a single value, True or False. You can perform operations on them: 'AND', 'OR' and 'NOT'

First, evaluate the expressions below by hand, then check yours answers with some python code.

a = False
b = True
c = False

1. a and b        = ...
2. a or b         = ...
3. not a          = ...
4. not b and c    = ...
5. (a and b) or c = ...

In [None]:
### in python, you can simply represent the boolean operators by their english language name

### Your code here

Now we have a concept of booleans we can think about conditional statements. These are useful if you want to be able to execute seperate branches of code, depending on your input. An 'if' statement evaluates a boolean expression and in the case of an expression 'True' allows the code to enter the execution block. Blocks are marked by indents. 

First, WITHOUT running the code below, determine its output by hand. It is an important skill to be able to understand what a piece of code does without running it.

In [None]:
a = True
b = False
c = True

if a:
    if not c:
        print('Answer 1')
    elif c and b:
        print('Answer 2')
    print('Answer 3')
else:
    print('Answer 4')

### Exercise 1.4

Sometimes we won't have the option of using booleans in our code, for example we might want to evaluate if a String or and Integer evaluates as True or False. For this case, Python allows the evaluation of conditional statements on non-boolean inputs. Try out various combinations of the below variables with the goal of finding out what values for strings and integer data types evaluate to True or False.

In [None]:
a = 'Hydrogen'
b = 'oxygen'
c = 1
d = 0
e = ''
f = -3
d = None

### Your code here

### Exercise 1.3:
Print out the first letter of every word in the string.

```python
sentence = 'Sober Physicists Don’t Find Giraffes Hiding In Kitchens'
```
What do you observe?

In [None]:
sentence = 'Sober Physicists Don’t Find Giraffes Hiding In Kitchens'

# print solution here

### Exercise 1.4:

1. Create a dictionary that represents the following table:
 
| Base | Acid |
|------|------|
| 'NaOH' | 'HCl' |
| 'KOH' | 'H2SO4' |
| 'Ca(OH)2' | 'HNO3' |

2. Add a new base to the dictionary: `NH4OH`.
3. Print out the categories and chemicals. 

In [None]:
# 1.

# 2.

# 3.

# 2. Control Structures - Loops

Now we will have a look at control flow in code. If you have a collection of elements like a list, you might want to iterate over each element and peform an action. First, lets look at the `while` loop. This loops checks a condition, and then if the condition evaluates to `True`, executes a block of code. After the code block is executed it returns to the condition and checks it again. 

In [None]:
pH = 2  # Assume we start the pH at 2 (which is acidic)

while pH != 7:  # while the pH is not neutral
    print(f"Current pH: {pH}")
    if pH < 7:  # if the environment is acidic
        print("Solution is too acidic. Adding a base to increase pH.")
        pH += 1  # add a base to increase the pH
    elif pH > 7:  # if the environment is basic
        print("Solution is too basic. Adding an acid to decrease pH.")
        pH -= 1  # add an acid to reduce the pH
        
print("Solution is now neutral.")

We can also use `while` loops to iterate over a sequence of numbers.

In [None]:
counter = 0
max_count = 9

# Here is the list of the first nine chemical elements:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]

while counter < max_count:
    # Here we print the element at the current index
    # Note the adjustment for 0-based indexing
    print(f"Element {counter + 1}: {elements[counter]}")
    counter += 1

We can use two additional control flows in iterations.. `break` immediately terminates the loop iterations and `continue` skips the current iteration of the loop, but the loop continues to run.

Given this information, what will be the output of the program below?

In [None]:
elements = ["Iron", "Copper", "Zinc", "Gold", "Silver", "Platinum"]

for element in elements:
    if element == "Copper":
        continue
    if element == "Gold":
        break
    print(element)

### For Loops

A for loop in Python is a way to repeat code for each item in a sequence. The basic syntax looks like this:

In [None]:
for item in iterable:
    # do something with item

Iterables are objects in Python that contain a sequence of elements - they can be "iterated over" one element at a time. Common iterables include:

In [None]:
noble_gases = ["Hydrogen", "Neon", "Argon"]
for gas in noble_gases:
    print(gas)

# We can also iterate in reverse
for gas in reversed(noble_gases):
    print(gas)

# Strings (iterate over each character)
name = "Lithium"
for letter in name:
    print(letter)

# Range (generates a sequence of numbers)
for number in range(3):
    print(number)  # Prints 0, 1, 2

The beauty of for loops is their simplicity - you don't need to manage indexes or worry about when to stop. Python automatically handles iterating through all elements and stops when it reaches the end.

However, if you want to iterate over a list via its index using a for loop you can do it in one of the two the following ways.

In [None]:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]

for idx in range(len(elements)):
    print(f"{idx}, {elements[idx]}")

for idx, element in enumerate(elements):
    print(f"{idx}, {element}")

Its important to know that modifying the 'element' that the for loop produces does not alter the original list.

In [None]:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]

for idx, element in enumerate(reversed(elements)):
    element = element.lower() + ' : ' + str(idx + 1)

print(elements)

If we wish to modify the original list, we can try the naive approach below. The code is trying to reverse the list and add atomic numbers. Before running the code, can you see what will go wrong?

In [None]:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]

for idx, element in enumerate(reversed(elements)):
    elements[idx] = element.lower() + ' : ' + str(len(elements) - idx)

### Exercise 2.1:

Implement a method to reverse a list and the corresponding atom numbers. As a hint, consider creating a new list.

In [None]:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]
new_elements = []

... # Your code here

print(elements)

### Exercise 2.2:

Can you think of a way to reverse the list *in-place*, ie without creating an entirely new list?

In [None]:
elements = ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine"]
new_elements = []

... # Your code here

print(elements)

### Exercise 2.3:

For the following group of problems, your task is to work out what the output of the code will be without running it. Check your answer by executing the program. If you get them wrong, try to go through the code step by step and double check your assumptions about how each line of code works.

In [None]:
numa = 11
while numa > 2.5:
    numa = numa - 1
    print(numa)

In [None]:
numb = 2.5
for i in range(0, 10, 2):
    pass
    print(i/numb)


In [None]:
numc = 10.2 
while True:
    if numc < 6.2:
        break
    print(numc)
    numc -= 1

In [None]:
collected_strings = []

for i in range(1, 5):
    if i % 2 == 0:  
        for j in range(5):
            if j == 3:
                break
                collected_strings.append(str(j))
        collected_strings.append(str('F'))
    else:  
        for j in range(5):
            if j == 3:
                continue
            elif j == 4:
                pass
            collected_strings.append(str(j))

for i in range(3):
    if i == 1:
        collected_strings.append("!")
        continue
    collected_strings.append("?")

collect_str = "".join(collected_strings)
print(f'Collected string is {collect_str}')

The code provided in this question is buggy. Do not execute it. What do you think the programmer intended this code to do? Jot down a table that shows the value of the variables at each iteration. This shoudl give you a clear understand of why the code is buggy. Once you have done so, modify the code such that it is no longer buggy. Note due to the lack of comments indicating what the code is attempting to do, there are several possible answers for this.

In [None]:
n = 10 
i = 10
while i > 0:
    if i % 2 == 0:
        i=i/2
    else: 
        i=i+1

# 3. Setting paths

Setting paths when coding is important. It is a good practice to set the paths to folders/data in a way that is reproducible and especially shareable. This is important when sharing code with others, or when you are working on a project that requires data from different sources. Paths also look different on different operating systems (Windows, Mac, Linux), so it is important to set paths in a way that is compatible with all operating systems. Luckily, there are libraries like `os` and `pathlib` that can help us with that. We will look into `pathlib` in this notebook.

## Introduction to Pathlib
An introduction to the pathlib module, which provides a way to handle filesystem paths.

In [None]:
# Introduction to Pathlib

# Importing the pathlib module
from pathlib import Path

# Creating a Path object
p = Path('.')

# Displaying the current directory
print(p.resolve())

# Listing all files in the current directory
for file in p.iterdir():
    print(file)

# Creating a new directory
new_dir = p / 'new_directory'
new_dir.mkdir(exist_ok=True)

# Checking if the new directory exists
print(new_dir.exists())

# Creating a new file in the new directory
new_file = new_dir / 'new_file.txt'
new_file.touch()

# Checking if the new file exists
print(new_file.exists())

# Deleting the new file
new_file.unlink()

# Checking if the new file exists
print(new_file.exists())

# Deleting the new directory
new_dir.rmdir()

# Checking if the new directory exists
print(new_dir.exists())

## Exercises

### Exercise 2.1:

1. Create a directory called `ex_folder` in the current working directory. 
2. Check after creation if the directory exists.
3. Create a file called `ex_file.txt` in the `ex_folder` directory.

In [None]:
# 1.

In [None]:
# 2.

In [None]:
# 3.

### Exercise 2.2:

Correct these paths so that it works on all operating systems, if possible. 
```python
path1 = 'C:\Path\to\your\working\dir\ex_file.txt'
path2 = 'Path/to/your/working/dir/ex_file.txt'
path3 = '/Users/neeser/Documents/teaching/CH-200_PracticalProgrammingChem/practical-programming-in-chemistry-exercises/week_01/ex_folder/ex_file.txt
```

What are the issues with these paths?

In [None]:
path1 = 'C:\Path\to\your\working\dir\ex_file.txt'
path2 = 'Path/to/your/working/dir/ex_file.txt'
path3 = Path('/Users/neeser/Documents/teaching/CH-200_PracticalProgrammingChem/practical-programming-in-chemistry-exercises/week_01/ex_folder/ex_file.txt')
# correct here

print(path1.exists())
print(path2.exists())
print(path3.exists())

### Exercise 2.3:

Delete the `ex_folder` directory and its contents. Check if the directory exists after deletion.

In [None]:
# delete the directory and its contents


In [None]:
# check if ex_folder exists
