# Introduction to Python

Welcome to python! In this notebook you'll learn some basic tenants of python coding and using a Jupyter notebook! Make sure to read the instructions carefully and follow along.

### Chapters:
- Using a Jupyter Notebook
- Intro to Python Syntax
- Some Data Structures: lists, arrays, and dataframes
- Common errors

You can ignore the code below, it's just for setup:

In [None]:
# Clone the GitHub repository
!git clone https://github.com/samihat-rahman/chempython-visions-2024.git

# Navigate to the directory containing your dataset
%cd chempython-visions-2024/tutorial

# Using a Jupyter Notebook
There are two main components or "cells" in a Jupyter Notebook:
- Markdown cell
- Python/Code cell

## Markdown Cells
The cell you're reading right now is a **Markdown** cell. For the assignments, this will contain the instructions for running the notebook. We can also add nicely formatted equations in a mark **Markdown** cell:

$$y=mx+c$$

You **do not** need to know the syntax of how to use a Markdown cell, they will only be for instructions. Any interactions with the Markdown file will have explicit instructions.

## Python/Code Cell

Below we have a python or code cell:

In [None]:
# This is a python cell
print("Hello, World!")

These are **interactive** this is where we write the code. If you hover your mouse on the left of the cell, there is a little play button. If you click that button it should print this message right below the cell:

"Hello, World!"

Notice the `print("Hello, World!")` line: this is telling python to output the message "Hello, World!"

I've added an empty cell below with some quotation marks. Try playing with the cells and add whatever message you want **inside of the quotation marks**. Then run the cell to see if things work as expected.

In [None]:
# Add your message in the print statement below
print("REPLACE THIS WITH YOUR MESSAGE")
# Then run the python cell

There is also some basic syntax you will need to be aware of, such as comments. Most *major* instructions will be given in the markdown cells, but some of the instructions will be given to you within the python cell. This is done using comments, this is "non code" language that is used to explain the code. 

Comments come after a hashtag: `#`

In [None]:
# This is a comment, this does not run any code and is only used for notes

Now you should be ready to run a Jupyter Notebook!

# Python syntax: Variables

Below we'll learn some very basic syntax on python, i.e. how we store information on python and tell it what to do. 

## Printing & Strings

The first thing we learned from above was how to get python to "print" a message.

This is done by using the `print()` statement and putting a *string* inside the print. A *string* is surrounded by quotations `""`

In [None]:
print("This will print a string")

Go ahead and run the cell above and look at the output

### Outputs

This brings us to outputs. We've already encountered this, but, explicitly, the outputs of the python codes appear *below* the cell after running it. This is also where any errors will appear!

## Variables

We'll now learn about variables. Variables are very important in python, they are how we store any sort of information in python. 

Variables are defined by defining a variable name and adding an equals sign after the name `=`. The data we want to store in the variable is written to the *right* of the `=`. 

For example, if I want to define a variable for the number *7* and label it *seven*, the syntax will be:

`seven = 7`

And in python:

In [None]:
# A variable for the number 7
seven = 7

Notice that when you run this code (hit the play button over the cell), there is no output! This is because variables are simply stored *in memory*. If we want the variable to be outputted, we need to write a print statement and put the variable name inside the parentheses.

In [None]:
# Here we are print the variable seven
print(seven)

### Variables can be anything

Variables can be any sort of data, not just numbers. It can also be strings, which we learned about above.

In [None]:
# This is a variable storing a string
greeting = "Hello, World!"
# Now let's print the variable greeting
print(greeting)

Now there are some rules while naming variables:
- The name cannot have numbers in it, it must be words
- There are no spaces in variable names, if you need to add a space use an underscore: `double_word = "Two word variable"`
- There needs to be a variable name and some sort of data on the right of the equals sign, otherwise the statement is incomplete: `incomplete_variable =`

### Doing math using variables

Now let's get comfortable using variables. We'll use python to calculate the molar mass of some hydrocarbons. The first step here is to define variables with the atomic masses of carbon and hydrogen:

In [None]:
# Below are the variables for the atomic masses of Hydrogen and Carbon
mass_H = 1.0079
mass_C = 12.0107

Make sure to run the cell above!

Now let's find the mass for methane: $CH_4$

So we need to define variables for the number of carbons and hydrogens:

In [None]:
# Defining variables for the number of atoms of each element in a molecule of methane
num_H = 4
num_C = 1

Again, run this cell!

To find the molar mass, we will do some calculations! To tell python to add two variables we simply use a plus sign `+` and for multiplication, we use a star `*`. You'll notice that this is the exact same as in Excel!

We can also use parentheses, just like we would mathematically.

In [None]:
# Now to find the molar mass of methane
molar_mass = (mass_H*num_H) + (mass_C*num_C)
# Now let's print the molar mass
print(f'Molar mass of methane: {molar_mass} g/mol')

One thing you'll notice is that the format inside the print statement is not something we've seen before! This is because I'm using something known as an *f-string*. You **do not** need to know what an f-string is, but it's a very clean way to have both strings and variables in one print statement.

Now, your task is to find the molar mass of ethane: $C_2H_6$

Below, I've added all the variables, you need to complete the *right-hand side* definitions

In [None]:
# Here are the masses, make sure to define the variables
mass_H =
mass_C = 
# Now define the number of atoms of each element in a molecule of ethane
num_H = 
num_C = 

Now run this cell after defining the variables! ***Remember, we always need to run each cell, this stores the variable in python. If you miss running a cell, then python will not remember that you defined the variable.***

Now, below I've added an incomplete statement for the molar mass. Use the syntax you learned above to add the computation for the molar mass, The answer should be: 30 g/mL

In [None]:
# Below is the molar mass calculation for ethane
molar_mass = 
# Now let's print the molar mass: no need to add anything to this line
print(f'Molar mass of ethane: {molar_mass} g/mol')

### Some extra math syntax
I'll list down all the math syntax you'll need, most of them are the same as in Excel, except for exponents!
- Adding: `+`
- Subtracting: `-`
- Multiplication: `*`
- Division: `/`
- Exponent: `**`
- Parentheses: `()`

In [None]:
# Some examples of mathematical operations: feel free to play with the numbers
addition = 2 + 2
subtraction = 3 - 1
multiplication = 2 * 3
division = 8 / 4
exponentiation = 2 ** 3
# Add order of operations
order_of_operations = (2 + 2) * 4 / 2

# Now let's print the results
print(f'Addition: {addition}')
print(f'Subtraction: {subtraction}')
print(f'Multiplication: {multiplication}')
print(f'Division: {division}')
print(f'Exponentiation: {exponentiation}')
print(f'Order of Operations: {order_of_operations}')

# Data Structures: lists, arrays, and strings

## Lists

One data structure that we won't use often in these labs, but you should know about is a *list*. We need to learn about lists as they are the basis of the other more complicated structures we'll need.

A list is, as expected: a list of variables, numbers, or strings! They are defined by adding *elements* inside a square bracket `[]` and separating each element in the list using a comma. So, if I wanted a list of numbers 1-5 we would do this:

In [None]:
# Printing a list of numbers from 1 to 5
print([1, 2, 3, 4, 5])

When you run the cell you'll notice that it prints out the list exactly as we wrote it in the syntax.

A list can be stored in a variable and can hold any types of data: numbers or strings

In [None]:
# Here is a list with random elements
random_list = [1, 'hello', 3.14, 'world', 5]
# Now let's print the list
print(random_list)

Now let's make two lists with some random numbers and see what happens when we try to add them.

In [None]:
# Two lists with random numbers of the same length
list_1 = [1, 2, 3, 4, 5]
list_2 = [6, 7, 8, 9, 10]

# Adding the two lists
add_list = list_1 + list_2

# Now let's print the added list
print(add_list)

Run the cell above, and what do you see?

The elements were not added mathematically! Instead, we see something else known as *concatenation*, where the elements of the lists were combined to form a new, longer list. This is why, to do math, we need a new type of data structure.

## Arrays

The best way to do math with lists is to convert them into arrays. The best way to view arrays is as vectors, so one column of data that can be as many rows as needed. To create arrays we will need to pull up a module known as `numpy`.

Run the cell below, and what do you see?

In [None]:
# Let's import numpy
import numpy as np

# let's create two lists of numbers
list_1 = [1, 2, 3, 4, 5]
list_2 = [6, 7, 8, 9, 10]

# Now let's convert the lists to arrays
array_1 = np.array(list_1)
array_2 = np.array(list_2)

# Now let's add the two arrays
add_array = array_1 + array_2

# Now let's print the added array
print(add_array)


As expected, the data is now added properly! We can do this for any two arrays *as long as they are the same size*. And any operation can be done.

In [None]:
# Now let's try doing other mathematical operations with the arrays
mult_array = array_1 * array_2
div_array = array_2 / array_1
exp_array = array_1 ** array_2

# Now let's print the results
print(f'Multiplication: {mult_array}')
print(f'Division: {div_array}')
print(f'Exponentiation: {exp_array}')

We can also do math with arrays and numbers or scalars.

In [None]:
# We can add a number to an array
add_num_array = array_1 + 5
# We can also multiply an array by a number
mult_num_array = array_1 * 2
# Let's square the array
squared_array = array_1 ** 2

# Now let's print the results
print(f'Addition: {add_num_array}')
print(f'Multiplication: {mult_num_array}')
print(f'Squared: {squared_array}')

We can also have multidimensional arrays. Best way to think about 2D or 3D arrays is in the form of a matrix. The examples above are one dimensional arrays so a vector:

$$ \mathrm{1D Array} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ .. \end{bmatrix} $$

So a 2D array is like a 2D matrix:

$$ \mathrm{2D Array} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ .. & .. \end{bmatrix} $$

In [None]:
# here's an example of a 2D numpy array
array_2D = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Now let's print the 2D array
print(array_2D)

We will very rarely encounter 2D arrays, even though our datasets will often have more than one column. Instead, we'll opt to use a datastructure that makes analyzing multidimensional data easier: dataframes.

## Dataframes

Dataframes are more complicated: think of them as an Excel sheet that you have to control using python. Like Excel sheets, they have column headings and the data is arranged in tabular form. The numbers also behave like a `numpy` array, so you can perform orders of operations between columns.

For this tutorial we will be using a faux datasheet (these numbers are all false!) which is an Excel file that located in the same folder as this Jupyter notebook. We will first open this data using a module known as `pandas`.

Run the cell below and see what happens

In [None]:
# Importing pandas
import pandas as pd

# Opening the excel file as a pandas dataframe and storing it in the variable dataframe
dataframe = pd.read_excel('emissions_data.xlsx', engine='openpyxl')

# Now let's print the first few rows of the dataframe
print(dataframe.head())

You should see some data that showcases some climate and energy data with respect to the year. When we have large datasets we need to easily extract one or two rows or columns quickly for comparison. This is where dataframes are very powerful. Unlike `numpy` arrays, which can only contain numbers, a dataframe can contain words/letters allowing us to have *labelled column headings*. This will let us very easily extract a column using its label. 

The syntax is as follows:

- First identify the variable name of your dataframe
- Copy the *exact* name of the column heading
- Combine the two like this: `dataframe_variable["Column Heading"]`, the column heading has to be in between quotes!

In [None]:
# Let's print out just the temperature column
print(dataframe["Temperature (°C)"])

We can also look at two columns at the same time by separating the column names by a comma and adding an extra square bracket:

`dataframe_variable[["Column 1", "Column 2"]]`

In [None]:
# Let's look at the year and emissions columns
print(dataframe[["Year", "CO2 Emissions (ppm)"]])

Again, we need to make sure that the column name is *exact*. If it's not exact we will get an error!

# Common Errors

## Arrays of different lengths
So, what happens when we try to add two arrays that are *different lengths*?

Run the cell below and see what happens

In [None]:
# Let's make two lists of different lengths
list_1 = [1, 2, 3, 4, 5]
list_2 = [6, 7, 8]

# Now let's convert the lists to arrays
array_1 = np.array(list_1)
array_2 = np.array(list_2)

# Now let's add the two arrays: what do you expect to happen?
add_array = array_1 + array_2

# Now let's print the added array
print(add_array)

Error message! What we'll focus on is this part:

`ValueError: operands could not be broadcast together with shapes (5,) (3,)`

This means that the addition could not be done since the lengths of the arrays are different.

***Fix:*** Make sure arrays are the same length before any order of operations are performed.

## Wrong Column Name

We'll use the same dataset we used above. Let's try to call the temperature column, but we'll do something slightly wrong and let's see what happens 

In [None]:
# Calling the temperature column from the dataframe above
print(dataframe["Temperature"])

We get a **Key Error**! 

`KeyError: 'Temperature'`

This is because the heading is *Temperature (°C)* **not** *Temperature*. We need to be **exact** with our column headings when calling them.