# Day 1-part 1: Fundamentals 1

## What is Python?

- Python is a programming language.
- Python is a collection of powerful internal and external libraries. We can use Python for real-world geosciences and engineering tasks. 
- Python is a philosophy for writing code. The principles of this philosophy are emboddied in the [Zen of Python](https://peps.python.org/pep-0020/), and they can be listed by typing:

In [None]:
import this

## Notebooks

Notebooks like this one consist of units called `cells`. Cells can contain markdown text like this one, or Python code like the cell above (notice that code cells are numbered). To run a cell you can click the `Run` button. These are some useful shortcuts:

- To run a cell and stay in the same cell, type `Ctrl+Enter` 
- To run a cell and move to the next cell, type `Shift+Enter`
- To run a cell and insert a new cell below, type `Alt+Enter`
- To run all the cells of a notebook, choose the `Cell -> Run all` menu
- Use the up and down arrow keys to move quickly between cells

## Math operations

Python can be used as a simple calculator:

In [None]:
2345 + 137

In [None]:
27 / 4

In [None]:
7 ** 3

In [None]:
27 // 4

In [None]:
27 % 4

However, this only works for the basic arithmetic operations: addition (`+`), subtraction (`-`), multiplication (`*` ), division (`/`), exponentiation (`**`), floor division (`//`), and modulo division (`%`). Notice that division (`/`) returns a float value, even if the two numbers are integers. Floor (or integer) division (`//`) returns the quotient of the division rounded down to the nearest whole number, while modulo division (`%`) returns the remainder of the division (this is more complicated for negative numbers, see this [note](https://blog.teclado.com/pythons-modulo-operator-and-floor-division/)).

For more advanced operations, we need to import the `math` module:

In [None]:
import math

A module is a collection of code items such as functions. Individual modules are often in a group referred to as a library, for example the `math` module is part of the Python standard library.

Now that we have imported the `math` module, we can perform more complex operations using its functions. For example, let's calculate the sine of 30 degrees.

In [None]:
# sine of 30 degrees, option 1
math.sin(30*math.pi/180)

In [None]:
# sine of 30 degrees, option 2
math.sin(math.radians(30))

There are two things to notice:

- The first line in the two cells above is a comment, a line that is ignored by Python but we can include to document what we are doing. Comments begin with `#` and continue to the end of the line.

- Computers understand radians, not degrees. Therefore, we need to convert the angle from degrees to radians either by multiplying it by $\pi$/180 (we use the function `math.pi` to get $\pi$), or by using the `math.radians` function, before taking the sine of the angle using the `math.sin` function. By the way, the result of this operation is 0.5, but as you can see there are minor precision errors.

The `math` module has a long list of mathematical functions. You can find the complete list [here](https://docs.python.org/3/library/math.html). Another way to quickly find out the functions in the math module is to type `math.` followed by the `Tab` key, this will present you with the list of functions in the module. Try below.

In [None]:
# Type "math." followed by the Tab key
    

`Tab` completion is a nice feature of the IPython shell (the shell used by Jupyter notebooks) and it works on any module, object, etc.

## Variables

A variable can be used to store values calculated in expressions and used for other calculations. In Python there are several variable types. These are summarized in the figure below:<br><br>

<img src="../figures/varTypes.png" alt="varTypes" width="600"/><br><br>

### Numbers

Python supports integers, floats, and complex numbers. Integers and floats differ by the presence or absence of decimals. Complex numbers have a real part and an imaginary part:

In [None]:
my_integer = 32 # integer
my_float = 23.00 # float
my_complex = 5 + 3j # complex number: 5 is the real and 3 is the imaginary part
print(my_integer, my_float, my_complex) # print numbers
print(my_complex.real, my_complex.imag) # print real and imaginary parts of the complex number

In the cell above, we use the function `print` to print the numbers. The real and imaginary parts of `my_complex` can be accessed using its `real` and `imag` methods, respectively.

By default `print` uses a single space to separate the results. However, it is possible to change the default separator:

In [None]:
print(my_integer, my_float, my_complex, sep = " -- ") # print numbers with a different separator

It is possible to add a new line between the outputs of two `print` statements, or have them in the same line:

In [None]:
print(my_integer, my_float, my_complex, "\n") # print numbers, use "\n" to add a new line
print(my_complex.real, my_complex.imag, "\n") # print real and imaginary parts of the complex number

print(my_integer, my_float, my_complex, end ="\t") # print numbers, use end = to specify the end of the line as "\t"
print(my_complex.real, my_complex.imag) # print real and imaginary parts of the complex number

The characters `\n` and `\t` are called escape characters. They can be used to insert characters that are illegal in a string, such as a new line (`\n`) or a Tab (`\t`). [Here is a list of escape characters](https://www.w3schools.com/python/gloss_python_escape_characters.asp).

By default `print` ends with a new line. To change this, we can make `end` equal to another character, for example a Tab (`\t`). This outputs the two print statements on the same line, separated by a Tab.

We can use the function `type` to find out the type of a variable.

In [None]:
print(type(my_integer), type(my_float), type(my_complex)) # print variable type

So our numbers are actually *classes*. The concept of class comes from the object oriented philosophy of Python, which we will discuss in the next notebook. For the moment, just realize that the numbers above are class instances (or objects) to which we can send methods. To find more about the methods of an object, use `Tab` completion. Let's try with `my_float`:

In [None]:
# Type "my_float." followed by the Tab key


Or use a question mark (`?`) before or after the variable. This will display some general information about the object:

In [None]:
# Find more information about my_float
my_float?

Thi is known as object instrospection.

### Booleans

Booleans (or bools) can only have two values, `True` or `False`. As we will see later, bools are useful in conditional statements.

In [None]:
my_bool = True
print(my_bool)

### A word about naming variables

Variable names should be clear and concise, be written in English, not contain special characters, and not conflict with any Python keywords. The following code prints the Python keywords:

In [None]:
import keyword
print(keyword.kwlist)

In this course, we will use the `pothole_case_naming` convention to label variables. This convention uses lowercase words separated by underscores `_`. You can find more information about naming conventions in the [Style Guide for Python Code](https://peps.python.org/pep-0008/).

### Sequences

A sequence is an ordered collection of items. Examples of sequences are Strings, Lists, and Tuples. Strings are sequences of characters, Lists are ordered collections of data (which can be of different type), and Tuples are similar to Lists, but they cannot be modified after their creation. The elements of a sequence can be accessed using indexes: 
- The first index of a sequence is always 0 (zero). 
- Using negative numbers (e.g., -1), the indexing of the sequence starts from the last element and proceeds in reverse order. 
- Two numbers separated by a colon (e.g. [3:7]) define an index range, sampling the sequence from the lower to the upper indexes, but excluding the upper index:

In [None]:
# create some sequences
my_string = "slightly " + "hot" # strings can be concatenated using +
my_list = [251 ,"top Triassic",146, "top Cretaceous", 23, "top Neogene"] # list of integers and strings
# tuple of integers and strings, parentheses are optional but I recommend having them
my_tuple = (251 ,"top Triassic",146, "top Cretaceous", 23, "top Neogene")
print(my_string)
print(my_list)
print(my_tuple)

In [None]:
# we can use indexes to get specific elements in the sequence
print(my_string[3:8]) # print characters at indexes 3 to 7 in my_string
print(my_list[0]) # print first element in my_list
print(my_tuple[-1]) # print last element in my_tuple

In [None]:
# A step can also be used after a second colon to, say, take every other element of the sequence
print(my_list[::2])
# A neat trick:  by passing -1 as step, we can reverse the sequence
print(my_string[::-1])

In [None]:
# lists can be modified
my_list.append(0.0) # append element
my_list.insert(0,"top Carboniferous") # insert element at index 0
my_list[-3] = 23.3 # change third last element
my_list.reverse() # reverse the elements of the list
print(my_list) # print my list

In [None]:
# but tuples can't, this cell will return an error
my_tuple[-2] = 23.3 

In [None]:
# lists have some interesting arithmetic
zeros = [0]*5  # a list with 5 zeros
ones = [1]*5  # a list with 5 ones
my_list = zeros + ones
new_str = str(my_list)

print(zeros, ones, my_list)
print(type(new_str))

### Formatting strings

Strings have powerful formatting properties. 
- The first two lines of the cell below show the use of a string and the `format` method to pass variables to a string. 
- The next line shows the use of the `format` method and the `:,` formatting type to add a comma as a thousand separator. 
- The last two lines print the result of `my_integer / my_float` with one decimal or `:.1f` formatting type. The first line uses the `string.format` method, while the second line uses a Python `f-string`.

For more formatting types, [check here](https://www.w3schools.com/python/ref_string_format.asp).

In [None]:
formatter = "Entry 1 = {}, entry 2 = {}, entry 3 = {}" # This is a string waiting for three inputs
print(formatter.format("Samuel", "Jackson", 33)) # Passing the three inputs to the string

print("The universe is {:,} years old.".format(13800000000))

print("my_integer / my_float = {:.1f}".format(my_integer / my_float)) # my integer / my float rounded to one decimal
print(f"my_integer / my_float = {my_integer / my_float:.1f}") # my integer / my float rounded to one decimal

### A word about type-checking

In Python, you can combine different variable types in expressions. However, Python does not type-check the code before running it. So, combining variable types may not work in some cases. You should be careful:

In [None]:
# let's assume that my_float is the temperature in Celsius
# convert this temperature to Fahrenheit degrees
temp_fahrenheit = 9 / 5 * my_float + my_integer # here we are combining integers and floats, it works

# now let's print the result, this also works
print(my_bool, ",", temp_fahrenheit, "Fahrenheit degrees is", my_string) 

The code in the cell above works. Why? 

The code in the cell below throws an error, however. Why? Can you fix it? *Hint*: You can use the function `str` to convert any variable type to a string.

In [None]:
new_string = my_bool + ", " + temp_fahrenheit + " Fahrenheit degrees is " + my_string  
print(new_string)

Out of curiosity, change the `+` symbols in the cell above to `,`. Run the cell again, what happens?

### Dictionaries

Dictionaries consist of a collection of key-value pairs. A dictionary is defined by enclosing a comma-separated list of key-value pairs in curly braces, with a colon separating each key from the associated value. In a dictionary, the order of the key-value items does not matter, a value is retrieved by specifying the corresponding key in square brackets:

In [None]:
# Define dictionary of mineral hardness
mohs = {"talc":1, "gypsum":2, "calcite":3, "fluorite":4, "apatite":5, "orthoclase":6,
           "quartz":7, "topaz":8, "corundum":9, "diamond":10} # Mohs hardness scale

# adding key-value pairs to dictionary
mohs["olivine"] = 6.5 
mohs["salt"] = 2.5
mohs["pyrite"] = 6.25

print(mohs.keys(), "\n") # print dictionary keys
print(mohs.values(), "\n") # print dictionary values
print(mohs.items(), "\n") # print dictionary items
print("Hardness of salt =", mohs["salt"]) # print hardness of salt

We can also delete items from a dictionary:

In [None]:
del mohs["pyrite"] # delete pyrite from dictionary
print(mohs.items()) # print dictionary items

## Arrays

Like lists, arrays are collections of items, but of the same type (e.g., all numbers or all strings). To use arrays in Python, you need to import the `numpy` library:

In [None]:
import numpy as np # import numpy library under the alias np

However, contrary to lists, arrays need to be declared:

In [None]:
my_array = np.array([-10, 30, 60, 90, 120, 150, -100]) # create array
print(my_array)

Let's modify this array:

In [None]:
my_array[0] = 0 # modify first element
my_array[-1] = 180 # modify last element
print(my_array)

In [None]:
my_array = np.append(my_array,[240, 270, 300, 330, 360]) # append elements
print(my_array)

In [None]:
my_array = np.insert(my_array,7,210) # insert element at index 7
print(my_array) 

In [None]:
my_array = np.delete(my_array,-1) # delete last element
print(my_array)

Notice that `my_array` does not have append, insert or delete methods. Instead, we need to use the `numpy` `append`, `insert`, or `delete` methods as shown above. This creates a new array.

Now, let's use index ranges to print some elements of the array:

In [None]:
print(my_array[4:8]) # print elements with indexes 4 to 7

Index ranges are quite powerful. Suppose we want to calculate the differences between successive elements of `my_array`. We can do this in one line of code as follows:

In [None]:
# differences between succesive elements of my_array
diffs = my_array[1:] - my_array[:-1]
print(diffs)

`my_array[1:]` contains the second to the last element of `my_array`, while `my_array[:-1]` contains the first to the penultimate element of `my_array`. Subtracting these two arrays gives us the differences between the elements of `my_array`.

We can use the `numpy.size` function to retrieve the number of elements in the array:

In [None]:
print("number of elements in array =", my_array.size) # print the number of elements in array

and the `numpy.dtype` function to find out the type of elements in the array:

In [None]:
my_array.dtype

### 2D arrays

A 2D array is an array of 1D arrays. It can be constructed as follows:

In [None]:
# create a 3 x 4 array
my_2d_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(my_2d_array)

To access an element of the array, we use two indexes within brackets. The first index refers to the row, and the second index to the column of the array. This is illustrated with a library cabinet below for the box at row index 2 and column index 2:<br><br>

<img src="../figures/2dArray.png" alt="varTypes" width="600"/><br><br>

In [None]:
# print element in third row and second column of my_2d_array
print(my_2d_array[2,1])

Index ranges allow us to quickly access several elements of the array. This is referred to as *slicing* the array. This figure shows how to access the row with index 3 in my cabinet:

<img src="../figures/slicingRow.png" alt="varTypes" width="600"/><br><br>

In [None]:
# print the first row of my_2d_array
print(my_2d_array[0,:]) # print(my_2d_array[0]) does the same

And this figure shows how to select the column with index 3 in my cabinet:

<img src="../figures/slicingColumn.png" alt="varTypes" width="600"/><br><br>

In [None]:
# print the first column of my_2d_array
print(my_2d_array[:,0]) # : means all rows in first column

In [None]:
# print the first two rows of my_2d_array
print(my_2d_array[:2,:]) # print(my_2d_array[:2]) does the same

In [None]:
# print the last two columns of my_2d_array
print(my_2d_array[:,2:4]) # means all rows in the last two columns

We can use the `numpy.shape` function to obtain the number of rows and columns in the array. This returns a tuple whose first element is the number of rows, and second element is the number of columns:

In [None]:
# print number of rows in my_2d_array
print("number of rows in array =", my_2d_array.shape[0]) # shape[0] is number of rows

# print number of columns in my_2d_array
print("number of columns in array =", my_2d_array.shape[1]) # shape [1] is number of columns

### 3D arrays

3D arrays work the same way, they are arrays of 2D arrays:

In [None]:
my_3d_array = np.arange(24).reshape(2,3,4) # constructing a 2 x 3 x 4 array
print(my_3d_array)

Here we use the `numpy.arange` function to generate 24 elements from 0 to 23, and the `numpy.reshape` function to reshape these elements into a 2 x 3 x 4  array. Let's run some of the operations above for this 3D array:

In [None]:
print(my_3d_array[0], "\n") # print the first 2D array in the 3D array
print(my_3d_array[1], "\n") # print the second 2D array in the 3D array
print(my_3d_array[0,1], "\n") # print the second row of first 2D array 
print(my_3d_array[1,:,2], "\n") # print the third column of second 2D array 
print("shape of array", my_3d_array.shape) # print the shape of the 3D array

Finally, the `numpy.ndim` function tells us the dimensions of the array:

In [None]:
print(my_array.ndim)
print(my_2d_array.ndim)
print(my_3d_array.ndim)

### Arrays versus lists

There are two main reasons to use arrays as opposed to lists:

- Arrays are more efficient for storing large amounts of data than lists. 
- Arrays are great for numerical operations; lists cannot handle directly math operations.

To make the last point clear, let's look at the following example:

In [None]:
my_sines = np.sin(np.radians(my_array)) # compute the sine of the elements in my_array
print(np.around(my_sines,2)) # np.around prints the array with just two decimal places

Here, we use the `numpy.radians` function to convert the elements of the array from degrees to radians, and the `numpy.sin` function to compute the sine of the elements in the array. All in one line of code! In the `print` statement, we use the `numpy.around` function to print the array elements with just two decimal places. Operating at once over all elements of an array makes your code faster. By the way, this is very similar to how arrays in programs such as [Matlab](https://www.mathworks.com) or [Octave](https://www.gnu.org/software/octave/) work.

So, when should we use a list or an array?
- If you are storing a relatively short sequence of items and don't plan to do any mathematical operations with it, use a list.
- If you have a very long list of items, and you plan to do numerical operations with them, use an array.

More about array operations in the next section.

## Array operations

There are two main groups of operations that involve `numpy` arrays:

- Element-wise operations
- Linear algebra operations

### Element-wise operations

These are simple element-wise operations that involve an array and a scalar, or two arrays of the same dimension. For example:

In [None]:
# create a 3 x 3 array
array_a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array_a)

In [None]:
print(array_a + 2) # array plus scalar

In [None]:
print(array_a - 2) # array minus scalar

In [None]:
print(array_a * 2) # array times scalar

In [None]:
print(array_a / 2) # array divided by scalar

In [None]:
print(array_a ** 2) # array elevated to the scalar

In [None]:
# Create another 3 x 3 array
array_b = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])

print(array_a + array_b) # element-wise sum

In [None]:
print(array_a - array_b) # element-wise difference

In [None]:
print(array_a * array_b) # element-wise multiplication

In [None]:
print(array_a / array_b) # element-wise division

In [None]:
print(array_a ** array_b) # element-wise exponentiation

### Linear algebra operations

Linear algebra is very important in geosciences and engineering (check our [online resource in computational geosciences](https://github.com/nfcd/compGeo)). Let's look at some examples:

In [None]:
# create two vectors (1 x 3 arrays)
vector_u = np.array([1, 2, 3])
vector_v = np.array([4, 5, 6])
# compute the magnitude of the vector u
length_u = np.linalg.norm(vector_u)
print("{:.3f}".format(length_u)) # print just 3 decimal places

In [None]:
# make the vector a unit vector by dividing it by its magnitude
vector_uu = vector_u / length_u
print(np.linalg.norm(vector_uu)) # this should print 1.0 

In [None]:
# compute the dot product of the vectors, this gives a scalar
print(np.dot(vector_u, vector_v))

In [None]:
# compute the cross product of the vectors, this gives another vector
print(np.cross(vector_u, vector_v))

In [None]:
# create two conformable matrices
# columns in matrix a = rows in matrix b
matrix_a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3 matrix
matrix_b = np.array([[7, 8], [9, 10], [11, 12]]) # 3 x 2 matrix
print("matrix a has", matrix_a.shape[0], "rows and", matrix_a.shape[1], "columns")
print("matrix b has", matrix_b.shape[0], "rows and", matrix_b.shape[1], "columns")

In [None]:
# multiply the matrices, this gives a 2 x 2 matrix
print(np.dot(matrix_a, matrix_b))

In [None]:
# create a square (rows = columns) 3 x 3 matrix
matrix_c = np.array([[1, 7, 9], [3, 5, 8], [4, 2, 6]])

# compute the determinant of the matrix
print(np.linalg.det(matrix_c)) 

In [None]:
# compute the inverse of the matrix
matrix_ci = np.linalg.inv(matrix_c)
print(matrix_ci) 

In [None]:
# the matrix times its inverse is equal to the identity matrix
# a matrix with 1s along the diagonal, and 0s outside
print(np.dot(matrix_c, matrix_ci))

Here we use the `numpy` linear algebra functions (some of them in the `linalg` module) to perform these operations.

`numpy` includes many more [operations](https://numpy.org/doc/stable/reference/routines.html). It is not surprising that data analytics and data science rely so heavily on `numpy` arrays.

To practice, try the exercises in [lab1_1](../lab/lab1_1.ipynb).