# Introduction to Python (Part 1)

This notebook provides the first part of a general introduction to Python functionality most relevant to this course.  In future notebooks, we'll look specifically at using and apply other packages and libraries.

Python is a powerful, object-oriented, open source, and widely used programming language that is one of the most commonly used in data science and increasingly across academic disciplines and the private sector.  However, if you are accustomed to other languages like MATLAB or R, or are a novice programmer, it can take awhile to get used to the Pythonic way of doing things.  This notebook seeks to provide you with a basic introduction to Python with a focus on the syntax and the type of mathematics we'll do in this class.  Other notebooks in this directory describe reading in and writing data and plotting and visualizing data and analyses. 

Python is an **interpreted** language (like MATLAB or R), which means you don't need to compile code for it to run.  This allows Python code to be relatively easy to debug, largely portable (e.g. cross-platform), easily readable, and tractable.  The primary trade off is that it will be slower than a compiled language like FORTRAN.  It is also very extensible - a large community has written and curated packages of code that allow you to do a lot with relatively little coding. 

Python also is easily useable in Juypter notebooks (like you're using now).  This makes it good for both teaching and sharing your code (and results and figures) with others.  We'll talk more about how to use Juypter notebooks in the introductory part of the class. 

In notebooks and Python code in general, you can comment out lines of code so they aren't read when the road runs. This can be useful for documenting what a line of code does or for troubleshooting or debugging.  To comment our a single line or a part of a line you can use `#`.  

## Mathematics

At its most basic, you can use Python as a simple calculator, with operations you'll easily recognize:

In [1]:
3 + 4 # this is a commment. When you run this code cell, you will see the answer displayed below the cell

7

Note that in this case, the result of the operation prints to the notebook as output below the code cell above. 



In [2]:
4/3 # when you run this code cell, you will see the answer displayed below the cell

1.3333333333333333

Order of operations in Python follows the mnemonic we all learned (remember?) back in grade school, PEMDAS: Parentheses, Exponents, Multiplication and Division, Addition and Subtraction.  Note that in Python, the exponential operator is `**` (this differs from other languages).

**Your turn!** Spend some time in the space below adding parentheses or changing operators to see how this order of operations works (and, of course, changes the result):

In [3]:
(3 - 4) ** (3* 4)

1

## Variables

Like other programming languages, you can also use **variables** to stand in for numbers and then perform operations on those:

In [4]:
a = 3
b = 4

a+b  # when you run this code cell, you will see the answer displayed below the cell

7

In [5]:
b/a

1.3333333333333333

In [6]:
b * a

12

You can also do comparisons with variables (more on this later in the notebook):

In [7]:
a < b # asks if a is less than b

True

In [8]:
a == b # asks if a and b are equal

False

You can also assign the answer to an equation with variable (or numbers) to another variable.  Here we also use the `print` function to have the result displayed after the code block:

In [9]:
c = b * a
print(c) # when you assign the answer to an equation to a variable, the answer doesn't automatically print to the notebook

12


Python also allows you to assign multiple values at once.  For example:

In [10]:
x, y, z = 1, 2, 3
print(x)
print(y)
print(z)

1
2
3


You can also return multiple values on a single line in a similar way:

In [11]:
x, y # will return (1, 2) as output from this cell

(1, 2)

## Data Types

Data types in Python are important to understand, as each has a set of properties that determine how they can be used and how they interact.  The basic data types are:

* `int`: Integers, or whole numbers
* `float`: Floating point numbers, or real numbers (those which decimals)
* `str`: Strings, or text data
* `list`: A sequence of values with specific properties 
* `tuple`: Another sequence of values with specific properties different from a `list` 
* `dict`: Dictionaries, which map a given `key` to a given `value`
* `bool`: Boolean values - essentially, True or False 

So far in this notebook we've looked at the first two in the list, which so far we've simply viewed as individual numbers, or what we might call **Numeric data types**, and some basic ways you can operatre on them.  Numeric data type includes integers (`int`) as well as floats (`float`, or floating precision numbers e.g. numbers with decimals).  They could also be complex numbers, although for our purposes it seldom will occur in this class.  

You can see the type of a variable using the `type` function:

In [12]:
this_number = 1.333
type(this_number) # returns 'float' because the number is a floating point precision number (number with decimals)


float

In [13]:
that_integer = 7
type(that_integer) # returns 'int' because 7 is an integer

int

Note that how you designate a variable can affect its type - so for instance including or not a decimal place can determine whether a number is an integer or a floating point precision number:

In [15]:
the_float = 7.0
the_integer = 7

type(the_float), type(that_integer)

(float, int)

Let's now spend some time looking at other data types we'll use.

### Strings

We can also have strings, or a sequences of characters.  In Python you can assign strings to a variable using either single quote marks or double:

In [16]:
d = "a string using a double quote"
f= 'a string using a single quote'
print(d)
print(f)

a string using a double quote
a string using a single quote


As you can see above, using either single and double quotes works in Python when we create strings.  Double quotes however may be useful if you anticipate that your string might have characters including single quote marks or slashes. For example:

In [17]:
my_string = "We're looking for the person\persons responsible"
print(my_string)

We're looking for the person\persons responsible


You can concatenate strings in a number of different ways:

In [18]:
print(d,"concatenated with",f) # concatenate with commas
print(d+" concatenated with "+f) # concatenate with plus signs and white space
print(d,'concatenated with',f) # concatenate with commas and a single quote
print(d+' concatenated with '+f)  # concatenate with plus signs and single quotes and white space

a string using a double quote concatenated with a string using a single quote
a string using a double quote concatenated with a string using a single quote
a string using a double quote concatenated with a string using a single quote
a string using a double quote concatenated with a string using a single quote


You can't concatenate strings and numbers directly.  But, you could convert a number to string using `str()`:

In [19]:
my_age = 48
"You are " + str(my_age)

'You are 48'

Another way to do somethng similiar is with **f-strings**.  f-strings allow you to do a string interpolation:

## 

In [20]:
f"You are {my_age}! That's really old!"


"You are 48! That's really old!"

This approach provides great flexibility and simpler formatting when creating longer strings:

In [21]:
your_age = 29
f"You are {my_age}! I am only {your_age}!"

'You are 48! I am only 29!'

f-strings represent an improvement on the way Python used to handle string concatentation.  You can see a useful tutorial here: [https://realpython.com/python-f-strings/](https://realpython.com/python-f-strings/)

Because strings are a particular data type in Python, there are set of operations that you can perform on them that wouldn't make sense, for instance, if applied to an integer.  For example, we can ask Python to show our string all upper case:

In [22]:
d.upper()

'A STRING USING A DOUBLE QUOTE'

In [23]:
d.swapcase() # swap the current case for the other case

'A STRING USING A DOUBLE QUOTE'

### An aside on functions, methods, and objects

You'll notice that we've called functions in two different ways this far.  For instance, we called `type(d)` to return the data type of variable `d`.  In the code cells above, we operated on our string variable `d` by using the `d.upper()` and `d.swapcase()` syntax.  What is going on? 

The main difference in these two ways of calling functions to operate on a variable is that for certain data types there are functions which are specific to that data type, so these functions are attached to the variable directly. For example, the `upper()` function would not make sense for any other data type other than strings, so Python makes it so this function is attached to string data types specifically and directly. This is in contrast to something like the `type` function which can take *any* data type as the input, so it is left as a generic function.  In general, when we call something in `variable.function()` form, we will call this a function call just like when we do `function(variable)` although you will also see the `variable.function()` form called a method.  We'll see more of this as we go along. 

### A first demonstration of slicing using strings

Slicing and indexing (when dealing with lists or arrays) are important tools that simplify selecting subsets of data.  What makes this confusing in Python, however is the use of zero-based indexing. Instead of starting to count at 1, Python counts starting at zero.  So the first position in a string of characters or a list of numbers is in position 0 (the 0th or zero-th position).  

Yes, this is really counterintuitive and I personally still struggle with it (especially coming from MATLAB and FORTRAN which start numbering with 1, like normal people do).  The reason for zero indexing is largely [mathematically motivated](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html), where describing e.g. ranges of values is more naturally thought about with respect to zero, whereas index schemes that start at 1 are more typical of counting things like, you know, actual data.  Unfortunately, most of us do tend to think of numbers as primarily for counting ('I have 2 eggs', 'Take the 3rd right turn').  See discussion and links here: https://craftofcoding.wordpress.com/2017/03/12/why-1-based-indexing-is-ok/

This is definitely a challenge to using Python - we will encounter a number of non-intuitive consequences of this aspect of Python during the class. 

The only intuitive way of thinking about this I can offer you at this point is this:  Imagine Python indexing as describing the solution to the following description: 'Once I am in a starting position, how far do I have to go to get to the result I'm looking for?'.  You could also think of how we talk about birthdays - e.g. your 1st birthday is actually the anniversary of your birth day (your which was your 0th birthday).

Let's take a first look at how this works using the string below:


In [None]:
my_string = 'Glycerol dibiphytanyl glycerol tetraethers'


Slicing and indexing are accomplished using square brackets - [] - followed by a number.  For instance, if we wanted the first character is the string above, we would slice out the 0th position, like so:

In [None]:
print(my_string[0])

The language here becomes messy, as you can see (the 1st character is actually the 0th character). 

You can also get a group or **range** of characters, but using [firstCharacter:lastCharacter] notation:

In [25]:
print(my_string[0:1])

W


In [30]:
import numpy as np

In [34]:
my_string[0:1]

'W'

Uh, wait.  Why didn't we get the 0th and the 1st characters, 'Gl'?  This is another confusing thing about Python indexing.  Python indexing ranges represent what are called half-open intervals.  What we might have expected, 'inclusive' or closed intervals, are more intuitive, but here again we have to be aware of this peculiarity of Python.  If we want both 'G' and 'l' from the above the string, we need to use [0:2] - one way to think about this is that you are starting at 0, which is essentially to the 'left' of the 'G', and taking 2 steps, the first past 'G' and the second past 'l', which gives you 'Gl' (I know, I know):

In [None]:
print(my_string[0:2])

Note that spaces are counted as characters:

In [None]:
print(my_string[0:21]) # gives us the first two words in the string

**Your turn!** Please us the string below and extract the characters to spell your first name.  Once you've extracted the letters, use string concatination as we shoed above to create a new string called 'my_name' with all the letters in your name.  You can use the code block below: 

In [40]:
all_letters = 'The quick brown fox jumps over the lazy dog'
name=(all_letters[22],all_letters[36],all_letters[38],all_letters[36])
print(name)

('m', 'a', 'y', 'a')


### Using Lists in Python

The **list** is another common data type, which can hold more than one value at a time and allow us to access (or index into) these.  Lists can hold numbers or strings.  Let's look at a simple example - note that we use **square brackets** to indicate the new list:

In [1]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
fruits # just typing the list name here will show the list in the notebook outbook

['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

In [2]:
print(fruits) # accomplishes the same thing as above

['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']


In [3]:
type(fruits) # this will tell us what type of data structure fruits is ...

list

Because Python is an object-oriented langauge, any [data type](https://docs.python.org/3/library/stdtypes.html#) or class (which in Python are now essentially the same thing) may have a set of methods that are associated with that data type.  Let's continue using our list called `fruits` to look at some examples of how this works.

For instance, the `list` class (which is what `fruits` is) has several methods you can use (see [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)), including one called `count`.  For instance, if we wanted to know how many times the string _apple_ appears in the list fruits, we could type the following command:

Because Python is an object-oriented langauge, any [data type](https://docs.python.org/3/library/stdtypes.html#) or class (which in Python are now essentially the same thing) may have a set of methods that are associated with that data type.  Let's continue using our list called `fruits` to look at some examples of how this works.

For instance, the `list` class (which is what `fruits` is) has several methods you can use (see [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)), including one called `count`.  For instance, if we wanted to know how many times the string _apple_ appears in the list fruits, we could type the following command:

In [4]:
fruits.count('apple') # will return the number of times 'apple' appears in the list 'fruits'

2

Python also allows us to locate the position of a value in a list.  Let's see if we can find the location of the first instance of 'banana' using the `.index` method:

In [5]:
fruits.index('banana') # returns the position of the first instance of banana in the list 'fruits'

3

Note that the index position returned is the first instance of 'banana' using the zero-based index scheme! (I know, I know)

Thus far we've used strings, but of course much of our data will be numbers.  Let's see how lists work for numbers.  Again, we use square brackets in the list declaration:

In [6]:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list[0] # The number 1 is in the 0th position in the list

1

We can return multiple index values on the same line, e.g.:

In [7]:
my_list[0], my_list[3]

(1, 4)

We can ask for values in other ways as well.  For instance, if we use an index of `-1` we will get the last item in the list:

In [8]:
my_list[-1] # in general, using the negative means to count backwards from the end of the sequence

10

You can of course get more than a single value out of a sequence.  In Python, the full syntax for slicing is `list_variable[start:stop:step]` where `start`, `stop`, and `step` are all (optional) integer values. There are some examples below.  Indexing is where the zero-based nature of Python gets really confusing.  It might help to visualize `my_list` and the zero-based indexing using the following figure:


![image.png](attachment:image.png)

Using the above figure as a guide, here are some various different approaches to indexing that might be useful and instructove in understanding Python's approach to this critical programming feature:

In [None]:
print(my_list[:])        # Get everything in the list
print(my_list[5:])       # Get the 5th element and everything after
print(my_list[:5])       # Get everything up to but NOT including the 5th element
print(my_list[3:5])      # Get the 3rd and 4th elements
print(my_list[0:None])   # Get everything by starting at zero and going to the end
print(my_list[0:None:2]) # Get every other item 
print(my_list[1::3])     # Get every 3rd item, starting after the 1st item

Note how the numbers in the slice specifiers can be optional - for instance `my_list[5:]` basically says 'start at position 5 and go to the end'.  `my_list[1::3]` essentially says 'start from position 1, go to the end, in steps of 3'. 

Once again, you'll notice some oddities above, like how `my_list[3:5]` only returns the values at index 3 and 4, but not 5.  Or, that `my_list[:5] doesn't include the value at index position 5.  This is the 'half open interval' again.  You'll encounter this throughout Python - once again, the end value of the sequence doesn't include that value!  It may then be easier to think about index numbers not as indicating a position in this case, but rather the 'cut points' or steps between locations, like so:

![image.png](attachment:image.png)

In this way of visualizing indexing, you could think of [0:1] as implying 'start at 0 and cut at 1', which would give you the value of 1:

In [9]:
print(my_list[0:1])    

[1]


Indexing is an incredibly important part of a programming language, and unfortunately Python's zero-based and half-open sequence nature can make this more difficult (and dangerous!) than it ought to be.  Especially as a beginner, whenever you encounter an indexing (or indexing-like) situation, make sure you are 100% sure what values Python is using and take the extra time to sanity check your code to make sure you understand what data are being accessed.  

### More on lists


Another somewhat non-intuitive and potentially 'dangerous' aspect of Python is that applying a method to a data object can _sometimes_ change that object.  Look at the following method applied to our earlier list of strings, 'fruits':

In [10]:
fruits.reverse() # this changes the order of 'fruits'
fruits

['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']

Note that the reversal of the order of the list above doesn't just _show_ you the reverse (e.g. it doesn't just print it to the screen or give you the opportunity to assign the reverse to a new variable, as MATLAB would) - it actually **applies** the reverse method to the existing list `fruits`` and so changes the list structure from that point onward!

Also observe the empty parentheses in the `fruits.reverse()` command.  Empty parentheses indicate that the method or function you are calling either takes no additional input arguments at all or it has some default arguments or behavior that are not necessary to specify.

Let's look at a similar method, `.sort` - you can call this method without any additional input:

In [11]:
fruits.sort()
fruits

['apple', 'apple', 'banana', 'banana', 'kiwi', 'orange', 'pear']

Now your list is sorted (alphabetically, from A to Z).  What if we wanted to sort a list in the opposite order, however, starting with Z and going to A?  It turns out there is a parameter you can call, `reverse=True` that will sort in ascending order:

In [12]:
fruits.sort(reverse=True)
fruits

['pear', 'orange', 'kiwi', 'banana', 'banana', 'apple', 'apple']

See how we passed the parameter option to the method - Python uses this form of 'option_name = value' to pass instructions to methods and to overrule the default behavior.  

Lists are **mutable** in Python, meaning that we can add, remove, and change their own elements. So, we can append new values to our list using `.append`:

In [13]:
fruits.append('pineapple')
fruits

['pear', 'orange', 'kiwi', 'banana', 'banana', 'apple', 'apple', 'pineapple']

We can also change elements of a list by referencing their position (using square brackets) and using the equal sign to set the value of that position to a new position:

In [14]:
fruits[1] = 'mango'
fruits

['pear', 'mango', 'kiwi', 'banana', 'banana', 'apple', 'apple', 'pineapple']

### Tuples

In addition to **lists**, there is another important Python data type called a **tuple** (**tuh**-pl).  The word comes from mathematics, where it refers to a (finite) sequence or ordered list of numbers.  A tuple is therefore similar to a list, but has different properties and methods. Generally speaking, a tuple is an **structured** sequence of values.  Whereas lists are a collection of individual elements of the same type, tuples can have multiple pieces of related but heterogenous information and/or the values are related to one another. 

A tuple can be created by placing all these related elements inside **parentheses** (), separated by commas:

In [15]:
us_president = ('Joe', 'Biden', '2021-01-20', 'Democratic')
type(us_president)

tuple

We can still access elements of a tuple in a familiar way (keeping in mind the oddity of zero-indexing):

In [16]:
print(us_president[2])
print(us_president[0],'',us_president[1])


2021-01-20
Joe  Biden


Unlike lists, tuples are **immutable**. This data structure doesn’t allow changing, adding, or removing individual elements, because the elements are all linked.  So, running `us_president[0] = 'Donald'` would result in an error.  

### Lists or tuples? 

So you're probably asking 'What is the difference?' and eventually you'll ask 'Should I be using a list or a tuple?'.

There some good resources on Stackoverflow:

List vs tuple, when to use each?
https://stackoverflow.com/questions/1708510/list-vs-tuple-when-to-use-each

What's the difference between lists and tuples?
https://stackoverflow.com/questions/626759/whats-the-difference-between-lists-and-tuples

From those posts, a few points:

* Tuples are heterogeneous (different types of) data structures, while lists are homogeneous (same type) sequences.
* 'Tuples have structure, lists have order'
* You can't add elements to a tuple. Tuples have no `append` method.
* You can't remove elements from a tuple. Tuples have no `remove` method.
* Tuples are smaller and faster
* Because they are immutable, tuples can act as keys in a dictionary (see below)

### Dictionaries in Python

**dictionaries** (often `dict`) are unordered collections of data stored as _**key-value pairs**_.  That is, you can refer to a bit of data by its key, and the keys therefore have to be unique within a single dictionary.  For instance, let's say we want to have a data structure where each person is associated with a number (like your university student number):

In [17]:
student_numbers = {'jane': 1234, 'bill': 5678}
type(student_numbers)
print(student_numbers)

{'jane': 1234, 'bill': 5678}


One of the nice things about a dictionary is we can retrieve values (index into them!) by the key they are associated with:

In [18]:
student_numbers['bill']

5678

You can also build dictionaries explicitly with the `dict` command and a collection of key-value pairs: 

In [19]:
dict([('jane', 1234), ('bill', 5678)])

{'jane': 1234, 'bill': 5678}

Or using simple equal signs to indicate the key-value pair:

In [20]:
dict(jane=1234,bill=5678)

{'jane': 1234, 'bill': 5678}

Because dictionaries are data types with specific methods that make sense, you can query the keys or values.  For instance:

In [21]:
student_numbers.keys() # returns the keys in the dictionary

dict_keys(['jane', 'bill'])

In [22]:
student_numbers.values() # returns the values in the dictionary

dict_values([1234, 5678])

We'll see that this way of thinking about dataset will be useful when we start to use other packages like `Pandas` and `xarray`.

### Booleans and Comparisons

Booleans ()`bool`) are binary variables that can take 2 values: `True` or `False` (note here the capitalization). One of the ways we'll use these is to check if certain conditions are met.  For example, if we are interested in equalities or inequalities between numeric data, the answer to a comparison between values will be a boolean.  We can also use the same approach to see if, of instance, a certain word is contained in a list of strings.  

Here is a simple example:

In [23]:
x = 3
y = 21
x < y # will return True because 3 is indeed less than 21

True

In [24]:
x == y # will return False because 3 is not equal to 21

False

Here are some of the comparisons you might find yourself making:

* `==`: equal to
* `!=`: not equal to
* `<`: less than
* `<=`: less than or equal to
* `>`: greater than
* `>=`: greater than or equal to
  
Python also can use logical operators:

* `and`: returns True if both or all statements are True
* `or`: returns True if any statements are True
* `not`: returns True only if the statement is False
   
Let's see how this works in practice:

In [25]:
x = 3
y = 21

print((x<y) and (x==y)) # clearly False

False


In [26]:
print((x<y) or (x==y)) # True because the first statement is True

True


We can also use similar operations on strings, for instance to find if a specific word exists in a string

In [27]:
my_string = 'The quick brown fox jumps over the lazy dog'
'dog' in my_string # returns True because the word dog is indeed in the string my_string

True

In [28]:
'cat' in my_string # returns False because 'cat' is not found in my_string

False