# Tutorial 1: Basic python syntax

This tutorial serves as a basic guide to python syntax, as well as some tips on navigating the JupyterLab console. Those familiar with the language can skip to Tutorial 2. 

**Note**: this notebook is pretty minimalistic and assumes basic knowledge of programming terminology (*e.g.*, string, function, loop). Those with zero coding experience may prefer work through a more detailed beginner's tutorial, such as https://www.learnpython.org/ 

**Note 2**: keyboard shortcuts and other console info is based off JupyterLab Desktop. I can't guarantee cross-compatibility with other platforms.

---

## 1.1 Notebook formatting

Notebooks will have markdown cells, code cells, and output cells.

.......................................................................................................... \
This is a markdown cell. Double click on it to edit. \
Formatting guide for your interest: https://www.markdownguide.org/cheat-sheet/ \
Press ctrl+enter to save changes. \
..........................................................................................................

In [None]:
#this is a code cell (comments are proceeded by a '#')
#click inside it to edit

print('Hello World')

#ctrl+enter to run the code, or shift+enter to run and move to the next block

\
As you can see, running a code block produces an output cell below it. 

To make a new cell, click the '+' button on the top bar. The 'play' button also runs the current cell, while the 'fast forward' button runs the whole notebook from the beginning. You can also change a cell between markdown and code in the drop-down menu.

---

## 1.2 Types of data

Here are some types of data you can assign to variables:

In [None]:
#variables are assigned using the '=' operator. 
#variables cannot start with a number or have any spaces.

example_string = 'string' 
example_integer = 1
example_float = 3.14159
example_boolean = True

#to output the value of a variable:
example_float

\
Another common data type is a **list**. It is a sequence of elements, surrounded by square brackets, each separated by a comma:

In [None]:
empty_list = []
string_list = ['one', 'two', 'three']
integer_list = [4, 5, 6]
mixed_list = ['abc', 100, 3.65, False]

#to retrieve a specific value from a list by index (position):
mixed_list[2]

Note that python indexing **starts at 0**, meaning that the `[2]` in the above example actually refers to the *third* element in `mixed_list`, not the second.

---

## 1.3 Functions

Basic math operators are pretty intuitive:

In [None]:
x = 1
y = 2
z = 3

new_value = (x+z)/y*5

new_value

\
By default, a cell will only output the last command given. To show more than one output, we can call the `print` function:

In [None]:
#functions take the format function_name(variables)

print(x)
print(y)
print(z)
print(new_value)

\
We can also define our own functions.

Look at the example syntax below. The function `my_function` is followed by the parameters `(a,b)` in **round brackets**. Multiple parameters are separated by a **comma**. A **colon** follows the brackets.

In [None]:
def my_function(a,b):
    print(a+b)

my_function('string1', 'string2')

Note that python syntax is **indentation sensitive**. In the above example, everything nested within `my_function` (the `print` statements) is indented one level. 
Most of the time, the console is smart enough to automatically indent as you write, but make sure to pay attention when formatting your code.

\
The `return` statement helps when you want to combine functions:

In [None]:
def func_1(a,b):
    a = a+5
    b = b+5
    return a+b #returns the value of the function without printing it

def func_2(c):
    return c*2

#putting brackets/code blocks on separate lines can help you keep track of nested functions
#notice there are 3 levels of indentation, one for each function

print(
    func_1(
        1,(func_2(2))
    )
) 

\
Some functions (also known as **methods**) work on specific data types (also known as **objects**). They have slightly different syntax:\
`object.function(parameters)`

In [None]:
#example of list methods

names = ['Alice', 'Bob', 'Charlie']

#returns the index (position) of the first element with the specified value
location_Bob = names.index('Bob')

print(location_Bob)

In [None]:
#examples of string methods

seq = 'atcggatccgt'

#in this case, 'upper' is the function, and seq is the data:
uppercase_seq = seq.upper()

#another example, but this function requires an additional parameter specified in the brackets:
count_a_seq = seq.count('a')

#some functions require multiple parameters, separated by a comma:
rna_seq = seq.replace('t','u')

print(uppercase_seq)
print(count_a_seq)
print(rna_seq)

Notice that in the above examples, none of the functions overwrite the original `seq` string. To do that, you would have to re-define the output to the `seq` variable, or define a new variable. This can be helpful when combining functions, since python works iteratively:

In [None]:
seq = 'atcggatccgt'

def comb_function(s):
    s = s.upper()
    s = s.replace('T', 'U')
    print(s)
    
comb_function(seq)

FYI, `s = s.upper().replace('T','U')` works as well.

\
You don't need to worry about learning all the built-in functions and syntax. If an exercise or assignment question requires the use of specific functions that are not covered in these tutorials, I will provide a link to the documentation. However, you will need to figure out how to apply it yourself. 

---

### Exercise #1

I was retrieving data on an organism of interest, and the taxonomy information was given as one long string:

In [None]:
my_organism = 'd__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Metallibacterium;s__Metallibacterium scheffleri' 

In this case, individual taxonomy classifications (domain, phylum, class, order, family, genus, species) are preceeded by two underscores, and separated by a semicolon.

As you can see, this data can be hard to work with on its own without doing some additional formatting.

**Your task:**

Read the documentation for `str.replace()` and `str.split()` \
https://www.w3schools.com/python/ref_string_replace.asp \
https://www.w3schools.com/python/ref_string_split.asp 

Write a function that retrieves only the **species** name of a given organism, without any preceeding characters (i.e. when given `my_organism`, it would return `Metallibacterium scheffleri`)

In [None]:
### YOUR CODE HERE ###

---

## 1.4 Conditionals

Conditionals, aka 'if...else' statements, are implemented like so:

In [None]:
a = 200
b = 33

if b > a: 
    print("b is greater than a") #execute if condition is met, otherwise proceed to next line
elif a == b: 
    print("a and b are equal") #'else if', will execute this if another condition is met
else: 
    print("a is greater than b") #execute if no prior conditions were met

\
We can turn the above example into a function:

In [None]:
def compare_numbers(a,b):
    if b > a: 
        print("b is greater than a")
    elif a == b: 
        print("a and b are equal")
    else: 
        print("a is greater than b")

compare_numbers(10,10)

Notice the change in indentation levels. This is one case where automatic indentation will not work - you need to make sure the conditional statements (`if`, `elif`, `else`) are all on the same level.

---

### Exercise #2:

Read the documentation for string slicing and list appending: \
https://www.w3schools.com/python/python_strings_slicing.asp \
https://www.w3schools.com/python/ref_list_append.asp

Without using `str.replace()`, write a function that takes two arguments (organism name, taxonomy level), and returns the specified taxonomy information for that organism, without any preceeding characters. 

For example, if given `(my_organism, 'class')`, it would return `Gammaproteobacteria`.

In [None]:
### YOUR CODE HERE ###

For an extra challenge, you can try to implement Exercise 2 *without* any `if` statements.\
Hint: recall the `list.index()` function.

---

## 1.6 Extras

As this is not a programming course, **none of the following is required** info for the assignment. I have provided them for your interest only.

### Loops

An example of using a `for` loop to iterate over a list:

```python
fruits = ["apple", "banana", "orange"]

for x in fruits:
    x = x.capitalize()
    print(x)
```

Returns `Apple Banana Orange`.

It's good to know how they work but subsequent tutorials will go over built-in functions that accomplish the same thing. 



### List comprehension

Python has a special type of syntax for iterating through lists that is more concise than using a `for` loop. It looks like this:
```python
numbers = [1,2,3,4,5]
squares = [x**2 for x in numbers] 
```

The above should return `[1,4,9,16,25]` for `squares`.

This can be combined with conditional statements:

```python
squares_above_2 = [x**2 for x in numbers if x>2] 
```
Returns `[1,2,9,16,25]` for `squares_above_2`.


### Other data types

**Sets** are a special type of list with only **unique** values. To turn a list into a set, use `set([list])`.

**Dicts** (short for dictionary) is a set where each item contains a **paired** set of values. They act like 'lookup tables.' A dict representing the taxonomy info from this tutorial would look like this:

```python
my_organism = {
  "domain": "Bacteria",
  "phylum": "Proteobacteria",
  "class": "Gammaproteobacteria",
  "order": "Xanthomonadales",
  "family": "Rhodanobacteraceae",
  "genus": "Metallibacterium" ,
  "species": "Metallibacterium scheffleri"
}
```

The first part is known as the key and the second part is known as the value.\
`dict.keys()` returns a list of keys and `dict.values()` returns a list of values.\
`dict[key]` returns `dict[value]`.

You can also construct a dict from two lists using `dict(zip([keys], [values]))`

