# Basic Python syntax
------
<br>

**Some important things to remember about Python syntax**

- Spaces matter!
  Python logic is denoted by tab-indentation. The spaces at the start of the
  line tell Python where an expression ends. You will come across this later,
  so don't worry about it for now!

- New lines matter!
  If you want to write an expression over multiple lines, there are specific
  ways to accomplish that!

<br>

> Press ctrl+enter to run a code cell and see what the output is

> Start a line with "#" to make a comment

<br>

## Variables

Variables are basically just "names" that you give things.
More accurately, they provide a reference to objects that have been created, so
that they can be referenced later. We use the "=" operator to assign variable
names to objects - easy as pie!

<br>

In [None]:
# Create a new string (text) variable and print it out
sometext = "Hello world"
print(sometext)

# Print a blank line
print()

<br>

## Functions

> You just used a function!

Python has a number of built-in functions - `print()` is used very frequently.
Whenever we want Python to tell us what's going on, you can ask it to `print()`
objects. Python will otherwise run very quietly until it hits an error!

When "calling" a function, you can often pass "arguments" that will be used by the function:

<br>

`func(argument1, argument2, argument3)`

<br>

- Not sure what arguments a Python function takes? Google it! Google is your friend when learning to code.
- For a quick reference you can also use the builtin `help()` on any Python object:

<br><br>

In [None]:
# Tell us about the the print() function
help(print)

<br>

As you can see, `print()` takes quite a few arguments, but most of the time a simple `print(value)` is enough to meet our needs.

<br>

A more complex call to `print()` might look like this:

<br>

In [None]:
# Print something more complex
print("A sentence", "to print over", "multiple lines", sep='\n')

A sentence
to print over
multiple lines


<br>

## A tiny challenge

> Create and `print()` some String (text) variables in the cell below:

<br><br><br>

# Data types - scalar
------

<br>

Python has a small variety of built-in data types to get you started - think
of them as the building blocks for data. You can use them to create structure - making it far easier to handle, store and manipulate your data in intelligent ways.

There are three super-simple scalar types you need to know:

<br>

- **String** (text):
  
  `x = "Some text to remember"`
- **Integer** (number):

  `x = 5`
- **Float** (number):

  `x = 5.458`

<br>

<br>

## Strings - text objects

<br>

> You've already created some Strings!

<br>

Strings are Python objects that represent some kind of text (i.e. a *String* of
characters). They can be defined in a number of ways.

<br>

In [None]:
# Let's play with some Strings

a = 'frog'       # Normally, single and double quotes are used to create Strings
b = "snake"
c = """turkey""" # You can use triple-quotes for more complex Strings

# These three methods return a String in exactly the same way - they are not
# different types of String, only different syntax for declaring them.
# It can be useful to have options when defining awkward, long Strings:

complex_string = """This String has multiple lines.
This is the second line.
This line has "quotes" that would 'normally' mess up our String syntax.
But triple quotes make this easy to write!
"""

# Python has some neat syntax for "concatenating" variables to Strings
# (Concatenate: stick them together)

# You can "add" Strings together:
print(b + " eats " + a)

# F-Strings are a super useful for "pasting" variables into a String:
print(f"{c}s eat {b}s and {a}s")


snake eats frog
turkeys eat snakes and frogs


In [None]:
# We already created a String that contains multiple lines:
print(complex_string)

# New lines can also be written as "regular" characters in Strings.
# They are defined as "\n"
lines = "Line 1\nLine 2\nLine3 "
print("A String with newlines:")
print(lines)


This String has multiple lines.
This is the second line.
This line has "quotes" that would 'normally' mess up our String syntax.
But triple quotes make this easy to write!

String with newlines:
Line 1
Line 2
Line3 


In [None]:
# We can use a builtin function to find the length of a String:
c_length = len(c)
print(f'The word "{c}" has a length of {c_length} characters!')

# We can access specific characters in a String:
# !!! Objects in Python are usually zero-indexed! (Numbering starts at 0)
print("The first letter of c:         ", c[0])
print("The second letter of c:        ", c[1])
print("The last letter of c:          ", c[-1])

# We can use "slices" to cut out specific parts of a String:
print("From the first letter of c:    ", c[1:])
print("The middle letters of c:       ", c[1:-1])


The word "turkey" has a length of 6 characters!
The first letter of c:          t
The second letter of c:         u
The last letter of c:           y
From the first letter of c:     urkey
The middle letters of c:        urke


<br>

## Integers and floats - numbers

<br>

> Pretty straight forward... they are just numbers!

<br>

In Python, you can define and operate on numbers in very familiar ways.

<br>

In [None]:
# Let's play with some numbers
x = 53          # An integer
y = 53.78       # A float (floating-point number, i.e. decimal)

# The usual operators apply:
i = x / y       # divide
j = x * y       # multiply
k = x ** y      # power

# We can use F-Strings to print these results nicely:
print(f"x / y  = {i}")
print(f"x * y  = {j}")
print(f"x ** y = {k}")


x / y  = 0.9854964670881369
x * y  = 2850.34
x ** y = 5.389596683869612e+92


In [None]:
# We can convert Strings to numbers and back, if Python allows:
x_string = "123"
x_int = int(x_string)
x_float = float(x_string)

y = 32.543234539454
print(f"y = {y}")

# You can round floats down to an integer:
y_int = int(y)
print(f"y_int = {y_int}")

# You can also round them to decimal places with the round() builtin,
# where the second argument is the number of decimal places:
decimal_places = 3
y_rounded = round(y, decimal_places)
print(f"y_rounded = {y_rounded}")

In [None]:
# Asking for the length of a number doesn't make sense. Python will complain
# when we do this!
len(5)

In [None]:
# If Python gives you an error, your code probably doesn't make sense!
print(5 + "apple")

<br><br><br>

# Data types - containers
------

<br>

To create some order to our data, we need to store our scalar data objects in some kind of container. Python has a few handy container types, but you can get almost everything done with just two:

<br>

- **Lists** - an ordered array of objects
- **Dictionaries** - a disordered, indexed array of objects

<br>

There is nothing scary about these - I promise!

<br>

## Lists - an ordered array

<br>

Lists allow us to store any number of objects in a linear list. We can then pull items out by their position (first, last, third, etc...).

<br>

In [None]:
# Create a new, empty list
ls = []

# Create a list with a mix of integers and Strings
a = [3, 4, 5, 'plane', 'gecko', 'taxi']
print("Our new list:", a)

# Recall items from the list
print("The first item in the list:         ", a[0])
print("The second item in the list:        ", a[1])
print("The third item in the list:         ", a[2])
print("The last item in the list:          ", a[-1])
print("The second-last item in the list:   ", a[-2])

# We can use slicing on lists, just like we did with Strings:
print("Three middle items in the list:       ", a[2:5])


Our new list: [3, 4, 5, 'plane', 'gecko', 'taxi']
The first item in the list:          3
The second item in the list:         4
The third item in the list:          5
The last item in the list:           taxi
The second-last item in the list:    gecko
Three middle items in the list:        [5, 'plane', 'gecko']


In [None]:
# Manipulating our lists

# You can concatenate lists by "adding" them together:
b = [57.546, 'sandwich']
c = a + b
print("a + b = ", c)
print()

# You can put anything you want in a list. Even other lists!
# We can use multi-line syntax to accomplish this neatly:
list_3d = [
    [a, b, c],
    ['a', 'new', 'list'],
]
print("A big, three-dimensional list:")
print(list_3d)
print()

# We can use "deep" indexing to pull list "c" back out and print it:
print("List 'c' pulled out of 'list_3d':")
print(list_3d[0][2])


a + b =  [3, 4, 5, 'plane', 'gecko', 'taxi', 57.546, 'sandwich']

A big, three-dimensional list:
[[[3, 4, 5, 'plane', 'gecko', 'taxi'], [57.546, 'sandwich'], [3, 4, 5, 'plane', 'gecko', 'taxi', 57.546, 'sandwich']], ['a', 'new', 'list']]

List 'c' pulled out of 'list_3d':
[3, 4, 5, 'plane', 'gecko', 'taxi', 57.546, 'sandwich']


<br>

## Dictionaries - an indexed array

<br>

> Dictionaries store data as `key: value` pairs

<br>

Dictionaries are similar to lists, but instead of being indexed by
numbers, dictionaries are indexed by an object - often a String.

This allows you to "look up" the data for that index, like looking up a word
in a dictionary! They provide a very organised way to store data.

Like lists, *any Python object* can be added to a dictionary.

<br>

In [None]:
# Create a new, empty dictionary
new = {}

# A simple dictionary to "look up" car speeds:
speed_kmh = {
    'astra': 135,
    'focus': 143,
    'civic': 167,
}

# A dictionary of lists, to "look up" car models:
models = {
    'holden': ['astra', 'colorado', 'commodore'],
    'ford': ['focus', 'ranger', 'falcon'],
}

# Now pull items from the dict by index:
print(f"Max speed of Civic:", speed_kmh['civic'])
print(f"Models available from ford:", models['ford'])
print()

# Add a new key to the dict:
models['honda'] = ['jazz', 'civic', 'accord']
print("Car models after adding Honda:")
print(models)


Max speed of Civic: 167
Models available from ford: ['focus', 'ranger', 'falcon']

Car models after adding Honda:
{'holden': ['astra', 'colorado', 'commodore'], 'ford': ['focus', 'ranger', 'falcon'], 'honda': ['jazz', 'civic', 'accord']}


<br>

## Structuring data

As an example of how Python data structures can be used, take a look at this notebook!

- Go the `file` in the menu bar
- Click `download` to download this notebook as a `.ipynb` file
- Open the file with notepad. You'll see that this entire IPython notebook is stored as a Python dictionary!

<br>

## A tiny challenge

<br>


> Define a dictionary so we can quickly access these data:

<br>

```
   Tissue       Weight
---------------------------
    gill         234
   muscle        457
   eyeball       112
    skin         89
   kidney        146
```

<br>

In [None]:
# Take a look at "speed_kmh" if you feel like cheating ^_^
weights = {
    # add dictionary items here
}

# Run the cell to test your dictionary works:
print(f"The eyeball weighs {weights['eyeball']} grams")

<br>

Now imagine that we have a bucket full of fish. How could you store the data above, for all of these fish?

<br><br><br>

# Methods
---

<br>

So we now know a few basic data types that we can use to define and
structure our data. It's time to start asking questions about our data.

Methods are specific functions that are "bound" to a specific object type. As such, each of the data types that we have learned (String, Int, List etc.) have their own methods. They allow us to perform useful, routine operations on these objects.

<br>

This section introduces a few handy methods that are native to Strings and
lists.

<br>

## String methods

In this block, we demonstrate eight useful methods that can be used on a String
object. You can call these methods on ANY String object. Think of them as a set
of questions or operations that you might apply to a piece of text.

In [None]:
# First, let's create a String to play with
x = "Hi, How are you today?"
print(f"The String x: '{x}'\n")

# Now, lets call some methods to answer questions about 'x'
print("Does x start with 'egg'?")
print(x.startswith('egg'))
print()

print("Does x start with 'Hi'?")
print(x.startswith('Hi'))
print()

print("Does x end with a question mark?")
print(x.endswith('?'))
print()

print("How does x look in lowercase?")
print(x.lower())
print()

print("How does x look in UPPERCASE?")
print(x.upper())
print()

print("Can we replace today with tomorrow?")
print(x.replace('today', 'tomorrow'))
print()

print("Can we break up a String and turn it into a list?")
print(x.split(" "))
print()

print("Okay, but can you do the opposite - join a list to make a String?")
ls = ['A', 'broken', 'up', 'sentence']
ls_joined = ' '.join(ls)
print(ls_joined)

<br>

## List methods

Now we will introduce four handy methods that can be used to manipulate lists.


In [None]:
# Let's create a simple list to play with
x = [1, 13, 37, 2, 3]

print("Can I add an item to the list?")
x.append(43)
print(x)
print()

print("Can I remove an item from the list?")
x.remove(13)
print(x)
print()

print("Can you reverse the order of the list?")
x.reverse()
print(x)
print()

print("Can you sort the list by values?")
x.sort()
print(x)

<br>

You may have notice something a bit different in the syntax here. These methods operate on the object *in place*. So in order to see the result, we have to call the method, and then print out the object:

```python
x.sort()
print(x)
```

String methods work differently. They return a new object when they are called, leaving the original object unchanged:


In [None]:
x = "some words"
x.upper()
print(x)  # See, x is unchanged!

y = x.upper()
print(y)  # a new String has been created

This points to an important characteristic of these types:

- Strings are immutable - they *can't* be modified after creation
- Lists are mutable - they *can* be modified at any time

So, when a method is called, it can either operate on the object in-place (as with lists), or it can return a new object, leaving the subject unchanged (as with strings).

This is fairly simple paradigm can cause some really silly bugs in your code if you don't understand what's happening!

## A tiny challenge

Below is a list of filenames - some data files that will be processed and output as `csv` files.

Can you fetch the third filename and **replace** the filename extension
with `.csv`?

In [None]:
filenames = [
    "data_1.txt",
    "data_2.txt",
    "data_3.txt",
    "data_4.txt",
    "data_5.txt",
]

# Fetch the third file name

# Replace the extension to create an output filename


<br><br><br>

# Logic
---

<br>

So, we can now create Python data structures and perform basic operations on them.

In order to start manipulating these data, we just need to sprinkle in some
Python logic! Expression of logic is the essence of programming.

<br>

## Comparisons

A comparison allows us to ask questions about a specific object.
It is a "statement" that Python can evaluate, returning a True / False (boolean) response.

In [None]:
# Define a String to play with
x = "hello"

print('Is x the String "Hello"?')
print(x == "Hello")
print()

print('Of course... Python is case-sensitive! So is it "hello"?')
print(x == "hello")
print()

print('So obviously x is not "goodbye"?')
print(x != "goodbye")
print()

print("Does x have a 'll' in it?")
print('ll' in x)
print()

# Define an int to play with
y = 12

print("Is y smaller than 6?")
print(y < 6)
print()

print("Is y smaller than 16?")
print(y < 16)
print()

print("So y is greater than 6, and also less than 16?")
print(y > 6 and y < 16)

<br>

## Conditionals - `if`/`else`

This is the fundamental logic of programming! It enables decision-making during code execution.

Pass a comparison to an `if` statement. If it evaluates to `True`, Python will
run that block of code. If `False`, it will skip that block of code and
continue. If the `if` block is followed by an `else` block, it will execute 
that block only if the `if` block turns out to be `True`.

```python
# Examples of "truthy" values:
True, 1, 12, "hi", [1,2,3], {"key": "value"}

# Examples of "falsey" values:
None, 0, "", [], {}
```

In [None]:
# Create some variables to play with
x = "hello"
y = 12

"""
!!! Take note of the indentation here! In Python, blocks of code are defined
by indentation. This tells Python which code to run when the "if" statement
is Truthy.
"""

if x == "hello":
    print("x is 'hello'!")
else:
    print("x is something else?")

if x:
    print(f"'{x}' is Truthy!")

x is 'hello'!
'hello' is Truthy!


In [None]:
# We can also use the "elif" operator to make a series of tests
a = 10
b = 12

if y > a:
    print("y is bigger than a")
elif y > b:
    print("y is bigger than b")
else:
    print(f"{y} is smaller than both {a} and {b}")

y is bigger than a


<br>

## For loops

Loops allow you to run a block of code on each item in a list (or any other
*iterable* object - Strings and dictionaries are iterable too!).

- Looping makes data processing extremely efficient!
- In the last section, "Files" we will use a for-loop to read a file line-by-line.

In [None]:
# Let's create a list to iterate over
places = [
    "zoo",
    "shops",
    "cinema",
    "library",
]

# Here, the subject of the loop has been named "place", but you can name it
# anything that makes sense. In this loop we iterate over "places", passing
# each "place" into a print statement:

for place in places:
    # For each place in the list of places, do this with the place:
    print(f"Let's go to the {place}")

Next we introduce two new keywords: `continue` and `break`
We can use them to control the flow of logic by *skipping* specific items and
*terminating* the loop early.

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

for n in numbers:
    if n == 5:
        # skip this iteration
        continue
    print(f"n = {n}")
    if n > 7:
        # "break" (terminate) the loop here:
        print(f"n is greater than 7!")
        print("I'm breaking the loop here!")
        break

Cool, we are now starting to piece together what we've learnt!

Let's sprinkle in a little more of our knowledge to this code block. This looks a little more crazy, but it's all stuff that we've covered before!

In [None]:
i = 0
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

for n in numbers:
    if i == 5:
        # skip the rest of the block for this iteration:
        continue
    print(f"Adding {n} to i...")
    i += n
    print(f"i is now {i}")
    if i > 7:
        break

It's time to introduce another Python built-in function!

- `Enumerate()` can be used to keep track of the iteration number in a `for` loop.
- We can use `i` to access the current loop number, starting at zero

What we're doing here is a tiny bit more complex - in this case the iterator returns *two* items at a time!

In [None]:
# Let's create a list to iterate over
places = [
    "zoo",
    "shops",
    "cinema",
    "library",
]

for i, place in enumerate(places):
    print(f"Place {i}) {place}")

<br><br><br>

# Files
---

<br>

Ok, we now have the basic components of Python programming:

- Syntax
- Data structures
- Expressions
- Logic

So far we have only used data that was defined in our code.
Of course, this is almost never the case in real life! Data is usually stored in files.

<br>

> Python is **totally agnostic** to file naming - if a file contains text, Python will read it the exact same, whether it's `data.txt`, `data.csv` or `data.tab`. Python can also read in binary files like `data.xlsx`, but this requires special libraries to be imported (see `pandas` - the Python data analysis library). In this tutorial, we will stick to regular text files.

<br>

## Reading data from files

---

In [None]:
# Let's download some data to play with
# (Don't worry, this is Colaboratory magic, not regular Python code!)

!wget https://raw.githubusercontent.com/neoformit/python_tutorial/55057f5fe4ff753ccaa60c2fb5883800d67e60d6/data/weights.txt
!wget https://raw.githubusercontent.com/neoformit/python_tutorial/55057f5fe4ff753ccaa60c2fb5883800d67e60d6/data/genes.fas

print("\n\nNow refresh your files panel, on the left!")

We will now read in a file and create a Python object from it. The simplest
way to a read in file in Python is to create one long String from the file content. We can then use some familiar String methods to break down the content and make sense of it.

In [None]:
# Pass a file path to open() to return a file "handle"
file = open("weights.txt")       # Opens in "read only" mode by default

print("'file' is just a file handle. We haven't read any content yet:")
print(file)
print()

content = file.read()            # Read the contents as a large String

# Try and remember to close files when you're finished with them!
# Lots of open files can clog up the computer's memory and will prevent other
# programs from opening those files.
file.close()

In [10]:
# We still don't know what's in the file - or how big it is!
# So how big is it?

# Remember - 'content' is just a String
#          - newline character = '\n'

lines = content.split('\n')
line_count = len(lines)
character_count = len(content)

print(f"The file has a length of {character_count} characters and {line_count} lines")

The file has a length of 24427 characters and 5000 lines


Jeez, I'm glad we didn't print out the whole thing!

We can still have a glance at it though:

In [None]:
# Use slicing to print the first 10 lines
print(lines[:10])

In [None]:
# We can use one of our String methods to print that nicely
text = '\n'.join(lines[:10])
print(text)

In [None]:
# We can also use a loop to add more information
for i, line in enumerate(lines):
    print(f"Line {i} | {line}")
    if i == 9:
        # Again, we only want to print 10 lines!
        break

Okay, so it looks the file is a large series of numbers. However, we will have to convert them to numbers before we can do any maths with them!

In [None]:
# Python will complain when we try this!
lines[0] + 5

In [None]:
# So we have a list of 5000 Strings... let's turn them into integers!
weights = []

for line in lines:
    n = int(line)              # Convert the String number to an int
    weights.append(n)          # Append the number to our new list

# Now we have a list of numbers - much more useful!

# Let's find out something about this data
print(f"The highest weight is {max(weights)}")
print(f"The lowest weight is {min(weights)}")
print()

# Let's sort the weights from low to high
weights.sort()

print(f"The five lowest weights are {weights[:5]}")
print(f"The ten highest weights are {weights[-10:]}")

So what are the weights that we're looking at?

These are the weights of fish, as recorded by fisherman in upstate New York. Being Americans, they recorded the weight of these fish in ounces. How annoying!

Let's fix this hot mess and convert them to kilograms. We could write a `for` loop to do this, but some new Python syntax will speed things up a lot. 

In [None]:
# Kilograms are much bigger than ounces
oz_to_kg = 0.0283495

# We will use some new syntax that allows us to filter and modify lists
# very efficiently: List comprehension!

# List comps are confusing at first, but they have a very routine structure.
# Once you learn this structure, they are quick and easy to build!

# This is the basic structure. It returns the list items unchanged.
ounces = [x for x in weights]  # Return 'x' for every 'x' in 'weights'

# We can add a simple expression to modify the items as they are returned
# This is the same structure we used before, but it's easier to read over
# multiple lines!
kilos = [
    x * oz_to_kg      # multiply 'x' by our conversion factor
    for x in weights  # for every 'x' in 'weights'
]

# The list order is unchanged, so the max weight should also be the last
print(f"Max weight:  {max(kilos)} kg")
print(f"Last weight: {kilos[-1]} kg")

Hmmm... far too many decimal places!

Let's use the `round()` function to round the numbers to one decimal place. Much better!

In [None]:
# The list comp structure is often easier to read over multiple lines!

kg_1dp = [
    round(x, 1)     # round 'x' to one DP
    for x in kilos  # for every 'x' in 'kilos'
]

print(f"Max weight:  {max(kg_1dp)} kg")

<br>

## Writing data to files
---

In order to save data created in our code, we need to write it to file.

In order to write data, you must open a file in write mode (`'w'`).

<br>

> **CAUTION** Opening a file in write mode will delete the content if the file already exists. This is a very easy and silly way to delete files accidentally! Be VERY sure that the file doesn't exist before creating, as Python will give you no warning whatsoever!

<br>

In [None]:
# We must explicitly open in "write" mode by passing 'w'
# This will create a new file, or overwrite an existing file

filename = 'weight_kg.csv'
file = open(filename, 'w')

# Remember, we have a list of ints now. We need to turn them back into Strings
# in order to write them to file! While we are at it, we can also include some
# newline characters, so that we are writing one number per line.

print(f"Writing data to {filename}...")

for n in kg_1dp:
    line = str(n) + '\n'
    file.write(line)

# Again, remember to close files when they are no longer needed!
file.close()

print("Done")

# Now go and check out your new file!
# Go to the file panel on the left, and double-click on the new file icon

<br>

## A tiny challenge
---

We mentioned before that you can also use list comprehension to *filter* lists.

- Can you filter the list `kilos` to include only weights above 20kg?
- Hint: you just need to add one `if` statement to the list comp that can be evaluated as `True` or `False`
- For a bonus point, write the filtered data to a new file called `kg_over_20.csv`

<br>

In [33]:
# Add a line to filter out certain numbers

filtered_kgs = [
    x
    for x in kilos
    # add a statement on this line, to evaluate 'x'
]

<br><br><br>

# Final challenge
---

<br>

So, you've learnt the basic components of programming in Python:

- Create variables
- Data structures
- Call built in functions
- Call object methods
- Logical flow and control
- Reading and writing files

<br>

So far you've been given a handful of simple challenges to test your knowledge. We're going to leave you with a takeaway challenge, that's a little bit harder but actually practical!

<br>

You'll notice that we downloaded a two files earlier - we haven't touched `genes.fas` yet. This is a `FASTA` formatted sequence file containing cDNA sequences for some crustacean genes. Have a look at it by double-clicking the file icon in the panel on the left.

<br>

## Tasks

- Read in the file `genes.fas`
- Create a dictionary of DNA sequences, indexed by sequence ID
- Print the ID of the longest cDNA sequence
- Iterate over the dictionary, collecting only genes containing the motif `TGACAC`.
- As you collect matching genes, add them to a new dictionary called `matches`
- Write `matches` to a new file in `FASTA` format

<br>

**Hints**
> Use `print()` statements to check your variables as the code develops, so you can spot and fix mistakes

> Remember, in `FASTA` format, a line beginning with `>` indicates the title line of a new sequence. How could you split the file content to collect these sequences?



In [None]:
# Write your code here.
# You may find it easier to separate into a series of cells.

MOTIF = 'TGACAC'
filename = 'genes.fas'