# Workshop 2

Welcome to Workshop 2. This workshop is based on all of the material covered in the self-study notebooks up to the end of notebook 11. You will now have learned about loops, dictionaries, conditional statements and loops so you now have the basic toolset to write programs capable of performing complex and repetitive tasks. 

In this workshop, you will **work in groups of 2-3 students** to solve a series of problems in two Jupyter notebooks. If you complete those within the workshop, you may want to try the (optional) notebook 'Workshop 2c' - you may need some demonstrator help with that one. 

At first, it might not be obvious how to answer the questions in this workshop by writing code. If that is the case:

* Think about what the question is asking you to do - how can you break it down into computable components that you can write in Python. What is your input, and what output do you need? Does you program need to use loops? Does it need to make decisions (conditional statements)?
* Do you need to store data? What data structures (e.g. lists and dictionaries) would make your task easiest?

Demonstrators will guide you along the way to solving the problems. 

### Google it... 

![Google](./images/googleg_standard_color_128dp.png)

Programming syntax can be complicated - even if you know the basic language, you also need all of the options that go with it. Even very proficient programmers make frequent use of Google searches or Python documentation. You should get into the habit of looking up code syntax you want to know more about or Googling how to achieve a particular task with code. 

This is a skill - try to use accurate programming language in your searches - you're more likely to get a good hit with "How do I generate a random integer in Python" than "How do I pick a number between 0 and 100". 

<div class="alert alert-danger">
**Advanced exercises**<br>
Some students may already be experienced programmers - more advanced exercises are provided for those who are looking for more challenging problems, but these are above the level of ability expected at this stage of the class.
</div>

# Task 1: Palindromes

![Palindrome](./images/palin-fe3.jpg)

A language palindrome is a word which reads the same backward as forward. Complete the code below to check whether each word in the list below is a palindrome.

It is a good idea to build your code up in stages and test each, for example:
  * First write code to print out each word in the list
  * Once that works, modify your code to reverse the word
    * *Hint: remember the special case of slicing - string_name[::-1] returns the reverse of a string.*
  * Use a conditional statement to only print out the palindromes
    * *Hint: What is True if a word is a palindrome?*

<div class="alert alert-danger">
<strong>Advanced exercises </strong><br>
    <br>
1. If you are already familiar with programming, you could write your code as a function - writing functions is covered in notebook 13.<br>
<br>
2. Can write your function without using slicing or reverse( ) - i.e. write your own code to reverse the string?<br>
</div>


In [1]:
words = ['golf', 'level', 'spoon', 'reverser', 'noon', 'racecar', 'cell', 'rotator', 'tape', 'stats', 'bridge', 'lagoon', 'tenet']


# Task 2: DNA palindromes

![Palindrome](./images/DNA_palindrome.jpg)

The meaning of palindrome in the context of molecular biology is slightly different from the definition used for words and sentences. Since a double helix is formed by two paired strands of nucleotides that run in opposite directions in the 5'-to-3' sense, and the nucleotides always pair in the same way (Adenine (A) with Thymine (T) and Cytosine (C) with Guanine (G)), a (single-stranded) nucleotide sequence is said to be a palindrome if it is equal to its reverse complement. 

To state that another way, if you substitute each nucleotide in a palindrome with its partner and reverse the sequence (this is known as reverse complementing), it will be the same as the original sequence. 

For example, the DNA sequence ATGATCAT is palindromic because its nucleotide-by-nucleotide complement is TACTAGTA, and reversing the order of the nucleotides in the complement gives the original sequence.
```
                             ATGATCAT
                             ||||||||
                             TACTAGTA
```
Most restriction enzyme (enzymes that cut at a specific DNA sequence and are widely used for DNA modification and cloning) cleavage sites are palindromic. 

* Part 1 - You are given a DNA sequence in a variable called sequence. Write a program to test whether the DNA sequence is a palindrome, and return 'True' or 'False'. You might find the dictionary 'pairs' useful in helping you complement the DNA sequence.

* Part 2 - Can you alter your code so that it processes a list of DNA sequences in a list called 'sequences'.

* Part 3 - Can you modify your code so that it stores only the palindromes in a list?

<div class="alert alert-danger">
**More advanced exercise**<br>
It is useful to be able to search long DNA sequences to find palindromic sequences. Write a program to print all palindromic sequences between 4 and 8 base pairs long in the sequence: <br>
ATGAGATAGAAGAGCGCATCGATCGATGGACCGATCGATCGATTCGCGAGCTCGCGATCGATCGGCC<br>
GATATCGCGCGATATGCGCTGCGTACGCACGATCGATCGATGGTAATCGTACGACTTCGAAGTCGCGC
</div>


### Tip

First try to see whether the first few sequences in the list are DNA palindromes without the computer. As you're doing that, break what you're doing into individual steps to understand how you know whether the sequence is a DNA palindrome. 

You may do this in different ways from others in your group - there are usually different ways you can solve a coding problem. 

e.g. 
* Step 1: Reverse the sequence
* Step 2: Replace each base with its complementary base
* Step 3: See whether the reversed, complemented sequence is the same as the original - if it is it is a DNA palindrome. 

Now think about how you can perform each of these steps in code: 

* Step 1: Use the same method you used in task 1

* Step 2: You have been given a dictionary with bases as keys and their complementary bases as values, so for example pairs['A'] will return T:

```
pairs = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}

print(pairs['A']) #prints T
```
* Step 3: You now need to test whether a condition is True - use conditional statements.

In [18]:
#Part 1 variables

sequence = 'ATGCGCAT'

pairs = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}

In [2]:
#Part 2 variables
sequences = ['ATGATCGTAT', 'ATGGAC', 'GAATTC', 'CTTATAAG', 'ATGCTA',
             'GCATCGACTTCGAAGTCGATGC', 'CGTGCTTCGAG', 'GAGCTC',
             'GCGTACGC', 'ATGGTA']

pairs = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}


In [None]:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  m in long sequences, giving 