# Extra exercises

## Leap years

Implement the function `is_leap_year` so that it returns `True` when the year is a leap year, `False` otherwise.  Check the Wikipedia article on leap years for the definition. Hint: the modulo operator in Python is %, it can be used to compute the remainder of a division. If you get it right, the script will print 'okay' for all test cases.

In [1]:
def is_leap_year(year):
    if year % 400 == 0:
        return True
    elif year % 100 == 0:
        return False
    else:
        return year % 4 == 0

In [2]:
years = [1971, 1972, 1980, 1990, 2000, 2004, 2100]
expected_results = [False, True, True, False, True, True, False]
for year, expected_result in zip(years, expected_results):
    if is_leap_year(year) == expected_result:
        print('okay for', year)
    else:
        print('not okay for', year)

okay for 1971
okay for 1972
okay for 1980
okay for 1990
okay for 2000
okay for 2004
okay for 2100


## Factorial function

Define a function that, given a positive integer, returns the factorial of that number.  The factorial is defined as:
$$
0! = 1 \\
n! = n(n-1)!
$$

Define two versions, one that is recursive (i.e., the function calls itself), the other iterative.  For this second version, remember that you can compute the factorial as
$$
n! = \prod_{i=1}^{n} i
$$


If you get it right, the last cell in this section will print 'okay' for all values.

In [7]:
def fac_rec(n):
    return 1 if n < 1 else n*fac_rec(n - 1)

In [8]:
def fac_iter(n):
    fac = 1
    for i in range(2, n + 1):
        fac *= i
    return fac

In [9]:
import math
for i in range(10):
    if fac_rec(i) == math.factorial(i):
        print('recursive okay for {0}'.format(i))
    else:
        print('recursive not okay for {0}'.format(i))
for i in range(10):
    if fac_iter(i) == math.factorial(i):
        print('iterative okay for {0}'.format(i))
    else:
        print('iterative not okay for {0}'.format(i))

recursive okay for 0
recursive okay for 1
recursive okay for 2
recursive okay for 3
recursive okay for 4
recursive okay for 5
recursive okay for 6
recursive okay for 7
recursive okay for 8
recursive okay for 9
iterative okay for 0
iterative okay for 1
iterative okay for 2
iterative okay for 3
iterative okay for 4
iterative okay for 5
iterative okay for 6
iterative okay for 7
iterative okay for 8
iterative okay for 9


## Count uppercase

Write a function that, given a string, returns the number of upper case characters in that string.

Hint: check the documentation on strings in Python for a method to check whether a string consists entirely of upper case letters.

If you get it right, the last cell in this section will print 'okay' for all test cases.


In [10]:
def count_upper(s):
    count = 0
    for letter in s:
        if letter.isupper():
            count += 1
    return count

In [11]:
str_examples = ['', 'Abc', 'ABc', '99485', 'A5b', 'ABCD']
expected_counts = [0, 1, 2, 0, 1, 4]
for str_example, expected_count in zip(str_examples, expected_counts):
    if count_upper(str_example) == expected_count:
        print("okay for '{0}'".format(str_example))
    else:
        print("not okay for '{0}'".format(str_example))

okay for ''
okay for 'Abc'
okay for 'ABc'
okay for '99485'
okay for 'A5b'
okay for 'ABCD'


## Palindromes

Define a fucntion that, given a string, returns True is that string is a palindrom, Falose otherwise., e.g., 'radar' is a palindrome, 'boxer' is not.

In [14]:
def is_palindrome(word):
    for i in range(len(word)//2):
        if word[i] != word[len(word) - i - 1]:
            return False
    return True

In [1]:
def is_palindrome(word):
    if len(word) < 2:
        return True
    else:
        return word[0] == word[-1] and is_palindrome(word[1:-1])

In [2]:
words = ['radar', 'boxer', '', 'a', 'QQ', 'acracadacarca',
         'abccba', 'abcdba']
expected_results = [True, False, True, True, True, True,
                    True, False]
for word, expected_result in zip(words, expected_results):
    if is_palindrome(word) == expected_result:
        print('okay for', word)
    else:
        print('not okay for', word)

okay for radar
okay for boxer
okay for 
okay for a
okay for QQ
okay for acracadacarca
okay for abccba
okay for abcdba


## Number of illegals

Define a function that given a string, counts the number of charactes that do not represent nucleotides in DNA.

If you get it right, the last cell will print 'okay' for all test cases.

In [20]:
def count_illegal_chars(dna):
    count = 0
    for nucl in dna:
        if nucl.upper() not in 'ACGT':
            count += 1
    return count

In [21]:
dnas = ['ACGTTAGC', 'ACewla', 'acccgv']
expected_illegals_counts = [0, 3, 1]
for dna, expected_illegals_count in zip(dnas, expected_illegals_counts):
    if count_illegal_chars(dna) == expected_illegals_count:
        print('okay for', dna)
    else:
        print('not okay for', dna)

okay for ACGTTAGC
okay for ACewla
okay for acccgv


## Make legal

Define a function that given a string that represents DNA, creates a new string that only contains the character 'A', 'C', 'G', 'T'.  If the original string contains lower case letters 'a', 'c', 'g', 't', replace those with upper case.  All other characters are left out.

If you get it right, the script will print 'okay' for all test cases.

In [26]:
def make_legal(dna):
    new_dna = ''
    for nucl in dna:
        if nucl.upper() in 'ACGT':
            new_dna += nucl.upper()
    return new_dna

In [27]:
dnas = ['ACGTTAGC', 'ACewla', 'acccgv', '', 'fkjrk']
expected_legals = ['ACGTTAGC', 'ACA', 'ACCCG', '', '']
for dna, expected_legal in zip(dnas, expected_legals):
    if make_legal(dna) == expected_legal:
        print("okay for '{0}'".format(dna))
    else:
        print("not okay for '{0}'".format(dna))

okay for 'ACGTTAGC'
okay for 'ACewla'
okay for 'acccgv'
okay for ''
okay for 'fkjrk'


## Is complement?

Define a function that, given two DNA sequences returns True if the first sequence is the complement of the second, False otherwise.  You can assume that the DNA strings only contain A, C, G, and T.

Hint: define a function that returns the complement of a given nucleotide.

In [30]:
def is_complement(dna1, dna2):
    for i, nucl in enumerate(dna1):
        if nucl == 'A':
            if dna2[i] != 'T':
                return False
        elif nucl == 'C':
            if dna2[i] != 'G':
                return False
        elif nucl == 'G':
            if dna2[i] != 'C':
                return False
        elif nucl == 'T':
            if dna2[i] != 'A':
                return False
    return True

In [31]:
dna_tuples = [('ACCGT', 'TGGCA'), ('GA', 'CT'), ('ACG', 'CGC')]
expected_results = [True, True, False]
for dna_tuple, expected_result in zip(dna_tuples, expected_results):
    if is_complement(dna_tuple[0], dna_tuple[1]) == expected_result:
        print('okay for', dna_tuple)
    else:
        print('not okay for', dna_tuple)

okay for ('ACCGT', 'TGGCA')
okay for ('GA', 'CT')
okay for ('ACG', 'CGC')


## Find longest

Define a function that given a string that represents DNA, finds the longest sequence of identical nucleotides, e.g.,

`AACCCGACGGT -> CCC`

`AAACCAAAA -> AAAA`

If there are multiple subsequences of the same length, return the first, e.g.,

`ACCCGTTTA -> CCC`

In [36]:
def longest_identical(dna):
    longest = ''
    if dna:
        curr_longest = dna[0]
        for nucl in dna[1:]:
            if nucl == curr_longest[-1]:
                curr_longest += nucl
            else:
                if len(curr_longest) > len(longest):
                    longest = curr_longest
                curr_longest = nucl
        if len(curr_longest) > len(longest):
            longest = curr_longest
    return longest

In [37]:
dnas = ['AACCCGACCGGT', 'AAACGAAAAT', 'CAAGTTG', 'CAGT', '',
        'CAAGAAA', 'CCAAAGGTTTG']
expected_subseqs = ['CCC', 'AAAA', 'AA', 'C', '', 'AAA', 'AAA']
for dna, expected_subseq in zip(dnas, expected_subseqs):
    if longest_identical(dna) == expected_subseq:
        print("okay for '{0}'".format(dna))
    else:
        print("not okay for '{0}'".format(dna))

okay for 'AACCCGACCGGT'
okay for 'AAACGAAAAT'
okay for 'CAAGTTG'
okay for 'CAGT'
okay for ''
okay for 'CAAGAAA'
okay for 'CCAAAGGTTTG'


## Scramble

Define a function that given a string representing a text consisting of sentences in a natural language, at random switches two consecutive characters in words.  Add an optional argument for the probability to exchange character in a word, defaults to 20 %. You can assume that the text contains only words and whitespace, no other characters.

E.g.,  'This is an example' -> 'hTis is an exapmle'

Hint: look at the documentation of the functions `random` and `randint` in the standard library module `random`.

In [38]:
import random

In [44]:
def scramble_word(word, prob):
    if len(word) > 1:
        if random.random() < prob:
            i = random.randint(1, len(word) - 1)
            word = word[:i - 1] + word[i] + word[i - 1] + word[i + 1:]
    return word

In [45]:
def scramble(sentence, prob=0.2):
    words = sentence.split()
    scrambeled_words = list()
    for word in words:
        scrambeled_words.append(scramble_word(word, prob))
    return ' '.join(scrambeled_words)

In [46]:
def count_changes(orig_sentences, new_sentence):
    count = 0
    for orig_word, new_word in zip(orig_sentences.split(),
                                   new_sentence.split()):
        if orig_word != new_word:
            count += 1
    return count

sentence = ('This is an example I think with a very long '
            'sentence that seems to keep going for ever since '
            'we need quite some words')
nr_words = len([1 for word in sentence.split() if len(word) > 1])
for prob in [0.0, 0.2, 0.5, 1.0]:
    new_sentence = scramble(sentence, prob)
    msg = 'probability = {0:.1%}, {1:.1%} words changed:'
    nr_words_changed =  count_changes(sentence, new_sentence)
    print(msg.format(prob, nr_words_changed/nr_words))
    print(new_sentence)

probability = 0.0%, 0.0% words changed:
This is an example I think with a very long sentence that seems to keep going for ever since we need quite some words
probability = 20.0%, 18.2% words changed:
This is an example I think with a very long sentence htat seems to kepe going for ever since we need qiute smoe words
probability = 50.0%, 50.0% words changed:
Tihs is na exmaple I thnik wiht a very long sentecne that seems ot ekep gonig for veer since we need quite osme words
probability = 100.0%, 100.0% words changed:
hTis si na eaxmple I htink wiht a evry logn setnence htat semes ot ekep oging fro veer snice ew ened quiet soem owrds
