# Extra exercises

## Leap years

Implement the function `is_leap_year` so that it returns `True` when the year is a leap year, `False` otherwise.  Check the Wikipedia article on leap years for the definition. Hint: the modulo operator in Python is %, it can be used to compute the remainder of a division. If you get it right, the script will print 'okay' for all test cases.

In [4]:
def is_leap_year(year):
    if year % 400 == 0:
        return True
    elif year % 100 == 0:
        return False
    elif year % 4 == 0:
        return True
    else:
        return False

In [5]:
years = [1971, 1972, 1980, 1990, 2000, 2004, 2100]
expected_results = [False, True, True, False, True, True, False]
for year, expected_result in zip(years, expected_results):
    if is_leap_year(year) == expected_result:
        print('okay for', year)
    else:
        print('not okay for', year)

okay for 1971
okay for 1972
okay for 1980
okay for 1990
okay for 2000
okay for 2004
okay for 2100


## Factorial function

Define a function that, given a positive integer, returns the factorial of that number.  The factorial is defined as:

$0! = 1$

$n! = n(n-1)!$

Define two versions, one that is recursive (i.e., the function calls itself), the other iterative.  If you get it right, the last cell in this section will print 'okay' for all values.

In [10]:
def fac_rec(n):
    if n == 0:
        return 1
    elif n > 0:
        return n*fac_rec(n - 1)
    else:
        print('### error: fac_rec must be positive')
        return None

In [18]:
def fac_iter(n):
    if n == 0:
        return 1
    else:
        fac = n
        for i in range(1, n):
            fac *= i
        return fac

In [25]:
def fac_iter(n):
    if n < 0:
        print('### error:  fac_iter must be positive')
        return None
    else:
        fac = 1
        for i in range(2, n + 1):
            fac *= i
        return fac

In [26]:
fac_iter(-4)

### error:  fac_iter must be positive


In [27]:
import math
for i in range(10):
    if fac_rec(i) == math.factorial(i):
        print('recursive okay for {0}'.format(i))
    else:
        print('recursive not okay for {0}'.format(i))
for i in range(10):
    if fac_iter(i) == math.factorial(i):
        print('iterative okay for {0}'.format(i))
    else:
        print('iterative not okay for {0}'.format(i))

recursive okay for 0
recursive okay for 1
recursive okay for 2
recursive okay for 3
recursive okay for 4
recursive okay for 5
recursive okay for 6
recursive okay for 7
recursive okay for 8
recursive okay for 9
iterative okay for 0
iterative okay for 1
iterative okay for 2
iterative okay for 3
iterative okay for 4
iterative okay for 5
iterative okay for 6
iterative okay for 7
iterative okay for 8
iterative okay for 9


In [12]:
fac_rec(-1)

### error: fac_rec must be positive


## Count uppercase

Write a function that, given a string, returns the number of upper case characters in that string.

Hint: check the documentation on strings in Python for a method to check whether a string consists entirely of upper case letters.

If you get it right, the last cell in this section will print 'okay' for all test cases.


In [37]:
def count_upper(s):
    count = 0
    for letter in s:
        if letter.isupper():
            count += 1
    return count

In [38]:
str_examples = ['', 'Abc', 'ABc', '99485', 'A5b', 'ABCD']
expected_counts = [0, 1, 2, 0, 1, 4]
for str_example, expected_count in zip(str_examples, expected_counts):
    if count_upper(str_example) == expected_count:
        print("okay for '{0}'".format(str_example))
    else:
        print("not okay for '{0}'".format(str_example))

okay for ''
okay for 'Abc'
okay for 'ABc'
okay for '99485'
okay for 'A5b'
okay for 'ABCD'


## Palindromes

Define a fucntion that, given a string, returns True is that string is a palindrom, Falose otherwise., e.g., 'radar' is a palindrome, 'boxer' is not.

In [39]:
def is_palindrome(word):
    rev = ''
    for i in range(len(word) - 1, -1, -1):
        rev += word[i]
    if rev == word:
        return True
    else:
        return False

In [41]:
def is_palindrome(word):
    if word == ''.join(reversed(word)):
        return True
    else:
        return False

In [51]:
def is_palindrome(word):
    if word == word[::-1]:
        return True
    else:
        return False

In [89]:
def is_palindrome(word):
    return word == word[::-1]

In [90]:
long_str = 'A'*10_000_000
len(long_str)

10000000

In [91]:
%timeit is_palindrome(long_str)

16.7 ms ± 544 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [101]:
def is_palindrome(word):
    for i in range(len(word)//2):
        if word[i] != word[-i - 1]:
            return False
    return True

In [98]:
def is_palindrome(word):
    return not any(word[i] != word[-i - 1] for i in range(len(word)//2))

In [102]:
%timeit is_palindrome(long_str)

1 s ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [64]:
words = ['radar', 'boxer', '', 'a', 'QQ', 'acracadacarca',
         'abccba', 'abcdba']
expected_results = [True, False, True, True, True, True,
                    True, False]
for word, expected_result in zip(words, expected_results):
    if is_palindrome(word) == expected_result:
        print('okay for', word)
    else:
        print('not okay for', word)

okay for radar
okay for boxer
okay for 
okay for a
okay for QQ
okay for acracadacarca
okay for abccba
okay for abcdba


## Number of illegals

Define a function that given a string, counts the number of charactes that do not represent nucleotides in DNA.

If you get it right, the last cell will print 'okay' for all test cases.

In [112]:
def count_illegal_chars(dna):
    count = 0
    for nucl in dna.upper():
        if nucl not in 'ACGT':
            count += 1
    return count

In [114]:
dnas = ['ACGTTAGC', 'ACewla', 'acccgv', '']
expected_illegals_counts = [0, 3, 1, 0]
for dna, expected_illegals_count in zip(dnas, expected_illegals_counts):
    if count_illegal_chars(dna) == expected_illegals_count:
        print('okay for', dna)
    else:
        print('not okay for', dna)

okay for ACGTTAGC
okay for ACewla
okay for acccgv
okay for 


## Make legal

Define a function that given a string that represents DNA, creates a new string that only contains the character 'A', 'C', 'G', 'T'.  If the original string contains lower case letters 'a', 'c', 'g', 't', replace those with upper case.  All other characters are left out.

If you get it right, the script will print 'okay' for all test cases.

In [109]:
def make_legal(dna):
    s = ''
    for nucl in dna.upper():
        if nucl in 'ACGT':
            s += nucl
    return s

In [110]:
dnas = ['ACGTTAGC', 'ACewla', 'acccgv', '', 'fkjrk']
expected_legals = ['ACGTTAGC', 'ACA', 'ACCCG', '', '']
for dna, expected_legal in zip(dnas, expected_legals):
    if make_legal(dna) == expected_legal:
        print("okay for '{0}'".format(dna))
    else:
        print("not okay for '{0}'".format(dna))

okay for 'ACGTTAGC'
okay for 'ACewla'
okay for 'acccgv'
okay for ''
okay for 'fkjrk'


## Is complement?

Define a function that, given two DNA sequences returns True if the first sequence is the complement of the second, False otherwise.  You can assume that the DNA strings only contain A, C, G, and T.

Hint: define a function that returns the complement of a given nucleotide.

In [117]:
def compute_complement(dna):
    compl = ''
    for nucl in dna.upper():
        if nucl == 'A':
            compl += 'T'
        elif nucl == 'C':
            compl += 'G'
        elif nucl == 'G':
            compl += 'C'
        elif nucl == 'T':
            compl += 'A'
        else:
            print(f'### error: {nucl} is not a nucleotide')
            return None
    return compl

In [118]:
compute_complement('ACCGT')

'TGGCA'

In [122]:
def is_complement(dna1, dna2):
    return compute_complement(dna1) == dna2

In [130]:
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna

In [135]:
def is_complement(dna1, dna2):
    seq1 = Seq(dna1, alphabet=generic_dna)
    seq2 = Seq(dna2, alphabet=generic_dna)
    return seq1 == seq2.complement()

In [136]:
dna_tuples = [('ACCGT', 'TGGCA'), ('GA', 'CT'), ('ACG', 'CGC'),
              ('', '')]
expected_results = [True, True, False, True]
for dna_tuple, expected_result in zip(dna_tuples, expected_results):
    if is_complement(dna_tuple[0], dna_tuple[1]) == expected_result:
        print('okay for', dna_tuple)
    else:
        print('not okay for', dna_tuple)

okay for ('ACCGT', 'TGGCA')
okay for ('GA', 'CT')
okay for ('ACG', 'CGC')
okay for ('', '')


## Find longest

Define a function that given a string that represents DNA, finds the longest sequence of identical nucleotides, e.g.,

`AACCCGACGGT -> CCC`

`AAACCAAAA -> AAAA`

If there are multiple subsequences of the same length, return the first, e.g.,

`ACCCGTTTA -> CCC`

In [139]:
def longest_identical(dna):
    longest = ''
    curr_nucl = None
    curr_longest = ''
    for nucl in dna:
        if nucl == curr_nucl:
            curr_longest += nucl
        else:
            if len(curr_longest) > len(longest):
                longest = curr_longest
            curr_longest = nucl
            curr_nucl = nucl
    if len(curr_longest) > len(longest):
        return curr_longest
    else:
        return longest

In [140]:
dnas = ['AACCCGACCGGT', 'AAACGAAAAT', 'CAAGTTG', 'CAGT', '',
        'CAAGAAA', 'CCAAAGGTTTG']
expected_subseqs = ['CCC', 'AAAA', 'AA', 'C', '', 'AAA', 'AAA']
for dna, expected_subseq in zip(dnas, expected_subseqs):
    if longest_identical(dna) == expected_subseq:
        print("okay for '{0}'".format(dna))
    else:
        print("not okay for '{0}'".format(dna))

okay for 'AACCCGACCGGT'
okay for 'AAACGAAAAT'
okay for 'CAAGTTG'
okay for 'CAGT'
okay for ''
okay for 'CAAGAAA'
okay for 'CCAAAGGTTTG'


## Scramble

Define a function that given a string representing a text consisting of sentences in a natural language, at random switches two consecutive characters in words.  Add an optional argument for the probability to exchange character in a word, defaults to 20 %. You can assume that the text contains only words and whitespace, no other characters.

E.g.,  'This is an example' -> 'hTis is an exapmle'

Hint: look at the documentation of the functions `random` and `randint` in the standard library module `random`.

In [1]:
import random

In [7]:
def scramble_word(word, prob):
    if random.random() < prob:
        pos = random.randint(0, len(word) - 2)
        new_word = ''
        for i, letter in enumerate(word):
            if i == pos:
                new_word += word[i + 1]
            elif i - 1 == pos:
                new_word += word[i - 1]
            else:
                new_word += letter
        return new_word
    else:
        return word

In [33]:
def scramble_word(word, prob):
    if random.random() < prob and len(word) > 1:
        pos = random.randint(0, len(word) - 2)
        return word[:pos] + word[pos + 1] + word[pos] + word[pos + 2:]
    else:
        return word

In [34]:
scramble_word('abcde', 1.0)

'bacde'

In [35]:
scramble_word('a', 1.0)

'a'

In [36]:
def scramble(sentence, prob=0.2):
    new_words = []
    for word in sentence.split():
        new_words.append(scramble_word(word, prob))
    return ' '.join(new_words)

In [40]:
def count_changes(orig_sentences, new_sentence):
    count = 0
    for orig_word, new_word in zip(orig_sentences.split(),
                                   new_sentence.split()):
        if orig_word != new_word:
            count += 1
    return count

sentence = ('This is an example I think with a very long '
            'sentence that seems to keep going for ever since '
            'we need quite some words')
nr_words = len([1 for word in sentence.split() if len(word) > 1])
for prob in [0.0, 0.2, 0.5, 1.0]:
    new_sentence = scramble(sentence, prob)
    msg = 'probability = {0:.1%}, {1:.1%} words changed:'
    nr_words_changed =  count_changes(sentence, new_sentence)
    print(msg.format(prob, nr_words_changed/nr_words))
    print(new_sentence)

probability = 0.0%, 0.0% words changed:
This is an example I think with a very long sentence that seems to keep going for ever since we need quite some words
probability = 20.0%, 27.3% words changed:
Thsi is an example I htink with a very long esntence that seems to keep going for ever sicne ew need quite some worsd
probability = 50.0%, 59.1% words changed:
This is an example I think wiht a vrey logn esntence thta seesm to keep going for eevr isnce ew nede uqite smoe worsd
probability = 100.0%, 90.9% words changed:
hTis si na eaxmple I htink iwth a evry olng sentenec thta semes ot keep oging ofr eevr isnce ew need qutie soem wodrs
