In [2]:
from __future__ import division, print_function, unicode_literals
import unittest

import util

# Welcome back! 

### What is a method?

In this lesson, we're going to discuss *methods* on strings. Methods are like functions, but instead of being standalone in the program, they are "bound" to a particular piece of data. Ordinarily, you call a function on a variable as a standalone function, like we saw for the len() function last chapter

In [4]:
len("Hello world")

11

When we defined our own functions, we called them in much the same way.

In [5]:
def greet_with_name(name):
    return "Hello "+name

greet_with_name("Josh")

'Hello Josh'

When we use methods, we call them on a string using the dot . operator.

All of the methods that we will see have been designed into python, and are part of the core language. You can also make your own methods, but not on types that already exist (like strings.) This is an advanced topic that we'll get to eventually!

Lets see some examples of string methods in Python. First we're gonna define a string:

In [6]:
s_1 = "Never gonna give you up, never gonna let you down"

The method "upper" is defined on strings to take the string, and return a version that is all uppercase. Since it is "bound" to the string as a method, we don't need to tell the function what string to operate on. Instead, we type the name of the string, then a dot, then the upper() function

In [7]:
s_1.upper()

'NEVER GONNA GIVE YOU UP, NEVER GONNA LET YOU DOWN'

Note that the upper function does not exist in python on its own.

### Why is this (if you're interested, feel free to skip)
*Python is used for a lot of different applications. Because of this, the designers of the language try to keep as few things in the core language as possible. If the "upper" function were seperate from the string, this would block users from creating their own variables called *upper* without conflicting with the built in function. Even worse, it might not be clear what sort of data type *upper* works on, forcing users to go to a long documentation any time they wanted to understand the role of a particular function. By organizing every function that works on strings as a string method, Python ensures that all of the built in functions that operate on strings are contained within one place.*

*There are certain functions like len() that are not defined as methods in python. That's because (as we'll soon see) there are many types in python that have meaningful lengths, so this function ends up being useful in a number of contexts.*

As you might guess, the .lower() method does something similar

In [8]:
s_1.lower()

'never gonna give you up, never gonna let you down'

Note that these methods are specific to strings. This makes sense, because there is no such thing as the lowercase of a number, for example

In [9]:
int(4000).lower()

AttributeError: 'int' object has no attribute 'lower'

# A brief tour of some string methods

### Changing case
As we saw before, there are methods for changing the case of a string 

In [10]:
s_1.upper()

'NEVER GONNA GIVE YOU UP, NEVER GONNA LET YOU DOWN'

In [11]:
s_1.lower()

'never gonna give you up, never gonna let you down'

In [12]:
# Capitalizes the beginning of every word
s_1.title()

'Never Gonna Give You Up, Never Gonna Let You Down'

In [13]:
# Inverts the case of every word
s_1.swapcase()

'nEVER GONNA GIVE YOU UP, NEVER GONNA LET YOU DOWN'

As an aside, case is more complicated in many other languages than it is in English. Python is used all over the world, and so it is generally able to correctly convert international characters to their proper upper case variants. Obviously, since this varies on location, the results will not always be perfect. 

In [14]:
"Grüß Gott".upper()

'GRÜSS GOTT'

### Searching

Say we wanted to know the position of "let" in the string. The .find(substring) method will return the index of the first position of "let" in the string. If the search string is not found, it will return -1. Find and all other string methods are case-sensitive

In [15]:
s_1.find("let")

37

In [16]:
s_1.find("Never")

0

In [17]:
s_1.find("never")

25

In [18]:
s_1.find("purple")

-1

If there are multiple examples where the search string is found, .find() returns only the first example

In [19]:
s_1.find("gonna")

6

Since searching returns a number representing the index where a string is found, you can easily combine it with slicing to (say) return all of the string after a search query:

In [20]:
s_1[s_1.find("gonna"):]

'gonna give you up, never gonna let you down'

### Counting occurences of a substring

The .count(substring) method tells you how many times a substring occurs in a string. Note that, like all functions, count is case-sensitive.

In [21]:
s_1.count("never")

1

In [22]:
s_1.count("gonna")

2

In [23]:
s_1.count("you")

2

### Replacing strings

The .replace(original, replacement) method replaces all of the intstances of the first argument in the string with the second argument.

In [24]:
s_1.replace('never', 'always')

'Never gonna give you up, always gonna let you down'

In [25]:
s_1.replace('gonna', 'going to')

'Never going to give you up, never going to let you down'

# Chaining methods

Since many string methods return a string, you can call a second method on them by just adding it to the end of the first method call. In programming, this is sometimes called "method chaining," and is a powerful and concise way to call many methods on one object, rather than creating lots of intermediate variables.

In [26]:
# Instead of 
s_1_lower_case = s_1.lower()
s_1_lower_case.replace('never', 'always')

'always gonna give you up, always gonna let you down'

In [27]:
# Try
s_1.lower().replace('never', 'always')

'always gonna give you up, always gonna let you down'

# Looking forward

There's a lot more methods on strings than the ones that we've explored here. Some of them I've held off on talking about, because they produce Python data types that we haven't discussed yet. If you want to try them out, try typing the name of a string (or a string inside quotes), typing a dot . , and then hitting tab. The iPython shell should automatically pop up a box showing every method defined on strings. Can you figure out what some of the other ones do?

# Documentation

If you ever want to know what a method does, or what arguments it accepts, you can type 

?s_1.method 

into a cell and a box will pop up at the bottom showing the arguments, what it returns, and a short description of what the method does. Give it a try below!

In [28]:
?s_1.casefold

# Excercises

## EcoRI fragments

EcoRI is a restriction enzyme with the recognition site GAATTC. Given a linear DNA sequence (i.e. not a circular plasmid), find how many fragments EcoRI would cut it into (HINT: This is not the same as the number of times that EcoRI cuts the fragment) (HINT 2: Remember that not all sequences will have the same case)

In [31]:
def number_ecori_cuts(seq):
    return seq.upper().count('GAATTC') + 1

In [32]:
class EcoriTest(unittest.TestCase):
    def test_upper_case_string(self):
        n_cuts = number_ecori_cuts("ACCGAATTCGGTGTACCGTGAATTCAGGACAG")
        self.assertEqual(n_cuts, 3, "How many fragments do you get for a linear sequence that's cut once?")
    def test_mixed_case_string(self):
        n_cuts = number_ecori_cuts("ACCGAATTCGGtGTACcGTGAATTCaGGACAG")
        n_cuts_lower = number_ecori_cuts("ACCGAATTCGGtGTACcGTGAATTCaGGACAG".lower())
        self.assertEqual(n_cuts, n_cuts_lower, "Does your program handle upper and lower case strings the same?")
        
util.run_tests(EcoriTest)

test_mixed_case_string (__main__.EcoriTest) ... ok
test_upper_case_string (__main__.EcoriTest) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK


<unittest.runner.TextTestResult run=2 errors=0 failures=0>

## Percent GC 

In DNA sequences, the GC-content of the sequence can strongly effect its properties (melting temperature, hybrid stability, etc.) Write a function that takes a DNA sequence as a string, and returns the percent GC content (where 0 is no GC, and 100 is all GC.) (HINT: Rememeber that you can get the length of a full sequence using the len() function)

In [33]:
def percent_gc(seq):
    seq = seq.upper()
    GC_count = seq.count('C') + seq.count('G')
    return 100 * (GC_count/len(seq))

In [34]:
class PercentGCTest(unittest.TestCase):
    def test_mixed_case_string(self):
        test_seq = "ACCGaaTTCGGTGTACCgTGTTCAGGACAG"
        self.assertEqual(percent_gc(test_seq), 
                         percent_gc(test_seq.lower()), 
                         "Does your program handle upper and lower case strings the same?")
    def test_not_percent(self):
        test_seq = "ACCGaaTTCGGTGTACCgTGTTCAGGACAG"
        self.assertNotEqual(percent_gc(test_seq), 0.5333333333333333, 
                            'Did you remember to express the answer as a percent?')
    def test_answer(self):
        test_seq = "ACCGaaTTCGGTGTACCgTGTTCAGGACAG"
        self.assertEqual(int(percent_gc(test_seq)), 53)

util.run_tests(PercentGCTest)


test_answer (__main__.PercentGCTest) ... ok
test_mixed_case_string (__main__.PercentGCTest) ... ok
test_not_percent (__main__.PercentGCTest) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.005s

OK


<unittest.runner.TextTestResult run=3 errors=0 failures=0>