In [90]:
from __future__ import division, print_function, unicode_literals

# Welcome back!

If you're new to python and haven't already gone through the first part, Numbers, do that first!


# Strings

Python variables are more flexible than just containers for numbers. They can also hold other kinds of data. One type of data that programs often must deal with is text. Programs might use text to store messages to the user, for example. They might download a website as text, and then count the number of times that a particular word appears. Even types of data that we might not traditionally think of as text, like DNA sequences, are often represented as strings.

We define a string in python by surrounding a piece of text in quotes. This text can contain any valid characters. This is a string

```python
string_1 = "Hello world"
```

As is this 

```python
text_message = "Time for 🍺 ?"
```

Or this

```python
restriction_enzyme_motif = "GATATC"
```

You can denote a string with either single ' or double " quotation marks. In python, these are completely interchangeable. For consistency, it's best to pick one and stick with it!

### Nitpicky point: How do you put a quote in a string?

Since strings begin and end with quotes, you might be wondering how we put a quote inside a string? The answer is with *escape characters*. When you put a backslash \ in a quote, Python treats this as a special character. The next character after the backslash can cause certain effects. For example, \n inserts a new line in a string, or \t inserts a "tab" character. In strings, we use \" and \' to insert the double and single quotation marks, respectively. So 

```python
quoted_message = "\"Learn Python,\" they said. \"It will be easy,\" they said."
```

Represents the string:

"Learn python," they said. "It will be easy," they said

### Double nitpicky point: How do you put a backlash in a string?

Because we use the backslash as a special character, you also have to have a way to represent backslashes. Convieniently, it's just a double backslash \\. So a path name on windows would be represented:

```python
path = "c:\\Users\\me"
```
### Long strings
What if you want to represent a message that is broken over multiple lines? One option is to use triple quotes around the string. Python will interpret everything until the next triple quote as all being part of one string. For example:

```python
long_message = """Dear student,
This is a long string.

Sincerely,
Mike"""
```

Try modifying the quote below to read:

"Learn python" they said. "It's super easy," I said.

In [2]:
"\"Learn Python,\" they said. \"It will be easy,\" they said."

'"Learn Python," they said. "It will be easy," they said.'

# Basic String operations

Python provides a number of handy ways to manipulate strings, right out of the box. We'll go through a few of the basic ones here. Something you'll learn quickly in programming is that it's best to be lazy, and use the ways that other people have solved problems whenever you can. Chances are, if you want to do something with a string, Python has a built in way to do it. Let's start by defining a variable with a string to play around with:

In [16]:
opening_sentence = "It was the best of times, it was the worst of times."

### Length of a string
One useful thing to know about a string is how long it is, or how many characters it takes up. In Python, there's a function built in to do this 

In [17]:
len(opening_sentence)

52

### Adding strings together

Strings can be combined, using the + symbol in the same way that you add numbers together. Note the space at the beginning of the string that we are adding on. When strings are added together, python does not automatically put in any sort of seperating character.

In [18]:
opening_sentence + " It was also the age of pizza 🍕"

'It was the best of times, it was the worst of times. It was also the age of pizza 🍕'

One thing to note: it's very important to understand the distinction in Python between a number (like 2) and it's string represenation ("2"). You might expect "10" + "2" would equal "12", but because the + operator works differently for strings and numbers, in reality, you get

In [40]:
"10" + "2"

'102'

You also cannot add a string to a number, or vice-versa. In Python this will always produce an error, as it is not obvious what your program is supposed to do.

In [47]:
"10" + 2

TypeError: Can't convert 'int' object to str implicitly

### Converting strings to numbers, and back to strings

To convert a number (or any other sort of data) to a string, python provides the str() function. To convert a string (or anything else) into a number, python provides the float() function (for decimals) or the int() function (for integers.) Let's see a few examples 

In [41]:
str(1.004)

'1.004'

In [43]:
float("3.14159")

3.14159

In [44]:
int("200004")

200004

In [45]:
float("3.04e9")

3040000000.0

### Bad conversions

Python is *very* picky about the conversions it will allow. If your string data cannot be cleanly interpreted as a number, or you try to turn a decimal point number into an integer, Python will produce an error and exit the program. While this might seem annoying, in reality it helps you avoid bugs that would be difficult or impossible to track down.

In [87]:
float("Not a number")

ValueError: could not convert string to float: 'Not a number'

In [88]:
int("100.2")

ValueError: invalid literal for int() with base 10: '100.2'

### If you're interested: type systems
Not all languages are as strict as Python. For example, in JavaScript (the language that powers websites) you can happily add different data types together and it will try to interpret the result as best as it can. For example, in JavaScript, the result of

```javascript
"10" + 2;
```

is "102". These languages are called "weakly typed."

Other languages are even stricter than Python. For example, in C, C++, Java, and many other languages, you must explicitly define the type of a variable when you first define it. In python, if you write 

```python
s = "A string"
```

Python is able to figure out that s should have a string (str) type. In C++ however, you have to explicitly define the variable. If you don't, your program will not run.

```C++
String s = String("A string");
```

Languages like Python are called "dynamically typed", becuase the interpreter can figure out what the types of variables are, allowing the types of variables to change. Languages like C++ are called "statically typed", because the types of variables cannot change, and must be specified when the variable is created.

# String indexing

We often want to select out characters in a string by their position. In Python, we do this with *string indexing*. Let's see an example first

In [34]:
first_char = opening_sentence[0]
second_char = opening_sentence[1]
third_char = opening_sentence[2]
last_char = opening_sentence[-1]
second_to_last_char = opening_sentence[-2]

In [20]:
print(first_char)

I


In [33]:
print(last_char)

.


By putting a number in square brackets [] after the string, you tell python to get the character at that position. Above, we took the first three characters of the string, and assigned them to the variables first_char, second_char, third_char. 

Note that negative numbers can also be used as indexes! These select the nth character from the end of the string (so -1 is the last character, -2 is the second to last character, etc.) 

### Python strings are zero indexed!

One thing that often trips people up: notice that we selected the first char with
```python
openining_sentence[0]
```

Instead of 
```python
openining_sentence[1]
```
In Python (like most, but not all computer languages) we count from zero, instead of one. To be 100% honest, this is done mostly for historical reasons. Other languages, like MATLAB and R start counting from one. Don't get tripped up!

# String slices

Being able to get individual characters is all well and good. But say we wanted to get the first three characters of a string. One way of doing this would be to use string concatenation, and add the strings together

```python
opening_sentence[0] + opening_sentence[1] + opening_sentence[2]
```

But this would get extremely tedious extremely quickly. Thankfully, Python has the ability to create *slices* from strings, which let you select all the characters from a start point to an end point. The syntax looks like

```python
string[start:stop]
```

This syntax is extremely powerful in Python, and is used in many places other than strings, so it's worth going into a little more detail.

Let's see some examples

In [21]:
opening_sentence[0:10]

'It was the'

In [22]:
opening_sentence[10:30]

' best of times, it w'

One point that trips people up: the *start* point of the slice is included, but the *stop* point is not. While this might seem like a confusing convention, it's useful for splitting strings into multiple parts that don't overlap. If we were to re-concatenate the [0:10] slice to the [10:30] slice, we'd end up with the [0:30] slice with no duplicate characters. You can see this below

In [23]:
opening_sentence[0:10] + opening_sentence[10:30]

'It was the best of times, it w'

In [24]:
opening_sentence[0:30]

'It was the best of times, it w'

### String slice shortcuts

Slices are pretty smart. If you don't specify a number for the start or end, Python assumed that the start begins at 0, and the end lies at the end of the string. Thus, to get every character up to character 30, we can just write 

In [26]:
opening_sentence[:30]

'It was the best of times, it w'

And to get every character from 30 to the end, we can write.

In [27]:
opening_sentence[30:]

'as the worst of times.'

The whole string can be selected using a single colon [:]. This might not seem helpful, but it is occasionally useful as we'll see later 

In [29]:
opening_sentence[:]

'It was the best of times, it was the worst of times.'

### Slicing with a step

There's an optional third number available when you create a slice. This number specifies a *step*. If the step is 2, Python will take every other character from the string. If the step is 3, it will take every third character, and so on...

This takes every other character between zero and 10

In [30]:
opening_sentence[0:10:2]

'I a h'

This takes every third character in the entire string (note the colons without numbers to imply the start and end of the string)

In [31]:
opening_sentence[::3]

'Iw ee  m,tat r  m.'

### Negative slicing

Python allows for negative numbers in slices, which act similarly to negative indexing, taking the characters from the end of a string

Let's get the last 10 characters

In [36]:
opening_sentence[-10:-1]

' of times'

You can also use negative numbers as the step of the string, which takes every nth character from the string, starting at the end. One sort of handy trick you can use this for is to quickly reverse the order of characters in a string, by setting the step to -1

In [37]:
opening_sentence[::-1]

'.semit fo tsrow eht saw ti ,semit fo tseb eht saw tI'

# A few excercises

## Excercise 1: 

Write a function (see the Numbers chapter to review functions if necessary) that summarizes a string by taking the first 10 characters and adding an ellipsis "..." to the end. A skeleton of the function is provided in the cell below. When you finish, run the next cell to test your function!

In [75]:
def summary(string):
    return ""

In [77]:
def ends_with_ellipsis(summary_function):
    summary = summary_function("A string to summarize")
    if not summary.endswith("..."):
        print("Looks like you forgot to add an ellipsis \"...\" to the end of your summary!")
        return False
    return True
        
def right_number_of_characters(summary_function):
    summary = summary_function("A string to summarize")
    if not len(summary) == 13:
        print("Are you sure you picked the first 10 characters?")
        return False
    return True

def correct_output(summary_function):
    summary = summary_function("A string to summarize")
    if summary != "A string ...":
        print("Something else went wrong!")
        return False
    return True

try:
    for f in [ends_with_ellipsis, right_number_of_characters, correct_output]:
        if not f(summary):
            raise AssertionError
except NameError as e:
    print("Are you sure that you named the function correctly? It should be called summary")
except AssertionError:
    print("Try again!")

Looks like you forgot to add an ellipsis "..." to the end of your summary!
Try again!


## Excercise 2: 

A palindrome is a sentence that reads the same forwards as backwards, like "A dog, a panic, in a Pagoda" Usually, these take some work to make, but technically you can easily make a palindrome by taking any sentence and sticking it to itself backwards. For example, if you had the string "Hello", you could make a palindrome by appending "olleH" to the end, producing "HelloolleH". Make a function to do this using the skeleton below, and make sure it passes its tests!

In [86]:
def make_palindrome(string):
    return ""

In [89]:
def not_empty_string(test_fn):
    if not test_fn("Makes a palindrome"):
        print("While an empty string is technically a palindrome, that's not the spirit of the excercise!")
        return False
    return True

def is_palindrome_function(test_fn):
    if not test_fn("Makes a palindrome") == test_fn("Makes a palindrome")[::-1]:
        print("Try again: it doesn't look like your output is the same forwards as backwards")
        return False
    return True
    
try:
    for f in [not_empty_string, is_palindrome_function]:
        if not f(make_palindrome):
            raise AssertionError
except NameError as e:
    print("Are you sure that you named the function correctly? It should be called summary")
except AssertionError:
    print("Try again!")
else:
    print("Great job!")


While an empty string is technically a palindrome, that's not the spirit of the excercise!
Try again!
