# Strings
Python has tremendous power when dealing with strings.
### What are strings?
Strings are sequences of characters
### So what are characters?
This brings up one of the primary divisions between Python 2 and 3. Python 3 handles characters and thus strings more robustly by defaulting characters to be unicode - an industrial strength character computing standard. Strings are bytes in Python 2. To declare a string you simply put characters between quotes.
### More on Unicode
Before Unicode, one of the first computing standards for character representation was ASCII, a system that represented 128 unique characters using 7 bits. A bit is the smallest unit of information for a computer and is binary (0/1). So 7 bits can encode 2^7 (128) different pieces of information and for ASCII, this information corresponded to characters, numbers and punctuation common to English language speakers.

Since there happen to be many thousands of languages and tens of thousands of different character representations, Unicode was brought into existence to universally cover any possible writing system. Unicode represents each character with 4 **bytes**, and with 8-bits per byte that is 32 bits which can represent 2^32 or over 4 billion unique characters. Each number in Unicode is called a **code point** and represents a single unique character.

In [1]:
# Define a string
my_string = 'my own personal string'

In [2]:
# Define a string with quotes inside. Escape the quote with \
my_string_w_quotes = 'asdfsa\'asdf'
my_string_w_quotes

"asdfsa'asdf"

In [3]:
# both single and double quotes work with python
# a more elegant way to handle quotes inside quotes is to use both double/single quotes
my_string_w_quotes = "my own personal string with an inner quote. it's grand!"
my_string_w_quotes

"my own personal string with an inner quote. it's grand!"

In [4]:
# If you have a bizarre string with both double and single quotes you can go one level up and use triple quotes
my_string_w_2_quote_types = '''My friend said, "I'm only a mediocre pythonista". I got mad! '''

# The interpreter outputted an escaped quote mark
my_string_w_2_quote_types

'My friend said, "I\'m only a mediocre pythonista". I got mad! '

### Print as a function
One other notable change between Python 2 and 3 is that **`print`** is now a function and not a statement. In the last example the output has an escaped character. To view the string in a prettier format with the outer quotes and escaped characters omitted use the print function. 

A function is a reference to reusable stored code that performs a particular task. Function names are always followed by a set of parentheses (when called) and can contain **arguments** which are comma separated values that the function will use. Notebook 5 is dedicated to functions.

In [5]:
# the print function with a single argument
print(my_string_w_2_quote_types)

My friend said, "I'm only a mediocre pythonista". I got mad! 


In [6]:
# Triple quotes can be used as block quotes
block_quote = """
This is
my 
block
quote
y'all
"""

In [7]:
# ugly with newline \n char
block_quote

"\nThis is\nmy \nblock\nquote\ny'all\n"

In [8]:
# pretty
print(block_quote)


This is
my 
block
quote
y'all



## Quick Interlude on Object Oriented Programming
In Python, it is common to see the phrase, "Everything is an object". In short, this means that nearly every Python construct (variables, functions, any value, classes, class instances and even modules) is an object. So, what is an object? For now we can simply define an object as something that has both attributes and methods. An attribute is simply a reference to another object stored under a specific name. A method is a function attached to an object that performs some action and returns a value. [See this post for more info.](https://jeffknupp.com/blog/2013/02/14/drastically-improve-your-python-understanding-pythons-execution-model/)

### Everyday examples of objects
If the above didn't make much sense, it might be easier to think of objects like we do in the real world. For example, lets use a **Person** as an example of an object in Python. Each person object will have both attributes and methods. Examples of attributes (stored values) could be the name (John), occupation (Data Scientist) and address (123 fake street). Methods on the other hand would be an action performed by the person. Examples of methods could be eat, walk, talk, type, etc... Think action verbs when you think of methods and nouns (or adjectives) for attributes.

### Dot Notation
Common to many object oriented programming languages is the dot notation. A dot is placed after the object name and then proceeded by either the attribute or method name. All methods are invoked by a placing a set of parentheses after the method name. Any arguments needed for the processing of the method go inside the parentheses. Below is an extremely simple representation of an object using the dot notation. We will assume that there is an object **person** and explore it's attributes and methods.

```
# assume we have an object named person
# person has attributes and methods
# let's first explore the attributes

>>> person.name
'John'

>>> person.occupation
'Data Scientist'

>>> person.weight
180

# Now let's see how methods are called
# Let's have the person eat by calling the eat method and pass it the argument 'steak'

>>> person.eat('steak')
John ate a steak and now weighs 182.

# Not all methods have arguments
# Let's make the person sleep

>>> person.sleep()
John sleeps for 8 hours
```

### The Takeaway
Since everything is an object, you will be using the dot notation all the time in Python to either get more information about that object (an attribute) or take an action (call a method).

### String Methods
Below you will see a few of the methods (there aren't many attributes for string objects) in action. Take notice how all the methods follow the name of the string and the dot(.) and always have parentheses both with and without arguments.

In [9]:
# Methods are called with object.method(arguments) notation
# test many methods on string
test_string = 'this is a TesT string.'

In [10]:
# Properly capitalize first letter
test_string.capitalize()

'This is a test string.'

In [11]:
#make lowercase
test_string.lower()

'this is a test string.'

In [12]:
#count occurrences of a substring
test_string.count('is')

2

In [13]:
# are all characters alphanumeric?
test_string.isalnum()

False

In [14]:
#split a string by a given character. Default is space. Returns a list
test_string.split()

['this', 'is', 'a', 'TesT', 'string.']

### So how many string methods are there?
There are far too many methods to remember for all the python objects that you will encounter. To get all the methods use the **dir** function.

In [15]:
# use print to make the output shorter and wider and not one long list
print(dir(test_string))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


### Double Underscore Methods
There are actually two underscores before and after the first half of the methods above. These signify **special** methods that are invoked in a different way than the dot notation. The double underscore methods can be invoked by the dot notation but that is not the standard way of accessing them. For instance, the **`__add__`** method alerts the programmer that the plus sign (+) operator will call the **`__add__`** method. Some of these special methods are covered below.


### The Python Data Model
Also, these **dunder** (short for double underscore) methods are fixed and standardized for all Python objects. [The Python Data Model](https://docs.python.org/3/reference/datamodel.html) standardizes the language and gives developers a common framework to quickly give objects a certain 'power'. There are [about 100](https://docs.python.org/3/reference/datamodel.html#special-method-names) of these special methods available to all objects (not all are implemented).

### Advanced: How can you view all the normal string methods?
Let's say we want to print out the methods that are unique only to strings. The dunder methods will need to be filtered out since they are part of the Python data model and can be implemented by all Python objects.

The below code uses something called a list comprehension that will be discussed in detail later. It might not make much sense now.

In [16]:
# print out only the normal methods. This will be explained later if it doesn't make sense now
print([method for method in dir(test_string) if method[0] != '_'])

['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


### How do I find out what these methods do?
Use the help function to see the documentation. Lets find out what the `endswith` method does.

In [17]:
# get documentation for endswith. You can also just add a ? to the method to do the same thing
help(test_string.endswith)

Help on built-in function endswith:

endswith(...) method of builtins.str instance
    S.endswith(suffix[, start[, end]]) -> bool
    
    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.



In [1]:
# endswith has one required argument, the suffix and returns True or False if the string ends in that suffix
test_string = 'yet another TEST StrinG'
test_string.endswith('string')

False

### Finding Attributes and Methods with tab completion
Its difficult to remember the names of all the attributes and methods that each object possess. We can print to the screen all the attributes and methods using the dir function but thats not very efficient.

A better way is to use tab completion. To quickly list and scroll through all the attributes and methods of an object, simply append a dot to an object name and **press tab** and the list will appear.

Place your cursor at the end of the line below and press tab to view a scrollable list of attributes and methods.

In [20]:
test_string.

### Problem 1
<span style="color:green">Find and use the string method that will replace each letter 'a' with 'A'. </span>

In [21]:
replace_string = 'replace each letter a in this string with A'
# your code here

### Chaining methods
Methods will always return another object. Occasionally they will return the object **`None`** which will not output anything to the screen. Many string methods return a string. When a string is returned, it is possible to immediately append a dot and call another string method in succession. 

Take a look at the example below of four successive method calls (chained methods) using the dot notation. The last method, **`count`** returns an integer (which itself has different methods and attributes but our chaining stops there).

In [5]:
test_string = '?!?!?!A HIDDEN TEST STRING??!!?!'
# Lets remove punctuation, make lower case, replace all t's with e and count the instances of e
test_string.strip('?!').lower().replace('t', 'a').count('e')
## Wow that was 4 chained methods ??!!

2

### Chaining on multiple lines
In other languages (like JavaScript), chaining methods is very common and is usually written more clearly and cleanly with one method per line. You can do this in python by wrapping your expression in parentheses. This is because the interpreter expects new lines to be broken up by the backslash character \.

However, these rules don't apply inside parentheses. You may space your syntax however you like inside parentheses.

In [6]:
# using parethenses to put methods on different lines
(test_string
    .strip('?!')
    .lower()
    .replace('t', 'a')
    .count('e'))

2

In [7]:
#  This also works but is a bit uglier
#  Unlike other languages (which ignore whitespace), a new line in python signals a new statement.
test_string \
    .strip('?!') \
    .lower() \
    .replace('t', 'a') \
    .count('e')

2

In [8]:
# We can print out each step of the expression as its evaluated
foo = test_string.strip('?!')
print(foo);

foo = foo.lower()
print(foo)

foo = foo.replace('t', 'a')
print(foo)

foo = foo.count('e')
print(foo)

A HIDDEN TEST STRING
a hidden test string
a hidden aesa saring
2


### Problem 2
<span style="color:green">Strip each letter 'a' from the right side, switch the case of each letter (from lower to upper and from upper to lower) and find the position(aka index) of the first letter 'o' </span>. Use a combination of the methods above. You can check the [string method documentation.](https://docs.python.org/3.5/library/stdtypes.html#string-methods)

In [26]:
test_string = 'aaaa TOO many aaaaaaaaa'
# your code here

### Getting the length of a string: Is it a function, attribute or method?
Obtaining the length of a string is one of the most straightforward pieces of information that you could get from a string. By looking at the above examples you would think that there might be an attribute like...  

`>>>test_string.len`  
But there is no such attribute and this would result in an error. You might then immediately think that maybe getting the length is a method and try...  

`>>>test_string.len()`  
But yet again, this is not a method and would result in an error.

### So how do you get the length of a string?
There is actually the builtin **function** (not method) **`len()`** that returns the string length. A function is not directly attached to any object.

In [9]:
# Getting the length of a string using the len function
test_string = 'yet another test string'
len(test_string)

23

### More on the `len()` function
**`len()`** is a built-in function (Python has about 70 standard built-in functions) and will return the length of any object that has implemented the **`__len__`** special method. If you look above at the list of all the string methods that resulted from the **`dir()`** function you will see **`__len__`**. 

Again, all these methods that have double underscores before and after the method name are **special** and mean that there is another python construct that will be available to the programmer for that object. In this case, when you see that one of the available methods is **`__len__`**, you will know that the `len()` function can be used on that object.

### Can you use the `__len__` method?
An interesting aside is that you can actually use the double underscore methods directly instead of the special python constructs that are built for them, though you would almost never need to do this in practice.

In [28]:
# Using the __len__ method instead of the len() function
test_string.__len__()

23

### remember that `__add__` special method
Since we saw `__add__`, we know that the **+** operator will do something with strings. Using the plus sign is how you concatenate strings in python

In [29]:
'abcde' + 'fghijk' + 'lmnop'

'abcdefghijklmnop'

In [30]:
# you can also directly call the __add__ method
'abcde'.__add__('fghijk',).__add__('lmnop')

'abcdefghijklmnop'

### What happens if you subtract strings?
A Python object must implement the **`__sub__`** special method for the subtraction sign to be meaningful for that object. **`__sub__`** was not in the string method list and will yield an error.

In [31]:
'asdfa' - 'a'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

### Why is there a `__mul__` special method?
Looking back up the notebook, you can see the special method **`__mul__`**. This alerts the programmer that the multiplication operator (`*`) has been implemented. What does it do?

In [32]:
# The string is repeatedly concatenated to itself via multiplication
'some test words | ' * 5

'some test words | some test words | some test words | some test words | some test words | '

### String Interpolation
String interpolation means substituting variables inside of strings. There are a couple ways to do this, including a new more [intuitive way](https://www.python.org/dev/peps/pep-0501/) in python 3.6. But we will go over the most popular one to date. There is much [more to string interpolation.](https://pyformat.info/)

In [33]:
# Put curly braces every place within a string where you want a substitution to occur.
# After closing the string quote, use the.format method 
# with arguments equal to the order that you would like them substituted
name = 'Ted'
occupation = 'data scientist'
salary = 3

worker_info = 'Employee {} is a {} and earns {} dollars per year'.format(name, occupation, salary)
print(worker_info)

Employee Ted is a data scientist and earns 3 dollars per year


### Get substrings of strings
Here is where we first introduce the index operator, **`[ ]`** which is fundamental to scientific computing in Python. The **`[ ]`** operator has the ability to grab item(s) from a sequence and grab them in most any manner you wish. Since strings are sequences of characters the **`[ ]`** operator provides lots of functionality for strings. 

The special method that implements this index operator is the **`__getitem__`** method.

In [11]:
test_string = 'is this precourse too easy?'

In [12]:
# Let's get the 4th character
test_string[4]

'h'

In [13]:
# 'h' appears to be the 5th character and not the 4th. Like most programming languages, 
#  sequences are 0-indexed in python
test_string[0]

'i'

In [14]:
# How to get the last letter of the string?
# First we get the length
last_position = len(test_string) - 1
test_string[last_position]

'?'

In [15]:
# That seems a little cumbersome. Python surely must have an easier way.
# There is an easier way. Python allows indexing from the last element using negative indices starting with -1
test_string[-1]

'?'

In [16]:
# You can keep grabbing items any distance from the end
test_string[-5]

'e'

### Slicing strings
Slicing a string, means to retrieve a subset of the string. There are numerous ways in which this accomplished. The bracket operator **`[ ]`** again is used along with a colon **`:`**

It is best learned by example.

In [17]:
# substrings are easily retrieved by giving the [] operator a starting and ending position separated by a :
# foo[a:b] - slices from position a to b-1. It does not include position b
test_string[4:9]

'his p'

In [18]:
# you can also slice from the end using negative indices
test_string[-5:-1]

'easy'

In [19]:
# You can slice by only giving either a starting or ending index
# With no starting index it is defaulted to 0
test_string[:6]

'is thi'

In [20]:
# slice from the 7th from the last position to the end
test_string[-7:]

'o easy?'

In [21]:
# you can chain together slicing
test_string[5:15][-3:]

'our'

### More formal slicing definition [start : stop : step]
Now that you have seen some string slicing in action. String slicing works within the bracket operator by passing it the starting index, the stopping index and the stepping amount. `foo[4:10:2]` slices from element 4 up to (but not including) element 10 by 2. If the step is not given it is defaulted to 1.

In [22]:
# its also possible to slice every nth letter with the syntax [start:stop:step]
# This slices location 4 to 10 picking up every other character
test_string[4:10:2]

'hsp'

In [23]:
# Very usefully, it is possible to take negative steps. Make sure start is higher than stop this time
test_string[8:3:-2]

'psh'

In [24]:
# And to fully reverse a string simply do not input a start or stop, just a step of -1
test_string[::-1]

'?ysae oot esruocerp siht si'

In [48]:
# error will occur if you try and access an index out of range
test_string[40]

IndexError: string index out of range

In [49]:
# but no error will occur when your index is out of range in a slice
test_string[4:600]

'his precourse too easy?'

### Problem 3
<span style="color:green">Slice this string from index 5 to the end by every 4th element</span>

In [50]:
test_string = 'or is this precourse too difficult?'
# your code here

### Problem 4
<span style="color:green">Get every third element starting from the end to the beginning </span>

In [51]:
# your code here

### Problem 5
<span style="color:green">Use four chained operators on a string of your choice. Look at the methods in the docs or above </span>

In [52]:
# Enter in a string inside the quotes
your_string = ''
# your code here

### What happens if you try and assign a new value to a character or slice of a string?
Since strings are **immutable** (can't be changed once created) so this will cause an error.

In [53]:
test_string[7] = 'z'

TypeError: 'str' object does not support item assignment

In [54]:
test_string[7:20:-1] = 'z'

TypeError: 'str' object does not support item assignment

### Mutable and Immutable Objects
Python objects are either mutable or immutable. Mutable objects can have their value's changed after creation. Immutable objects are those whose values cannot be changed after creation.

Strings, ints, floats, booleans are types of objects that are immutable (unable to be changed after creation).

We will soon learn mutable types like lists, dictionaries and sets.

### Didn't we have some strings from above that were mutated?
In some of the above examples strings were concatenated together to form a new string but the original strings were never changed.

In [55]:
# Concatenation does not mutate the underlying string
a = 'string 1 '
b = 'string 2'
print(a + b)
print(a)
print(b)

string 1 string 2
string 1 
string 2


### Multiline comments with strings
Instead of using the # for commenting out multiple lines of code, Python allows you to use triple quotes that are unassigned to a variable to comment out large blocks of texts. This is actually standard practice for writing documentation in methods that you write and are intuitively named 'docstrings'.

In [56]:
"""
This area can be
used as a multiline comment since
the normal comment character # does not allow for this.

This multiline comment is especially important when writing docstrings for functions
"""
foo = "executing a string assignment"
# no output

### Simple test whether a string contains a substring

In [57]:
foo = "executing a string assignment"

In [58]:
# you can use the index method, which gives you the position of the substring if found
foo.index('cut')

3

In [59]:
# or if you just need a boolean returned you use the keyword in
'cut' in foo

True

In [60]:
# can also use keyword not to return the opposite
'foo' not in foo

True

## Congrats on finishing notebook 2!
Move on to notebook 3! The pre-course is mandatory so make sure you finish it all!