# STRINGS

Strings are ordered collections of characters which are enclosed in single or double quotation marks. Single letters, words, sentences, paragraphs, emojis, kanji... these are all represented as strings. 

## Declaration

In [13]:
my_string = 'this'
# or
my_string = "this"

## Length
You can use a builtin function called `len` to calculate the length of variables:

In [15]:
len(my_string)

___
## PRACTICE
Look at the contents of `my_string` by both viewing it using `print` or just entering it's name. What is the difference? Why?

In [None]:
my_string = 'first line\nsecond line'

___

## Escaped characters
These characters are called **escaped characters** because the `\` "escapes" / changes the normal meaning of the following letter(s), and they return a special designated string when printed, instead of just their
character contents.

Here are some common string **escape character**:
> NOTE: these work in many / most programming languages!

- `\n` - newline
- `\t` - tab
- `\'` - escaped single quote
  - allows you to use a `'` in the middle of a string which has starting and ending `'` characters without prematurely ending the string declaration
- `\"` - escaped double quote
  - same as escaped single quote but for string declared with `"` starting and ending characters

In [None]:
print('first line\n\ttabbed second line\n\'I can use single quotes if I escape them\'')

You can get unicode characters outside of the standard alphanumeric characters by using
strings starting with "u" to indicate unicode, and `\u<unicode char number>` for unicode symbols references.

[Here are a ton of unicode character code mappings](http://unicode.org/charts/)


In [None]:
ustring = u'I have to get to work, \u2602 or \u2600..\u2639'
print(ustring)

## Multiline Strings
Multiline strings are exactly the same as multiline comments, but you can in fact save their values in variable for later use:

In [2]:
multiline_string = '''
my name is Vivek
and I'm a teacher at General Assembly
'''
# or
multiline_string = """
my name is Vivek
and I'm a teacher at General Assembly
"""

## String Concatenation
Concatenation means adding two values together, in this case, joining to strings end to end.

A string can simply be added together using the `+` operator:

In [4]:
str_1 = 'In the beginning, '
str_2 = 'there was Assembly'
str_3 = str_1 + str_2
print(str_3)

You can also use the `+=` operator as a shorthand way of saying incrementing an existing value by another value. In the case of strings, you end up concatenating a string value to a preexisting on:

In [6]:
str_4 = "PROLOGUE: "
str_4 += str_3
print(str_4)
# vs the longer way
str_5 = "PROLUGUE: "
str_5 = str_5 + str_3
print(str_5)

In order to add non-string variables with strings, you must convert them to strings first using the builtin `str` function:
> NOTE: This is not always the most elegant way to combine variables and string (we'll learn better ways in the **STRING SUBSTITUTION** section below)

In [None]:
# notice the difference in the value printed for these two statements
print(5)
print(str(5))

# now use str method to convert an integer to a string for concatenation
print('Here is a string plus a number: ' + str(5))

___

## Text transformations

String comes pre-packaged with all kinds of handy methods that allow us to transform them. 
> NOTE: Notice that when we talk about calling a function by it's name such as `len` or `str`, we call them **functions** whereas when we use a function with the syntax `<my variable>.<function name>`, we call these **methods**. Methods are simply functions that belong to variables, and are specialized to be used with those variable. All functions are run by using parentheses where we put any required data called **arguments** to the function between the parentheses: `<function name>(<arguments>)`
thods in other types:

In [None]:
sentence = "I aim to become a Python hacker!"
sentence.upper()
sentence.lower()
sentence.capitalize()

Some methods allow you to chain one method after another. It runs the methods in order, outputting the resultant value. Method chaining is also possible with some but not all me

In [None]:
sentence = "i AIm tO BeCoMe a PytHOn HaCkEr!"
sentence.lower().capitalize()


## Stripping

Stripping allows the removal of trailing space or characters around a string using the `strip` method. To remove spaces / characters from left and right sides, you can use `lstrip` or `rstrip`, respectively:

In [None]:
padded = '\t    There are a lot of spaces and tabs around this string\'s contents    \t\t'
padded.strip()
padded.lstrip()
padded.rstrip()

All of the stripping methods optionally take an argument for what kind of character to strip. By default, this character is set to `' '` (a single space), so running `strip()` is the same as running `strip(' ')`

Let's try stripping a non-space character:

In [None]:
num_padded = '00000why are there zeros surrounding this string?0000'
num_padded.strip('0')

## Splitting

The `split` method allows a string can be split into a list of substrings. We have not talked abut list variables yet, but all you need to know for now is that they are a ordered set of comma-separated values inside of square brackets, ex. `['the', 'quick', 'brown', 'fox']`.


`split` also has a default "split-at" value of `' '`. Let's try using the default value and other values:

In [None]:
sentence = "i AIm tO BeCoMe a PytHOn HaCkEr!"

# split at every space (default)
sentence.split()

# split at every 'a'
sentence.split('a')

A very easy way to split a string into a list of characters is using the `list` method, with the string as the argument:


In [None]:
list(sentence)


## Slicing

Python provides a way to get parts of a string by specifying start and stop positions. This is called **slicing**, and later we will see that this syntax works on lists as well.

Slicing takes three positional arguments separated by ':' and placed within square brackets:

- starting index (inclusive) _default value: `0`_
- ending index (exclusive) _default value: the end of the string_
- skipping - how many positions to move forward between each retrieved value, starting at starting index until the ending index is reached _default value: `1`_

> NOTE: string indices, and counting in general in Python and other programming languages starts at `0`. For example, in this string `'hello'` the index of the letter `h` is `0`, not `1`!


In [None]:
sentence[2:5]
sentence[:5]
sentence[5:]
sentence[:]
sentence[::2]


## Searching

There are a few different ways that we can search within a string to see if a substring exists within it.

Let's consider this `paragraph`

In [6]:
paragraph = '''In the works of Tarantino, a predominant concept is the distinction between
ground and figure. In a sense, Sartre suggests the use of rationalism to attack
colonialist perceptions of sexuality. Marx uses the term ‘subcapitalist
materialism’ to denote the rubicon, and subsequent futility, of conceptualist
class.
“Society is used in the service of class divisions,” says Derrida. Thus,
Sargeant[1] suggests that the works of Tarantino are
empowering. The main theme of Finnis’s[2] model of
rationalism is the role of the participant as artist.'''

To simply check for the presence of a substring, we can use the `in` operator:

In [8]:
'Tarantino' in paragraph

To get the index of the first instance of a substring from the left, we can use the `find` method:


In [None]:
paragraph.find('Tarantino')

To do the same from the right, we can use the `rfind` method:


In [None]:
paragraph.rfind('Tarantino')


## String substitution

Strings containing `{}` characters can have those characters replaced with variable value using the `format` string method. Inserting variable into strings in this way is called either **string substitution** or **interpolation**. A string with `{}` values is called a template string as it's the template from which string can be formatted by
inserting variable values.

By the way, _this_ is a more elegant way to combine variables into a string than concatenation and using the `str` function.

> NOTE: The `{}` can contain values that determine the index of the variable to be inserted, or it's name, or other cool magical stuff([see here for more info on these tricks](https://pyformat.info/)). Also, **Python3** supports yet another way to do **interpolation** using [**f-Strings**](https://cito.github.io/blog/f-strings/).

Let's insert some values into a template string by passing several positional values to the `format` method:

In [13]:
print('my name is {} and I am {} years old'.format('Vivek', 27))

We can specify the order that the positional arguments are inserted by putting integers inside the curly braces:

In [15]:
# print('I want to substitute the {1} and {0} value in a different order'.format('second', 'first'))

We can also name the insertion points and pass keyword arguments to `format` to replace named insertion points with their key-ed value:

In [17]:
print('Your name is {name}, today is {day_of_week} and we are in the {class_name} class' \
         .format(name='Sara', day_of_week='Saturday', class_name='Introduction to Python Bootcamp'))

# Breakout Rooms!

___
## PRACTICE
Generate a **multiline** template string in the form of a short written letter that uses the values of the following variables, and sign your nickname in capital letters

- variables: `day_of_week`, `favorite_color`, `nickname`, and `salutation`
- hint: you need to declare these variable before using `format` if you are going to use positional arguments
___

___
## PRACTICE

Unscramble the message in `scrambled`.

- hint 1: the first letter of the hidden message is `y`
- hint 2: the message is embedded in scrambled at regular intervals
___

In [2]:
scrambled = 'T{vSzXyJ^osatoCJuWZOurtwnDi cAIxMFhXkB\\caaoTpLVnvSqIppfe_TYDAc TrJrbamRuPvGPas[rGqWniqfduIahZ_YvVgHxWpVheXDYy_{dsIur{^ L[hQTrt`j^n`voaPKzpF IIOnWodrxpEcjehcThaTcbtoJZ`obH[aSZdEuVfT[eK]fptE TfcCs]t[RX]e\\hxcgtOairYORWcsNNekoB wqp^FmsHRkExGenyRlrGnUc\\pTrtgW{rHheUYOIUJnUREozScYjwH\\ae'

<details>
<summary>Solution</summary>

The first thing to do is start at the left and find the index of the first `y`, which is `6`, then try different skipping values until a meaningful message appears.
If your curious how the message was made, here was the code that generated it:

```python
import random
hidden_message = 'you have managed to decode this sentence'
scrambled = ''
gap = 6
for i in hidden_message:
    for n in range(gap):
        scrambled += chr(random.randint(65, 123))
    scrambled += i

print(scrambled)
```
</details>


___
## PRACTICE

Take the `weird_string` below, remove the trailing spaces on the left and the trailing repetitions of the word 'end' from the right, then split it into words:
___

In [None]:
weird_string = '\n\t   well~isn\'t~this~a~weird~string?endendend'

___
## PRACTICE
Declare a string that prints the following string value (double quotation
marks and new lines included, but you can ignore the ''' characters):
```python
'''
"We may have all come on different ships, but we're in the same boat now."
     - MLK
'''
```
___

___
## PRACTICE
Replace the double quotes in the above string with special character \u2036 (open quotation), and \u2033 (closed
quotation)
___

___
## PRACTICE
Print a string that says `"my name is <name>, and I am <age> years old"` using the `name` and `age` vars
provided below:


In [None]:
name = 'Jerermy'
age = 4


## Replacement

Using the bulitin `replace` string method, we can replace characters or substring in an existing string. If we're clever, we can remove characters and substrings as well using the same method (think about it).

This time, you are going to read about the `replace` method and try to implement it yourself.
[Check out the docs here.](https://docs.python.org/2/library/stdtypes.html#str.replace)

___
## PRACTICE

In `quote`, replace the first 'th' with 'the' and make the string all lowercase other than the first word.
- hint: [another string method...](https://docs.python.org/2/library/stdtypes.html#str.capitalize)

In [10]:
quote = 'In th End, we will remember not the words of our Enemies, but the silence of our Friends.'

___
## PRACTICE

One of the most important things you can learn as any kind of programmer is to learn how to read the documentation for the language and modules / packages that you plan to use. Let's take a look at the Python language documentation to learn more about the built in string method `find` and it's arguments. [Check out the docs here.](https://docs.python.org/2/library/stdtypes.html#str.find)

> NOTE: **functions** take two different kinds of arguments: 
> - **positional arguments**: 
>   - the first few arguments to a **function** need to be input into the function in a specific order. 
>   - For example, you may have a function called `whisperThanYell` that expects two string arguments, first the whispered string (which will be transformed to all lowercase) then second, the yelled string (which will be transformed to all uppercase). 
>   - If you ran `whisperThanYell('I\'ll tell you a secret', 'you better not tell')` you'd get back `"i'll tell you a secret, YOU BETTER NOT TELL"`, but if you ran `whisperThanYell('you better not tell', 'I\'ll tell you a secret')`, `you better not tell, "I'LL TELL YOU A SECRET"`. 
>   - Notice how these are two different outcomes, ie with positional arguments, order matters!
> - **keyword arguments**:
>   - some functions let you pass arguments using `<keyword>=<some value>` notation in the arguments list **AFTER** all of the positional arguments, and these are called keyword arguments
>   - these arguments can go in any order
> - either, both, or neither type of argument can be applicable to a given function, and any one of the arguments in particular can be required or optional

Now, let's find the index of the second `of` occurrence in `paragraph` using `find`:


In [9]:
paragraph = '''In the works of Tarantino, a predominant concept is the distinction between
ground and figure. In a sense, Sartre suggests the use of rationalism to attack
colonialist perceptions of sexuality. Marx uses the term ‘subcapitalist
materialism’ to denote the rubicon, and subsequent futility, of conceptualist
class.
“Society is used in the service of class divisions,” says Derrida. Thus,
Sargeant[1] suggests that the works of Tarantino are
empowering. The main theme of Finnis’s[2] model of
rationalism is the role of the participant as artist.'''