# String and String Methods

![](static/python_strings.png)


## What you'll learn in today's lesson (learning goals)

1. What is the string data type.
1. String properties.
1. Immutable data types.
1. How you create a string.
1. Common special characters.
1. String templates.
1. f strings
1. Common string "gotchas"

## What is the String Data Type?

### The String Data Type is:
- A fundamental Python data type
    - Cannot be broken into smaller values of different type
- Abbreviated to `str` in Python:

In [1]:
print(type("Hello There"))

<class 'str'>


![](https://media.giphy.com/media/xTiIzJSKB4l7xTouE8/giphy.gif)

### Strings have a length to them

In [2]:
len("Hello There")

11

In [3]:
# Variables that reference a string can be passed into `len` as well:
hello = 'Hello There'
len(hello)

11

### Characters 'Strung' together
- Contains **characters**: individual letters or symbols.
- Characters in a string appear sequentially (they have a specific order to them).
    - Character mappings such as [ASCII](https://en.wikipedia.org/wiki/ASCII) and [Unicode](https://en.wikipedia.org/wiki/Unicode) are used to specify what character value translates to a human readable letter, symbol, or command.

In [4]:
hello = 'hello there'
print(f'"{hello}"\n')

# this will print out in a nice way each character of the
# string and it's associated index
for index, char in enumerate(hello):
    print(f'{index:02} -> "{char}"')

"hello there"

00 -> "h"
01 -> "e"
02 -> "l"
03 -> "l"
04 -> "o"
05 -> " "
06 -> "t"
07 -> "h"
08 -> "e"
09 -> "r"
10 -> "e"


### Creating Strings in Python

- **String Literals:** strings that are directly written within your Python code.
- Created using various quotation **delimiters** (a sequence of one or more characters specifying a boundary)

In [5]:
single_quote = 'Did you ever hear the tradegy of Darth Plageius the Wise?'
print(single_quote)
double_quote = "I thought not.  It's not a story the Jedi would tell you."
print(double_quote)

Did you ever hear the tradegy of Darth Plageius the Wise?
I thought not.  It's not a story the Jedi would tell you.


In [6]:
# Once a string delimeter is first seen, other delimeters can be used: 

vader_quote = '"I find your lack of faith disturbing." — Darth Vader'
print(vader_quote)

"I find your lack of faith disturbing." — Darth Vader


In [7]:
# GOTCHA: Be careful about using the delimeter within the string
#     you want to create
luke_quote = 'I'll never turn to the dark side.'

SyntaxError: invalid syntax (<ipython-input-7-83f91cc87a5e>, line 3)

### Multi-Lined Strings

In [8]:
# you can use triple single or double quotes to creat a string:
triple_single_quote = '''The dark side of the Force is a pathway to many abilities some consider to be unnatural.'''
print(triple_single_quote)
triple_double_quote = """The dark side of the Force is a pathway to many abilities some consider to be unnatural."""
print(triple_double_quote)

The dark side of the Force is a pathway to many abilities some consider to be unnatural.
The dark side of the Force is a pathway to many abilities some consider to be unnatural.


In [9]:
# Triple quotes can span multiple lines but the newlines
# are rendered
long_string = """The dark side of the Force is a pathway 
to many abilities some consider to be unnatural.
"""
print(long_string)

The dark side of the Force is a pathway 
to many abilities some consider to be unnatural.



In [10]:
# white spaced is preserved when using triple quotes
print("""The dark side of the Force is a pathway 
    to many abilities some consider to be unnatural.
""")

The dark side of the Force is a pathway 
    to many abilities some consider to be unnatural.



In [11]:
# If you have a really long string, but don't want newlines the use the
# backslash \
print("The dark side of the Force is a pathway \
to many abilities some consider to be unnatural.")

The dark side of the Force is a pathway to many abilities some consider to be unnatural.


In [12]:
# GOTCHA: Make sure there isn't ANYTHING (including comments) 
#     after the backslash as that will cause an error
print("The dark side of the Force is a pathway \ 
to many ablities some consider to be unnatural")

SyntaxError: EOL while scanning string literal (<ipython-input-12-6f9de0d79e5a>, line 3)

### In-Class Exercises:

- Print a string that uses double quotation marks **inside** the string.
- Print a string that uses an apostrophe **inside** the string.
- Print a string that spans multiple lines, with white space preserved.
- Print a string that is coded on multiple lines but displays on a single line.

## Working with Strings

Now that we've learned how to create strings, lets examine some of the more common things we can do with them.

- Concatenation
- Indexing & Slicing
- String Immutability
- Checking if a string is within another string

### Concatenation

In [13]:
# You can concatenate (combine) multiple lines together using the + operator
leia = "I love you."
han = "I know."
print(leia + han)

I love you.I know.


In [14]:
# the previous example looks a bit weird, because when concatenating strings
# nothing else is added, so we have to manually add a space to it
print(leia + ' ' + han)

I love you. I know.


![](https://media.giphy.com/media/e6e1P3wC6xkYg/giphy.gif)

### Indexing & Slicing

- strings are sequences which means that they can be indexed and sliced
- **string index:** the numerical location of a character within a string
    - Indexing with all sequence types in Python starts at 0
- **string silce:** a sub-string extracted from a string

In [15]:
# To view the value of a charater in a string, you must index
# into the string using open and close square brackets
# following the string or variable
ship = "Mellinnium Falcon"
print(ship[0])
print("Mellinnium Falcon"[0])

M
M


In [16]:
# slicing is done by using inserting a colon (:) in between the
# start and stop index to extract out the substring.  Notice the
# stop index isn't extracted.
print(ship[0:10])
ship[10]

Mellinnium


' '

In [17]:
# slicing doesn't need a start and stop.  Empty assume the first
# or last index depending on the situation
print(ship[:10])

Mellinnium


In [18]:
ship[10:]

' Falcon'

### Strings are Immutable

- **immutability:** Unable to change the variable's associated value

In [19]:
ship = 'Mellinnium Falcon'
ship + ' is the fastest in the galaxy'

'Mellinnium Falcon is the fastest in the galaxy'

In [20]:
ship

'Mellinnium Falcon'

In [21]:
# You also can't change individual values of the string
ship[0] = 'S'

TypeError: 'str' object does not support item assignment

![](https://media.giphy.com/media/3ornjSL2sBcPflIDiU/giphy.gif)

In [22]:
# you need to create a new variable (or overwrite the previous)
new_ship = "S" + ship[1:]
new_ship

'Sellinnium Falcon'

### the `in` operator

- the `in` operater checks to see if a value on the left is in the sequence on the right
- returns `True` or `False`

In [23]:
jedi_masters = "Obi-Wan Kenobi, Yoda, Qui-Gon Gin"
'Anakin' in jedi_masters

False

![](https://3.bp.blogspot.com/-BJnqTMTBL6c/VrKYilX6R1I/AAAAAAAABs0/kSJ74wzE9ns/s1600/anakin%2Bcry.gif)

In [24]:
council_members = "Anakin, Obi-Wan Kenobi, Yoda, Qui-Gon Gin"
'Anakin' in council_members

True

![](https://i.pinimg.com/originals/e6/74/49/e6744964a17014c19a122fb002d2a2ca.gif)

### In-Class Exercise

- Create a string and print its length
- Create two strings, concatenate them, and print the result
- Create two string and use concatentation to add a space in-between them and print the result
- Print the sub-string 'nerf' from the string 'nerf herder'

## String Methods

Strings are objects and as such have a collection of special functions called **string methods** that are used for working with and manipulating strings.  With these methods, we can:

- Changing the strings case
- Trim trailing and leading whitespace
- Find the location of a substring
- Replace sub-strings within a string

### Changing case

In [25]:
# You can change the case of all the character in a string by using
# the lower and upper methods

jar_jar = "Jar Jar Binks"
print(jar_jar.lower())

jar jar binks


<img src="https://www.washingtonpost.com/resizer/H-GULNtP2kkg_tlUMfsmUzGhNLU=/1484x0/arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/CKVFFZE5IU3H3BHCABPAB4AQ5I.jpg" width="400" />

In [26]:
print(jar_jar.upper())

JAR JAR BINKS


<img src=https://static.independent.co.uk/s3fs-public/thumbnails/image/2015/10/26/12/Jar-Jar-Star-Wars.jpg?w968 width="400" />

### Trimming Leading and Trailing Whitespace

Trimming whitespace means removing spacing characters from either the front or the back of the string.  This can be done using one of three methods.

In [27]:
# you can trim whitespaces from the end of a string 
# by using the string, rstrip, and lstrip methods
print("   NOOOOO OOOOOOOO!    ".strip())

NOOOOO OOOOOOOO!


In [28]:
print("   NOOOOO OOOOOOOO!    ".lstrip())

NOOOOO OOOOOOOO!    


In [29]:
print("   NOOOOOOOOOOOOO!    ".rstrip())

   NOOOOOOOOOOOOO!


### Finding a sub-string

In [30]:
# you can find the starting index of a sub-string by using
# the find method
who_talks = "Who talks first? You talk first? I talk first."

In [31]:
# notice that this only finds the first instance of that
# substring
talk_location = who_talks.find('talk')
print(talk_location)

4


In [32]:
print(who_talks)

# this will print a carrot directly underneath the selected string character
# location found from .find
print(' ' * talk_location + '^')

Who talks first? You talk first? I talk first.
    ^


In [33]:
# we can find further instances by passing in the
# starting position of the string where to start
talk_location = who_talks.find('talk', 5)
print(talk_location)

21


In [34]:
print(who_talks)
print(' ' * talk_location + '^')

Who talks first? You talk first? I talk first.
                     ^


In [35]:
talk_location = who_talks.find('talk', 22)
print(talk_location)

35


In [36]:
print(who_talks)
print(' ' * talk_location + '^')

Who talks first? You talk first? I talk first.
                                   ^


In [37]:
# until we get to a point where there is no 
print(who_talks.find('talk', 36))

-1


In [38]:
# this is also returned when the sub-string isn't found
# in the string at all
print(who_talks.find('meh'))

-1


### Replacing sub-strings

In [39]:
sith_lords = 'Sidius, Duku'
print(sith_lords.replace('Duku', 'Vader'))

Sidius, Vader


In [40]:
print(sith_lords)

Sidius, Duku


In [41]:
sith_lords = sith_lords.replace('Duku', 'Vader')
print(sith_lords)

Sidius, Vader


In [66]:
# replace can be used to 'remove' substrings
troubled_anakin = "Not just the men; but the women and children too!"
print(troubled_anakin.replace('men', ''))

Not just the ; but the wo and children too!


In [68]:
# You can actually chain these as well
print(troubled_anakin.replace('women', '').replace('men', '').replace('children', ''))

Not just the ; but the  and  too!


In [69]:
# GOTCHA: Beware of order of replacement too
print(troubled_anakin.replace('men', '').replace('women', '').replace('children', ''))

Not just the ; but the wo and  too!


### In-Class Exercise

- Write a script that converts the following strings to lower case:
    - Jalin
    - Obi-Wan
    - Darth Maul
    - Rex
- Write a script that converts the previous string to upper case
- Write a script that remove whitespace from the following strings:
    - `"      Ezra Bridger is my apprentice!     "`
- Write a script that replaces 'Han' with 'Ben', and 'shot' with 'stabbed' in the following string:
    - `"Han Solo shot first"`
- Write a script that finds the location of the string 'Night' within the following string.  Then print out the start and end index of that string along with the actual substring:
    - `"Dathomir is the home planet of the Night Sisters"`

## Formatting String

Formatting strings allows us to create strings in such a way where we 'insert' variable into the string.  For example, let's say that I wanted to log R2-D2's status like this:

```text
0001 - Task 1 - Start - Initiate Connection to Computer
0002 - Task 2 - Complete - Set urgency level - High
0006 - Task 1 - Complete - Connection to computer established
0006 - Task 3 - Start - Initiate shutdown of garbage compactor
0020 - Task 3 - Complete - Garbage compactor shutdown
```

In [42]:
# printing this out is laborious
print("0001 - Task 1 - Start - Initiate Connection to Computer")
print("0002 - Task 2 - Complete - Set urgency level - High")
print("0006 - Task 1 - Complete - Connection to computer established")
print("0006 - Task 3 - Start - Initiate shutdown of garbage compactor")
print("0020 - Task 3 - Complete - Garbage compactor shutdown")

0001 - Task 1 - Start - Initiate Connection to Computer
0002 - Task 2 - Complete - Set urgency level - High
0006 - Task 1 - Complete - Connection to computer established
0006 - Task 3 - Start - Initiate shutdown of garbage compactor
0020 - Task 3 - Complete - Garbage compactor shutdown


Printing out each line become very laborious AND most likely we're using variable such as `time`, `task_num`, `task_status`, and `message`.  Let's break down the message formatting (*wink, wink*):

```text
<Time> - Task <Task Num> - <Task Status> - <message>
```

Let's examine the different methods of doing this with these specific variable/value pairs:

In [56]:
time = 0
task_num = 1
task_status = "Started"
message = "Initiate Connection to Computer"

### percent/modulo `%` method

Don't use this method, but I'm putting it down here for historical background.

In [57]:
# the % operator basically "unpacks" a tuple of values from the right side
# into the string on the left side.  Note: this is very printf`y which is how
# other languages like C do string formatting.
print("%d - Task %d - %s - %s" % (time, task_num, task_status, message))

0 - Task 1 - Started - Initiate Connection to Computer


A 'type' reference is required within the string to unpack that value.  Keep on eye on the types that your passing in.  The most common ones are:

| symbol | type reference |
| ------ | -------------- |
| d      | integer        |
| f      | float          |
| s      | string         |
| x      | hex            |

In [58]:
print('%d - Task %d - %d - %d' % (time, task_num, task_status, message))

TypeError: %d format: a number is required, not str

In [59]:
# the formatting previously isn't quite what we wanted though so we need to 
# make one more adjustment
print("%04d - Task %d - %s - %s" % (time, task_num, task_status, message))

0000 - Task 1 - Started - Initiate Connection to Computer


The `%04d` means to unpack an integer value (`d`) with a max string length of 4 character wide and filler character of 0 (`04`).

### `.format` method

The format method builds on the `%` method to make things a little more clearer to the reader what was going on.  In stead of `%<type>` you simply use open close curly brackets (`{}`) to indicate location of variable insertion.  Use this method over using `%` method.

In [60]:
print("{:04} - Task {} - {} - {}".format(time, task_num, task_status, message))

0000 - Task 1 - Started - Initiate Connection to Computer


In [61]:
# you can assign names to the string locations and reference them in the
# .format method
print("{time:04} - Task {num} - {status} - {message}".format(
    time=time,
    status=task_status,
    num=task_num,
    message=message
))

0000 - Task 1 - Started - Initiate Connection to Computer


In [62]:
# one advantage of this, is if you're using the same string format over and
# over, then you can save the string as a variable and format it later
r2_log_entry = "{time:04} - Task {num} - {status} - {message}"
print(r2_log_entry.format(
    time=time,
    status=task_status,
    num=task_num,
    message=message
))

0000 - Task 1 - Started - Initiate Connection to Computer


### f strings

f strings are new to Python (3.6+) and therefore not available on all systems.  However, this one build on the `.format` method and removes the need to call `.format` with named locations and instead uses current variable in use:

In [63]:
print(f'{time:04} - Task {task_num} - {task_status} - {message}')

0000 - Task 1 - Started - Initiate Connection to Computer


In [65]:
# updating variables to show the changes
time = 2
task_status = 'Complete'
task_num = 2
message = 'Set urgency level - High'
print(f'{time:04} - Task {task_num} - {task_status} - {message}')

0002 - Task 2 - Complete - Set urgency level - High


### In-Class Exercise

- Using the a formatted string, create the following template and store it in a variable
    - `"The <ship> can make the Kessel Run in less than <distance> parsecs"`
- Using the variable storing the formatted string and the `.format` method, insert the ship name "Mellinnium Falcon" and distance of 12
- Using an fstring and the same template above, insert the ship name "Ghost" and distance of 20