# Working with Text Data: Strings

A *string* is a series of one or more characters enclosed in either `'single'` or `"double"` quotation marks. For most strings, it doesn't matter whether you use single or double quotes: 

In [1]:
a = "Time is an illusion, lunchtime doubly so"
b = 'Time is an illusion, lunchtime doubly so'
a == b

True

In [8]:
a = "'Time is an illusion, lunchtime doubly so' said Ford"
b = '"Time is an illusion, lunchtime doubly so" said Ford'
c = ""Time is an illusion, lunchtime doubly so" said Ford" # this line will cause an error!

SyntaxError: invalid syntax (<ipython-input-8-07ba980404d5>, line 3)

What if you need to include both kinds of quotation marks in the same string? In this case, we need to use `\` to *escape* characters: 

In [6]:
d = '"I like the cover" said Arthur, "Don\'t panic!"'
d

'"I like the cover" said Arthur, "Don\'t panic!"'

The `print()` function displays a pleasant, human-readable representation of many `python` objects. 

In [7]:
print(d)
# ---

"I like the cover" said Arthur, "Don't panic!"


## Basic String Manipulations

Python gives us several ways to manipulate strings. An especially important one is *concatenation*, which can be achieved with `+`:

In [10]:
"Ford" +" Prefect"

'Ford Prefect'

We can also do "multiplication." In the context of strings, multiplication means *repetition*: 

In [12]:
"Marvin "*3

'Marvin Marvin Marvin '

Concatenation is a useful tool for constructing strings using variables:

In [16]:
x = "The Hitchiker's "
y = "Guide "
z = "to the Galaxy"
print(x + y + z)

The Hitchiker's Guide to the Galaxy


When we want to form messages involving numbers, we generally need to use the `str()` function to convert those numbers into strings prior to concatenation: 

In [22]:
x = 42
print("The answer to the ultimate question of life, the universe, and everything is " + str(x))
# ---

The answer to the ultimate question of life, the universe, and everything is 42


## String Indexing

Like C++, Python using 0-based indexing. It also supports negative indices to count backwards from the end of a string. 

<figure class="image" style="width:50%">
  <img src="https://cdn.programiz.com/sites/tutorial2program/files/python-list-index.png" alt="The word 'probe' is shown with positive indices (left to right) zero through four and negative indices (left to right) -5 through -1. ">
  <figcaption><i>Illustration of string indexing in Python</i></figcaption>
</figure>




In [37]:
s = "Manchester United"

In [38]:
s[0]

'M'

In [39]:
s[-1]

'd'

Take a moment to predict the output of the following: 

In [40]:
s[2], s[-2]

('n', 'e')

We can easily grab substrings using the `:` operator. `s[start:stop]` will get letters starting at index `start`, up to **and not including** index `stop`. 

In [41]:
s[0:3]

'Man'

In [42]:
s[-4:-2]

'it'

We can also use the syntax `s[start:stop:interval]` to get letters that are `interval` apart. 

In [43]:
# every other letter from indices 0 through 5

s[0:4:2]

'Mn'

In [44]:
# leaving `start` and `stop` blank indices you want to go through the whole string

s[::2]

'Mnhse ntd'

In [45]:
# s backwards
s[::-1]

'detinU retsehcnaM'

An important thing you **can't** do with string indexing is modify the letters in a string. This is because strings are immutable (cannot be changed once instantiated) as discussed in the prior notebook. 

In [47]:
s[0] = "m"
# ---

TypeError: 'str' object does not support item assignment

Later, we'll try out a few *string methods*: functions that allow you to programmatically alter strings, like this: 

In [48]:
s = "Chelsea FC"
new_s = s.replace("Chelsea", "Liverpool")
print(new_s)
print(s is new_s)

Liverpool FC
False


Note as illustrated above that string methods do not change the original string (this isn't possible as strings are immutable!), instead they return a new object.