## Strings

Strings are a simple but powerful tool for storing non-numerical information.

String variables store textual data.  To tell Python that we want to treat some text as a string, we use either single quotations `'` or double quotations `"`.  Either format is a valid approach.

In [1]:
fruit = 'banana'

In [2]:
vegetable = "carrot"

Use of single or double quotes can make nested quotations easier.  For instance, what if you want to use an apostrophe inside a string variable?  Consider the apostrophe in:
> This isn't an interesting sentence.

In that case, be sure to use double quotes to mark your string.

In [3]:
sent = "This isn't an interesting sentence." # this code will work

In [4]:
sent = 'This isn't an interesting sentence. # this has problems

SyntaxError: invalid syntax (<ipython-input-4-fcbf08c4a876>, line 1)

Because Python won't be able to differentiate between the `'` character being used as an apostrophe and the `'` character being used to signify a single quote, the easiest way to enter the above example into Python is to use double quotes for the string.


Similarly, use single quotes to mark the string if you want to include double quotes inside the string.  Consider the example:
> Bob said, "Yes, that is a boring sentence."

In this case, the easiest way to do things is to enter this as a string in Python is to use single quotes.

In [6]:
quote = 'Bob said, "Yes, that is a boring sentence."'

### Escape Characters

The trickiest problem with quotations is something like the following example:
> Bob said, "Yes, that's a boring sentence."

Note that the textual data we want to include here has both double quotes and a single quote inside the data.  To enter this text into Python as a string variable, we need to use an **escape character**.

Escape characters are special commands inside of a string variable that tell Python that the data is something special.  Without an escape character inside a string, Python interprets every character literally.  For instance, the literal interpretation of `n` is that it represents the letter "n".  The literal interpretation of `'` is that it represents a single quote.  We can "escape" from this literal interpretation of a character by prefacing that character with `\`.  The `\` symbol tells Python to treat the information following the `\` specially.  In the example here, `\'` will tell Python that the `'` character is to be treated as part of the data, rather than a potential end for a single quoted string.

In [7]:
quote = 'Bob said, "Yes, that\'s a boring sentence."'
print(quote)

Bob said, "Yes, that's a boring sentence."


There are only a handful of escape characters to remember.  The rest can be Googled if you need them.  The escape characters worth remembering are:
- `\'`: tells Python to treat `'` as a single quote (apostrophe) inside the string data
- `\"`: tells Python to treat `"` as a double quote inside the string data
- '\n' : tells Python to enter a line break.
- '\t' : tells Python to enter a tab.

In [9]:
print('Line one of text.\nLine two of text')

Line one of text.
Line two of text


In [10]:
print('\tThis is an indented sentenece.')

	This is an indented sentenece.


<span style="color:red">**Concept check**:</span>  Set the variable, `v`, below equal to a string such that the second line evaluates to (aka prints out) the following:
```
That isn't a smart investment.
You should but Dogecoin instead.
    (just kidding)
```

In [None]:
# complete concept check here
v =
print(v)

### Characters

A string is a collection of characters.  The string `'cat'` is comprised of the individual characters `'c'`, `'a'`, and `'t'`.  The string `'my\ncat'` is made up of the characters `'m'`, `'y'`, `'\n'`, `'c'`, `'a'`, `'t'`.  Note that the third character in this latter example is an escape character `'\n`' rather than an alphanumeric character (the letters `'a'`-`'z'` or numbers `'0'`-`'9'`) or a symbol (e.g. `'!'`, `'@'`, or `'#'`).

Becuase strings are comprised of characters, it is helpful to know how to access individual characters of those strings.  We will work with the examples `'AAPL Stock'` and `'F Stock'` for a moment.  These two strings are items that we may come accross in a written document.  Suppose that we need to extract the ticker names from these strings.  For instance, if these strings are items that we've extracted from a news article or a tweet, then finding the ticker names helps us determine what the subject of the article/tweet is.

The first character of a string is the $0^{th}$ character in the string.  This is a quirk of Python, along with a number of many other programming languages.  Python begins counting at zero, rather than at one.  Hence, the first character is character $0$, the second character is character number $1$, etc.  We can select character number $n$ with square brackets that follow immediately after the string name.

In [11]:
string = 'cat'
print(string[0])

c


In [12]:
print(string[1])

a


Along with selecting a single character with its position in the string, we can choose a subset of characters with a set of sequential numbers.  Such numbers are listed with a starting point and ending point.  For instance, if we want the numbers $1$, $2$, $3$, then we would indicate the starting point of $1$ and the ending point of $4$.  Yes, $4$, *not* $3$.  The ending point is the number *after* the last number to be included.

Consequently, `0:2` would select characters number $0$ and $1$ because the starting point is $0$ and the ending point is $1$.

In [15]:
print(string[0:2])

ca


Similarly, `1:3` selects characters number $1$ and $2$ because the starting point is $1$ and the ending point is $3$.

In [16]:
print(string[1:3])

at


The ability to select a substring from a string is powerful, as we will see when applied to our example strings `'AAPL Stock'` and `'F Stock'`.

### Searching Strings

To find a particular character in a string, we can use the `.find()` function.

For example, `'cat'.find('a')` takes the string `'cat'` and applies the function `.find()` to it.  We specify the character `'a'` inside the function (inside the parentheses) to indicate that we are searching for the character `'a'`.  This command in Python will return the number $1$ because the character `'a'` first appears in the string `'cat'` at position $1$.

In [17]:
'cat'.find('a')

1

Note that `.find()` will give us only the position of the first instance of a desired character.  Suppose that we want to find the position of the character `'o'` in the string `'google'`.  The command `'google.find('o')` will return the number $1$.  It will never give us back the number $2$, even though the character `'o'` also appears in that position in the string `'google'`.

In [18]:
'google'.find('o')

1

In [19]:
'This is a sentence'.find('is')

2

If a given character is *not* found in a string, the `.find()` function will return `-1`.

In [20]:
'google'.find('a')

-1

<span style="color:red">**Concept check**:</span>  Find the position of the ticker `NVDA` in the following string.

In [None]:
# complete concept check here
string = 'Analysts expect earnings for NVDA to rise over the next four quarters.'


### String Concatenation

The way to combine (also known as *concatenating*) two or more strings is to use the `+` character.  Just like `+` combines integers and/or floating point numbers, `+` combines strings.

In [21]:
word1 = 'Apple'
word2 = 'Inc.'
word1 + word2

'AppleInc.'

Notice that `+` does not join together strings with spaces between them.  There are very reasonable applications in which no spacing should occur between strings, and so Python will not assume that you'd like to have spaces between strings.

In the above example, we can safely say that the combined string would look more reasonable with a space between the two words.  To achive this, simply include a `' '` string as a middle string in what becomes a concatenation of three strings.

In [22]:
word1 + ' ' + word2

'Apple Inc.'

Note that tricky issues arrive if we use `+` in circumstances that include both strings and numbers.  Obviously, one can't add strings and numbers together in any sort of algebraic interpretation of the word "add".  Importantly, however, is that Python won't assume that, in the presence of both strings and numbers, a `+` character should be interpreted as concatenation.  To clarify these last two sentences, let's attempt two different lines of code.

First, enter a number plus a string.

In [24]:
8 + '1'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Python prints out an error.  The message, in a nutshell, informs us that we can't use `+` in the presence of an `int` and a `str` variable type.

Second, try things the other way around.  Enter a string plus a number.

In [25]:
'1' + 8

TypeError: can only concatenate str (not "int") to str

Again, Python prints out an error.  This time the message is different.  It warns us that Python can only concatenate `str` variables with other `str` variables (and not `int` variables.

Why do the messages differ?  In the first case, Python sees a number (integer) first.  It then assumes that the `+` character must be addition, and gets confused and raises an error when, after the `+` character, it encounters a string.  In the second case, Python sees a string first.  It then assumes that the `+` character is meant to signify concatenation, and thus becomes confused and raises and error when it sees an integer.

The solution to the problem of combining strings and numbers together is called *type conversion*, and it will be discussed later in this chapter.