# Data Types: Strings

Think about the type of data that you're likely to see as a data scientist. Does it only contain numbers? Of course not! Much of the data you'll see comes in the form of *text*. Most programming languages have a special data type for storing text, called **strings**. In this section, we'll move beyond using Python as a simple calculator and use it to do some simple text analysis.

# Creating Strings

To create a string in Python, write some text surrounded by quotation marks: `"`:

In [None]:
"Word"

In [None]:
"More than one word, and punc-tu-a-tion!"

In [None]:
# not a number, since we're using quotes!
"3"

Notice that Python displays our string with single quotes around it -- this is its way of letting us know that it is indeed a string.
As before, we can use the `type` function to what type Python believes these pieces of data to be:

In [None]:
type("Word")

In [None]:
type("3")

Of course, `str` is short for "string".

As with numbers, we can use variables to store strings:

In [None]:
s = "Hello world!"
s

If you don't like double-quotes, you're in luck: you can write strings using single quote marks just the same:

In [None]:
'Word'

In [None]:
'More than one word, and punc-tu-a-tion!'

Unlike in other languages, there is no difference between using single quotes and double quotes in Python.

What happens if you don't put anything inside of the quotes?

In [None]:
''

This is fine! We call it an *empty string*.

## More on `'` vs `"`

We said above that there is no real difference between using single quotes and double quotes to create strings, but there are instances where one is preferable over the other.

For instance, what if we wanted to turn the following piece of text into a string?

> Javascript is a "real" programming language.

Notice that the text itself includes quotation marks. If we try to wrap the whole piece of text with double-quotes, Python will get upset:

In [None]:
"Javascript is a "real" programming language.

When Python sees the first `"`, it thinks to itself: OK, this is a string. It continues reading until it find the second `"`, right before `real`. It then thinks the string is over -- but it isn't.

To avoid confusing Python, we can instead use single-quotes to delineate the string:

In [None]:
'Javascript is a "real" programming language'

In this case, single-quotes were preferable, but that's not always the case. For instance, suppose we want to represent the string:

> Python: a data scientist's best friend.

We can't use single quotes here because of the apostrophe in "scientist's". So we surround the string with double quotes:

In [None]:
"Python: a data scientist's best friend"

There is another way we can handle strings containing single- and double-quotes: We can "escape" the character by prefixing it with a backslash `\`.  This will tell Python to treat the character differently than it normally would -- in this case, it tells Python not to end the string.
This is very helpful when both single quotes *and* double quotes appear in the string!

In [None]:
'They said, "escaping isn\'t so bad," and I believe them!'

#### Combining strings

Interestingly, we can add strings together using the addition operator `+`.  This basically glues the strings together, and is commonly called **concatenation**.

In [None]:
"one fish" + "two fish" + "three fish"

**Question**:
 Given the following variables, write an expression that concatenates the two strings and adds a space in between. The output should be `'red fish blue fish'`

```
string1 = "red fish"
string2 = "blue fish"
```


<details><summary><b>Answer</b>:</summary>```
string1 + ' ' + string2
```</details>

#### String methods

Finally, once you create a string, that string posesses an extra set of functions that are unique to strings.

```{margin}
When a specific type of object has its own set of functions, we call those functions 'methods'.  You'll see in the next chapter that more complex data types usually have lots of methods!
```

Methods can be called directly on a string, or the variable name of a string. The following methods return the string they're called on, but with different capitalization -- they can be very useful for when you're cleaning data.

In [None]:
my_string = "JuSt A sTrInG"

In [None]:
my_string.lower()

In [None]:
my_string.upper()

In [None]:
my_string.title()

Notice that a method is accessed by placing a dot after the string, and then calling the function name.

```{margin}
This is commonly called **dot notation**. It indicates that the whatever comes after the dot *belongs* to the object before the dot.
```

The `replace` method is extremely powerful, since it allows us to find and replace sections of a string. The previous string methods we looked at took no arguments, but the `replace` methods takes two arguments: *the text to find*, and *the text to replace it with*.

In [None]:
'found you'.replace('you', 'Waldo')

Remember the empty string `''`? It's used a lot with `replace` in order to get rid of parts of text entirely! Notice that the text must match *exactly*, and is case sensitive!

In [None]:
'where\'s Waldo'.replace('w', '')

Since the string methods we've looked at return more strings, we can even call more string methods on the result!

In [None]:
s = 'started with words'
t = s.replace('started', 'ended')
u = t.replace('words', 'a sentence')
v = u.capitalize()
w = v + '.'
w

### True/False - Booleans

A **Boolean** (named after [George Boole](https://en.wikipedia.org/wiki/Boolean_algebra)) is a logical data type, indicating whether something is True or False. It has the type `bool`.

In [None]:
type(True)

#### Comparisons

Boolean values result when we use comparison operators to compare the value of two expressions.

The standard set of comparisons operators carries over from math,
- Less than `<`
- Less or equal `<=`
- Greater than `>`
- Greater or equal `>=`
- Equal `==`
- Not equal `!=`

Notice that the equal comparison operator distinguishes itself from the assignment operator by using *two* equal signs.

In [None]:
1 == 0

Any expressions to the left or right of the comparison operator will be evaluated before the comparison is carried out.

In [None]:
(3 * 4) / 6 < 1 + 2 + 3 + 4

And if there are multiple comparison operators in an expression, then each comparison must evaluate to True in order for the entire expression to be True.

In [None]:
1 < 0 + 2 < 3

In [None]:
1 != 3 <= 2

Since `3 <= 2` is False, the above expression evaluates to False.

We can use comparison operators on all sorts of things! For example, we can use the `==` and `!=` to check if *any* objects are equal in value.

In [None]:
'Ronaldo' == 'Waldo'

In [None]:
True != False

In [None]:
sum == sum

And many objects support greater than/less than comparison too. For instance, a string is less than or greater than another string based on alphabetical order.

```{margin}
Technically, string comparisons compare using **lexicographical** order, which just means that text including numbers and symbols is also ordered.
```

In [None]:
"Avocado" < "Banana" < "Cantaloupe"

In [None]:
"1. Learn about Python" < "2. Learn about data science" < "3. Profit"

Notice that if you look at a dictionary, words like "Fire" will show up before "Fireplace" -- the same holds true with string comparisons in Python.

In [None]:
"Fire" < "Fireplace" < "Fireplaces"

## A new type of error -- TypeErrors

We saw that we can use the addition operator to add two strings. But what happens if we try to add a number and a string?

In [None]:
1 + '2'

We've successfully stumbled on another very common error, the `TypeError`. This happens when we try to use an operator on a type of object that doesn't support it!

Another type error (with a different explanation) arises if we try switching the order of the string and number.

In [None]:
'2' + 1

Or if we try to use an operator that strings don't understand, like subtraction.

In [None]:
'2' - '2'

Sometimes we'll get a type error when using a function call on an object of the wrong type, too. For example, if we try using a math function on a string.

In [None]:
import math
math.log('2')

**Tip**

Notice that the explanation that follows the `TypeError` at the very bottom will almost always explain the data types of your operands -- which is helpful to figure out exactly what variables or values lead to the code breaking!

## Other data types

Once we move past primitives, our objects start getting more complex!

Remember, everything has a type. We saw at the beginning of this page that even functions are a type of object.

Besides functions, the most common types of objects we'll observe will act mostly like containers for raw data. Examples of these container-like data types include lists, arrays, and tables -- all of which are extremely useful and you'll soon know!

---
## Summary

Everything in Python has a type -- these are called **data types**.

We can find the type of an object by calling `type` function on an object or expression.

There are four **primitive** types that represent raw data:
- **Integers** `int` are whole numbers
- **Floats** `float` are numbers with decimals
- **Strings** `str` are text
- **Booleans** `bool` are True/False

When faced with division or an expression that involves any floats, the end result will be a float.

Multiple strings can be glued together using `+`.

Strings own a handful of **methods** -- functions that belong solely to the data type of strings.

Methods are called using **dot notation**, by placing a dot after a string or variable name of a string, then calling the function: `my_string.function_name(arguments, ...)`

Some string methods allow you to create new strings that change capitalization or find and replace snippets of text.

Lots of objects can be compared using **comparison operators**, `<` `<=` `>` `>=` `==` `!=`, which will return a boolean value.

Trying to perform an operation on data types that don't support that operation will often result in a **TypeError**.

Most other data types that aren't primitive are either functions or act like containers for primitive types.