##### <img src="../SDSS-Logo.png" style="display:inline; width:500px" />


# Learning Objectives
1. Understand the string data types in Python.
2. Understanding how to index and slice in Python
 

## strings in Python are used to hold a sequence of characters
### The sequence of characters can be enclosed in single quotes (') or double quotes ("), but the two ends must match


In [1]:
## Strings
"Creating a string"

'Creating a string'

In [2]:
## Single quotes work also
'Another string'

'Another string'

In [None]:
# But the two sides should match
"Not a string'

SyntaxError: unterminated string literal (detected at line 2) (519509423.py, line 2)

: 

## Triple `"` or `'` can be used to create strings.
### This is useful for block strings
### And is frequently used as a docstring to describe what a function does

In [None]:
x = """It is wonderful
to be a TarHeel"""
print(x)

It is wonderful
to be a TarHeel


## You can check the type of a string

In [None]:
# Check string type
type('Yet another string')

str

## Addition operator on strings

You can use the add operator between strings.
This is very useful as it **concatenates** the two strings.

Let's add `'Hello' + 'World'` and assign it to variable y.
Let's make variable z have the value `'Hello ' + 'World'`. Note that compared to y, z has a blank after `Hello`.

Let's print out y and z using the `print` function.

In [None]:
y = "Hello" + "World"
z = "Hello " + "World"

print('y=', y)
print('z=', z)

y= HelloWorld
z= Hello World


 ## Multiplication operator on strings
You can even multiply a string by an integer.
This is very useful if you need to make a string a defined length or copy a string.

Think if you had to make a string of 100 blanks.
What about a thousand blanks!

See what happens when you type `y * 3` in the cell below.

In [None]:
y * 3

'HelloWorldHelloWorldHelloWorld'

## Length of a string
In many cases, it is useful to know how many characters there are in a string. This is called the length of the string. <br>

In python, you can use the `len` built-in function to find the number of characters (or length) of a string.
To read more, click on this link, [`len`](https://www.w3schools.com/python/ref_func_len.asp).  

In the cell below, let us calculate the length of variable `y`.

In [None]:
len(y)

10

# Indexing
### A string is an example of a Python object that is made up of a number of other objects (in this characters) in a specified sequence.
### There are many Python objects like that, as we will see.
### An important thing to learn is how to access pieces of such objects - this comes up again and again.
### This is done using **indexing**.

### The first character in a string has the index 0, the next character has index 1 etc.

Now, what if we want just the first letter of the string "Hello World" which is stored in variable `x`?

The square brackets are the notation used to reference the individual letters of the string.
The integer within the brackets is the **index**.

Remember that tidy computer scientists count from zero, so to reference the first character of a string
use the integer 0.
The first character of a string `x` is `x[0]`.
The second character of a string `x` is `x[1]` and so on.

In [None]:
x = 'My string'
x[0]

'M'

In [None]:
x[1]

'y'

### Since there are `len(x)` characters in the string `x` and we start indexing with 0, the index of the last chatacter is `len(x) - 1`

In [None]:
x[len(x) - 1]

'g'

### What if we use an index larger than that?

In [None]:
x[len(x) + 2]

IndexError: string index out of range

### What if you use a negative index?

In [None]:
x[-1]

'g'

### Surprisingly, negative integers can also index a single character in a string

`x[-1]` is the last character of the string x.
`x[-2]` is the next-to-last character of x. <br>
And so, `x[-len(x)]` is the first character of the string x.
<br>
<br>

In [None]:
print(x[-1])
print(x[-2])
print(x[-3])


print(x[-len(x) + 1])
print(x[-len(x)])

l
e
e
t
I


### Too far negative will also lead to an index error

In [None]:
x[-25]

IndexError: string index out of range

### Putting all this indexing stuff together
To reference a letter of the string, $x$,
use a positive integer that is between 0 and (1 less than the length of the string, $len(x)$). Or to index backwards, use a negative number between -1 and -len(x).
So valid indexes are from $-len(x)$ to $len(x)-1$.  <br><br>
 $ x[n] = \begin{cases}
IndexError, & n \ge len(x),\\   
(n+1)^{th} \space character \space of \space x, & 0 \le n  \le len(x)-1, \\
(len(x) + n +1)^{th} \space character \space of \space x, & -len(x) \le n \le -1,  \\
IndexError, & n < -len(x)\\  
\end{cases}$  <br><br>


# Slicing
### What if you want to access subsets of the string as opposed to just single characters?
### This is where slicing using the indexes comes in.

### To access a portion of the string starting from index `i1` to index `i2-1`, put the two indexes separated by a `:` within the square brackets.

### For example, to access from index 1 to index 3, you will say `x[1:4]`

In [None]:
x = "Consider this string here"
x[1:4]

'ons'

## In general you can slice using `start-index:stop-index:step-size`, where step-size is how much to increment between indexes.

### For example, to get the 1st, 3rd, 5th and 7th characters, use `x[0:8:2]`

In [None]:
x = "Consider this string here"
x[1:8:2]

'osdr'

### You can use negative indexes while slicing, as well as negative step sizes.
### See if you can understand why the output is what you get below.

In [None]:
print(x[-4:-1])

her


In [None]:
print(x[-4::])

here


In [None]:
print(x[-1::-2])

ee nrssh einC


In [None]:
# Reverse a string? Why does this work?
print(x[-1::-1])

ereh gnirts siht redisnoC


# String Methods
## Everything in Python is an object.
### Objects have associated attributes.
### There are two types of attributes to an object:
- Data attributes
- Methods
<br>
<br>

## Strings have a number of [useful methods](https://docs.python.org/3/library/stdtypes.html#string-methods) that can be used for string processing.


### We will not look at all of the string methods here, you can look them as you need them.

Instead, let us look at some simple string problems and see how string methds are useful.

Let us start with defining a string that we will use.

In [None]:
my_string = '''
When Katrina Morgan explains to people that she studies math as a doctoral candidate in the College of Arts & Sciences,
they often respond unfavorably, saying they just “aren’t a math person".
“You wouldn’t just say, “Oh, I’m not a history person,’” Morgan said.
She argues that mathematics is a field where people decide whether they are or are not fit for the field very early in their academic careers.
This is especially true for girls because there is less representation of women in mathematics, Morgan explained.
In 2016, Morgan, along with fellow doctoral student Francesca Bernardi,
set out to change that by founding Girls Talk Math, which invites high school girls from North Carolina
to UNC-Chapel Hill to participate in a two-week day camp that explores mathematical concepts.
'''

### The `find()` method can be used to find a substring within a string. It returns the index of the start of the substring
### if it found, and -1 otherwise.

Let us see if we find some substrings in `my_string`.

In [None]:
substring = 'math'
print(f"Index of {substring} is {my_string.find(substring)}")

Index of math is 57


### The `count()` method counts how many times a given substring occurs within the string.

In [None]:
print(f"{substring} occurs {my_string.count(substring)} times")

math occurs 5 times


## Find the second sentence in `my_string`

- We assume that each sentence ends in a period `.`
- We can use `find()` to the index of the first `.`, `first_index`
- We can use `find()` again to get the index of the second '.', `second_index`
- The second sentence hen stretches from `first_index` to `second_index`.

In [None]:
first_index = my_string.find('.')
second_index = my_string.find('.', first_index + 1)
print(f"Second sentence is {my_string[first_index+1:second_index+1]}")


Second sentence is 
“You wouldn’t just say, “Oh, I’m not a history person,’” Morgan said.


### Hmmm, why are we getting the second sentence on the next line?

In [None]:
my_string[first_index+1]

'\n'

### Aah, it is becasuse the original string has a newline character.

The string method `strip()` can help here

In [None]:
print(f"Second sentence is {my_string[first_index+1:second_index+1].strip()}")

Second sentence is “You wouldn’t just say, “Oh, I’m not a history person,’” Morgan said.
