## Strings

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b4/0321_DNA_Macrostructure.jpg/300px-0321_DNA_Macrostructure.jpg" width="33%">

A [*string*](https://en.wikipedia.org/wiki/String_%28computer_science%29) is a sequence of characters. We might commonly call it *text*.

A *string literal* is a sequence of characters, between single or double quotes, written directly in the source code.

Let's look at some things we can do with strings in Python.

### Valid strings

In [None]:
# Valid strings

print('Single quoted: NYU')
print()

print("Double quoted: NYU")
print()

print('''Triple quoted: NYU has established the Center for
Social Media and Politics, which will examine the production,
flow, and impact of social media content in the political sphere,
as well as support research that uses social media data to study politics.

Photo credit: adamkaz/Getty Images
Will Focus on Production, Flow, and Impact of Content—and Methods
to Use Social Media Data to Study Politics''')

Why choose ' or "?

Think about whether your string has quotes inside it:

In [None]:
s = "Don't use `s` as var name in your labs!'"

s = 'She said "Stop in love\'s name!"'

### Strings as a Sequence

We can "get at" a particular character in a string using the square-bracket `[]` operator. This is called *indexing* the string. The number in the brackets is called the *index*. Recall, programmers like to start counting at 0!

In [None]:
#H e l l o   N e w   Y o r k !
#-----------------------------
#0 1 2 3 4 5 6 7 8 9 1 1 1 1 1
#                    0 1 2 3 4
#-----------------------------
#        (negatives... below)
#1 1 1 1 1 1 9 8 7 6 5 4 3 2 -1
#5 4 3 2 1 0

my_string = "Hello New York!"

print('my_string =', my_string)
print('my_string[0] =', my_string[0])
print('my_string[14] =', my_string[14])
print('my_string[-1] =', my_string[-1])
print('my_string[-2] =', my_string[-2])
print('my_string[-15] =', my_string[-15])

In [None]:
-0

What happens if we use an invalid index?

In [None]:
print('my_string[15] = ', my_string[15])

What will be the result of running the following code?

a) It will print 'H'  
b) It will print 'e'  
c) It will print 'Hello New York!'  
d) We will get an 'IndexError' message

In [None]:
print(my_string[-16])

### Special Characters

"Special" characters don't have a direct typed representation. They are specified using the backslash character followed by their special designation. We say the backslash *escapes* the character we type after it.

In [None]:
print("print a \t (tab)")

In [None]:
# Non-printing characters

# Some characters perform necessary operations but show up as whitespace in
# the output

# Newline - \n
# Tab -     \t

print("This is the first line.\nThis is the second line.\n\tA tabbed line.")

In [None]:
print("aaaa\fform feed")

In [None]:
print("abcdefghijklmnopqrstuvwxyz\rcarriage return")

In [None]:
print("abcdef\v\vghijkl\t\123\xF2\a")

In [None]:
print('''W\nh\na
t
h
a
p
p
ens if we 
    try multiline print
with triple quotes?''')

#### Some special characters:

- \n Newline
- \t Tab
- \\\ Backslash (\\)
- \' Single quote (')
- \" Double quote (")
- \a ASCII Bell (BEL)
- \b ASCII Backspace (BS)
- \f ASCII Formfeed (FF)
- \n ASCII Linefeed (LF)
- \r ASCII Carriage Return (CR)
- \t ASCII Horizontal Tab (TAB)
- \v ASCII Vertical Tab (VT)
- \ooo   ASCII character with octal value ooo
- \xhh    ASCII character with hex value hh

([**ASCII**](https://en.wikipedia.org/wiki/ASCII) stands for *American Standard Code for Information Interchange*: it is our standard way of encoding characters.)

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/USASCII_code_chart.png/500px-USASCII_code_chart.png" width="30%">

In [None]:
print("Helloo\b")

In [None]:
print("\vHello: \xfe")

In [None]:
s = "This is the backslash: \\"
print(s)

### String Are Immutable

That means they can't be changed! (But we can create a new string from an old one, as we will see.)

Let's try to change a string:

In [None]:
kids_name = "Fazzlebop"
kids_name[0] = "D"

### Traversing a string

"To traverse" is to pass through.

We can use a `for` loop to traverse a string:

In [1]:
ramble_rose_lyric = "Goodbye mama and papa"
for char in ramble_rose_lyric:
    print(char, end=",")  # is just like writing `print(c, end='\n')`

G,o,o,d,b,y,e, ,m,a,m,a, ,a,n,d, ,p,a,p,a,

What do we need to use in end=X to get this to the default?

a) can't do it  
b) end="\t"  
c) , s  
d) end="\n"  

What do we put instead of the comma to get tab-separated characters?

a) tab  
b) \tab  
c) \n  
d) \t  

### Transposing a string

"To transpose" means to reverse the elements of some sequence.

How can we transpose a string?

We're going to use the Python built-in function `len()`, which returns the length of a sequence.

In [4]:
lyric = "The transitive nightfall of diamonds"
print("Length of s is", len(lyric))
print("Position of first char ('T') counting backwards is:",
      -len(lyric))
print(lyric[-len(lyric)])

Length of s is 36
Position of first char ('T') counting backwards is: -36
T


In [7]:
for i in range(-1, -len(lyric) - 1, -1):
    print(lyric[i], end="")

sdnomaid fo llafthgin evitisnart ehT

We can turn this into a function:

In [8]:
def reverse_str(s):
    for i in range(-1, -len(s) - 1, -1):
        print(s[i], end="")

def main():
    s = "A man a plan a canal Panama"
    reverse_str(s)
    
main()

amanaP lanac a nalp a nam A

Let's see how to do loop over a string with a `while` loop instead. We need to initialize a loop variable, then increment it in the loop body. So our outline is:

In [12]:
s = "Some string"

for c in s:
    print(c, sep=":", end=" | ")

print("\n\n", "*"*30, "\n")

i = 0
while i < len(s):
    print(i, s[i], sep=":", end=" | ")
    i += 1

S | o | m | e |   | s | t | r | i | n | g | 

 ****************************** 

0:S | 1:o | 2:m | 3:e | 4:  | 5:s | 6:t | 7:r | 8:i | 9:n | 10:g | 

We clearly need to replace `cond_test` with something better! If we want our code to handle strings of *any* length, we should replace it with:

a) 7  
b) len(s)  
c) len(s) - 1  
d) -len(s)

### String Comparisons

The relational operators we studied in our [Boolean Expressions notebook](https://github.com/gcallah/IntroPython/blob/master/notebooks/BooleanExpr.ipynb) can be used on strings:

In [14]:
s1 = "A"
s2 = "B"
s1 < s2

True

In [15]:
'A' < 'a'

True

In [16]:
'9' < 'A'

True

In [17]:
"!" < "&"

True

Strings are compared one characters at a time, and the first place they differ... that's your answer!

When the strings compared are equal up to the length of the shorted string, the longer one will be "greater":

In [18]:
s1 = "Hello"
s2 = "Hello!"
s2 > s1

True

In [19]:
s1 == s2

False

In [20]:
s1 > s2

False

Furthermore, upper case letters are not equal to lower case letters, and "come before" them, so:

In [21]:
'A' == 'a'

False

In [22]:
'A' < 'a'

True

Note how string comparisons can produce a different result than int comparisons:

In [23]:
print("1" < "9")
print(1 < 9)

True
True


In [25]:
"1000000" < "9"

False

Empty string length is 0, so it is always smaller:

In [26]:
print ('' < 'a')

True


Is the following `True` or `False`?

In [None]:
"Hello!" > "hello"

a) True
b) False

In [27]:
"Andy123" == "Andy 123"

False

### The `in` Operator

The `in` operator determines if some subsequence is in a sequence.

In [28]:
"abc" in "abcd"

True

In [29]:
"abd" in "abcd"

False

What sort of value does `in` return?

a) int  
b) None  
c) boolean  
d) float

So, we can use this in an `if` statement:

In [30]:
s1 = "to be"
s2 = "or not to be"

str_to_find = "to not"

if str_to_find in s1:
    print("Found it in s1")
elif str_to_find in s2:
    print("Found it in s2")
else:
    print("String not found")

Found it in s2
