# Chapter 6: Strings

### A String is a sequence

**String** - a sequence of characters which can be accessed one at a time with the bracket operator

In [2]:
fruit = 'banana'
letter = fruit[1]  #extract the character at index position 1
print(letter)

a


The expression in **[brackets]** is called an **index**, which indicates which character in the sequence.  

In Python, the index is an offset from the beginning of the string, and **the offset of the first letter is zero**.

The value of the index has to be an intger

In [3]:
letter = fruit[0]
print(letter)

b


In [5]:
# returns error if index is not an integer
letter = fruit[1.5]
print(letter)

TypeError: string indices must be integers

-------------------

### Getting length of a string using `len()`

**len()** - built in function that returns the number of the characters in a string:

In [6]:
fruit = 'banana'
len(fruit)

6

In [7]:
# return error because there is no character at the index position of 6
length = len(fruit)
last = fruit[length]

IndexError: string index out of range

In [9]:
# correct code to get the last letter using len()
last = fruit[length - 1]
print(last)

a


In [10]:
# use negative indices to print character from backward
fruit[-1]

'a'

-------------

### Traversal through a string with a loop

**Traversal** - the pattern of processing a string one character at a time: start at the beginning, select each character in turn, do something to it, and continue until the end.

One way to write a traversal is a `while` loop

In [11]:
index = 0
while index < len(fruit) :
    letter = fruit[index]
    print(letter)
    index = index + 1

b
a
n
a
n
a


Another way to write a traversal is with a `for` loop

In [12]:
for char in fruit :
    print(char)

b
a
n
a
n
a


**1. Write a `while` loop that starts at the last character in the string and works its way backwards to the first character in the string, printing each letter on a separate line, except backwards**

In [41]:
fruit = 'banana'
i = len(fruit)
while i > 0 :
    letter = fruit[i-1]
    print(letter)
    i = i - 1
    

a
n
a
n
a
b


------

### String slices

**Slice** - a segment of a string. Selecting a slice is similar to selecting a character

In [42]:
s = 'Monty Python'
print(s[0:5])

Monty


In [43]:
print(s[6:12])

Python


In [44]:
fruit = 'banana'
fruit[:3]

'ban'

In [45]:
fruit[3:]

'ana'

If the first index is greater than or equal to the second the result is an empty string, represented by two quotation marks

In [46]:
fruit = 'banana'
fruit[3:3]

''

An empty string contains no characters and has length 0, but other than that, it is the same as any other string.

**2. Given that fruit is a string, what does fruit[:] mean?**
--> It returns the whole string

In [47]:
fruit[:]

'banana'

-------------------

### Strings are immutable

Strings are **immutable**, which means you can't change an existing string. Instead, we can create a new string that is a variation on the original.

In [49]:
# cannot change a string from 'Hello, world!'' to 'Jello, world!''
greeting = 'Hello, world!'
greeting[0] = 'J'

TypeError: 'str' object does not support item assignment

In [51]:
# new string 'Jello, world!' that is a variation on the original 'Hello, world!'
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
print(new_greeting)

Jello, world!


--------------

### Looping and counting

In [53]:
# count the number of times the letter "a" appears in a string
word = 'banana'
count = 0
for letter in word:
    if letter == 'a' :
        count = count + 1
print(count)

3


**3. Encapsulate this code in a function named count, and generalize it so that it accepts the string and letter as arguments**

In [6]:
def countLetters(str, ch):
    inp = str
    count = 0
    for char in inp:
        if char in ch: 
            count = count + 1
    print(count)
    return count

inp = str(input('Enter a string: '))
char = str(input('Enter a letter that you want to count: '))
number = countLetters(inp, char)

print(char, 'appears', number, 'times in the string that you entered.')

Enter a string: tuy
Enter a letter that you want to count: y
1
y appears 1 times in the string that you entered.


-----

### The `in` operator

**`in`** - a boolean operator that takes two strings and returns `True` if the first appears as a substring in the second:

In [1]:
'a' in 'banana'

True

In [2]:
'seed' in 'banana'

False

----

### The string comparison

The comparison operators work on strings. It can see if the two strings are equal. Other comparison operations are useful for putting words in alphabetical order

Python does not handle uppercase and lowercase letters the same way that people do. All the uppercase letters come before all the lowercase letter.

In [6]:
word = input('Enter a word: ')

if word == 'banana' :
    print('All right, bananas.')
elif word < 'banana' :
    print('Your word, ' + word + ', comes before banana.')
else:
    print('Your word, ' + word + ', comes after banana.')

Enter a word: Pineapple
Your word, Pineapple, comes before banana.


-------------

### String methods

**Object** contains both data (the actual string itself) and methods, which are effectively functions that are built into the object and are available to any instance of the object.

**`dir()`** - lists the methods available for an object. 

**`type()`** - shows the type of an object

**`help()`** - get some simple documentation on a method

In [7]:
stuff = 'Hello world'
type(stuff)

str

In [8]:
dir(stuff)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [9]:
help(str.capitalize)

Help on method_descriptor:

capitalize(self, /)
    Return a capitalized version of the string.
    
    More specifically, make the first character have upper case and the rest lower
    case.



Calling a **method** is similar to calling a function ( it takes arguments and returns a value) but the syntax is different. We call a method by appending the method name to the variable name using the periad as a delimiter.

A method call is called an *invocation*

Method `_.upper()` takes a string and returns a new string with all uppercase letters:

In [10]:
word = 'banana'
# invoke upper on word
new_word = word.upper() #empty parentheses indicate that this method takes no argument.
print(new_word)

BANANA


Method `_.find()` takes a string and find the letter/substrings requested

In [11]:
word = 'banana'
index = word.find('a')
print(index)

1


In [12]:
word.find('na')

2

In [13]:
word.find('na', 3)

4

Method `_strip()` remove white space (spaces/tabs/newlines) from the beginning and end a string.

In [14]:
line = ' Here we go '
line.strip()

'Here we go'

Method `_.startswith()` returns booleans values - requires case to match

In [15]:
line = 'Have a nice day'
line.startswith('h')

False

Method `_.lower()` takes the string and map it all to lowercase

In [16]:
line.lower()

'have a nice day'

We can make multiple method calls in a single expression as long as we are be careful with the order.

**4. Write an invocation with `_.count()` to counts the number of times the letter 'a' occur in 'banana'**

In [17]:
word = 'banana'
word.count('a')

3

------------

### Parsing strings

In [26]:
# we want to pull out substring 'uct.ac.za' from the string
data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'

# find the position of the at-sign in the string
atpos = data.find('@')
print(atpos)

21


In [27]:
# find the position of the first space after the at-sign
sppos = data.find(' ', atpos)
print(sppos)

31


In [28]:
# string slicing to extract the portion of the string which we are looking for
host = data[atpos+1:sppos]
print(host)

uct.ac.za


---

### Format operator

**Format operator, %** - allows us to construct strings, replacing parts of the strings with the data stored in variables.  
+ When applied to integer, % is the modulus operator.  
+ When applied as the first operand for a string, % is the format operator. 

In [29]:
# 2nd operand should be formatted as an integer
# 'd' stands for 'decimals'

camels = 42
'%d' % camels

'42'

A format sequence can appear anywhere in the string, so you can embed a value in a sentence

In [30]:
camels = 42
'I have spotted %d camels.' % camels

'I have spotted 42 camels.'

If there is more than one format in the string, the second argument has to be a **tuple**.  
Each format sequence is matched with an element of the tuple, in order.

In [32]:
# %g formats a floating-point number and %s formats a string

'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')

'In 3 years I have spotted 0.1 camels.'

The number of elements in the tuple must match the number of format sequences in the string. The types of the elements also must match the format sequences

----

### Debugging

`_.startswith()` returns `False` if the string is empty

Another way to safely write the if statement using the *guadian* pattern and make sure the second logical expression is evaluated only where ther is at least one character in the string

----

### Glossary

**counter** A variable used to count something, usually initialized to zero and then incremented.

**empty string** A string with no characters and length 0, represented by two quotation marks.

**format operator** An operator % that takes a format string  and a tuple and generates a string that includes the elements of the tuple formatted as specified by the format string

**format**  A sequence of characters in a format string, like **%d** that specifies how a value should be formatted.

**format string** A str4ing used with the format operator that contains format sequence

**flag** A boolean variable used to indicate whether a condition is true or false

**invocation** A statment that calls a method


**immutable** The property of a sequence whose items cannot be assigned

**index** An integer value used to select an item in a sequence, such as a character in a string

**item** One of the values in a sequence

**method** A function that is associated witrh an object and called using dot notation.

**object** Something a variable can refer to. For now, you can use 'object' and 'value' interchangeably

**search** A pattern of traversal that stops when it finds what it is looking for 

**sequence** An ordered set; that is a set of values where each values is identified by an integer index

**slice** A part of a string specified by a range of indices

**traverse** To iterate through the items in a sequence, performing a similar operation on each

------------

## Exercises

**5. Take the following Python code that stores a string. Use find and string slicing to extract the portion of the string after the colon character and then use the float function to convert the extracted string into a floating number**

In [48]:
string = 'X-DSPAM-Confidence:0.8475'
end = len(string)
colon = string.find(':')
postcolon = string.find(' ', colon)
target = string[colon+1:end]
print(target)

0.8475
