<h2>More on Strings</h2>

In this module, we discuss strings in Python in more details. 

<h3>Indexing and Slicing</h3>

<h4>Indexing</h4>
Strings in Python are considered a sequence of individual characters, and we can access the characters using their indices. Given a string of length n
- Positive indices start from 0 until n-1
- Negative indices start from -n until -1

For example

![image.png](attachment:image.png)



In [None]:
a_string = 'Hello World'

print(a_string[0])
print(a_string[3])
print(a_string[5])    #<-- this is the index of the space
print(a_string[8])
print(a_string[0])
print(a_string[10])
print(a_string[-1])
print(a_string[-3])
print(a_string[-7])
print(a_string[-11])

We can use the len() function to obtain the length of the string, i.e. the number of characters in a string

In [25]:
len(a_string)

11

And we can use a loop to iterate the string:

In [27]:
for pointer in a_string:
    print(pointer)

H
e
l
l
o
 
W
o
r
l
d


Or count the number of a character in a string, for example, the letter 'l'

In [28]:
count = 0

for character in a_string:
    if (character == 'l'):
        count += 1
        
print(count)

3


Can you write a function that count the occurrence of a character in a given string? 

In [42]:
def count_character(char, string):
    count = 0

    for pointer in string:
        if (pointer == char):
            count += 1

    return count

In [47]:
count_character('l','Hello World')

3

In [52]:
user_char = input('Please enter a character: ')
user_string = input('Please enter a string: ')
occurrence = count_character(user_char, user_string)
print('The character "%s" occurs %d time(s) in the string "%s"' % (user_char, occurrence, user_string))

Please enter a character: l
Please enter a string: hello hello world
The character "l" occurs 5 time(s) in the string "hello hello world"


<h4>Slicing</h4>

Slicing a string means to obtain a part of the string. We can slice a string with the syntax $a\_string[start:end]$? We will obtain all characters in the range of index. Similar to the range() function, the item at index $end$ will <b>not</b> be included in the sliced string

We repeat the example of the string 'Hello World'

![image.png](attachment:image.png)

In [72]:
a_string = 'Hello World'

In [61]:
len(a_string)

14

In [62]:
a_string[19]

IndexError: string index out of range

In [64]:
a_string[3:]

'lo World!!!'

In [65]:
a_string[:7]

'Hello W'

In [66]:
a_string[3:9]

'lo Wor'

In [68]:
a_string[3:-2]

'lo World!'

In [69]:
a_string[-10:-4]

'o Worl'

In [73]:
a_string[10:2:-1]

'dlroW ol'

We can also add a $step$ when slicing: $some\_string[start:end:step]$

In [74]:
a_string[2:10:2]

'loWr'

In [None]:
a_string[1:11:3]

In [None]:
a_string[-1:-7:-1]

In [75]:
a_string[::-1]

'dlroW olleH'

<h3> Concatenation </h3>

We discussed this in module 2. Strings can be concatenated with the operator '+'. Concatenation will place all strings next to each other without delimiters - you need to manually add them (for example, a space ' ') if you want your strings to be separated in the result

In [76]:
'Hello' + 'World'

'HelloWorld'

In [77]:
'Hello' + ' ' + 'World'

'Hello World'

In [78]:
string_1 = 'Hello'
string_2 = 'World'
print(string_1 + string_2)
print(string_1 + ' ' + string_2)
print(string_2 + ' ' + string_1)

HelloWorld
Hello World
World Hello


<h3>Strings are Immutable</h3>

Recall, an object is immutable if we cannot change it after it is created. Tuple is an immutable list -- we cannot change items in a tuple after initializing it. Strings are similar, we cannot change the characters in a string after creating it

In [80]:
a_string = 'Hello World'

In [86]:
a_string[4] = 'b'

TypeError: 'str' object does not support item assignment

In [87]:
a_string = 'World Hello'

In [88]:
a_string

'World Hello'

So what happens when you do something like below

In [None]:
string_1 = 'Hello'
string_1 += ' ' + 'World'
print(string_1)

1. After the first statement, string_1 is an reference object that point to the string 'Hello' in the system memory
2. In the 2nd statement, first, the system creates a new string 'Hello World' (which is the result of 'Hello' + ' ' + 'World') in a different memory location. string_1 then points to the new memory location instead of the old one. So, the original string 'Hello' doesn't change, it is simply no longer being referenced by string_1.

<h3>The in Operator</h3>

Similar to a list, we can use the <b>in</b> operator to check whether a string contains another string. The syntax is

<b>string_1 in string_2</b>

- This operation returns True if string_2 does contain string_1 and False otherwise

For example:

In [89]:
'Hello' in 'Hello World'

True

In [90]:
'ell' in 'Hello World'

True

In [95]:
'hello'.lower() in 'Hello World'.lower()

True

In [97]:
'HelLO WoRld'.lower()

'hello world'

There must be an exact match, including cases (lower/upper), for the operation to return True

In [None]:
'hello' in 'Hello World'

In [112]:
'asv a45sgf#as'.islower()

True

<h3>More String Operation</h3>

Python provides many built-in methods for us to work with strings. Remember, they are <b>methods</b> and <b>need to be called from a string object</b>. The general syntax is 

$a\_string.method(<possible\_input>)$

Below is the list of common methods you may want to use

<b>Checking Methods</b>
- isalnum(): returns true if string contains only alphabetic letters or digits and is at least 1 character in length - returns false otherwise
- isalpha(): returns true if the string contains only alphabetic letters and is at least 1 character in length - returns false otherwise
- isdigit(): returns true if the string contains only numeric digits and is at least 1 character in length - returns false otherwise
- islower(): returns true if all the alphabetic letters are lowercase, and contains at least 1 alphabetic characters - returns false otherwise
- isspace(): returns true if the string contains only whitespace characters and is at least 1 character in length - returns false otherwise
- isupper(): returns true if all of the alphabetic letters are uppercase, and the string has at least 1 alphabetic letter - returns false otherwise

<b>Modification Methods</b>
- lower(): returns copy with all characters converted to lowercase
- lstrip(): returns copy with all whitespace characters removed
- lstrip(char): char argument is a string containing a characters - returns copy of the string with all instances of char that appear at the beginning of the string removed
- rstrip(): returns copy with all whitespace characters at the end removed
- rstrip(char): same as lstrip(char) but at the end
- strip(): returns a copy of string with all leading and trailing whitespace characters removed
- strip(char): returns a copy of string with char characters at beginning and end removed
- upper(): returns a coy of the string with all alphabetic letters converted to uppercase.

<b>Search and Replace Methods</b>
- endswith(substring): substring argument is a string - returns true if the string ends with substring
- find(substring): substring argument is a string - returns lowest index in the string where substring is found
- replace(old, new): old and new arguments are both strings - returns a copy of the string with all instances of old replaced by new
- startswith(substring): substring is a string - returns true if the string starts with the substring

<h4> Some Examples</h4>

In [116]:
'Hello World'.find('World')

6

In [None]:
string_1 = 'abc123'    #this string only contain alphabet character and digits
print(string_1.isalnum())
print(string_1.isalpha())
print(string_1.isdigit())

In [None]:
'abc123$#'.isalnum() #this string contains non alphanumeric character - $ and #

In [None]:
print('abc'.isalpha())
print('ABC'.isalpha())

In [None]:
print('12345'.isdigit())
print('12.345'.isdigit())   #the decimal point will NOT be considered digit

In [114]:
print('ABCD'.lower())
print('abcd'.upper())

abcd
ABCD


In [113]:
print('         abcd         '.strip())
print('         abcd         '.rstrip())
print('         abcd         '.lstrip()) #you can check and see the spaces on the right of 'abcd' are still there

abcd
         abcd
abcd         


In [117]:
'Hello World'.replace('l','n')

'Henno Wornd'

In [118]:
'the quick brown fox jumped over the lazy dog'.replace('the','a')

'a quick brown fox jumped over a lazy dog'

<h3>Read and Write Data</h3>

In [None]:
data = [['Alice', 'GA', 25],
        ['Bob', 'NY', 31],
        ['Carol', 'NC', 32],
        ['Dean', 'NY', 21]
       ]

In [None]:
of = open('test.txt','w')
for l in data:
    of.write(l[0] + ' ' + l[1] + ' ' + str(l[2]) +'\n')
of.close()

In [None]:
of = open('test.txt','r')
for line in of:
    print(line.rstrip('\n'))
of.close()

In [None]:
of = open('test.txt','r')
for line in of:
    stpline = line.rstrip('\n')
    name, state, age = stpline.split(' ')
    age_group = 'over 30'
    if int(age) < 30:
        age_group = 'under 30'
    print(name, state, age, age_group)
of.close()