![DSB Logo](img/Dolan.jpg)
# Python Data Types: Strings
## PY4E Chapter 6
### How data are stored and processed in Python

# String is a Sequence

- As we seen before, strings are sequences
    - a sequence is a collection of elements
    - also it is directional, meaning you can _index_ elements from it
        - if a string `my_str`, you can use `[]` to index it
        - notice that in Python (and many other languages), index always starts with `0`
        - all indices need to of _integer_ type
        - but you can use _values_, _variables_, _expressions_ and/or _operators_

In [5]:
my_str = 'Hello Python!'
my_str[0]

'H'

In [6]:
# this will throw an error
my_str[1.5]

TypeError: string indices must be integers

In [1]:
b = 0
my_str[b = 0]

SyntaxError: invalid syntax (<ipython-input-1-3268d19e05a1>, line 2)

In [4]:
my_str[b+1]

'e'

# Getting the Length of a String

- For the sequence type (string included), Python provides a built-in function `len()` to return its length
    - so you don't have to do a `for` loop for this
    - `len()` is a very powerful and useful function in most sequence types
    - you can always use `len()` to get the last element from a sequence
        - note that string starts with index `0`

In [5]:
len(my_str)

13

In [6]:
# this will throw an error
my_str[len(my_str)]

IndexError: string index out of range

In [7]:
# this can fix the error
my_str[len(my_str) -1]

'!'

In [8]:
# similarly, you can use the `-1` index for the last element
my_str[-1]

'!'

# More on String Indexing

- If you want to slide from the beginning of a string (_left_ to _right_), you should use __positive__ indexing
    - for instance, the first letter is `my_str[0]` , then `my_str[1]`, ...
- if you want to slide from the end of a string (_right_ to _left_), you should use __negaive__ indexing
    - for instance, the first letter is `my_str[-1]` , then `my_str[-2]`, ...

In [10]:
my_str[-2]

'n'

# More on String Indexing

- You can always index more than 1 elements (letters) from a string
    - it is called _slicing_
    - you can use `:` symbol to slice a string
    - slicing can happen from either ends, or even both!

In [11]:
# this will return the whole string
my_str[0:]

'Hello Python!'

In [12]:
# this will return the string but the first letter
my_str[1:]

'ello Python!'

In [13]:
# this will return the string but the last letter
my_str[:-1]

'Hello Python'

In [14]:
# try slicing from both ends
my_str[1:-1]

'ello Python'

# More on String Slicing

- As long as you know how indexing on a string works, you can retrieve any part of the string use _slicing_
- for instance:
- Question: what would below statement return?
```python
my_str[:]
```

In [22]:
my_str[0:5]

'Hello'

In [24]:
# note that space is also a character
my_str[5]

' '

In [25]:
# you can also starts from the end
my_str[-7:-1]

'Python'

In [26]:
# how can you explain the below results?
my_str[1:1]

''

# Traverse through a String with Loops

- Sometimes we want to go through a string a letter by a letter
- Then do something about the letter
- And continue all the way to the string
- This pattern is called _traversal_
    - and this fits the purpose of a loop

In [7]:
# use a while loop to traverse through `my_str` from beginning
index = 0
while index < len(my_str):
    letter = my_str[index]
    print(letter)
    index = index + 1

H
e
l
l
o
 
P
y
t
h
o
n
!


# Your Turn Here 

Write a `while` loop traverse through `my_str` from the end.

In [17]:
# hint 1
len(my_str)

13

In [19]:
# hint 2
my_str[-13]

'H'

In [9]:
index = -1
while index >= (len(my_str) * -1):
    letter = my_str[index]
    print(letter)
    index = index - 1

!
n
o
h
t
y
P
 
o
l
l
e
H


In [21]:
# But more often we use `for` loops to traverse through a string
# Use the knowledge from Lecture last week, what is the iteration variable? what is the collection?
# How do we control so that this is not an infinite loop?
for char in my_str:
    print(char)

H
e
l
l
o
 
P
y
t
h
o
n
!


# Looping and Counting

- We already learned how to count items from a collection
    - if you don't remember how to do that, refer to the contents last week
    - string is a collection as well
    - however, we do not need to count _all_ items in string since we have `len()` for that
    - but we can count occrrence of a certain item in string

In [32]:
count = 0
for letter in my_str:
    if letter == 'l': 
        count = count + 1
print('l appears', count, 'times')

l appears 2 times


# Strings are Immutable
- Mutable means you can reassign the value of a variable, or part of a variable
    - Strings are immutable means you cannot reassign part of the string

In [27]:
my_str[0] = 'h'

TypeError: 'str' object does not support item assignment

In [28]:
# then what if we want to do it?
# we can only concatenate the new value with the rest of the string
# note that we create a new variable `my_str_new` and this will not change the original `my_str`
my_str_new = 'h' + my_str[1:]
my_str_new

'hello Python!'

In [33]:
# The `in` operator
# This is a very useful operator to check if a certain part is in the string

# this will return `True`
'H' in my_str

True

In [34]:
# this will return `False`
'H' in my_str_new

False

In [35]:
# you can check any part
'monty' in my_str

False

# String Comparion

- Like integers and floats, we can compare strings
    - string comparison is based on alphabetical order
    - note that Python handles uppercase and lowercase differently
        - uppercase < lowercase
        - normally we do not compare mixed cases together, we may convert them into the same case (usually lowercase)

In [11]:
word = input('Enter your word here:')

if word < 'banana':
    print('Your word, ' + word + ', comes before banana.')
elif word > 'banana':
    print('Your word, ' + word + ', comes after banana.')
else:
    print('All right, bananas.')


Enter your word here: pine


Your word, pine, comes after banana.


# String Methods

- String is an important type of Python objects/data types
    - An object contains both _data_ (variable) and built-in _methods_ (functions)
    - These functions can be applied to any _instance_ of the object
        - an instance is any variable belonging to an object
    - calling a method is similar as calling a function
        - but calling a function _f_ is `f(argument)`
        - while calling a method is `var.method()`

In [38]:
type(my_str)

str

In [40]:
# remember we can use `dir` to list all applicable methods
print(dir(my_str))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [41]:
# if we have questions about any methods, we can use `help`
help(str.capitalize)

Help on method_descriptor:

capitalize(self, /)
    Return a capitalized version of the string.
    
    More specifically, make the first character have upper case and the rest lower
    case.



In [42]:
# Remember we said we always normalize strings to lowercase
# we can use .lower() for that
my_str.lower()

'hello python!'

In [43]:
# another important method is `find()` 
# if a search string appears in the target string
# it will return the starting index of the string
'banana'.find('a')

1

In [44]:
# what would happen if the search string is not in the target string?
'banana'.find('c')

-1

# Your Turn Here

Clearly above result is not correct - `-1` means that the whole string is scanned and the sub-string is not found.

Write your code below to fix it: if a search string is not found, output 'not found!'

In [45]:
# `.find()` can be used to find a sub string in a string
'banana'.find('na')

2

In [46]:
# you can even tell `.find()` where to start
# by providing a second argument in the method
'banana'.find('na', 3)

4

In [47]:
# Another important method is `.strip()`
# to remove white space (spaces, tabs, or newlines) from the beginning and end of a string
line = '   Here we go      '
line.strip()
# this is extremely popular when reading text files in

'Here we go'

In [49]:
# you can also test if a string begin with another string
line.strip().startswith('Here')

True

In [50]:
# remember Python is case-sensitive
line.strip().startswith('here')

False

In [51]:
# but we can fix that
line.strip().lower().startswith('here')

True

In [53]:
# similarly we can use `.endswith()` to test what the string end with
line.strip().endswith('go')

True

# Parsing Strings

- One common task handling string is to find a meaningful part from it
    - for example, email addresses, URLs, phone numbers, ...
    - the key is to observe the string
    - and find common patterns 
- For instance:
    - the `phl332.fairfield.edu` part is called a _host_
    - how can we extract that from the text?
    - clearly we cannot use the exact match since the second string contains a host part (`fairfield.aws.com`) but not the exact text
    
```
From tao@phl332.fairfield.edu Sun Sep 1 2019 11:23:04 PM 
From huntley@fairfield.aws.com Mon Sep 2 2019 00:12:45 AM 
```

   

In [54]:
log = 'From tao@phl332.fairfield.edu Sun Sep 1 2019 11:23:04 PM'

# we know that the `host` part is right after `@`
# so let's locate `@` first
atpos = log.find('@')
print(atpos)

8


In [55]:
# then we need to observe the patter again to find where the `host` part ends
# we notice that right after the `host` part there is a space ' '
# lets find that - below statement find the 1st space after the position of '@' (atpos)
sppos = log.find(' ', atpos)
print(sppos)

29


In [56]:
# now we can use slicing to extrac the `host` part
# can you answer why we need to use `atpos+1`?
host = log[atpos+1:sppos]
print(host)

phl332.fairfield.edu


In [57]:
# Let's put everything together into a function
def host_find(log):
    atpos = log.find('@')
    sppos = log.find(' ', atpos)
    host = log[atpos+1:sppos]
    return host

In [58]:
host_find(log)

'phl332.fairfield.edu'

In [59]:
# Now let's try it on the next log item 
# to see whether the pattern we followed is correct
log1 = 'From huntley@fairfield.aws.com Mon Sep 2 2019 00:12:45 AM '
host_find(log1)

'fairfield.aws.com'

# Format Operator

- We knew that `%` when applied to integers is the modulus operator
- But when `%` is applied to strings it is format operator
    - when use the `%` operator, you can use following statement
    - `%d` means the output is a decimal - do not confuse this as an integer
    - variable is the value you want to format
    
```python
'%d' % variable
```

In [60]:
var = 33
'%d' % var

'33'

In [63]:
# you can also format it as float
# the `.3` part means you want to keep 3 digits after the decimal point
'%.3f' % var

'33.000'

In [66]:
# Formatting operator is particularly useful in print statements
import math

radius = 3.0
area = math.pi * (radius ** 2)
print('The area of circle with radius %d is: %.2f.' % (radius, area))

The area of circle with radius 3 is: 28.27.


In [68]:
# be careful of following errors
print('%d %d %d' % (3, 1))

TypeError: not enough arguments for format string

In [69]:
# this is also a common error
print('%d %d %d', % (3, 1, 2))

SyntaxError: invalid syntax (<ipython-input-69-9d1000480c8b>, line 2)

In [70]:
# another common error
print('%d' % 'dollar')

TypeError: %d format: a number is required, not str

# Your Turn Here
Finish exercises below by following instructions of each of them. 

Make sure you provide proper __pseudo code__ for each of your program.

## Q1. Coding Problem

Write a function `sub_reddit` to retrieve the title of the sub reddit from the URL.

Example input and output:
```
sub_reddit("https://www.reddit.com/r/funny/") ➞ "funny"

sub_reddit("https://www.reddit.com/r/relationships/") ➞ "relationships"

sub_reddit("https://www.reddit.com/r/mildlyinteresting/") ➞ "mildlyinteresting"
```

__HINT:__ notice what is before the sub reddit in the URLs in common? You might want to embed a test to ensure the pattern exists.

## Q2. Coding Problem

Write a function `reverse_str()` to reverse any string from user input.

Example input and output:
```
reverse_str('aabbcc') -> 'ccbbaa'
reverse_str('123') -> '321'
```

__HINT:__ index of the last element is `-1`, the first element is `0`; the index of the second to last element is `-2`, and the second element is `1`, ...

## Q3. Coding Problem

Write a function to calculate the length and area of an arc in a circle.

\begin{equation*}
    \ length_{arc} = \frac{n \times{\pi} \times r}{180}
\end{equation*}

\begin{equation*}
    \ area_{arc} = \frac{n \times{\pi} \times r^2}{360}
\end{equation*}

- in which `n` is the angle of the sector/arc (user input, integer)
- $\pi$ can be provided in the `math` package
- `r` is the radius (user input, float)

You need to use the __format operator__ to output like below:
```
The arc/sector with an angle of 90 and radius of 1 has a length of 1.571 and an area of 0.785.
```

__HINT:__ use `%d` nad `%f` properly in the output.

![DSB Logo](img/Dolan.jpg)
# Python Data Types: Strings
## PY4E Chapter 6
### How data are stored and processed in Python

![DSB Logo](img/Dolan.jpg)
# Python Data Types: Lists
## PY4E Chapter 8
### How data are stored and processed in Python

# List is also a Sequence

- Like strings, lists are also sequences 
    - in essence, they are ordered collections
    - _strings_ should be considered as _lists_
    - lists are collections of values (aka. _items_ or _elements_)
    - we use `[]` to denote a list

In [73]:
# a list of integers
[1, 2, 3, 4]
# a list of strings
['1', '2', '3', '4']
# a list of mixed type
[1, '2', 3, '4']
# nested lists
[[1], [2, 3, 4]]

[[1], [2, 3, 4]]

In [76]:
# you can use `len()` to get the length (number of elements in a list)
# just like what we did with strings
len([1, 2, 3, 4])

4

In [18]:
# you can assign list as a variable
# this is a list of integers again
int_lst = [1, 2, 3, 4]
# this is an empty list
emp_lst = []

# Lists are Mutable

- unlike strings, lists are _mutable_
    - means you can _assign/update_ value(s) in a list
    - to update value(s) in a list, you can just index the element
        - keep in mind that indices of a list also starts at __0__

In [75]:
# update the last item
int_lst[3] = 5
int_lst

[1, 2, 3, 5]

# Traversing a List
- Like strings, we can traverse lists using _loops_
    - `for` loops are most common
```python
for intvar in int_lst:
    print(intvar)
```

- However, if you want to update the values in a list, you should use `for` loops as following:
```python
for i in range(len(int_lst)): # here i is the index of element
    int_lst[i] *= 2
```

In [77]:
# Nested lists are special - the child list(s) in the list are considered an element
for item in [[1], [2, 3, 4]]:
    print(item) # these are lists

[1]
[2, 3, 4]


# YOUR TURN HERE

If we want to access the items in the child lists above, what can we do? Write your code below.

In [13]:
for item in [[1], [2, 3, 4]]:
    for child in item:
        print(child)

1
2
3
4


# One-Liners
- A cool thing we can do in Python is we can write one-liner for loops
    - meaning instead of writing the for loop in multiple lines
    - we can write them in one line (all pros do that)
    - for instance:
    
```python
for i in range(len(int_lst)): # here i is the index of element
    int_lst[i] *= 2
```

In [22]:
[i * 2 for i in int_lst] # here `i` is the element in the list

[2, 4, 6, 8]

In [20]:
# another exmaple - extract even number from following list
# for i in [1, 2, 3, 4, 5]:
#    if i % 2 == 0
[i for i in [1, 2, 3, 4, 5] if i % 2 == 0]

[2, 4]

# Lists Operations

- Lists have two main operators `+` and `*`
    - `+` operator concatenates two or more lists
    - `*` operator duplicate the list several times

In [78]:
# `+` operator
[1,2,3] + [4, 5]

[1, 2, 3, 4, 5]

In [89]:
# however, this will not work
[1, 2, 3] + 2

TypeError: can only concatenate list (not "int") to list

In [79]:
# `*` operator
[1, 2, 3] * 3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

# Slicing a list

- Like strings, we can slice a list using `:`
    - again, indices of lists starts at `0`
    - index of last item in a list is `-1`

In [92]:
my_lst = [1, 2, 3] * 3
my_lst[1:3]

[2, 3]

In [93]:
my_lst[-3:-1]

[1, 2]

In [94]:
my_lst[:]

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [95]:
# Since lists are mutable, any slice of a list is mutable too
my_lst[1:3] = [4,5]
my_lst

[1, 4, 5, 1, 2, 3, 1, 2, 3]

In [96]:
# however, this will not work as expected
# what went wrong?
my_lst[1:3] = [7, 8, 9]
my_lst

[1, 7, 8, 9, 1, 2, 3, 1, 2, 3]

# List Methods

- Two most important list methods are `.append()` and `.extend()`
    - `.append()` add a single __element/item__ to the end of the list
        - if you are inserting single values to a list, you can embed `.append()` in a loop
    - `.extend()` add all elements from __another list__ to the end of the list
        - `.extend()` is the same as `+`

In [99]:
lst1 = [1,2,3]
# `.append()` only takes an argument of a single element
lst1.append(4)
lst1

[1, 2, 3, 4]

In [100]:
lst2 = [5, 6]
lst1.extend(lst2)
print(lst1)

[1, 2, 3, 4, 5, 6]


In [101]:
# `lst2` remains unchanged
lst2

[5, 6]

In [102]:
# Another useful method of list is `.sort()`
# which can sort the elements in a list, as the name suggests
my_lst1 = [5, 2, 4, 8, 6, 1]
my_lst1.sort()
my_lst1

[1, 2, 4, 5, 6, 8]

In [104]:
# you can also sort a list of strings
my_lst2 = ['z', 'y', 'x']
my_lst2.sort()
my_lst2

# Be careful if your list contains different data types

['x', 'y', 'z']

# Deleting Elements

- We alredy know how to insert (add) and update elements in the list, how about _deleting_ elements?
    - Python provides several methods to delete elements
        - if you know the index of the element, you can always use `.pop()` 
        - you can also use a function called `del[i]` - in which `i` is the index of the element
            - difference between `.pop()` and `del[]` is that the element popped can be stored in another element
        - if you know the element you want to delete, but not the index, you can use the `.remove()` method

In [105]:
# example of `.pop()`
my_lst2 = ['z', 'y', 'x']
# remove the last element
popped = my_lst2.pop(2)
# should be ['z', 'y']
print(my_lst2)
# contains 'x'
print(popped)

['z', 'y']
x


In [106]:
# example of `del[]`
my_lst2 = ['z', 'y', 'x']
del my_lst2[2]
print(my_lst2)

['z', 'y']


In [107]:
# you can also delete a slice of a list
my_lst2 = ['z', 'y', 'x']
del my_lst2[1:2]
print(my_lst2)

['z', 'x']


In [108]:
# example of `.remove()`
my_lst2 = ['z', 'y', 'x']
my_lst2.remove('x')
print(my_lst2)

['z', 'y']


# List Functions

- List provides a variety of functions allowing you quickly take a look of a numeric list
    - `len()` provides the number of elements in the list
    - `max()` provides the maximal number in the list
    - `min()` provides the minimal number in the list
    - `sum()` provides the total sum of the list
   

In [109]:
nums = [3, 41, 12, 9, 74, 15]
# number of elements
print(len(nums))
# maximum
print(max(nums))
# minimum
print(min(nums))
# sum total
print(sum(nums))
# arithmethic mean, aka. mean
print(sum(nums)/len(nums))

6
74
3
154
25.666666666666668


# Lists and Strings
- Strings and lists are very similar
    - we can see strings as lists of _characters_ that is _immutable_
    - you can convert a string to a list of characters by using `list()`
    - you can also convert a multi-world string (e.g. an English sentence) using `.split()` method

In [110]:
my_str = 'hello world!'
my_str_lst = list(my_str)
my_str_lst

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!']

In [111]:
word_lst = my_str.split()
word_lst

['hello', 'world!']

In [112]:
# you can specify what you want to split on
# by passing the delimiter as an argument
weird_str = 'this=is=a=string'
weird_str.split('=')

['this', 'is', 'a', 'string']

In [115]:
# you can reverse `.split()` by using `.join()`
' '.join(weird_str.split('='))

'this is a string'

# Lists Equality and Aliasing

- In some logical statements, we may want to test if two lists are equal
    - we can always use `==` operator like we did with integers, floats, ...
    - however, we also introduced another operater called `is` 
        - if you try `1 is 1` you will get the expected result `True`
        - but if you try `[1] is [1]` you will not get expected results `False`
        - This is because `is` test whether they are the same __object__ (refer to pp. 99 in PY4E for more details)
        - And lists with the same value are different __objects__ 
        - That is where we will use _aliasing_
    


In [116]:
lst1 = [1, 2, 3]
lst2 = [1, 2, 3]
lst1 == lst2

True

In [117]:
lst1 = [1, 2, 3]
lst2 = [1, 2, 4]
lst1 == lst2

False

In [118]:
# True
1 is 1

True

In [119]:
# ???
[1] is [1]

False

In [120]:
lst_a = [1]
# create an alias of `lst_a` as `lst_b`
lst_b = lst_a
# Now they are the same object
lst_a is lst_b

True

# Your Turn Here
Finish exercises below by following instructions of each of them. 

Make sure you provide proper __pseudo code__ for each of your program.

## Q1. Coding Problem

Write a function to return the `n` last elements from a list.
- `n` should be from user input, as an integer.
    - if `n` is not an integer, return 'Please enter an integer!'
- `test_lst` should be a list of 5 - 7 random integers between 0 and 9 (_HINT_: random.randint() and a `for` loop)
    - if `n` is greater than the length of `test_lst`, return 'Pleae enter a valid integer!'
- Both `test_lst` and `n` should be arguments of the function
- If `n = 0`, return an empty list `[]`.

Example input and output:
```
return_last([1, 2, 3, 4, 5], 1) -> 5
return_last([4, 3, 9, 9, 7, 6], 3) -> [9, 7, 6]
return_last([1, 2, 3, 4, 5], 7) -> 'Pleae enter a valid integer!'
return_last([1, 2, 3, 4, 5], 0) -> []
return_last([1, 2, 3, 4, 5], 0.1) -> 'Please enter an integer!'
```

In [45]:
import random

def return_last():
    # user input - convert to integer if needed
    n = input()
    try:
        n = int(n)
    except:
        return('Please enter a valid integer')
    # create a list of 5-7 random integers betwen 0 and 9
    test_lst = []
    i = random.randint(5, 7)
    for x in range(i + 1):
        rand_int =  random.randint(0, 9)
        test_lst.append(rand_int)
    # test if n = 0, n > the length of the test_lst, or n <= the length of the test_lst
    if n == 0:
        output_lst = []
        return(output_lst)
    elif n > len(test_lst) or n < 0:
        return('Please enter a valid integer')
    else:
        output_lst = test_lst[-n:]
        return(output_lst)

In [39]:
return_last()

 q


'Please enter a valid integer'

## Q2. Coding Problem

Write a program to extract any integer between 0 and 100, which is divisible by 3 __and__ 5, into a list. And calculate the average of the list.

__Additional challenge__: write the program as a _one-liner_.

In [48]:
lst = [i for i in range(100) if i % 3 == 0 and i % 5 == 0]
print(lst)
print(sum(lst)/len(lst))

315


TypeError: 'int' object is not iterable

# Classwork (start here in class)
You can start working on them right now:
- Read Chapter 6 and 8 in PY4E
- If time permits, start in on your homework. 
- Ask questions when you need help. Use this time to get help from the professor!

# Homework (do at home)
The following is due before class next week:
  - Any remaining classwork from tonight
  - Data Camp “Python Lists” assignment 

Note: All work on Data Camp is logged. Don't try to fake it!

Please email jtao@fairfield.edu if you have any problems or questions.

![DSB Logo](img/Dolan.jpg)
# Python Data Types: Lists
## PY4E Chapter 8
### How data are stored and processed in Python