<a target="_blank" href="https://colab.research.google.com/github/svniko/python-fund-2023/blob/main/Lecture9.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

                                         Assoc. Prof. Svitlana Kovalenko
                                         Department of Software Engineering 
                                         and Management Intelligent Technologies
                                          NTU KhPI

# Lecture 9

# Python sequences. Strings

Sequence type is a type of data in Python which is able to store more than one value (or less than one, as a sequence may be empty), and these values can be sequentially (hence the name) browsed, element by element.

As the `for` loop is a tool especially designed to iterate through sequences, we can express the definition as: a sequence is data which can be scanned by the `for` loop.

You've encountered one Python sequence so far - the list. The **list is a classic example of a Python sequence**, although there are some other sequences worth mentioning.


The second notion - **mutability - is a property of any of Python's data that describes its readiness to be freely changed during program execution**. There are two kinds of Python data: mutable and immutable.

Mutable data can be freely updated at any time 

**Immutable data cannot be modified in this way.**

Imagine that a list can only be assigned and read over. You would be able neither to append an element to it, nor remove any element from it. This means that appending an element to the end of the list would require the recreation of the list from scratch.

You would have to build a completely new list, consisting of the all elements of the already existing list, plus the new element.

The data type to discuss now is a string. **A string is an immutable sequence type**. It can behave like a list, but it mustn't be modified.

In [39]:
s = 'Python Fundamentals'
# we can iterate over a string
for ch in s:
    print(ch)

P
y
t
h
o
n
 
F
u
n
d
a
m
e
n
t
a
l
s


![image.png](attachment:image.png)

In [40]:
# we can use indexing 
s[0]

'P'

In [41]:
# we can slice it
s[:6]

'Python'

In [42]:
s[::-2]

'sanmdu otP'

In [43]:
s[-12: -2]

'Fundamenta'

In [44]:
# we can use it in list comprehension
# delete all vowels from a list
[ch for ch in s if ch not in 'eyuioa']

['P', 't', 'h', 'n', ' ', 'F', 'n', 'd', 'm', 'n', 't', 'l', 's']

In [45]:
''.join([ch for ch in s if ch not in 'eyuioa'])

'Pthn Fndmntls'

In [46]:
# BUT we can not assign any value in string element
s[0] = 'B'

TypeError: 'str' object does not support item assignment

# String Operators

The `+` operator concatenates strings:

In [47]:
s = "Python"
t = "Fundamentals"
s + t

'PythonFundamentals'

In [48]:
q = s + t
q

'PythonFundamentals'

In [49]:
"Python" + ' ' + 'Fundamentals'

'Python Fundamentals'

The `*` operator creates multiple copies of a string:

In [50]:
'Python!' * 3

'Python!Python!Python!'

The `in` and `not in` operators provide boolean testing of membership within a string:

In [51]:
'Python' in q

True

In [52]:
'C++' not in q

True

### About string immutability

In [53]:
s = 'Python'
s += " Fundamentals"
s

'Python Fundamentals'

In [54]:
s = 'Python'
print(id(s))
s += " Fundamentals"
print(id(s))
s

17565952
26274608


'Python Fundamentals'

## Built-In String Functions

Python provides many functions that are built into the interpreter and always available. In this lesson, you’ll see a few that work with strings and character data:

|Function	|Description|
|----------|---------------|
|chr()|	Converts an integer to a character|
|ord()|	Converts a character to an integer|
|len()|	Returns the length of a string|
|str()|	Returns a string representation of an object|
|int()|Returns an integer object from the specified input. The returned int object will always be in base 10|
|float()|Returns a floating representation of an object|

### String literals

We can create a string literals using single, double and triple quotes

In [55]:
# defining strings in Python
# using single quotes
my_string = 'Python'

In [56]:
# using function type() to get the data type of my_string, 
# and printing the string
print(type(my_string)) 
print(my_string)

<class 'str'>
Python


In [57]:
# using double quotes
my_string = "Python"
print(type(my_string))
print(my_string)

<class 'str'>
Python


In [59]:
# using triple quotes
my_string = '''Python'''
print(type(my_string))
print(my_string)
# strings with triple quotes can be multi-line string
my_string = """Python
   Fundamentals
Course
"""

print(type(my_string))
print(my_string)

<class 'str'>
Python
<class 'str'>
Python
   Fundamentals
Course



#### Escape Characters
An escape character lets you use characters that are otherwise impossible to put
into a string. An escape character consists of a backslash (`\`) followed by the
character you want to add to the string. (Despite consisting of two characters, it
is commonly referred to as a singular escape character.) For example, the escape
character for a single quote is `\'`. 

In [60]:
"Bob's cat"

"Bob's cat"

In [61]:
'Bob's cat'

SyntaxError: invalid syntax (3552450982.py, line 1)

In [62]:
'Bob\'s cat'

"Bob's cat"

|Escape character| Prints as|
|----------------|----------|
|\\'| Single quote|
|\\"|Double quote|
|\\t| Tab|
|\\n |Newline (line break)|
|\\\ |Backslash|


In [63]:
print('Hello!\nHow are you?\nHope you\'are doing well.')

Hello!
How are you?
Hope you'are doing well.


In [64]:
print('Hello!\tHow are you?\tHope you\'are doing well.')

Hello!	How are you?	Hope you'are doing well.


### Raw Strings
You can place an `r` before the beginning quotation mark of a string to make it a
raw string. A raw string completely ignores all escape characters and prints any
backslash that appears in the string.

In [65]:
print(r'Bob\'s cat.')

Bob\'s cat.


In [67]:
print(r"Bob's cat.")

Bob's cat.


In [68]:
print('D:\Py_2023\EN')

D:\Py_2023\EN


In [69]:
print('D:\Py_2023\nice')

D:\Py_2023
ice


In [70]:
print(r'D:\Py_2022\nice')

D:\Py_2022\nice


In [71]:
print('D:\\Py_2022\\nice')

D:\Py_2022\nice


In [72]:
# Creating an empty string
s = str()
s

''

In [73]:
s = ''
s

''

### Python string methods

|Function|	Description|
|--------|-------------|
|split()|	Python string split() function is used to split a string into the list of strings based on a delimiter.|
|join()|	This function returns a new string that is the concatenation of the strings in iterable with string object as a delimiter.|
|strip()|	Used to trim whitespaces from the string object.|
|upper()|	We can convert a string to uppercase in Python using str.upper() function.|
|lower()|	This function creates a new string in lowercase.|
|replace()|	Python string replace() function is used to create a new string by replacing some parts of another string.|
|find()	|Python String find() method is used to find the index of a substring in a string.|
|format()|	It’s used to create a formatted string from the template string and the supplied values.|


list of Python string methods
https://www.w3schools.com/python/python_ref_string.asp

In [74]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


### strip()
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.

In [75]:
'   hello   '.strip()

'hello'

### lstrip() and rstrip()
`lstrip([chars])`: Return a copy of the string with leading characters removed.

`rstrip([chars])`: Return a copy of the string with trailing characters removed.

In [76]:
'   hello   '.lstrip()

'hello   '

In [77]:
'   hello   '.rstrip()

'   hello'

#### strip() with character
We can specify character(s) instead of removing the default whitespace.

In [78]:
'***hello***'.strip('*')

'hello'

Careful: only leading and trailing found matches are removed:

In [79]:
' \n \t hello\n'.strip('\n')
# '\n' not leading, so the first \n is not removed!


' \n \t hello'

In [80]:
'\n \t hello\n'.strip('\n')

' \t hello'

In [83]:
' \n \t hello\n'.strip()

'hello'

In [85]:
' \n \t hello\n'.strip(' ').strip('\n')

' \t hello'

### strip() with combination of characters

The chars argument is a string specifying the set of characters to be removed. So all occurrences of these characters are removed, and not the particular given string.

In [86]:
'www.example.com'.strip('cmow.')

'example'

### removeprefix() and removesuffix()
**They are introduced in Python 3.9.0 version.**

Like seen before, `strip`,`lstrip`, and `rstrip` remove all occurrences of the passed chars string. So if we just want to remove the given string, we can use `removeprefix` and `removesuffix`.

In [90]:
'python: pythonic way'.rstrip('python: ')

'python: pythonic wa'

In [None]:
'python: pythonic way'.removeprefix('python: ')

![image.png](attachment:image.png)

In [None]:
'python: pythonic way'.removesuffix('python: ')

![image.png](attachment:image.png)

### replace()
Return a copy of the string with all occurrences of substring old replaced by new.

In [91]:
 ' \n \t hello\n'.replace('\n', '')

'  \t hello'

### split()
Return a list of the words in the string, using sep as the delimiter string. If `maxsplit` is given, at most maxsplit splits are done.

In [92]:
'string methods in python'.split()

['string', 'methods', 'in', 'python']

In [93]:
'string methods in python'.split(" ")

['string', 'methods', 'in', 'python']

In [94]:
'string    methods    in    python'.split(" ")

['string', '', '', '', 'methods', '', '', '', 'in', '', '', '', 'python']

In [95]:
'string    methods    in    python'.split()

['string', 'methods', 'in', 'python']

In [97]:
'string methods in python'.split(' ', maxsplit=2)

['string', 'methods', 'in python']

In [98]:
'string   methods   in python'.split(' ', maxsplit=1)

['string', '  methods   in python']

### rsplit()
Return a list of the words in the string, using sep as the delimiter string. If `maxsplit` is given, at most `maxsplit` splits are done, the rightmost ones.

In [99]:
'string methods in python'.rsplit()

['string', 'methods', 'in', 'python']

In [100]:
'string methods in python'.rsplit(' ', maxsplit=1)

['string methods in', 'python']

### join()
Return a string which is the concatenation of the strings in iterable.

In [101]:
Lst = ['string', 'methods', 'in', 'python']
' '.join(Lst)

'string methods in python'

In [102]:
Lst = ['string', 'methods', 'in', 'python']
'*'.join(Lst)

'string*methods*in*python'

In [103]:
part = '$'
part.join(Lst)

'string$methods$in$python'

### upper(), lower(), capitalize(), title()
Return a copy of the string with all the cased characters converted to uppercase, lowercase, first character capitalized and the rest lowercased.



In [105]:
'PyThoN FunDaMenTals cOuRSe'.upper()

'PYTHON FUNDAMENTALS COURSE'

In [106]:
'PyThoN FunDaMenTals cOuRSe'.lower()

'python fundamentals course'

In [107]:
'PyThoN FunDaMenTals cOuRSe'.capitalize()

'Python fundamentals course'

In [108]:
'PyThoN FunDaMenTals cOuRSe'.title()

'Python Fundamentals Course'

### islower(), isupper()

Checks if the string consist only of upper or lower characters.

In [109]:
'PYTHON FUNDAMENTALS COURSE'.islower()

False

In [110]:
'python fundamentals course'.islower()

True

In [111]:
'PYTHON FUNDAMENTALS COURSE'.isupper()

True

In [112]:
'Python Fundamentals Course'.isupper()

False

In [113]:
'PYTHON FUNDAMENTALS COURSe'.isupper()

False

### isalpha(), isnumeric(), isalnum()

`isalpha()`: Return True if all characters in the string are alphabetic and there is at least one character, False otherwise.

`isnumeric()`: Return True if all characters in the string are numeric characters, and there is at least one character, False otherwise.

`isalnum()`: Return True if all characters in the string are alphanumeric and there is at least one character, False otherwise.

In [114]:
s = 'python'
print(s.isalpha(), s.isnumeric(), s.isalnum())

True False True


In [115]:
s = '123'
print(s.isalpha(), s.isnumeric(), s.isalnum())

False True True


In [117]:
s = 'python2023'
print(s.isalpha(), s.isnumeric(), s.isalnum())

False False True


In [118]:
s = 'python-2023'
print(s.isalpha(), s.isnumeric(), s.isalnum())

False False False


###  isspace(), istitle()

`isspace()` returns True if the string consists only of spaces, tabs, and
newlines and is not blank.
`istitle()` returns True if the string consists only of words that begin with
an uppercase letter followed by only lowercase letters.

In [119]:
'\n  \t'.isspace()

True

In [120]:
"Python Fundamentals Course".istitle()

True

In [121]:
"Python fundamentals course".istitle()

False

### count()
Return the number of non-overlapping occurrences of substring sub in the slice `s[start, end]`.

In [122]:
'hello world'.count('l')

3

In [123]:
'hello world'.count('l',3,10)

2

### find()
Return the lowest index in the string where substring sub is found within the slice `s[start:end]`.

In [124]:
s = 'Python Fundamentals'
idx = s.find('a')
print(idx)
print(s[idx])
print(s[idx:])

11
a
amentals


In [125]:
idx = s.find('a', 12)
print(idx)
print(s[idx])
print(s[idx:])

16
a
als


In [126]:
#if not found, returns -1
idx = s.find('a', 12, 14)
print(idx)
print(s[idx])
print(s[idx:])

-1
s
s


In [128]:
help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



 ### rfind()
 Return the highest index in the string where substring sub is found, such that sub is contained within `s[start:end]`.

In [129]:
s

'Python Fundamentals'

In [130]:
idx = s.rfind('a')
print(idx)

16


In [131]:
idx = s.rfind('x')
print(idx)

-1


### startswith() and endswith()
Return `True` if string starts/ends with the prefix/suffix, otherwise return `False`.

In [132]:
'Python'.startswith('Py')

True

In [133]:
'Python'.endswith('on')

True

In [134]:
'Python'.endswith('of')

False

### partition()
Split the string at the first occurrence of `sep`, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

In [135]:
'Python is awesome!'.partition('is')

('Python ', 'is', ' awesome!')

In [136]:
'Python is awesome!'.partition('very')

('Python is awesome!', '', '')

### center(), ljust(), rjust()
`center()`: Return centered in a string of length width. Padding is done using the specified fillchar (default is a space).

`ljust()`: Return the string left justified in a string of length width. Padding is done using the specified fillchar (default is a space).

`rjust()`: Return the string right justified in a string of length width. Padding is done using the specified fillchar (default is a space).

In [137]:
"Python Fundamentals".center(30, '-')

'-----Python Fundamentals------'

In [138]:
"Python Fundamentals".ljust(30, '-')

'Python Fundamentals-----------'

In [139]:
"Python Fundamentals".rjust(30, '@')

'@@@@@@@@@@@Python Fundamentals'

### swapcase()
Return a copy of the string with uppercase characters converted to lowercase and vice versa.

In [140]:
'PYTHON Fundamentals'.swapcase()

'python fUNDAMENTALS'

### zfill()
Return a copy of the string left filled with ‚0‘ digits to make a string of length width. A leading sign prefix (‚+‘/'-') is handled by inserting the padding after the sign character rather than before.

In [141]:
'42'.zfill(5)

'00042'

In [143]:
'-42'.zfill(10)

'-000000042'

### f-Strings
Since Python 3.6, f-strings can be used to format strings. They are more readable, more concise, and also faster!

In [144]:
a = 2
b = 3
f'{a} + {b} = {a + b}'

'2 + 3 = 5'

### format()

Syntax
```Python
string.format(value1, value2...)
```

The `format()` method formats the specified value(s) and insert them inside the string's placeholder.

The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.

The `format()` method returns the formatted string.


In [145]:
"My name is {fname}, I'm {age}".format(fname = "John", age = 23)

"My name is John, I'm 23"

In [146]:
"My name is {0}, I'm {1}".format("John", 23)

"My name is John, I'm 23"

In [147]:
"My name is {1}, I'm {0}".format("John", 23)

"My name is 23, I'm John"

In [148]:
"My name is {}, I'm {}".format("John",23)

"My name is John, I'm 23"

### Formatting Types
Inside the placeholders you can add a formatting type to format the result:

|  |  |
|---|--|
|:<	|	Left aligns the result (within the available space)|
|:>		|Right aligns the result (within the available space)|
|:^		|Center aligns the result (within the available space)|
|:=	|Places the sign to the left most position|
|:+		|Use a plus sign to indicate if the result is positive or negative|
|:-	|	Use a minus sign for negative values only|
|: 	|	Use a space to insert an extra space before positive numbers (and a minus sign before negative numbers)|
|:,		|Use a comma as a thousand separator|
|:_		|Use a underscore as a thousand separator|
|:b	|	Binary format|
|:c	|	Converts the value into the corresponding unicode character|
|:d	|	Decimal format|
|:e	|	Scientific format, with a lower case e|
|:E	|	Scientific format, with an upper case E|
|:f	|	Fix point number format|
|:F	|	Fix point number format, in uppercase format (show inf and nan as INF and NAN)|
|:g	|	General format|
|:G	|	General format (using a upper case E for scientific notations)|
|:o|		Octal format|
|:x	|	Hex format, lower case|
|:X	|	Hex format, upper case|
|:n	|	Number format|
|:%		|Percentage format|

In [149]:
txt = "We have {:8} friends."
txt.format(100)

'We have      100 friends.'

In [150]:
txt = "We have {:<8} friends."
txt.format(100)

'We have 100      friends.'

In [154]:
txt = "We have {:>8X} friends."
txt.format(110)

'We have       6E friends.'

In [155]:
txt = "We have {:>+8} friends."
txt.format(100)

'We have     +100 friends.'

In [156]:
txt = "We have {:.2f} dollars."
txt.format(100)

'We have 100.00 dollars.'

In [157]:
txt = "We have {:_} dollars."
txt.format(100000)

'We have 100_000 dollars.'

In [159]:
# the same things we can do with f-strings
f"We have {100000:_} dollars."

'We have 100_000 dollars.'

In [161]:
a = 3.1415926
f'The Pi value is {a:^10.5f} '

'The Pi value is  3.14159   '

## Examples

### Example 1
You are given a string and you have to find its first word.

In [162]:
def first_word(text):
    return text.split()[0]

In [163]:
assert first_word("Hello   world") == "Hello"
assert first_word("a word") == "a"
assert first_word("greetings from Python") == "greetings"
assert first_word("hi") == "hi"
print("OK")

OK


In [165]:
"Hello   world".split()[0]

'Hello'

#### More difficult version

When solving a task pay attention to the following points:

- There can be dots and commas in a string.
- A string can start with a letter or, for example, one/multiple - -dot(s) or space(s).
- A word can contain an apostrophe and it's a part of a word.
- The whole text can be represented with one word.

In [174]:
def first_word_1(text):
#     text = text.replace('.', ' ').replace(',',' ')
#     text = text.split()[0]
    return text.replace('.', ' ').replace(',',' ').split()[0]

In [175]:
assert first_word_1("Hello world") == "Hello"
assert first_word_1("  a word ") == "a"
assert first_word_1(" don't touch it") == "don't"
assert first_word_1("greetings, friends") == "greetings"
assert first_word_1("...Don't speak...") == "Don't"
assert first_word_1(", Hello.World") == "Hello"
print("OK")

OK


In [176]:
s = '& *H:e,.;l-?l!o'
for char in " ,.;:-?!&*":
     s = s.replace(char, "")
print(s)

Hello


### Example 2
You have a sequence of strings, and you’d like to determine the most frequently occurring string in the sequence. It can be only one.

In [177]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



In [178]:
def most_frequent(data):
    return max(data, key=data.count)

In [179]:
assert most_frequent(["a", "b", "c", "a", "b", "a"]) == "a"
assert most_frequent(["a", "a", "bi", "bi", "bi"]) == "bi"
assert most_frequent(["a"]) == "a"
print("OK")

OK


In [181]:
most_frequent(["a", "b", "c", "a", "b", "a", "b", "b"])

'b'

### Example 3
Try to find out how many zeros a given number has at the end.

In [182]:
def end_zeros(a):
    return len(str(a)) - len(str(a).rstrip('0'))

In [184]:
assert end_zeros(0) == 1
assert end_zeros(1) == 0
assert end_zeros(10) == 1
assert end_zeros(101) == 0
assert end_zeros(245) == 0
assert end_zeros(100100) == 2
assert end_zeros(100000) == 5
assert end_zeros(100007) == 0
assert end_zeros(1000070) == 1
print("OK")

OK
