In [1]:
from IPython.core.display import display, HTML, Image
display(HTML("<style>.container { width:90% !important; }</style>"))
Image('https://www.python.org/images/python-logo.gif')

<IPython.core.display.Image object>

## Strings

String is a basic data type available in Python and is used to store text. It is a sequence of characters, which can include letters, numbers, punctuation marks and spaces. For example, if we want to store the name of a student or address, then string is the most suitable data structure. 

To represent a string, you can enclose it between a pair of single quotes, double quotes or even triple quotes. Though there is no difference in a string enclosed between single or double quotes, i.e. 'Modulus' and "Modulus" represents the same string.

Python provides a wide variety of operators and built-in functions for string manipulation.

![image.png](attachment:image.png)

## Agenda

* Creating and storing strings
* Basic string operations
* String methods
* Accessing characters in string
* String slicing, joining and splitting
* Formatting strings
* Summary
* Sample Python codes on string data type
* Multiple choice questions and programming

## Creating and Storing Strings

Strings consist of one or more characters surrounded by matching quotation marks. 

In Python, string is an 'immutable' data type. 

[An immutable object is an object that is not changeable and its state cannot be modified after it is created. You cannot overwrite the values of immutable objects. However, you can assign the variable again.]

### Creating simple strings

In [5]:
s1 = 'Data Science'
s2 = "Data Science"
s3 = "M"
print(s1,s2,s3)

Data Science Data Science M


### Defining empty string

In [27]:
s4 = ''
s5 = ""
print(s4,s5)





### Using ' or " within string

In [7]:
s6 = "Modulus's Data Science Program"
s7 = 'As saying goes: "Practice makes perfect"'

print(s6)
print(s7)

Modulus's Data Science Program
As saying goes: "Practice makes perfect"


### Using escape ( \ ) character

In [9]:
s8 = "As saying goes: \"Practice makes perfect\""
print(s8)

As saying goes: "Practice makes perfect"


In [1]:
s8 = "As saying goes: "Practice makes perfect""
print(s8)

SyntaxError: invalid syntax (<ipython-input-1-494d92d62ce1>, line 1)

### Using triple quotes

In [16]:
s9 = '''Modulus Data Science course. 
This is a course on Basic Python.'''
print(s9)

Modulus Data Science course. 
This is a course on Basic Python.


In [18]:
s9 = "Modulus Data Science course. 
This is a course on Basic Python."
print(s9)

SyntaxError: EOL while scanning string literal (<ipython-input-18-eb6e8b3885eb>, line 1)

In [19]:
s9 = "Modulus Data Science course.\nThis is a course on Basic Python."
print(s9)

Modulus Data Science course.
This is a course on Basic Python.


### Using str() built-in function

This function takes in any object and converts it into string.

In [23]:
x=50
print(x)
str(x)

50


'50'

In [26]:
str()

''

In [28]:
type(str())

str

### Storing strings

In [53]:
s10="Hello"
type("s10")

str

![image.png](attachment:image.png)

TEASER: to explain immutability

In [54]:
id(s10)

4545737072

In [55]:
s10="World"
id(s10)

4545894896

In [56]:
s10[0] = 'X'

TypeError: 'str' object does not support item assignment

The characters in a string cannot be changed once a string value is assigned to string variable. 
However, you can assign a different string values to the same string variable.

## Basic String Operations

In programming, many tasks require string manipulation. Python provides the programmer a wide variety of extremely useful operators for this purpose, which allows a user to perform tasks with ease and efficiency. 

### Concatenation operator

'+' is used to concatenate two strings. This operator requires two operands and works on direct string values as well as variables storing string values. Result of the operation is also a string.

In [4]:
s1 = 'Data'
s2 = 'Science'
print(s1 + s2)

DataScience


In [3]:
print('Data' + ' ' + 'Science')

Data Science


In [5]:
print(s1 + ' ' + 'Science')

Data Science


In [2]:
name = input('Enter your name: ')
print('Welcome ' + name)

Enter your name: Daisy
Welcome Daisy


An important point to note here is that '+' operator is also used to add numbers. We are using same operator to concatenate two strings. 

Does this mean that we can also add a number and string using '+'? Answer is NO.

In [9]:
print(4 + '5')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In such cases, we need to convert either number to string or string to number, depending on our use case.

In [12]:
print(str(4) + '5')

45


In [11]:
print(4 + int('5'))

9


### Replication operator

'*' is used to replicate a string multiple times. This operator requires two operands. First operand is an integer which determines how many times the string given as second operand needs to be repeated. Result of the operation is also a string.

In [3]:
s = input('Enter an expression! ')
n = int(input('Enter number of times expression needs to be repeated '))
print(n*s)

Enter an expression! Hello
Enter number of times expression needs to be repeated 7
HelloHelloHelloHelloHelloHelloHello


In [4]:
print(s*n)

HelloHelloHelloHelloHelloHelloHello


### Membership operator

'in' and 'not in' are used to check the presence or absence of a string in another string. Both of these operators require two operands, both being strings. Result of the operation is a boolean value i.e. True or False.

'in' operator evaluates to True if the string value in the left operand appears in the sequence of characters of string value in right operand. Essentially left string should be a substring of right string.

'not in' operator evaluates to True if the string value in the left operand does not appear in the sequence of characters of string value in right operand. Essentially left string should not be a substring of right string.

In [7]:
main_string = 'Data Science'
sub_string = input('Enter string to check membership: ')

sub_string in main_string

Enter string to check membership: Sce


False

In [8]:
sub_string not in main_string

True

### String comparison

List of comparison operators available in Python:
![image.png](attachment:image.png)

These work similarly on strings as on numbers.
In case of strings, ASCII value of characters are compared.

ASCII (American Standard Code for Information Interchange) value of a character is integer equivalent, usually between 0 and 127. It is a method to define a set of characters for encoding text documents on computers.

For example: ASCII values of,
* 'A' is 65 
* 'B' is 66 
* 'a' is 97 
* 'b' is 98
* '0' is 48
* '1' is 49
* '!' is 33 ... so on.

This means a string starting with uppercase letter will be smaller than a string starting with lowercase letter. Similarly, a string starting with a digit will be smaller than a string starting with an alphabet.

In [32]:
s1 = 'savan'
s2 = 'savaN'
print(s1 > s2)

True


![image.png](attachment:image.png)

What happens if one of the input strings is empty?

In [49]:
s3 = str() # s3 = ''
print(s1 > s3) # s1 = 'savan' from earlier example

True


In [48]:
s3 = ' ' # this is different from '' bcz ' ' means a string of length 1, a string with space character.
print(s1 > s3)

True


How to know the ASCII value of a character?

In [50]:
ord('A')

65

In [51]:
ord(' ')

32

In [44]:
ord('')

TypeError: ord() expected a character, but string of length 0 found

In [9]:
ord('AB')

TypeError: ord() expected a character, but string of length 2 found

How is empty string being compared?

If the string is empty, a value of 0 is returned.

### Built-in string functions

A function is a block of code to perform a specific task, will have its own scope and is called by name. It can take no arguments or can take one or more arguments. After execution of the function completes, it can or can not return one or more values.

![image.png](attachment:image.png)

In [3]:
name = "Jasmine"
len(name)

7

In [4]:
min(name), max(name)

('J', 's')

## String Methods

A method in python is somewhat similar to a function, except it is associated with object/classes. 

Two major differences are:
1. The method is implicitly used for an object for which it is called.
2. The method is accessible to data that is contained within the class.

All string methods returns new values. They do not change the original string.

In [5]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


![image.png](attachment:image.png)

In [11]:
name = "jasmine"

In [12]:
name.capitalize() 

'Jasmine'

In [13]:
name

'jasmine'

In [14]:
name.find("mine")

3

In [15]:
name.find("abcd")

-1

In [10]:
name.zfill(10)

'000jasmine'

In [16]:
name.zfill(5)

'jasmine'

### References for Python string methods and Python built-in functions
* https://www.w3schools.com/python/python_ref_string.asp
* https://www.w3schools.com/python/python_ref_functions.asp

## Accessing Characters in String

In Python, each character in a string occupies certain position within the string. Example, first character of a string is at position 0. This refers to indexing or subscripting i.e. fetching the character at a particular position in the given string.

![image.png](attachment:image.png)

Points to note:
* Index starts at 0
* Special characters, white spaces have their own index
* Negative indexes can be used to get characters from the end of the string

In [17]:
s = "Hello World"

In [19]:
len(s)

11

In [18]:
s[0], s[5], s[-7]

('H', ' ', 'o')

Two more points to note:
* Positive index should fall between 0 and (1- length of the string) while negative index should fall between 1 and length of the string, otherwise IndexError is encountered
* Range of indexes can be used to fetch a substring

In [20]:
s[11]

IndexError: string index out of range

In [15]:
s[-12]

IndexError: string index out of range

In [17]:
s[3:7] #start index is inclusive, end index is exclusive, : is used between start and end index

'lo W'

![image.png](attachment:image.png)

In [18]:
s[3:]

'lo World'

In [19]:
s[:11]

'Hello World'

In [20]:
s[:]

'Hello World'

In [21]:
s[:-4]

'Hello W'

In [23]:
s[-5:-4]

'W'

In [26]:
s[6:4], s[-3:-5], s[1:1] # start index < end index 

('', '', '')

In [29]:
s[-10:5] 

'ello'

TEASER

![image.png](attachment:image.png)

In [21]:
s[20]

IndexError: string index out of range

In [31]:
s[-1:5] # start index < end index but it doesn't works, why?

''

In [38]:
s[6:20], s[-15:4], s[-15:20] # indexes are not within limit, then why are we not getting error?

('World', 'Hell', 'Hello World')

## String Slicing, Joining and Splitting

Slicing is accessing a sequence of characters in a string.


There is a start index, which is inclusive.
There is an end index, which is exclusive.
Also, there is a step parameter.

Indexes are integers, such that slicing can be done using either positive or negative indexes or both. 

In [40]:
s = "Hello World"

In [22]:
start = int(input("Enter start index: "))
end = int(input("Enter end index: "))

s[start:end]

Enter start index: 3
Enter end index: 8


'lo Wo'

Now let's see how step parameter works in slicing.

Step refers to the number of characters that need to be skipped after the start indexing character in the string. The default value of step is one. 

Syntax is: "string[start:end:step]"

![image.png](attachment:image.png)

In [48]:
s[0:10:2]

'HloWr'

In [49]:
s[0::2]

'HloWrd'

In [52]:
s[::5]

'H d'

### Joining

'+' is used to concatenate strings.

If there are multiple strings to be joined, one way is to use '+' repeatedly between each string.

Other option is to use join() in-built method. 

Syntax: join_string.join(sequence)

Here, sequence can be string or list. 
* If the sequence is a string, then join() function inserts 'join_string' between each character of the string sequence and returns the concatenated string. 
* If the sequence is a list, then join() function inserts 'join_string' between each item of list sequence and returns the concatenated string. All the items of the list should be of type string.

In [58]:
s1 = ' '
s2 = "DataScience"
s3 = ["Data","Science"]

print(s1.join(s2))
print(s1.join(s3))

D a t a S c i e n c e
Data Science


In [61]:
s1 = '!?'
print(s1.join(s2))
print(s1.join(s3))

D!?a!?t!?a!?S!?c!?i!?e!?n!?c!?e
Data!?Science


In [62]:
s4 = ["Data",123]

" ".join(s4)

TypeError: sequence item 1: expected str instance, int found

### Splitting

Python provides split() method to break a string on specified delimiter (or seperator). It returns a list of string items.

Syntax: input_string.split([separator [, maxsplit]])

Here, 
* separator is the delimiter string and is optional. Default value is whitespace.
* maxsplit indicates maximum splits to be done, i.e. split() will perform 'maxsplit' splits and hence return maxsplit+1 items in the output list. It is also an optional parameter, in case it it not given (or specified as -1), then there is no limit on the number of splits and hence all possible splits will occur.

In [24]:
s1 = "Modulus Data Science Course" 

s1.split()

['Modulus', 'Data', 'Science', 'Course']

In [67]:
s1.split(' ')

['Modulus', 'Data', 'Science', 'Course']

In [68]:
s1.split('')

ValueError: empty separator

In [69]:
s1.split("#")

['Modulus Data Science Course']

In [25]:
s1

'Modulus Data Science Course'

In [71]:
s1.split("a")

['Modulus D', 't', ' Science Course']

In [72]:
s1.split("at")

['Modulus D', 'a Science Course']

In [74]:
s1.split(" ",2)

['Modulus', 'Data', 'Science Course']

In [75]:
s1.split(,2)

SyntaxError: invalid syntax (<ipython-input-75-b8b523240fe3>, line 1)

In [76]:
s1.split(" ",-1)

['Modulus', 'Data', 'Science', 'Course']

## Formatting Strings

Usually formatting is required in printing the strings.

3 ways to do so:
* using %s
* format() method
* using f'' 

In [96]:
st = 'Data Science'

"The objective of the course is to introduce you to the world of " + st + "."

'The objective of the course is to introduce you to the world of Data Science.'

In [95]:
"The objective of the course is to introduce you to the world of %s."%st

'The objective of the course is to introduce you to the world of Data Science.'

In [27]:
st1 = 'Data'
st2 = 'Science'

"The objective of the course is to introduce you to the world of %s %s."%(st1,st2)

'The objective of the course is to introduce you to the world of Data Science.'

In [28]:
"The objective of the course is to introduce you to the world of " + st1 + " " + st2 + "."

'The objective of the course is to introduce you to the world of Data Science.'

In [112]:
st12 = ('Data','Science')

"The objective of the course is to introduce you to the world of %s."%st12

TypeError: not all arguments converted during string formatting

In [110]:
"The objective of the course is to introduce you to the world of {value}.".format(value=st12)

"The objective of the course is to introduce you to the world of ('Data', 'Science')."

In [113]:
"The objective of the course is to introduce you to the world of {value1} {value2}.".format(value1=st1,value2=st2)

# as value parameters increase, readability reduces

'The objective of the course is to introduce you to the world of Data Science.'

In [29]:
f'The objective of the course is to introduce you to the world of {st1} {st2}.'

'The objective of the course is to introduce you to the world of Data Science.'

TEASER

f'' with width and precision optional parameters

In [183]:
width = 10
precision = 5
value = 12.34567

print(f'result: {value}')

print(f'result: {value:{width}.{precision}}')

print(f'result: {value:{width}}')

print(f'result: {value:.{precision}}')

result: 12.34567
result:     12.346
result:   12.34567
result: 12.346


What happens if value in above example is actually a string?

What happens if there are two values with their own width and precision parameters in above example?

### Escape sequences

Escape sequences, also known as control sequences, are combination of a backslash (\) followed by either a letter or a combination of letters and digits. 

The idea of using backslash (\) character is to substitute the meaning of characters that follow it by an alternate interpretation. 

![image.png](attachment:image.png)

In [123]:
"Data 
Science"

SyntaxError: EOL while scanning string literal (<ipython-input-123-68fc903ad8a4>, line 1)

In [125]:
"Data \
Science"

'Data Science'

In [164]:
print('One of the Python's advantage is that it is an open- source programming language.')

SyntaxError: invalid syntax (<ipython-input-164-5e1877d19489>, line 1)

In [165]:
print('One of the Python\'s advantage is that it is an open- source programming language.')

One of the Python's advantage is that it is an open- source programming language.


In [166]:
print('One of the Python\'s advantage is \nthat it is an \ropen- \bsource \t programming \\ language.')

One of the Python's advantage is 
that it is an open- source 	 programming \ language.


### Unicodes

Represents different character encodings.

Most common encodings are UTF-8 and UTF-16.

Regular Python strings are not Unicode. 
* Escape character \u is used to convert part of string into Unicode
* 'u' prefix on the string literal is used to convert complete Python strings into Unicode

In [162]:
print("\u20B9", type("\u20B9"))
print(u'\u20B9', type(u'\u20B9'))

₹ <class 'str'>
₹ <class 'str'>


In [160]:
# Octal and Hex values, respectively. Not Unicodes.
print("\046")
print("\x24")

&
$


### Raw strings

A raw string is created by prefixing the character 'r' to the string. In Python, a raw string ignores all types of formatting within a string including the escape characters.

In [167]:
print(r'One of the Python\'s advantage is \nthat it is an \ropen- \bsource \t programming \\ language.')

One of the Python\'s advantage is \nthat it is an \ropen- \bsource \t programming \\ language.


## Summary

* What is a string in Python? 
    * A sequence of characters
    * Immutable i.e. cannot be changed once created
* How to create a string? 
    * Enclose the sequence within single or double (or even triple) quotes or use str() built-in function
* String operations
    * Concatenation, replication, membership, comparison 
    * Built-in functions
    * Methods
* Accessing characters in a string using index
* Slicing, joining, splitting, formatting strings

## Sample Python Codes 

In [30]:
# Code 1: given a string, and a sub string, move the sub string in the input string to the end
'''
Example:
input_string = 'Python is essential in Data Science domain'
sub_string = ' essential'
output_string = 'Python is in Data Science domain essential'
'''

'''
There could be multiple ways to write the code
    - use different combination Python built-in methods
    - write your own logic using loops/if statements 
'''

input_string = 'Python is essential in Data Science domain'
sub_string = ' essential'

print("Input string:\t" + input_string) 
print("Sub string:\t" + sub_string)

Input string:	Python is essential in Data Science domain
Sub string:	 essential


In [31]:
# Solution 1:
# Using replace() + '+' operator 
  
output_string = input_string.replace(sub_string, '') + sub_string 

print("Output string: " + output_string)  

Output string: Python is in Data Science domain essential


In [32]:
print(input_string.replace(sub_string, ''))

Python is in Data Science domain


In [9]:
# Solution 2:
# Using string slicing, find() and len() + "+"

output_string = input_string[:input_string.find(sub_string)] + \
                input_string[input_string.find(sub_string) + len(sub_string):] + \
                sub_string 

print("Output string: " + output_string)  

Output string: Python is in Data Science domain essential


In [33]:
input_string

'Python is essential in Data Science domain'

In [8]:
print(input_string[:input_string.find(sub_string)]) 
print(input_string[input_string.find(sub_string) + len(sub_string):])

Python is
 in Data Science domain


In [34]:
print(input_string.find(sub_string))

9


In [35]:
len(sub_string)

10

In [37]:
# Solution 3:
# Using while loops, if statments, len() and '+' operator

i = 0;
flag = True
while(flag and i < len(input_string)-len(sub_string)+1):
    if input_string[i] == sub_string[0]:
        flag = False;
        j = i+1;
        k = 1;
        while(k < len(sub_string)):
            if input_string[j] != sub_string[k]:
                flag = True
                break
            j = j+1
            k = k+1
    i = i+1

#i-1 is the index indicating the start of the  sub string in the input string
# above code is essentially doing what find() does

output_string = ''
l = 0
while(l < i-1):
    output_string += input_string[l]
    l = l+1
    
print(output_string)

l = l + len(sub_string)
while(l < len(input_string)):
    output_string += input_string[l]
    l = l+1

print(output_string)
    
# above 2 while loops are essentially doing slicing
    
output_string += sub_string

print("Output string: " + output_string)

Python is
Python is in Data Science domain
Output string: Python is in Data Science domain essential


## Multiple Choice Questions and Programming