---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.4</h1>

## _strings.ipynb_
#### [Click me to learn more about Python strings](https://www.w3schools.com/python/python_strings.asp)
#### [Click me to learn more about string methods](https://www.w3schools.com/python/python_ref_string.asp)

## Learning agenda of this notebook
1. Defining strings in Python
2. Accessing characters of a string in Python
3. Strings are immutable
4. Slicing strings
5. String concatenation
6. Creating large strings

7. String Methods: `lower()`, `upper()`, `strip()`, `startswith()`, `split()`, `join()`, `find()`, `replace()`, `format()`
8. String Membership test

## 1. Defining Strings in Python
- A string is a collection of character(s) closed within single or double quotation marks. (There is no `char` data type in Python as in C/C++)
- A string can also contain a single character or be entirely empty.
- To make a single quote part of a string, define the string using double quotes and vice versa. You can also make use of escape sequence

In [1]:
string1 = 'Hello'
print(string1)

string2 = "World"
print(string2)

string3 = ""
print(string3)

string4 = "A"
print(string4)

Hello
World

A


In [2]:
# triple quotes string can extend multiple lines

string5 = """Hello, This is
            multi-line string"""
print(string5)

string5 = '''Hello, This is
            multi-line string'''
print(string5)

Hello, This is
            multi-line string
Hello, This is
            multi-line string


## 2. Accessing Characters of a String in Python
- Since string is of type sequence, and any component within a sequence can be accessed by entrying an index within square brackets. So naturally this work for strings as well
- Similarly, if we want to find out the index of a specific item/character, we can use the `str.index()` method

In [37]:
str = 'Python Programming is fun'
print('str = ', str)

#access first index
print('str[0] = ', str[0])

# Negative indices start from the opposite end of the string. Hence, -1 index corresponds to the last character
print('str[-1] = ', str[-1])

#access second last index
print('str[-2] = ', str[-2])

#print(str[17])     #access an index out of the range, will get error

#print(str[1.5])    #use numbers other than an integer as index will flag an error

str =  Python Programming is fun
str[0] =  P
str[-1] =  n
str[-2] =  u


In [2]:
# To find out the index of a specific character
str = "Python Programming is fun"
str.index('t')

2

## 3. Strings are Immutable

In [7]:
#strings are immutable, means string object does not support item assignment
str1 = 'ArifButt'

#str1[5] = 'c'

print(id(str1))

#assigning a new value is valid
str1 = 'python'

print(id(str1))

140393967741808
140393947361200


The object `ArifButt` is now orphan, since there is no variable referring to it now. 

## 4. Slicing Strings
- Slicing is the process of obtaining a portion (substring) of a string by using its indices.
- Given a string, we can use the following template to slice it and obtain a substring:
```
string[start:end]
```

- **start** is the index from where we want the substring to start. If start is not provided, slicing starts from the beginning.
- **end** is the index where we want our substring to end (not inclusive in the substring). If end is not provided, slicing goes till the end of the string (includes the last character of the string).

In [22]:
str = 'DataScienceToolsAndTechniques'

print(str[0:4]) # From the start till before the 4th index
print(str[:4]) # From the start till before the 4th index
print(str[11:16])
print(str[19:]) # From the 19th index till the end
print(str[19:len(str)]) # From the 19th index till the end
#if start is greater than end, it will return empty string
print(str[5:2])

Data
Data
Tools
Techniques
Techniques



### a. Slicing with a Step 
- Until now, we’ve used slicing to obtain a contiguous piece of a string, i.e., all the characters from the starting index to before the ending index are retrieved.
- However, we can define a step through which we can skip characters in the string. The default step is 1, so we iterate through the string one character at a time.
- The step is defined after the end index:
```
string[start:end:step]
```

In [20]:
str = 'DataScienceToolsAndTechniques'
print(str[::])  # A default step of 1
print(str[::1])  # A step of 1
print(str[::2])  # A step of 2

DataScienceToolsAndTechniques
DataScienceToolsAndTechniques
DtSineolAdehius


### b. Reverse Slicing
- Strings can also be sliced to return a reversed substring. 
- For reverse slicing we need to give a negative step
- For reverse slicing the `start` index must be less than the `end` index, otherwise an empty string will be returned

In [4]:
str = '0123456789'
print(str[::-1]) 
print(str[5:1:-1]) 
print(str[2:10:-1])
print(str[::-2]) 

9876543210
5432

97531


## 5. String Concatenation
- Two strings can be joined or concatenated using the `+` operator

In [9]:
str1 = 'Hello'
str2 =' World!'
str3 = str1 + str2
print('str1 + str2 = ', str3)


print("Y" + str3[1:])

str1 + str2 =  Hello World!
Yello World!


## 6. Creating Large strings
- A string can be replicated/repeated using the `*` operator

In [8]:
str1 = 'Hello'
print('str1 * 3 =', str1 * 5)

buffer = 'A' * 100
print(buffer)

str1 * 3 = HelloHelloHelloHelloHello
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


## 7. String Methods
- Strings in Python have many built-in *methods* that are used to manipulate them. Let's try out some common string methods.
> **Methods**: Methods are functions associated with data types and are accessed using the `.` notation e.g. `variable_name.method()` or `"a string".method()`. Methods are a powerful technique for associating common operations with values of specific data types.
- Note that all string methods return new values and DO NOT change the existing string. 
- [Click me to learn more about string methods](https://www.w3schools.com/python/python_ref_string.asp)
- You can find a full list of string methods here: https://www.w3schools.com/python/python_ref_string.asp. 

### a. The `len()`, `str.lower()`, `str.upper()` and `str.capitalize()` methods
- The `len()` is a built-in function that returns the number of items of a container data type passed as argument
- The `str.lower()` method return a copy of the string converted to lowercase.
- The `str.upper()` method return a copy of the string converted to uppercase.
- The `str.capitalize()` method return a capitalized version of the string.

In [15]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [14]:
str= "Hello World"
mylist = [1,2,3,4,5]
len(str)
len(mylist)

5

In [21]:
str="DS"
help(str.lower)

Help on built-in function lower:

lower() method of builtins.str instance
    Return a copy of the string converted to lowercase.



In [1]:
str = 'LearNing is Fun with Arif'
print('Orignial string = ', str)

rv = len(str)
print('len(str) = ', rv)

rv = str.lower()
print('str.lower() = ', rv)
      
print('str.upper() = ', str.upper())

rv = str.capitalize()
print('str.capitalize() = ', rv)
print('Orignial string = ', str)


Orignial string =  LearNing is Fun with Arif
len(str) =  25
str.lower() =  learning is fun with arif
str.upper() =  LEARNING IS FUN WITH ARIF
str.capitalize() =  Learning is fun with arif
Orignial string =  LearNing is Fun with Arif


### b. The `str.strip()` method
- The `str.strip()` method removes whitespace characters from the beginning and end of a string.

In [27]:
str="DS"
help(str.strip)

Help on built-in function strip:

strip(chars=None, /) method of builtins.str instance
    Return a copy of the string with leading and trailing whitespace removed.
    
    If chars is given and not None, remove characters in chars instead.



In [2]:
buffer ="    hello world, this is       Arif Butt      "
rv = buffer.strip()
print(buffer)
print(rv)

    hello world, this is       Arif Butt      
hello world, this is       Arif Butt


### c. The `str.startswith()` method
The `str.startswith()` method Return True if str starts with the specified prefix, False otherwise.
```
str.startswith(prefix[, start[, end]])
```

In [28]:
str="DS"
help(str.startswith)

Help on built-in function startswith:

startswith(...) method of builtins.str instance
    S.startswith(prefix[, start[, end]]) -> bool
    
    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.



In [42]:
str = "Learning is fun with Arif Butt"

rv = str.startswith('Learning')
print(rv)

rv = str.startswith('Arif')
print(rv)


rv = str.startswith('Arif', 21)
print(rv)

# case sensitive
rv = str.startswith('arif', 21)
print(rv)

rv = str.startswith('arn', 2, 5)  # character at ending index is not included
print(rv)


True
False
True
False
True


### d. The `str.split()` and `str.join()` method
- The `str.split()` method splits a string into a list of strings at every occurrence of provided character(s).
- The `str.join()` method takes all items in an iterable and joins them into one string.

In [43]:
str="DS"
help(str.split)

Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.



In [49]:
str1 = 'Learning is fun with Arif Butt'
mylist = str1.split()
print(mylist)

print(str1.split('i'))

['Learning', 'is', 'fun', 'with', 'Arif', 'Butt']
['Learn', 'ng ', 's fun w', 'th Ar', 'f Butt']


In [48]:
str="DS"
help(str.join)

Help on built-in function join:

join(iterable, /) method of builtins.str instance
    Concatenate any number of strings.
    
    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.
    
    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'



In [59]:
# The join() method takes all items in an iterable and joins them into one string.
mylist = ['Learning', 'is', 'fun', 'with', 'Arif']

#Note the separator is space character
mystr = ' '.join(mylist)

print(mylist, type(mylist))
print(mystr, type(mystr))

['Learning', 'is', 'fun', 'with', 'Arif'] <class 'list'>
Learning is fun with Arif <class 'str'>


In [60]:
# The join() method takes all items in an iterable and joins them into one string.
mylist = ['Arif', 'Rauf', 'Maaz', 'Hadeed', 'Mujahid', 'Mohid']

#Note the separator is hash character
mystr = '#'.join(mylist)

print(mylist, type(mylist))
print(mystr, type(mystr))

['Arif', 'Rauf', 'Maaz', 'Hadeed', 'Mujahid', 'Mohid'] <class 'list'>
Arif#Rauf#Maaz#Hadeed#Mujahid#Mohid <class 'str'>


### e. The `str.find()` method
- The `str.find()` method is used to find a substring from within a string we can use the find() method, which returns the first index at which a substring occurs in a string. If no instance of the substring is found, the method returns -1.
```
str.find(substring, start, end)
    where substring is what we are searching for,
    start is the index from which we start searching in string named str,
    end is the index where we stop our search in string named str
    start and end are optional.
```

In [61]:
str="DS"
help(str.find)

Help on built-in function find:

find(...) method of builtins.str instance
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



In [71]:
str = 'DataScienceToolsAndTechniques'
print(str.find('Data'))
print(str.find('And'))


print(str.find('S',2)) # second argument starts searching from that index
print(str.find('s',2)) # case sensitive

print(str.find('S',0, 4)) # third argument stops searching uptill that index
print(str.find('S',0, 5)) 

0
16
4
15
-1
4


### f. Use `str.replace()` method to find a substring
- The `str..replace()` method replaces a part of the string with another string.
```
str.replace(substring_to_be_replaced, new_string, count=-1)
```
- Note that `replace` returns a new string, and the original string is not modified.

In [72]:
str="DS"
help(str.replace)

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.



In [74]:
print("hello".replace("e","a"))

hallo


In [75]:
str = 'Welcome to Learning Data Science with Arif'
newstring = str.replace('Data Science', 'Life')
print(str)
print(newstring)

Welcome to Learning Data Science with Arif
Welcome to Learning Life with Arif


### g. The `str.format()` method
- The `str.format()` method combines values of other data types, e.g., integers, floats, booleans, lists, etc. with strings. 
- You can use `str.format()` to construct output messages for display in the Python built-in `print()` function.
- You put placeholders `{}` within the format string of `print()` function, and the arguments to the `str.format()` method are the variable names
- The values of the variables are replaced with the arguments provided to the `str.format()` method.

In [76]:
str="DS"
help(str.format)

Help on built-in function format:

format(...) method of builtins.str instance
    S.format(*args, **kwargs) -> str
    
    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').



In [79]:
#Example 1:
age = 51;    name="Arif Butt"

print("Mr. {0}, you are {1} years old." .format(name, age))

Mr. Arif Butt, you are 51 years old.


In [83]:
#Example 2:
name="Hadeed Butt"
cost = 100
discount = .2
bill = cost - cost * discount

print("Mr. {0}, your total cost is {1}, percentage discount is {2}, and bill is {3}" 
      . format(name, cost, discount, bill))


Mr. Hadeed Butt, your total cost is 100, percentage discount is 0.2, and bill is 80.0


### Comparing two strings using `is` operator and `==` operator

In [84]:
# Let us check out the IDs of the following two variables. Like numbers type of variable, 
# they are same as both a and b refers to the same memory location containing string 'hello'
a = 'hello'
b = 'hello'
id(a), id(b)

(140414932089968, 140414932089968)

In [92]:
# in case of strings, both a and b refers to the same memory location containing string 'hello'
a = 'hello'
b = 'hello'

# The `is` operator checks the memory address of two strings 
print (a is b) 
# The `==` operator checks the contents of two strings
print (a == b) 



print(a is not b)
print (a != b)

True
True
False
False


In [93]:
# both x and y refers to two different memory locations containing string 'hello'
x = 'hello'
y = 'bye'

# The `is` operator checks the memory address of two strings 
print (x is y) 
# The `==` operator checks the contents of two strings
print (x == y) 



print(x is not y)
print (x != y)

False
False
True
True


### String Membership test using `in` operator

In [17]:
'a' in 'DataScience'

True

In [18]:
'th' not in 'python'

False

## Check your Concepts

Try answering the following questions to test your understanding of the topics covered in this notebook:

1. What are the container types available in Python?
2. What kind of data does the String data type represent?
3. What are the different ways of creating strings in Python?
4. What is the difference between strings created using single quotes, i.e. `'` and `'` vs. those created using double quotes, i.e. `"` and `"`?
5. How do you create multi-line strings in Python?
6. What is the newline character, `\n`?
7. What are escaped characters? How are they useful?
8. How do you check the length of a string?
9. How do you convert a string into a list of characters?
10. How do you access a specific character from a string?
11. How do you access a range of characters from a string?
12. How do you check if a specific character occurs in a string?
13. How do you check if a smaller string occurs within a bigger string?
14. How do you join two or more strings?
15. What are "methods" in Python? How are they different from functions?
16. What do the `.count`, `.isalnum` and `.isalpha` methods on strings do?
17. How do you replace a specific part of a string with something else?
18. How do you split the string "Sun,Mon,Tue,Wed,Thu,Fri,Sat" into a list of days?
19. How do you remove whitespace from the beginning and end of a string?
20. What is the string `.format` method used for? Can you give an example?
21. What are the benefits of using the `.format` method instead of string concatenation?
22. How do you convert a value of another type to a string?
23. How do you check if two strings have the same value?
24. Where can you find the list of all the methods supported by strings?