# Python Strings

In this Session we will learn about various string functions and methods.

### Built-in functions to Work with Python



Some of the commonly used ones are **`enumerate()`** and **`len()`**. 

In [1]:
str = 'cold'

# enumerate()
list_enumerate = list(enumerate(str))
print('list(enumerate(str) = ', list_enumerate)

#character count
print('len(str) = ', len(str))

list(enumerate(str) =  [(0, 'c'), (1, 'o'), (2, 'l'), (3, 'd')]
len(str) =  4


## Python String Formatting

### Escape Sequence



In [2]:
print("He said, "What's there?"")

SyntaxError: invalid syntax (Temp/ipykernel_16024/3245963823.py, line 1)

In [3]:
# using triple quotes
print('''He said, "What's there?"''')

# escaping single quotes
print('He said, "What\'s there?"')

# escaping double quotes
print("He said, \"What's there?\"")

He said, "What's there?"
He said, "What's there?"
He said, "What's there?"


### Here is a list of all the escape sequences supported by Python.

| Escape Sequence | Description |
|:----:| :--- |
| **`\newline`** |   Backslash and newline ignored  | 
| **`\\`** |   Backslash | 
| **`\'`** |   Single quote | 
| **`\"`** |   Double quote | 
| **`\a`** |   ASCII Bell | 
| **`\b`** |   ASCII Backspace | 
| **`\f`** |   ASCII Formfeed | 
| **`\n`** |   ASCII Linefeed | 
| **`\r`** |   ASCII Carriage Return |
| **`\t`** |   ASCII Horizontal Tab | 
| **`\v`** |   ASCII Vertical Tab | 
| **`\ooo`** |   Character with octal value ooo | 
| **`\xHH`** |   Character with hexadecimal value HH | 

In [4]:
# Escape sequence

print('I hope every one enjoying the python tutorials.\nDo you ?') # '\n' line break
print('Days\tChapters\tTopics')  # '\t' tab space
print('Day 1\tChp 1\tPython Introduction')
print('Day 2\tChp 2\tPython Datatypes')
print('Day 3\tChp 3\tPython Flow Control')
print('Day 4\tChp 4\tPython Functions')
print('Day 5\tChp 5\tPython Files')
print('This is a back slash  symbol (\\)') # To write a back slash
print('In every programming language it starts with \"Hello, World!\"')

I hope every one enjoying the python tutorials.
Do you ?
Days	Chapters	Topics
Day 1	Chp 1	Python Introduction
Day 2	Chp 2	Python Datatypes
Day 3	Chp 3	Python Flow Control
Day 4	Chp 4	Python Functions
Day 5	Chp 5	Python Files
This is a back slash  symbol (\)
In every programming language it starts with "Hello, World!"


In [5]:
# Here are some examples

print("C:\\Python32\\Lib")
#C:\Python32\Lib

print("This is printed\nin two lines")
#This is printed
#in two lines

print("This is \x48\x45\x58 representation")
#This is HEX representation

C:\Python32\Lib
This is printed
in two lines
This is HEX representation


### Raw String to ignore escape sequence



In [6]:
print("This is \x61 \ngood example")

This is a 
good example


In [7]:
print(r"This is \x61 \ngood example")

This is \x61 \ngood example


### The `format()` Method for Formatting Strings


In [8]:
# Python string format() method

# default(implicit) order
default_order = "{}, {} and {}".format('Allan','Bill','Cory')
print('\n--- Default Order ---')
print(default_order)

# order using positional argument
positional_order = "{1}, {0} and {2}".format('Allan','Bill','Cory')
print('\n--- Positional Order ---')
print(positional_order)

# order using keyword argument
keyword_order = "{s}, {b} and {j}".format(j='Allan',b='Bill',s='Cory')
print('\n--- Keyword Order ---')
print(keyword_order)


--- Default Order ---
Allan, Bill and Cory

--- Positional Order ---
Bill, Allan and Cory

--- Keyword Order ---
Cory, Bill and Allan


In [9]:
# formatting integers
"Binary representation of {0} is {0:b}".format(12)

'Binary representation of 12 is 1100'

In [10]:
# formatting floats
"Exponent representation: {0:e}".format(1966.365)

'Exponent representation: 1.966365e+03'

In [11]:
# round off
"One third is: {0:.3f}".format(1/3)

'One third is: 0.333'

In [12]:
# string alignment
"|{:<10}|{:^10}|{:>10}|".format('bread','butter','jam')

'|bread     |  butter  |       jam|'

### Old style formatting

We can even format strings like the old **`sprintf()`** style used in C programming language. We use the **`%`** operator to accomplish this.

In [None]:
x = 36.3456789
print('The value of x is %3.2f' %x)

In [None]:
print('The value of x is %3.4f' %x)

## Common Python String Methods



In [13]:
# Example:

s="heLLo wORLd!"
print(s.capitalize(),"vs",s.title())

print("upper case: '%s'"%s.upper(),"lower case: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase())

print('|%s|' % "Hello World".center(30)) # center in 30 characters

print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace

print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!"))

print("Hello World".replace("World","Class"))

Hello world! vs Hello World!
upper case: 'HELLO WORLD!' lower case: 'hello world!' and swapped: 'HEllO WorlD!'
|         Hello World          |
|lots of space|
%s without leading/trailing d,h,L or ! = |%s| eLLo wOR
Hello Class


In [14]:
# capitalize(): 

challenge = 'Python Datatypes'
print(challenge.capitalize()) # 'Python Datatypes'

Python datatypes


In [15]:
# count(): 

challenge = 'Python Datatypes'
print(challenge.count('y')) # 2
print(challenge.count('y', 6, 14)) # 1
print(challenge.count('ty')) # 1

2
1
1


In [16]:
# endswith(): 

challenge = 'Python Datatypes'
print(challenge.endswith('es'))   # True
print(challenge.endswith('type')) # False

True
False


In [17]:
# expandtabs(): 

challenge = 'Python\tDatatypes'
print(challenge.expandtabs())   # 'Python  Datatypes'
print(challenge.expandtabs(10)) # 'Python    Datatypes'

Python  Datatypes
Python    Datatypes


In [18]:
# find():

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('u')) # -1

1
-1


In [19]:
# format()	
first_name = 'Ajantha'
last_name = 'Devi'
job = 'Research Head'
country = 'India'
sentence = 'I am {} {}. I am working as {}. I live in {}.'.format(first_name, last_name, job, country)
print(sentence) # I am Ajantha Devi. I am working as Research Head. I live in India.

I am Ajantha Devi. I am working as Research Head. I live in India.


In [20]:
# index():

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('th')) # 2

1
2


In [21]:
# isalnum(): 

challenge = 'PythonDatatypes'
print(challenge.isalnum()) # True

challenge = 'Pyth0nDatatypes'
print(challenge.isalnum()) # True

challenge = 'Python Datatypes'
print(challenge.isalnum()) # False

challenge = 'Python Datatypes 2021'
print(challenge.isalnum()) # False

True
True
False
False


In [22]:
# isalpha(): 

challenge = 'PythonDatatypes'
print(challenge.isalpha()) # True

num = '123'
print(num.isalpha())      # False

True
False


In [23]:
# isdecimal(): 

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('th')) # 2

1
2


In [24]:
# isdigit(): 

challenge = 'Ninety'
print(challenge.isdigit()) # False
challenge = '90'
print(challenge.isdigit())   # True

False
True


In [25]:
# isdecimal():
num = '30'
print(num.isdecimal()) # True
num = '30.6'
print(num.isdecimal()) # False

True
False


In [26]:
# isidentifier():

challenge = '2021PythonDatatypes'
print(challenge.isidentifier()) # False, because it starts with a number
challenge = 'Python_Datatypes'
print(challenge.isidentifier()) # True

False
True


In [27]:
# islower():

challenge = 'python datatypes'
print(challenge.islower()) # True
challenge = 'Python datatypes'
print(challenge.islower()) # False

True
False


In [28]:
# isupper(): 
challenge = 'python datatypes'
print(challenge.isupper()) #  False
challenge = 'PYTHON DATATYPES'
print(challenge.isupper()) # True

False
True


In [29]:
# isnumeric():

num = '90'
print(num.isnumeric())         # True
print('ninety'.isnumeric())    # False

True
False


In [30]:
# join(): 

web_tech = ['HTML', 'CSS', 'JavaScript', 'React']
result = '#, '.join(web_tech)
print(result) # 'HTML# CSS# JavaScript# React'

HTML#, CSS#, JavaScript#, React


In [31]:
# strip(): 

challenge = ' python datatypes '
print(challenge.strip('y')) # 5

 python datatypes 


In [32]:
# replace(): 

challenge = 'python datatypes'
print(challenge.replace('datatypes', 'data-types')) # 'thirty days of coding'

python data-types


In [33]:
# split():

challenge = 'python datatypes'
print(challenge.split()) # ['python', 'datatypes']

['python', 'datatypes']


In [34]:
# title(): 

challenge = 'python datatypes'
print(challenge.title()) # Python Datatypes

Python Datatypes


In [35]:
# swapcase(): 
  
challenge = 'python datatypes'
print(challenge.swapcase())   # PYTHON DATATYPES
challenge = 'Python Datatypes'
print(challenge.swapcase())  # pYTHON dATATYPES

PYTHON DATATYPES
pYTHON dATATYPES


In [36]:
# startswith(): 

challenge = 'python datatypes'
print(challenge.startswith('python')) # True
challenge = '2 python datatypes'
print(challenge.startswith('two')) # False

True
False


#### Inspecting Strings

There are also lost of ways to inspect or check strings. Examples of a few of these are given here:

* Checking the start or end of a string: **`startswith("string")`** and **`endswith("string")`** checks if it starts/ends with the string given as argument

* Capitalisation: There are boolean counterparts for all forms of capitalisation, such as **`isupper()`**, **`islower()`** and **`istitle()`**

* Character type: does the string only contain the characters:
  * 0-9: **`isdecimal()`**. Note there is also **`isnumeric()`** and **`isdigit()`** which are effectively the same function except for certain unicode characters
  * a-zA-Z: **`isalpha()`** or combined with digits: **`isalnum()`**
  * non-control code: **`isprintable()`** accepts anything except '\n' an other ASCII control codes
  * \t\n \r (white space characters): **`isspace()`**
  * Suitable as variable name: **`isidentifier()`**
  
* Find elements of string: **`s.count(w)`** finds the number of times **`w`** occurs in **`s`**, while **`s.find(w)`** and **`s.rfind(w)`** find the first and last position of the string **`w`** in **`s`**.

In [37]:
# Example:

s="Hello World"
print("The length of '%s' is"%s,len(s),"characters") # len() gives length of the string

s.startswith("Hello") and s.endswith("World") # check start/end

# count strings
print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s))

print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1

The length of 'Hello World' is 11 characters
There are 3 'l's but only 1 World in Hello World
"el" is at index 1 in Hello World


## 💻 Exercises ➞ <span class='label label-default'>String</span>

1. Concatenate the string **`Python`**, **`4`**, **`Data`**, **`Science`** to a single string, **`Python 4 Data Science`**.

2. Declare a variable named **`course`** and assign it to an initial value **`Python 4 Data Science`**.

3. Print the length of the **`course`** string using **[len()](https://github.com/milaan9/04_Python_Functions/blob/main/002_Python_Functions_Built_in/040_Python_len%28%29.ipynb)** method and **[print()](https://github.com/milaan9/04_Python_Functions/blob/main/002_Python_Functions_Built_in/051_Python_print%28%29.ipynb)**.

4. Change all the characters of variable company to uppercase and lowercase letters using **[upper()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/026_Python_String_upper%28%29.ipynb)** and **[lower()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/025_Python_String_lower%28%29.ipynb)** method.

5. Use **[capitalize()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/001_Python_String_capitalize%28%29.ipynb)**, **[title()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/042_Python_String_title%28%29.ipynb)**, **[swapcase()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/027_Python_String_swapcase%28%29.ipynb)** methods to format the value of the string **`Python 4 Data Science`**.

6. Cut(slice) out the first word of **`Python 4 Data Science`**.

7. Check if **`Python 4 Data Science`** string contains a word **`Python`** using the method: **[index()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/010_Python_String_index%28%29.ipynb)**, **[find()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/008_Python_String_find%28%29.ipynb)** or other methods.

8. Change **`Python 4 Data Science`** to **`Python 4 Everybody`** using the **[replace()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/035_Python_String_replace%28%29.ipynb)** method or other methods.

9. Split the string **`Python 4 Data Science`** using space as the separator (**[split()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/038_Python_String_split%28%29.ipynb)**).

11. **`Google, Facebook, Microsoft, Apple, IBM, Oracle, Amazon`** split the string at the comma.

12. What is the character at index 9 in the string **`Python 4 Data Science`**.

13. What is the second last index of the string **`Python 4 Data Science`**.

14. Create an acronym or an abbreviation for the name **`Python 4 Data Science`**.

15. Use **[index()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/010_Python_String_index%28%29.ipynb)** to determine the position of the first occurrence of **`D`** in **`Python 4 Data Science`**.

16. Use **[rfind](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/036_Python_String_rfind%28%29.ipynb)** to determine the position of the last occurrence of **`e`** in **`Python 4 Data Science`**.

17. Use **[index()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/010_Python_String_index%28%29.ipynb)** or **[find()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/008_Python_String_find%28%29.ipynb)** to find the position of the first occurrence of the word **`because`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

18. Use **[rindex](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/037_Python_String_rindex%28%29.ipynb)** to find the position of the first and last occurrence of the word **`because`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

19. Slice out the phrase **`‘because’, because ‘because’`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

20. Does **`Python 4 Data Science`** start with a substring **`Python`**?

21. Does '**`Python 4 Data Science`** contains with a substring **`Python`**?

22. **`             Python 4 DataScience                                  `** remove the left and right trailing spaces in the given string.

23. The following list contains the names of some of python libraries: **`['Django', 'Flask', 'Bottle', 'Pyramid', 'Falcon']`**. Join the list with a hash with space string.

24. Which one of the following variables return True when we use the method **[isidentifier()](https://github.com/milaan9/02_Python_Datatypes/blob/main/002_Python_String_Methods/015_Python_String_isidentifier%28%29.ipynb)**
    - ```py
    2021PythonDataypes
    Python_Dataypes_2021
    ```
25. Make the following using string formatting methods:

    - ```py
8 + 6 = 14
8 - 6 = 2
8 * 6 = 48
8 / 6 = 1.33
8 % 6 = 2
8 // 6 = 1
8 ** 6 = 262144
    ```

26. Use a **new line** and **tab** escape sequence to print the following lines.
    - ```py
    Name      Age     Country   City
    Ajantha    96      India    Chennai
    ```

## Advanced string processing
For more advanced string processing there are many libraries available in Python including for example:
* **re** for regular expression based searching and splitting of strings
* **html** for manipulating HTML format text
* **textwrap** for reformatting ASCII text
* ... and many more