# Python: Strings

This notebook covers **strings** in Python, including all built-in string methods, their usage, and practical problems to reinforce learning. Strings are immutable sequences of characters used to represent text.

## Topics Covered
1. Introduction to Strings
2. Creating and Accessing Strings
3. String Methods (Comprehensive List)
4. String Formatting and Concatenation
5. Common String Operations
6. Practical Problems and Solutions
7. Connection to `handson-ml2` Repository

## 1. Introduction to Strings

- A **string** is a sequence of characters enclosed in single quotes (`'`), double quotes (`"`), or triple quotes (`'''` or `"""`).
- Strings are **immutable**, meaning they cannot be changed after creation.
- Strings support various methods for manipulation and are widely used for text processing.

## 2. Creating and Accessing Strings

- Create strings using quotes.
- Access characters using **indexing** (0-based) or **slicing** (`[start:end:step]`).
- Use `len()` to get the length of a string.

In [1]:
s1 = "Hello Python"
print(len(s1))

12


In [4]:
msg = "xHXeXlXlXoX XWXoXrXlXdX" # Stegnography
print(msg[0])
print("Actual message :", msg[1:len(msg):2]) # [start:end:increment]

x
Actual message : Hello World


In [5]:
msg = "XdXlXrXoXWX XoXlXlXeXH" # Stegnography
print("Actual message :", msg[-1:-(len(msg)+1):-2]) # [-1:-(len(msg)+1):-2] # msg[-1], msg[-3], msg[-5]...

Actual message : Hello World


In [6]:
print(type(msg))

<class 'str'>


## 3. String Methods

Python provides a rich set of built-in string methods. Below is a comprehensive list with examples. All methods return a new string since strings are immutable.

### Case Conversion Methods
- `capitalize()`: Capitalizes the first character.
- `casefold()`: Converts to lowercase for case-insensitive comparisons (stronger than `lower()`).
- `lower()`: Converts all characters to lowercase.
- `upper()`: Converts all characters to uppercase.
- `swapcase()`: Swaps case of all characters.
- `title()`: Capitalizes the first letter of each word.

In [13]:
text = "hello world!!!" # 0x10783a6b0 
print(hex(id(text)))
text = text.capitalize() # 0x10783dc70 (Hello world!!), free 0x10783a6b0
print(hex(id(text)))
print(text)

0x10783a6b0
0x10783dc70
Hello world!!!


In [None]:
text = "hello world!!!" # 0x10783a6b0 
print(hex(id(text)))
result = text.capitalize() # Memory consumption is more!!!
print(hex(id(result)))
print(result)

In [None]:
list1 = ["hello world", 1,2,3]
# [0x10783a6b0, 1,2,3 ]
list1[0] = "Hello world"
# [0x10783dc70, 1,2,3 ]

# Tuple : Perf (Good), can not be modified
# List  : Perf (average), can be modified

In [16]:
s1 = input("Please enter area name")
if s1.upper() == "US":
    print("Validated!!!")
else:
    print("Invalid")

Please enter area name us


Validated!!!


In [17]:
s1 = input("Please enter area name")
if s1.lower() == "us":
    print("Validated!!!")
else:
    print("Invalid")

Please enter area name US


Validated!!!


In [24]:
s1 = "Hello World!!!"
print(s1.casefold())
print(s1.lower())

hello world!!!
hello world!!!


In [25]:
s1 = "the news"
print(s1.title())

The News


In [26]:
s1 = "The News"
print(s1.swapcase())

tHE nEWS


### Searching and Counting Methods
- `count(sub)`: Counts occurrences of a substring.
- `find(sub)`: Returns the lowest index of substring (or -1 if not found).
- `rfind(sub)`: Returns the highest index of substring.
- `index(sub)`: Like `find()`, but raises ValueError if not found.
- `rindex(sub)`: Like `rfind()`, but raises ValueError.
- `startswith(prefix)`: Checks if string starts with prefix.
- `endswith(suffix)`: Checks if string ends with suffix.

In [5]:
text = "Python is fun, Python is great!"


In [29]:
s1 = "Python is great. PYTHON is easy to learn. python is popular!"
# How many times python is repeated in above string
def count_substring(main, sub):
    pass

count = count_substring(s1, "Python")
print(count)




None


In [32]:
s1 = "Python is great. PYTHON is easy to learn. python is popular!"
s1 = s1.lower()
print(s1.count("python"))


3


In [33]:
s1 = "Python is great. PYTHON is easy to learn. python is popular!"
print(s1.lower().count("python"))
# Cython package: c code --> library along with APIs
# Standard example: pandas (written using c/c++)

3


In [38]:
news = '''Apple has released new iphone, itab.
MS release product 01.
'''
print(f"Apple string found at position: {news.find("Apple")}")
print(f"Apple string found at position: {news.find("MS")}")
print(f"Apple string found at position: {news.find("Google")}")


Apple string found at position: 0
Apple string found at position: 37
Apple string found at position: -1


In [41]:
news = '''Apple has released new iphone, itab. 20-May
MS release product 01.
Apple announced new MAC. 21-May
'''
print(f"Apple string found at position: {news.rfind("Apple")}")


Apple string found at position: 67


In [43]:
news = '''Apple has released new iphone, itab. 20-May
MS release product 01.
Apple announced new MAC. 21-May
'''
offset = news.rfind("Apple")
print(news[offset:]) # s1[offset:] => s1[offset:end:1]


Apple announced new MAC. 21-May



In [47]:
files = ["abc.txt", "abc.xls", "abc.pdf"]
for file in files:
    if file.endswith(".txt"):
        print(f"File: {file} Should be opened with NOTEPAD")
    elif file.endswith(".xls"):
        print(f"File: {file} Should be opened with PANDAS")
    elif file.endswith(".pdf"):
        print(f"File: {file} Should be opened with PDF reader")
    else:
        print(f"#### There is no application to handled this file: {file}")
    

File: abc.txt Should be opened with NOTEPAD
File: abc.xls Should be opened with PANDAS
File: abc.pdf Should be opened with PDF reader


In [49]:
files = ["abc.txt", "abc.xls", "abc.pdf"]
for file in files:
    if file.endswith("xt"):
        print(f"File: {file} Should be opened with NOTEPAD")
    elif file.endswith("ls"):
        print(f"File: {file} Should be opened with PANDAS")
    elif file.endswith("df"):
        print(f"File: {file} Should be opened with PDF reader")
    else:
        print(f"#### There is no application to handled this file: {file}")
    

File: abc.txt Should be opened with NOTEPAD
File: abc.xls Should be opened with PANDAS
File: abc.pdf Should be opened with PDF reader


In [51]:
files = ["MJ song.mp3", "DJ song.mp3", "SJ song.mp3", "MJ song2.mp3"]
interesting_song = "MJ"
for file in files:
    if file.startswith("MJ"):
        print("Paying music: ", file)


Paying music:  MJ song.mp3
Paying music:  MJ song2.mp3


In [52]:
files = ["MJ song.mp3", "DJ song.mp3", "SJ song.mp3", "123 MJ song2.mp3"]
interesting_song = "MJ"
for file in files:
    if file.find("MJ") != -1:
        print("Paying music: ", file)


Paying music:  MJ song.mp3
Paying music:  123 MJ song2.mp3


### Modification Methods
- `replace(old, new)`: Replaces occurrences of old substring with new.
- `strip(chars)`: Removes leading/trailing characters (default: whitespace).
- `lstrip(chars)`: Removes leading characters.
- `rstrip(chars)`: Removes trailing characters.
- `expandtabs(tabsize)`: Expands tabs to spaces.
- `center(width)`: Centers string with padding.
- `ljust(width)`: Left-justifies string.
- `rjust(width)`: Right-justifies string.
- `zfill(width)`: Pads with zeros on the left.

In [53]:
passwd = input("Please enter password")
print(passwd)


Please enter password abc@123


abc@123


In [55]:
mailid = "abc@gmail.com"
# Replace gmail.com with XXXXXX
mailid = mailid.replace("gmail.com", "xxxxxx")# (old string, new string)
print(mailid)

abc@xxxxxx


In [56]:
mailid = '''abc@gmail.com
123@gmail.com
123@gmail.com'''
accent_letters = ["1", "2", "3"]
for letter in accent_letters:
    mailid = mailid.replace(letter, "x")
print(mailid)
# using re

abc@gmail.com
xxx@gmail.com
xxx@gmail.com


In [60]:
title = "the news"
title = title.upper()
print(title)
title = title.center(40) #             THE NEW               
print(title)

THE NEWS
                THE NEWS                


In [66]:
title = "the news"
title = title.upper()
print(title)
title = title.center(40, "+") #             THE NEW               
print(title)

THE NEWS
++++++++++++++++THE NEWS++++++++++++++++


In [64]:
title = "the news"
title = title.upper()
print(title)
title = title.ljust(40,"-")
print(title)

THE NEWS
THE NEWS--------------------------------


In [65]:
title = "the news"
title = title.upper()
print(title)
title = title.rjust(40,"-")
print(title)

THE NEWS
--------------------------------THE NEWS


In [69]:
title = "the news"
title = title.upper()
print(title)
title = title.zfill(40)
print(title)

THE NEWS
00000000000000000000000000000000THE NEWS


In [None]:
text = "  Hello, World!  "


### Testing Methods
- `isalnum()`: Checks if all characters are alphanumeric.
- `isalpha()`: Checks if all characters are alphabetic.
- `isdigit()`: Checks if all characters are digits.
- `isdecimal()`: Checks if all characters are decimal digits.
- `isnumeric()`: Checks if all characters are numeric.
- `isspace()`: Checks if all characters are whitespace.
- `islower()`: Checks if all cased characters are lowercase.
- `isupper()`: Checks if all cased characters are uppercase.
- `istitle()`: Checks if string is title-cased.
- `isidentifier()`: Checks if string is a valid identifier.

In [71]:
text = "Python3"
print("Is it alphanumeric", text.isalnum())
print("Is it num", text.isdigit())

Is it alphanumeric True
Is it num False


### Splitting and Joining Methods
- `split(sep)`: Splits string into a list using separator.
- `rsplit(sep)`: Splits from the right.
- `splitlines()`: Splits string at line breaks.
- `join(iterable)`: Joins elements of iterable with string as separator.
- `partition(sep)`: Splits at first occurrence of separator, returns tuple.
- `rpartition(sep)`: Splits at last occurrence.

In [73]:
text = "apple,banana,orange"
print(text)
fruits = text.split(",") # list of substrings 
print(fruits)

apple,banana,orange
['apple', 'banana', 'orange']


In [75]:
text = "apple:banana:orange"
print(text)
fruits = text.split(":") # list of substrings 
print(fruits)

apple:banana:orange
['apple', 'banana', 'orange']


In [79]:
text = "apple:banana:orange:grapes"
print(text)
fruits = text.split(":",1) # 1: how many splits 
print(fruits)

apple:banana:orange:grapes
['apple', 'banana:orange:grapes']


In [80]:
text = "apple:banana:orange:grapes"
print(text)
fruits = text.split(":",2) # 2: how many splits 
print(fruits)

apple:banana:orange:grapes
['apple', 'banana', 'orange:grapes']


In [81]:
text = "apple:banana:orange:grapes"
print(text)
fruits = text.rsplit(":",2) # 2: how many splits 
print(fruits)

apple:banana:orange:grapes
['apple:banana', 'orange', 'grapes']


In [82]:
text = '''apple
banana:orange:grapes'''
print(text)
fruits = text.splitlines()
print(fruits)

apple
banana:orange:grapes
['apple', 'banana:orange:grapes']


In [85]:
text = "apple:banana:orange"
print(text)
fruits = text.split(":") # list of substrings 
result = [] # empty list
for fruit in fruits:
    result.append(fruit.title()) # ["Apple", "Banana", "Orange"]
print(result)
statement = "-".join(result)
print(statement)

apple:banana:orange
['Apple', 'Banana', 'Orange']
Apple-Banana-Orange


In [86]:
text = "apple:banana:orange"
result = text.partition(":")
print(result)

('apple', ':', 'banana:orange')


In [87]:
text = "apple:banana:orange"
result = text.rpartition(":")
print(result)

('apple:banana', ':', 'orange')


### Encoding and Decoding
- `encode(encoding)`: Encodes string to bytes.
- `decode(encoding)`: Decodes bytes to string (used with bytes objects).

In [95]:
text = "Hello"
encoded = text.encode('utf-8')
print(type(encoded))
print("encode('utf-8'):", encoded)  # b'Hello'
# print("decode('utf-8'):", encoded.decode('utf-8'))  # Hello

<class 'bytes'>
encode('utf-8'): b'Hello'


## 4. String Formatting and Concatenation

- **Concatenation**: Use `+` to combine strings.
- **Formatting**: Use f-strings, `.format()`, or `%` (f-strings preferred).
- **Repetition**: Use `*` to repeat strings.

In [90]:
s1 = "Hello"
s2 = "World"
s3 = s1+s2 # classes: Operator overloading: s1.__add__(s2)
print(s3)

HelloWorld


In [92]:
s1 = "Hello"
num = 10
s3 = s1+num # There is no operator overloading function for __add__(int)
print(s3)

TypeError: can only concatenate str (not "int") to str

In [91]:
s1 = "Hello"
msg = s1*100
print(msg)

HelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHello


## 5. Common String Operations

- **Slicing**: Extract parts of a string.
- **Membership**: Check if a substring exists using `in`.
- **Iteration**: Loop through characters.
- **Escape Sequences**: Use `\n`, `\t`, etc., for special characters.

## 6. Practical Problems and Solutions

Below are practical problems to apply string methods, with solutions provided.

### Problem 1: Reverse a String
Write a program to reverse a given string.

In [100]:
def reverse_string(s1):
    return s1[::-1]


assert reverse_string("hello") == "olleh"
assert reverse_string("") == ""
assert reverse_string("a") == "a"
assert reverse_string("12345") == "54321"
assert reverse_string("Able was I ere I saw Elba") == "ablE was I ere I saw elbA"

### Problem 2: Check if a String is a Palindrome
Write a program to check if a string is a palindrome (reads the same forward and backward).

In [None]:
def is_palindrome(s):
    pass
assert is_palindrome("Racecar") == True              # Simple palindrome with case difference
assert is_palindrome("A man a plan a canal Panama") == True  # Palindrome with spaces
assert is_palindrome("Hello") == False               # Not a palindrome
assert is_palindrome("") == True                     # Empty string is a palindrome
assert is_palindrome("Was it a car or a cat I saw?") == True  # Palindrome with punctuation


### Problem 3: Count Vowels in a String
Write a program to count the number of vowels (a, e, i, o, u) in a string.

In [None]:
def count_vowels(s):
    pass
assert count_vowels("Hello World") == 3          # e, o, o
assert count_vowels("PYTHON") == 1               # o
assert count_vowels("aeiouAEIOU") == 10          # All vowels, both cases
assert count_vowels("xyz") == 0                  # No vowels
assert count_vowels("") == 0                     # Empty string


### Problem 4: Capitalize First Letter of Each Word
Write a program to capitalize the first letter of each word in a sentence without using `title()`.

In [None]:
def capitalize_words(sentence):
    pass
assert capitalize_words("hello world") == "Hello World"              # Basic lowercase
assert capitalize_words("python is fun") == "Python Is Fun"          # Multiple words
assert capitalize_words("  multiple   spaces  ") == "Multiple Spaces"  # Extra spaces ignored
assert capitalize_words("123abc test") == "123abc Test"              # Word starting with digit
assert capitalize_words("") == ""                                    # Empty string


### Problem 5: Remove Duplicate Characters
Write a program to remove duplicate characters from a string while preserving order.

In [None]:
def remove_duplicates(s):
    pass

assert remove_duplicates("banana") == "ban"                # 'b', 'a', 'n'
assert remove_duplicates("abcabcabc") == "abc"             # First 'a', 'b', 'c' kept
assert remove_duplicates("aaaaa") == "a"                   # All same characters
assert remove_duplicates("123123") == "123"                # Digits preserved in order
assert remove_duplicates("") == ""                         # Empty string


## Exercises

1. Write a program to check if two strings are anagrams (contain the same characters with the same frequency).
2. Create a program that takes a string and replaces all vowels with '*'.
3. Write a program to find the longest word in a sentence.
4. Implement a program that converts a string to title case, handling edge cases like extra spaces.
5. Write a program to count the frequency of each character in a string and store it in a dictionary.

Try these in a new code cell below!

In [None]:
# Your code here


## Summary

- Strings are immutable sequences of characters with a rich set of methods.
- Methods include case conversion, searching, modification, testing, splitting/joining, and encoding.
- Common operations include slicing, concatenation, formatting, and iteration.
- Practical problems like reversing strings or counting vowels reinforce method usage.
- Strings are crucial in data processing tasks, such as those in the `handson-ml2` repository.

For more practice, try the exercises, explore Python's [official string documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), or apply these methods to text data in `handson-ml2` notebooks!