## Basic String Operations

A string is a data structure that consists of one or more characters, such as letters, numbers, symbols, or whitespace. Strings are useful for storing and processing text data, such as names, messages, or commands. Strings are also immutable, which means they cannot be changed once they are created.

To create a string, we use single quotes ' ' or double quotes " " and enclose the characters inside them. For example, we can create a string of hello like this:

In [4]:
hello = "hello"
hello

'hello'

We can also create an empty string by using just the quotes:

In [5]:
empty_string = ""

To access the characters in a string, we use indexing and slicing. Indexing means using an integer number inside square brackets to get a specific character from the string. For example, to get the first character from the hello string, we use index 0:

In [6]:
hello[0]

'h'

Slicing means using a colon : inside square brackets to get a range of characters from the string. For example, to get the first three characters from the hello string, we use slice 0:3:

In [7]:
hello[0:3]

'hel'

Note that slicing is exclusive of the end index, which means the slice 0:3 will return the characters from index 0 up to but not including index 3. To get all the characters from the string, we can use slice : without any numbers:

In [8]:
hello[:]

'hello'

To modify the characters in a string, we cannot use assignment statements with indexing or slicing, because strings are immutable. For example, if we try to change the first character in the hello string to "H", we will get an error:

In [9]:
hello[0] = "H"

TypeError: 'str' object does not support item assignment

To change the characters in a string, we need to create a new string with the desired characters. For example, to change the first character in the hello string to "H", we can use slicing and concatenation:

In [12]:
hello = "H" + hello[1:]
hello

'Hello'

Concatenation means using the + operator to join two or more strings together. For example, to create a new string that says "Hello, world!", we can use concatenation:

In [14]:
greeting = hello + ", world!"
greeting


'Hello, world!'

Repetition means using the * operator to repeat a string a certain number of times. For example, to create a new string that says "hello" five times, we can use repetition:

In [16]:
repeated = hello * 5
repeated

'HelloHelloHelloHelloHello'

Hello and welcome to this video on data structures in Python. In the previous video, we learned about basic string operations, such as concatenation, repetition, and slicing. In this video, we will learn about some of the string methods that we can use to manipulate and process strings in Python.

## String Methods

A string method is a function that is attached to a string object and can perform some operation on the string. For example, the upper method is a string method that returns a new string with all the characters in uppercase. To use a string method, we use the dot notation and the name of the method, followed by parentheses. For example, to use the upper method on the string "hello", we do:

In [17]:
"hello".upper()

'HELLO'

There are many string methods that we can use to perform various operations on strings, such as split, join, find, replace, strip, and count. You can find more information about these methods in the Python documentation or by using the help function in Python.

In this section, we will look at some of the common string methods that we can use to manipulate and process strings in Python. We will use the following example string of a sentence:

In [18]:
sentence = "This is a sample sentence."

- The split method returns a list of strings that are separated by a specified delimiter. For example, to split the sentence by whitespace, we can use the split method without any arguments:

In [19]:
sentence.split()

['This', 'is', 'a', 'sample', 'sentence.']

To split the sentence by a specific character, such as ".", we can use the split method with the character as the argument:

In [20]:
sentence.split(".")

['This is a sample sentence', '']

- The join method returns a string that is the concatenation of the elements in an iterable, such as a list or a tuple, with a specified separator. For example, to join the list of words in the sentence with a dash "-", we can use the join method with the dash as the argument:

In [21]:
"-".join(sentence.split())

'This-is-a-sample-sentence.'

- The find method returns the index of the first occurrence of a substring in a string, or -1 if the substring is not found. For example, to find the index of the word "sample" in the sentence, we can use the find method with the word as the argument:

In [22]:
sentence.find("sample")

10

- The replace method returns a new string where all occurrences of a substring are replaced by another substring. For example, to replace the word "sample" with the word "example" in the sentence, we can use the replace method with the old and new words as the arguments:

In [23]:
sentence.replace("sample", "example")

'This is a example sentence.'

- The strip method returns a new string where the leading and trailing characters are removed. By default, the strip method removes whitespace characters, such as spaces, tabs, and newlines. For example, to remove the whitespace from the sentence, we can use the strip method without any arguments:

In [27]:
sentence_2 = '    This is an example.    '
sentence_2.strip()

'This is an example.'

- The count method returns the number of times a substring appears in a string. For example, to count the number of "s" characters in the sentence, we can use the count method with the character as the argument:

In [28]:
sentence.count("s")

4

## Use cases for string manipulation

String manipulation is a very common and important task in programming, as it allows us to process and analyze text data, such as user input, web scraping, or natural language processing. Some of the use cases for string manipulation are:

String manipulation can be used to validate and format user input, such as checking if an email address is valid, or converting a phone number to a standard format. For example, we can use the find and replace methods to check and format an email address like this:

In [30]:
email = input("Enter your email address: ")
if email.find("@") == -1 or email.find(".") == -1:
    print("Invalid email address.")
else:
    email = email.lower().replace(" ", "")
    print("Your email address is:", email)

Invalid email address.


String manipulation can be used to extract and parse information from web pages, such as titles, links, or keywords. For example, we can use the requests and BeautifulSoup modules to get and parse the HTML content of a web page, and then use the split and find methods to extract the title and the first link like this:

In [31]:
import requests
from bs4 import BeautifulSoup

response = requests.get("https://en.wikipedia.org/wiki/Python_(programming_language)")
soup = BeautifulSoup(response.text, "html.parser")

title = soup.title.text.split("-")[0].strip()
print("The title of the web page is:", title)

first_link = soup.find("a", href=True)["href"]
print("The first link in the web page is:", first_link)

The title of the web page is: Python (programming language)
The first link in the web page is: #bodyContent


String manipulation can be used to perform natural language processing tasks, such as tokenization, stemming, or sentiment analysis. For example, we can use the nltk module to tokenize, stem, and analyze the sentiment of a sentence like this:

In [32]:
import nltk
from nltk import sentiment
nltk.download("punkt")
nltk.download("wordnet")
nltk.download("vader_lexicon")

sentence = "Python is a great programming language."

tokens = nltk.word_tokenize(sentence)
print("The tokens of the sentence are:", tokens)

# Lemmatization is the process of reducing a word to its base or dictionary form, known as the lemma. 
# For example, the lemma of the word "cats" is "cat", and the lemma of "running" is "run".
stemmer = nltk.stem.WordNetLemmatizer()
stems = [stemmer.lemmatize(token) for token in tokens]
print("The stems of the tokens are:", stems)

sentiment_analyzer = sentiment.SentimentIntensityAnalyzer()
sentiment = sentiment_analyzer.polarity_scores(sentence)
print("The sentiment of the sentence is:", sentiment)


The tokens of the sentence are: ['Python', 'is', 'a', 'great', 'programming', 'language', '.']
The stems of the tokens are: ['Python', 'is', 'a', 'great', 'programming', 'language', '.']
The sentiment of the sentence is: {'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6249}


[nltk_data] Downloading package punkt to /home/milad/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/milad/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/milad/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Hello and welcome to this video on data structures in Python. In the previous video, we learned about some of the string methods that we can use to manipulate and process strings in Python. In this video, we will learn about another skill that is very useful and powerful: string formatting. String formatting is a way of inserting variables, expressions, or values into a string, using placeholders and formatting options.

## String Formatting

String formatting is a technique that allows us to create dynamic and customized strings, by using placeholders that are replaced by variables, expressions, or values at runtime. For example, we can create a string that greets a user with their name, by using a placeholder for the name variable:

In [33]:
name = "Alice"
greeting = "Hello, {}!".format(name)
greeting

'Hello, Alice!'

The placeholder is denoted by the curly braces { }, and the variable that replaces it is passed as an argument to the format method. The format method is a string method that returns a new string with the placeholders replaced by the arguments. We can use multiple placeholders and arguments in the same string, as long as they match in number and order. For example, we can create a string that displays the name and age of a user, by using two placeholders and two arguments:

In [34]:
name = "Bob"
age = 25
info = "Your name is {} and your age is {}.".format(name, age)
info

'Your name is Bob and your age is 25.'

We can also use positional or keyword arguments to specify which placeholder is replaced by which argument. Positional arguments are identified by the index of the argument, starting from zero. Keyword arguments are identified by the name of the argument, which must match the name of the placeholder. For example, we can create the same info string as before, by using positional or keyword arguments:

In [35]:
info = "Your name is {0} and your age is {1}.".format(name, age) # positional arguments
info = "Your name is {name} and your age is {age}.".format(name=name, age=age) # keyword arguments
info

'Your name is Bob and your age is 25.'

We can also use formatting options to control how the placeholders are replaced by the arguments, such as alignment, width, precision, or type. Formatting options are specified after a colon : inside the placeholder, using a mini-language that defines the desired format. For example, we can create a string that displays the name and age of a user, with some formatting options:

In [36]:
info = "Your name is {:<10} and your age is {:>5.2f}.".format(name, age)
info

'Your name is Bob        and your age is 25.00.'

The formatting option <10 means that the placeholder is left-aligned and has a width of 10 characters. The formatting option >5.2f means that the placeholder is right-aligned and has a width of 5 characters, with 2 decimal places and a fixed-point notation. There are many other formatting options that we can use to customize the appearance of the placeholders, such as sign, padding, grouping, or conversion. You can find more information about these options in the Python documentation or by using the help function in Python.

## f-strings

f-strings are a newer and simpler way of creating formatted strings in Python, introduced in version 3.6. f-strings are also called formatted string literals, because they are literal strings that are prefixed with the letter f or F, and can contain expressions or variables inside curly braces. For example, we can create the same greeting and info strings as before, by using f-strings:

In [37]:
greeting = f"Hello, {name}!"
greeting

'Hello, Bob!'

In [38]:
info = f"Your name is {name} and your age is {age}."
info

'Your name is Bob and your age is 25.'

We can also use formatting options inside the curly braces, just like in the format method. For example, we can create the same info string as before, with some formatting options, by using an f-string:

In [39]:
info = f"Your name is {name:<10} and your age is {age:>5.2f}."
info

'Your name is Bob        and your age is 25.00.'

f-strings are faster and more readable than the format method, and can also support some features that the format method cannot, such as self-documenting expressions, multiline strings, or local variables. For example, we can create a string that displays the result of a calculation, with a self-documenting expression, by using an f-string:

In [40]:
x = 10
y = 20
result = f"The sum of {x} and {y} is {x + y}."

Here is a list of some of the formatting options that you can use inside the curly braces of an f-string:

- `:>` or `:<` for right or left alignment
- `:^` for center alignment
- `:n` for width, where n is an integer
- `:.n` for precision, where n is an integer
- `:s`, `:d`, `:f`, `:e`, `:g`, `:b`, `:o`, `:x`, `:c` for type, where s is string, d is decimal, f is fixed-point, e is exponential, g is general, b is binary, o is octal, x is hexadecimal, and c is character
- `:+`, `:-`, or `: ` for sign, where + is always show, - is only show for negative, and space is leave a space for positive
- `:0`, `:_`, or `:,` for padding, where 0 is zero, _ is underscore, and , is comma
- `:#` for alternate form, where # adds prefixes for binary, octal, and hexadecimal types

## Use cases for string formatting

String formatting is a very useful and powerful skill in Python, as it allows us to create dynamic and customized strings, by inserting variables, expressions, or values into a string, using placeholders and formatting options. Some of the use cases for string formatting are:

- String formatting can be used to display and format data in a human-readable way, such as numbers, dates, or currencies. For example, we can use string formatting to display and format the price of a product, with a currency symbol, a comma separator, and two decimal places:

In [63]:
price = 1234.43
formatted_price = f"The price of the product is ${price:,.1f}."
formatted_price

'The price of the product is $1,234.4.'

- String formatting can be used to create and print messages, logs, or reports, that contain dynamic and relevant information, such as user input, errors, or results. For example, we can use string formatting to create and print a message that greets a user with their name and the current date:

In [64]:
import datetime
name = input("Enter your name: ")
date = datetime.date.today()
message = f"Hello, {name}! Today is {date:%A, %B %d, %Y}."
print(message)

Hello, milad! Today is Monday, December 18, 2023.
