# **`Data Science Learners Hub`**

**Module : Python**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

### **`# Exploring 'String' Data Type in Python`**

**1. Introduction:**

The string data type in Python is a sequence of characters. It is a versatile data type used to represent and manipulate textual data. Strings are enclosed in single (' '), double (" "), or triple (''' ''' or """ """) quotes.

**2. Real World Scenario:**
Think of strings as words, sentences, or paragraphs in a document. For example, when working with text-based data like articles, user inputs, or log messages, strings become essential for manipulation and analysis.

**3. Importance in Python:**

- `Text Processing`: Strings are fundamental for text processing in Python. Many operations involve manipulating and analyzing textual information.

- `Data Representation`: Strings are used to represent data, such as names, addresses, and messages, making them essential in various applications.

**4. Real-world Applications:**

- `User Input`: Gathering user input from forms or command-line interfaces often involves strings.
  
- `File Operations`: Reading and writing to files involve string manipulation.

- `Web Development`: Handling URLs, form data, and text content in web applications.


**5. Escape Sequences:**

- \n: Newline character.

- \t: Horizontal tab character.

- \\: Backslash character.

- \’: Single quote character.

- \”: Double quote character.

**6. Syntax for Creating Stings:**
```python
my_string = "Hello, Python!"

```

**7. String Methods:**

![DSLH-StringMethodsInPython.jpeg](attachment:DSLH-StringMethodsInPython.jpeg)

#### Summary of methods/ functions of 'String' data type in Python:

**`Note : `** Notice that there is no in-place in case of string function, everything is done as copy

  

| Method | Syntax | Return Type | In-place or Copy | Input Parameters | One-liner Explanation | Peculiarities/Considerations |
| --- | --- | --- | --- | --- | --- | --- |
| capitalize() | `string.capitalize()` | str | Copy | None | Capitalizes the first character of the string. | No peculiarities. |
| center(width) | `string.center(width)` | str | Copy | width (int) | Centers the string within the specified width. | The width must be greater than or equal to the length of the string. |
| count(sub) | `string.count(sub)` | int | Copy | sub (str), start (int), end (int) | Counts the occurrences of the specified substring. | Optional start and end parameters allow counting within a specific range. |
| find(sub) | `string.find(sub)` | int | Copy | sub (str), start (int), end (int) | Finds the first occurrence of the specified substring. | Returns -1 if the substring is not found. |
| join(list) | `'delimiter'.join(list)` | str | Copy | list (iterable) | Joins the elements of the list into a string with the specified delimiter. | The method is called on the delimiter, and the list is passed as an argument. |
| ljust(width) | `string.ljust(width)` | str | Copy | width (int) | Left-justifies the string within the specified width. | Similar to `center()`, width must be greater than or equal to the length of the string. |
| lower() | `string.lower()` | str | Copy | None | Converts all characters in the string to lowercase. | No peculiarities. |
| lstrip() | `string.lstrip(chars)` | str | Copy | chars (str) | Removes leading whitespaces or specified characters from the left side of the string. | No peculiarities. |
| replace(oldsub, newsub) | `string.replace(oldsub, newsub)` | str | Copy | oldsub (str), newsub (str), count (int) | Replaces occurrences of oldsub with newsub in the string. | The optional count parameter limits the number of replacements. |
| rfind(sub) | `string.rfind(sub)` | int | Copy | sub (str), start (int), end (int) | Finds the last occurrence of the specified substring. | Returns -1 if the substring is not found. |
| rjust(width) | `string.rjust(width)` | str | Copy | width (int) | Right-justifies the string within the specified width. | Similar to `center()` and `ljust()`, width must be greater than or equal to the length of the string. |
| rstrip() | `string.rstrip(chars)` | str | Copy | chars (str) | Removes trailing whitespaces or specified characters from the right side of the string. | No peculiarities. |
| split() | `string.split(sep, maxsplit)` | list | Copy | sep (str), maxsplit (int) | Splits the string into a list of substrings based on the specified separator. | The default separator is whitespace. `maxsplit` limits the number of splits. |
| title() | `string.title()` | str | Copy | None | Converts the first character of each word to uppercase. | Words are determined as sequences of characters separated by whitespace. |
| upper() | `string.upper()` | str | Copy | None | Converts all characters in the string to uppercase. | No peculiarities. |

This table provides a quick reference for the specified string methods, including their syntax, return type, in-place or copy behavior, input parameters, one-liner explanation, and any peculiarities or considerations. Adjustments can be made based on the specific needs of your teaching materials.

#### Some additional methods/ functions or operations:


| Method                            | Syntax                                                | Return Type | In-place or Copy | Input Parameters             | One-liner Explanation                                        | Peculiarities/Considerations                               |
|-----------------------------------|-------------------------------------------------------|-------------|------------------|-----------------------------|--------------------------------------------------------------|-------------------------------------------------------------|
| Access Character                  | char = my_string[index]                              | str         | Copy             | Index (int)                 | Access a character at a specific index in the string         | IndexError if the index is out of range                       |
| Slicing                           | substring = my_string[start:end]                     | str         | Copy             | Start (int), End (int)      | Get a substring from start index (inclusive) to end index (exclusive) | End index can be omitted, and negative indexing is allowed    |
| Concatenation                     | new_string = string1 + string2                       | str         | Copy             | Another string (string2)   | Concatenate two strings                                      |                                                              |
| Repetition                        | repeated_string = my_string * n                      | str         | Copy             | Repetition count (int)     | Repeat the string 'n' times                                  | Negative repetition count results in an empty string          |
| String Length                     | length = len(my_string)                              | int         | Copy             | None                        | Get the length of the string                                 |                                                              |
| Check Prefix                      | is_prefix = my_string.startswith(prefix)           | bool        | Copy             | Prefix (str)                | Check if the string starts with a specific prefix            |                                                              |
| Check Suffix                      | is_suffix = my_string.endswith(suffix)             | bool        | Copy             | Suffix (str)                | Check if the string ends with a specific suffix              |                                                              |


#### 1. `capitalize()`

**Syntax:**
```python
s.capitalize()
```

Copy of s with only the first character capitalized.

**Example:**

In [1]:
text = "python is fun"
capitalized_text = text.capitalize()
print(capitalized_text)
# Output: "Python is fun"
print(text)

Python is fun
python is fun


#### 2. `center(width)`

**Syntax:**
```python
s.center(width)
```

Copy of s centered in a field of given width.

**Example:**

In [2]:
word = "Python"
centered_word = word.center(12)
print(centered_word)
# Output: "  Python    "

   Python   


#### 3. `count(sub)`

**Syntax:**
```python
s.count(sub)
```

Count the number of occurrences of sub in s.

**Example:**

In [5]:
text = "python is powerful, Python is easy"
count_python = text.count("Python")
print(count_python)

# Output: 1

# Note : Here the output is 1 even if there are two words of python, Python
# Since the first python has small letter p so it is not considered
# as we are counting python with letter P as capital in the code

1


#### 4. `find(sub)`

**Syntax:**
```python
s.find(sub)
```

Find the first index position where sub occurs in s.

**Example:**

In [6]:
text = "Python is amazing"
position_of_is = text.find("is")
print(position_of_is)

# Output: 7 (It is the index position)

7


#### 5. `join(iterable)`

**Syntax:**
```python
s.join(iterable)
```

Concatenate list into a string, using s as separator.

**Example:**

In [1]:
words = ["Hello", "World", "Python"]
joined_text = " ".join(words)
print(joined_text)

# Output: "Hello World Python" (Note : here separator is space)

Hello World Python


#### 6. `ljust(width)`

**Syntax:**
```python
s.ljust(width)
```

Same like center(), but s is left-justified.

**Example:**

In [2]:
word = "Python"
left_justified_word = word.ljust(10)
print(left_justified_word)

# Output: "Python    "

Python    


#### 7. `lower()`

**Syntax:**
```python
s.lower()
```

Copy of s in all lowercase characters.

**Example:**

In [3]:
text = "Python is FUN"
lowercased_text = text.lower()
print(lowercased_text)

# Output: "python is fun"

python is fun


#### 8. `lstrip()`

**Syntax:**
```python
s.lstrip()
```

Copy of s with leading white space removed.

**Example:**

In [7]:
text = "    Python    "
stripped_text = text.lstrip()
print(stripped_text)

# Output: "Python    "

print(id(text))
print(id(stripped_text))

Python    
4497232048
4497294512


#### 9. `replace(oldsub, newsub)`

**Syntax:**
```python
s.replace(oldsub, newsub)
```

Creates a copy of s and Replaces all occurrences of oldsub in s with newsub.

**Example:**

In [8]:
text = "Python is easy and really easy"
updated_text = text.replace("easy", "powerful")
print(updated_text)

# Output: "Python is powerful"

print(id(text))
print(id(updated_text))

Python is powerful and really powerful
4503346896
4503297680


#### 10. `rfind(sub)`

**Syntax:**
```python
s.rfind(sub)
```


Creates a copy and Like find(), but returns the rightmost position.



**Example:**

In [9]:
text = "Python is powerful, Python is easy"
last_position_of_python = text.rfind("Python")
print(last_position_of_python)

# Output: 24

print(id(text))
print(id(last_position_of_python))

20
4423660432
4375364432


#### 11. `rjust(width)`

**Syntax:**
```python
s.rjust(width)
```

Creates a copy and Like center(), but s is right-justified.

**Example:**

In [10]:
word = "Python"
right_justified_word = word.rjust(10)
print(right_justified_word)

# Output: "    Python"

print(id(text))
print(id(right_justified_word))

    Python
4423660432
4424803824


#### 12. `rstrip()`

**Syntax:**
```python
s.rstrip()
```

Copy of s with trailing white space removed.

**Example:**

In [11]:
text = "Python    "
right_stripped_text = text.rstrip()
print(right_stripped_text)

# Output: "Python"

print(id(text))
print(id(right_stripped_text))

Python
4424822704
4497534128


#### 13. `split()`

**Syntax:**
```python
s.split(sep)
```

Split s into a list of substrings (see text).

**Example:**

In [12]:
sentence = "Python is powerful and Python is versatile"
words = sentence.split()
print(words)

# Output: ['Python', 'is', 'powerful', 'and', 'Python', 'is', 'versatile']

print(id(sentence))
print(id(words))

['Python', 'is', 'powerful', 'and', 'Python', 'is', 'versatile']
4423660432
4498184768


#### 14. `title()`

**Syntax:**
```python
s.title()
```

Copy of s with first character of each word capitalized.

**Example:**

In [13]:
text = "python programming"
titlecased_text = text.title()
print(titlecased_text)

# Output: "Python Programming"

print(id(text))
print(id(titlecased_text))

Python Programming
4497943696
4497934016


#### 15. `upper()`

**Syntax:**
```python
s.upper()
```

Copy of s with all characters converted to upper case.

**Example:**

In [14]:
text = "python is fun"
uppercased_text = text.upper()
print(uppercased_text)
# Output: "PYTHON IS FUN"

PYTHON IS FUN


**8. Considerations:**

- `Data Immutability`: Strings in Python are immutable, meaning their values cannot be changed after creation. Any operation that appears to modify a string actually creates a new string.

- `Encoding`: Be mindful of character encoding when working with strings, especially in scenarios involving different languages or special characters.  Special characters like newline (\n) or tab (\t) are represented using escape characters.

- `Case Sensitivity`: String comparisons are case-sensitive. Pay attention to the case when working with strings.

- `Whitespace`: Be aware of leading and trailing whitespaces, which can affect string comparison and formatting.




**9. Common Mistakes:**

- `Forgetting Parentheses`: Methods like `upper()` and `lower()` are functions, and their parentheses are essential. Forgetting them might lead to unintended behavior.

- `Index Errors`: When using indexing or slicing, avoid going beyond the length of the string to prevent IndexError.

- `Mishap`: Forgetting that strings are immutable can lead to unintentional errors when attempting to modify them directly.

**10. Hands-on Experience:**

#### Q1. Counting Vowels

Write a Python function that takes a string as input and returns the count of vowels (a, e, i, o, u) in the string.

**Solution:**

In [15]:
def count_vowels(s):
    vowels = "aeiou"
    return sum(1 for char in s if char.lower() in vowels)

# Test
result = count_vowels("Python is Amazing")
print(result)

# Output: 5

# Note : See How the sum() is executed using comprehension

5


**Explanation:**

- `The generator expression (1 for char in s if char.lower() in vowels)`: This generates a sequence of 1s for each vowel found in the input string s. The condition if char.lower() in vowels filters out non-vowel characters.

- `sum(...)`: The sum function is used to calculate the sum of the generated sequence of 1s. Since each 1 represents a vowel, the sum effectively gives the count of vowels.

#### Q2. Reversing Words

Given a string, reverse the order of words. (e.g., "Hello World" becomes "World Hello")

**Solution:**

In [16]:
def reverse_words(s):
    words = s.split()
    reversed_sentence = " ".join(reversed(words))
    return reversed_sentence

# Test
result = reverse_words("Python is Fun")
print(result)

# Output: "Fun is Python"

# Note : See how the function reversed() is used

Fun is Python


**Explanation:**

`reversed(words)`: The reversed() function is a built-in Python function that returns a reverse iterator of the input sequence. In this case, it takes the list of words (words) and returns an iterator that produces the words in reverse order.

#### 3. Longest Word

Write a program to find the longest word in a given sentence.

**Solution:**

In [17]:
def longest_word(s):
    words = s.split()
    return max(words, key=len)

# Test
result = longest_word("Python is an amazing language")
print(result)
# Output: "language"

language


**Explanation:**

- `max(words, key=len)`: The `max()` function is a built-in Python function used to find the maximum value in an iterable. Here, it takes the list of words (`words`) as the iterable. The `key=len` argument specifies that the comparison for finding the maximum should be based on the length of each word.

- `key=len`: This is a keyword argument that specifies a custom key function. In this case, the `len` function is used as the key function, which returns the length of each word. The `max()` function will find the word with the maximum length based on this key function.

#### 4. Palindrome Check

Implement a function that checks if a string is a palindrome (reads the same backward as forward).

**Solution:**

In [20]:
def is_palindrome(s):
    # Clean the string by removing non-alphanumeric characters and converting to lowercase
    clean_s = "".join(char.lower() for char in s if char.isalnum())
    print(clean_s)
    
    # Check if the cleaned string is equal to its reverse
    return clean_s == clean_s[::-1]

# Test
result = is_palindrome("A man, a plan, a canal, Panama")
print(result)
# Output: True


amanaplanacanalpanama
True


**Explanation:**

- `clean_s = "".join(char.lower() for char in s if char.isalnum())`: This line creates a cleaned version of the input string s. It removes non-alphanumeric characters (using char.isalnum()) and converts all characters to lowercase. The cleaned string is stored in the variable clean_s.

- `print(clean_s)`: This line prints the cleaned string. It's useful for understanding how the string is processed.

- `return clean_s == clean_s[::-1]`: This line checks if the cleaned string is equal to its reverse. The expression clean_s[::-1] creates a reversed version of the cleaned string using slicing. If the cleaned string is the same forward and backward, the function returns True, indicating that the input string is a palindrome; otherwise, it returns False.

#### 5. Word Length Sorting

Create a program that takes a sentence and prints the words in ascending order of their lengths.

**Solution:**

In [21]:
def sort_by_length(s):
    # Split the input string into a list of words
    words = s.split()
    
    # Sort the words based on their lengths using the key=len parameter
    sorted_words = sorted(words, key=len)
    
    # Join the sorted words back into a string
    return " ".join(sorted_words)

# Test
result = sort_by_length("Python programming is fun")
print(result)
# Output: "is fun Python programming"


is fun Python programming


**Explanation:**

The `sort_by_length` function takes a sentence as input and returns a new sentence with its words sorted based on their lengths. Let's go through the code step by step:

1. `words = s.split()`: This line splits the input string `s` into a list of words. The default separator is whitespace.

2. `sorted_words = sorted(words, key=len)`: This line sorts the list of words based on their lengths using the `sorted` function. The `key=len` parameter specifies that the sorting should be based on the length of each word.

3. `return " ".join(sorted_words)`: This line joins the sorted words back into a string using a space as the separator. The `join` method concatenates the words, creating a sentence with the words sorted by length.

4. The function is tested with the sentence "Python programming is fun," and the result is printed.

#### 6. **Anagram Checker:**
   Write a function that checks if two given strings are anagrams. Anagrams are words or phrases formed by rearranging the letters of another.

In [1]:
def are_anagrams(s1, s2):
    return sorted(s1) == sorted(s2)

# Example usage:
word1 = "listen"
word2 = "silent"
print(are_anagrams(word1, word2)) 

True


#### 7. **String Compression:**
   Create a function that performs basic string compression using the counts of repeated characters. For example, "aabcccccaaa" would become "a2b1c5a3".

In [2]:
def compress_string(s):
    result = ""
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            result += s[i - 1] + str(count)
            count = 1

    result += s[-1] + str(count)
    return result

string = "aabcccccaaa"
print(compress_string(string))  # Output: "a2b1c5a3"

a2b1c5a3


#### 8. **Unique Characters:**
   Write a function that checks if a given string has all unique characters.

In [3]:
def has_unique_characters(s):
    return len(set(s)) == len(s)

word = "python"
print(has_unique_characters(word))  

True


#### 9. **Substring Concatenation:**
   Create a function that takes a sentence and a list of words, and returns the indices of the starting points of all substrings formed by concatenating all words in the list.

In [4]:
def find_substrings(sentence, words):
    word_len = len(words[0])
    word_count = len(words)
    total_len = word_len * word_count
    result = []

    for i in range(len(sentence) - total_len + 1):
        substring = sentence[i:i + total_len]
        if sorted([substring[j:j + word_len] for j in range(0, total_len, word_len)]) == sorted(words):
            result.append(i)

    return result

text = "barfoothefoobarman"
word_list = ["foo", "bar"]
print(find_substrings(text, word_list)) 

[0, 9]


#### 10. **Longest Common Prefix:**
   Write a function to find the longest common prefix string amongst a list of strings.

In [5]:
def longest_common_prefix(strings):
    if not strings:
        return ""
    prefix = strings[0]
    for string in strings[1:]:
        i = 0
        while i < len(prefix) and i < len(string) and prefix[i] == string[i]:
            i += 1
        prefix = prefix[:i]
    return prefix

words = ["flower", "flow", "flight"]
print(longest_common_prefix(words)) 

fl


#### 11. **String Rotation:**
   Create a function that checks if one string is a rotation of another string.

In [6]:
def is_rotation(s1, s2):
    return len(s1) == len(s2) and s2 in s1 + s1

string1 = "waterbottle"
string2 = "erbottlewat"
print(is_rotation(string1, string2))

True


**11\. Homework Assignments:**

1. **Question:** Check if a given number is a palindrome. **Hint:** A palindrome reads the same backward as forward.
    
2. **Question:** Replace all occurrences of a specific word in a sentence with another word. **Hint:** Utilize the `replace()` method.
    
3. **Question:** Count the number of consonants in a given string. **Hint:** Consonants are all the letters except vowels (a, e, i, o, u).
    
4. **Question:** Remove all the punctuation from a given sentence. **Hint:** Use the `translate()` method with `str.maketrans()`.
    
5. **Question:** Concatenate two strings with space in between. **Hint:** Use the `+` operator or the `join()` method.

**12. Interesting Facts:**

- `ASCII and Unicode`: Internally, Python represents characters using Unicode, allowing for a broader range of characters from various languages.

- `f-string Formatting`: Python 3.6 introduced f-strings, a concise and convenient way to embed expressions inside string literals.

- `SQL injection attacks`: Often involve string manipulation to exploit database systems.

- `Python's string formatting capabilities`: Facilitate clear and concise text output.

### **`Extra Innings`**

#### `string.punctuation`

In Python's `string` module, `string.punctuation` is a built-in constant that stores a string containing all the printable punctuation characters. These characters are generally used to separate words, phrases, or clauses, or to add emphasis or meaning to a sentence.

Here's what you can expect to find in `string.punctuation`:

* **Common punctuation marks:** This includes characters like exclamation points (!), commas (,), periods (.), question marks (?), quotation marks ("), parentheses (), brackets [], and more.
* **Mathematical symbols:** Some mathematical symbols like plus (+), minus (-), equal (=), greater than (>), less than (<), etc., might also be included.
* **Other symbols:** Depending on the Python version and your system's locale settings, you might find additional symbols like apostrophe ('), at sign (@), hash (#), etc.

It's important to note that `string.punctuation` only contains printable ASCII characters. This means it won't include non-printable control characters or punctuation symbols specific to other languages.

Here are some common uses of `string.punctuation`:

* **Removing punctuation:** You can iterate through a string and check if each character exists in `string.punctuation` to remove punctuation from the text.
* **Validating user input:** You can use `string.punctuation` to check if a user-entered string contains any punctuation characters, depending on your specific validation needs.
* **Customizing punctuation handling:** By knowing the characters in `string.punctuation`, you can write code to handle specific punctuation marks differently.


In [1]:
import string

def has_punctuation(text):
  """Checks if a string contains any punctuation characters.

  Args:
      text: The string to check.

  Returns:
      True if the string contains punctuation, False otherwise.
  """
  punctuation = string.punctuation
  for char in text:
    if char in punctuation:
      return True
  return False

# Example usage
text1 = "This is a sentence with no punctuation."
text2 = "This sentence has some punctuation, like commas and periods."

print(f"Text 1: {has_punctuation(text1)}")
print(f"Text 2: {has_punctuation(text2)}")


Text 1: True
Text 2: True
