What are Pandas String Methods?

Pandas gives us many handy string functions to work with text data inside columns or Series.

These string methods work on every element of the Series all at once (vectorized), so you don’t need to write loops.

Most of these methods are very similar to normal Python string methods you already know.

In [2]:
import pandas as pd

# Create a Series of names
monte = pd.Series([
    'Graham Chapman', 
    'John Cleese', 
    'Terry Gilliam',
    'Eric Idle', 
    'Terry Jones', 
    'Michael Palin'
])

# 1. String methods that return a Series of strings:
#    Example: Convert all names to lowercase
lowercase_names = monte.str.lower()
print("Lowercase names:\n", lowercase_names)
# Output:
# 0    graham chapman
# 1       john cleese
# 2     terry gilliam
# 3         eric idle
# 4       terry jones
# 5     michael palin

uppercase_names = monte.str.upper()
print("\nUppercase names:\n", uppercase_names)

Capitalized_names = monte.str.capitalize()
print("\nCapitalized names:\n", Capitalized_names)

# 2. String methods that return numbers:
#    Example: Count the number of characters in each name
name_lengths = monte.str.len()
print("\nLength of each name:\n", name_lengths)
# Output:
# 0    14
# 1    11
# 2    13
# 3     9
# 4    11
# 5    13

# 3. String methods that return boolean values:
#    Example: Check if each name starts with 'T'
starts_with_T = monte.str.startswith('T')
print("\nNames starting with 'T':\n", starts_with_T)
# Output:
# 0    False
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False

# 4. String methods that return lists or complex objects:
#    Example: Split each name into parts (first name, last name)
split_names = monte.str.split()
print("\nSplit names into parts:\n", split_names)
# Output:
# 0    [Graham, Chapman]
# 1       [John, Cleese]
# 2     [Terry, Gilliam]
# 3         [Eric, Idle]
# 4       [Terry, Jones]
# 5     [Michael, Palin]

# Summary:
# - Pandas string methods work on the whole Series at once.
# - They help transform, check, or clean text data easily.
# - The return type depends on the method used:
#     * strings (like .lower())
#     * numbers (like .len())
#     * boolean values (like .startswith())
#     * lists or other complex structures (like .split())


Lowercase names:
 0    graham chapman
1       john cleese
2     terry gilliam
3         eric idle
4       terry jones
5     michael palin
dtype: object

Uppercase names:
 0    GRAHAM CHAPMAN
1       JOHN CLEESE
2     TERRY GILLIAM
3         ERIC IDLE
4       TERRY JONES
5     MICHAEL PALIN
dtype: object

Capitalized names:
 0    Graham chapman
1       John cleese
2     Terry gilliam
3         Eric idle
4       Terry jones
5     Michael palin
dtype: object

Length of each name:
 0    14
1    11
2    13
3     9
4    11
5    13
dtype: int64

Names starting with 'T':
 0    False
1    False
2     True
3    False
4     True
5    False
dtype: bool

Split names into parts:
 0    [Graham, Chapman]
1       [John, Cleese]
2     [Terry, Gilliam]
3         [Eric, Idle]
4       [Terry, Jones]
5     [Michael, Palin]
dtype: object


len(),	lower(),	translate(),	islower()
ljust(),	upper(),	startswith(),	isupper()
rjust(),	find(),	endswith(),	isnumeric()
center(),	rfind(),	isalnum(),	isdecimal()
zfill(),	index(),	isalpha(),	split()
strip(),	rindex(),	isdigit(),	rsplit()
rstrip(),	capitalize(),	isspace(),	partition()
lstrip(),	swapcase(),	istitle(),	rpartition()




* **Vectorized item access and slicing:**
  Pandas lets you easily get parts of each string in a Series. For example, getting the first three characters of every name is done by `.str[0:3]` or `.str.slice(0, 3)`. This is like slicing in normal Python but done for every string in the Series at once.

* **Using `.get()` with `.split()`:**
  Sometimes you want to split strings into parts and then get a specific part. For example, splitting full names into first and last names, and then picking the last name using `.str.split().str.get(-1)`.

* **Indicator variables (`get_dummies()`):**
  Sometimes you have columns with coded info, like 'B|C|D' meaning multiple categories combined with a separator (`|`). The `.str.get_dummies()` method splits this into separate columns with 1 or 0, showing whether each category is present or not. This is super helpful for turning categorical data into numbers that can be used in analysis or machine learning.

* These string methods and tools help you clean and prepare text data efficiently, especially when dealing with messy or coded information. They form the building blocks for complex text processing in pandas.



In [4]:
import pandas as pd

# Create a Series of names
monte = pd.Series([
    'Graham Chapman', 
    'John Cleese', 
    'Terry Gilliam',
    'Eric Idle', 
    'Terry Jones', 
    'Michael Palin'
])

# 1. Vectorized slicing: Get first 3 characters of each name
first_three_chars = monte.str[0:3]  # same as monte.str.slice(0, 3)
print("First 3 characters:\n", first_three_chars)
# Output:
# 0    Gra
# 1    Joh
# 2    Ter
# 3    Eri
# 4    Ter
# 5    Mic

# 2. Using get() with split() to extract last names:
last_names = monte.str.split().str.get(-1)  
print("\nLast names extracted:\n", last_names)
# Output:
# 0    Chapman
# 1     Cleese
# 2    Gilliam
# 3       Idle
# 4      Jones
# 5      Palin

# 3. Creating a DataFrame with 'info' column containing codes
full_monte = pd.DataFrame({
    'name': monte,
    'info': ['B|C|D', 'B|D', 'A|C', 'B|D', 'B|C', 'B|C|D']
})
print("\nDataFrame with coded info:\n", full_monte)
# Output:
#               name   info
# 0  Graham Chapman  B|C|D
# 1     John Cleese    B|D
# 2   Terry Gilliam    A|C
# 3      Eric Idle    B|D
# 4    Terry Jones    B|C
# 5  Michael Palin  B|C|D

# 4. Using get_dummies() to split 'info' column into separate indicator columns
dummies = full_monte['info'].str.get_dummies('|')
print("\nIndicator variables (dummy columns):\n", dummies)
# Output:
#    A  B  C  D
# 0  0  1  1  1
# 1  0  1  0  1
# 2  1  0  1  0
# 3  0  1  0  1
# 4  0  1  1  0
# 5  0  1  1  1


First 3 characters:
 0    Gra
1    Joh
2    Ter
3    Eri
4    Ter
5    Mic
dtype: object

Last names extracted:
 0    Chapman
1     Cleese
2    Gilliam
3       Idle
4      Jones
5      Palin
dtype: object

DataFrame with coded info:
              name   info
0  Graham Chapman  B|C|D
1     John Cleese    B|D
2   Terry Gilliam    A|C
3       Eric Idle    B|D
4     Terry Jones    B|C
5   Michael Palin  B|C|D

Indicator variables (dummy columns):
    A  B  C  D
0  0  1  1  1
1  0  1  0  1
2  1  0  1  0
3  0  1  0  1
4  0  1  1  0
5  0  1  1  1
