
# Python for Analytics — **Strings** (Walkthrough)

**Session Time:** 30 minutes (10 min theory + 20 min code walkthrough)  
**Audience:** Analytics team (beginner-friendly)  
**Goal:** Work confidently with text: create, index, clean, search, split/join, format, validate, and handle immutability—**no regex required**.

> Presenter tip: Keep one action per cell; say the expected outcome before running.



## Run Order & Sections
1. Basics: create, print, type (1–2 min)  
2. Length, indexing, slicing (3–4 min)  
3. Cleaning: `upper/lower`, `strip`, `replace` (3–4 min)  
4. Searching: `in`, `startswith`, `endswith`, `find` (3–4 min)  
5. Split & Join (3 min)  
6. Formatting output: f-strings (1–2 min)

**Additional (requested) topics — still regex‑free:**  
7. Character classification (`isalpha`, `isdigit`, `isalnum`, `isspace`)  
8. Count occurrences (`count`)  
9. Safe searching (`find` vs. `index`)  
10. Immutability workaround (rebuild via slicing)



---
## 1) String Basics (create, print, type)


In [None]:

branch_code = "BLR"
comment = "  High Risk flagged  "
print(branch_code, "|", comment)
print("Types:", type(branch_code), type(comment))



---
## 2) Length, Indexing, Slicing
- `len(s)` counts characters  
- `s[i]` gets a character (0-based)  
- `s[a:b]` slices a substring (a inclusive, b exclusive)


In [None]:

code = "BLR2024"
print("len(code):", len(code))
print("First char:", code[0])
print("Last char:", code[-1])
print("Prefix (0:3):", code[0:3])    # BLR
print("Year (3:):", code[3:])        # 2024



---
## 3) Cleaning & Standardization
- Case normalization: `.upper()`, `.lower()`  
- Whitespace trim: `.strip()`  
- Replacement: `.replace(old, new)`


In [None]:

status = "  High Risk "
clean = status.strip().upper()          # remove outer spaces + uppercase
fixed = clean.replace("HIGH", "VERY HIGH", 1)  # replace once
print("Original:", repr(status))
print("Clean   :", clean)
print("Fixed   :", fixed)



---
## 4) Searching in Strings
- Substring check: `'ALERT' in text`  
- Prefix/Suffix: `.startswith()`, `.endswith()`  
- Location: `.find()` → index or `-1` when not found


In [None]:

text = "High ALERT flagged at BLR"
print("'ALERT' in text:", "ALERT" in text)
print("startswith('High'):", text.startswith("High"))
print("endswith('BLR'):", text.endswith("BLR"))
print("find('ALERT'):", text.find("ALERT"))   # returns index or -1



---
## 5) Split & Join
- `.split(sep)` → list of parts  
- `'sep'.join(list)` → combine to a single string


In [None]:

entry = "  BLR_high_risk_2024  "
clean = entry.strip().upper()               # "BLR_HIGH_RISK_2024"
parts = clean.split("_")                    # ['BLR', 'HIGH', 'RISK', '2024']
print("Clean :", clean)
print("Parts :", parts)
joined = " ".join(parts)                    # "BLR HIGH RISK 2024"
print("Joined:", joined)



---
## 6) Formatting Output (f-strings)
- Insert variables into strings cleanly  
- Control decimal precision


In [None]:

branch, risk, year = "BLR", "HIGH", 2024
risk_score = 4.756
print(f"Branch: {branch} | Risk: {risk} | Year: {year} | Score: {risk_score:.2f}")



---
## 7) Character Classification (Data Validation Lite)
- `isalpha()` → letters only  
- `isdigit()` → digits only  
- `isalnum()` → letters or digits (no spaces)  
- `isspace()` → whitespace only


In [None]:

branch_code = "BLR"
year_str = "2024"
mix = "B2R"
blank = "   "

print("branch_code.isalpha():", branch_code.isalpha())
print("year_str.isdigit():   ", year_str.isdigit())
print("mix.isalnum():        ", mix.isalnum())
print("blank.isspace():      ", blank.isspace())



---
## 8) Count Occurrences
- `.count(substr)` → number of non-overlapping matches


In [None]:

comment = "ALERT raised. ALERT reviewed. ALERT closed."
print("ALERT occurrences:", comment.count("ALERT"))



---
## 9) Safe Searching — `find()` vs `index()`
- `.find(substr)` → returns `-1` if not found (safe)  
- `.index(substr)` → raises `ValueError` if not found (use when failure is exceptional)


In [None]:

s = "No ALERT here"
pos = s.find("ALERT")     # returns -1 when missing
print("find('ALERT'):", pos)

try:
    idx = s.index("ALERT")   # will raise ValueError when missing
except ValueError as e:
    print("index('ALERT') raised:", e.__class__.__name__)



---
## 10) Immutability Workaround
Strings are **immutable**—you can't change characters in place.  
Instead, **build a new string** using slices and concatenation.


In [None]:

s = "BLT"        # want to change first char 'B' -> 'R'
# s[0] = "R"     # TypeError if uncommented
s2 = "R" + s[1:] # new string with desired change
print("Original:", s, "| New:", s2)



---
### Next Steps
- Use these string tools with **if/elif/else** for rule-based text checks.  
- Later, you'll apply the same ideas to **pandas .str** methods on columns.

> Keep prints compact; use `repr()` if you need to visualize hidden spaces.
