# STRINGS

***

A string is a sequence of characters used to represent text

But internally:

* A string is an ordered, immutable sequence
* Each character is a Unicode code point

### Why strings are critical in AI

Everything below is string-based:

* Text data
* User prompts
* Chat history
* Tokens
* Logs
* Labels
* Features in NLP

**LLMs don’t “see meaning” — they see text patterns.**

***

## STRING CREATION (ALL WAYS)

In [1]:
s1 = "AI"
s2 = 'ML'
s3 = """Deep Learning"""
s4 = '''NLP'''
print(s1 , s2 , s3 , s4)

AI ML Deep Learning NLP


**Key points :**

" " and ' ' are identical

Triple quotes allow:

* multi-line strings
* docstrings
* long prompts (VERY IMPORTANT for GenAI)

#### Multi-line string (prompt engineering)

In [3]:
prompt = """
You are an AI assistant.
Answer clearly.
"""
print(prompt)


You are an AI assistant.
Answer clearly.



**This is how LLM prompts are actually written.**

***

#### STRINGS ARE IMMUTABLE

In [4]:
s = "hello"
s[0] = "H"   # ❌ ERROR

Why?

Strings cannot be changed in place.

Instead:

In [6]:
s = "H" + s[1:]
print(s)

Hello


**Why immutability matters :**

* Safe for concurrency
* Predictable behavior
* Enables caching
* Important for LLM prompt safety

***

### STRING INDEXING

In [8]:
s = "Python"

In [10]:
s[0]   # 'P'

'P'

In [11]:
s[-1]  # 'n'

'n'

**Negative indexing**

* -1 → last character
* Very useful in parsing

**NLP relevance**

* Accessing:
* first letter
* suffixes
* prefixes
* sentence endings

***

### STRING SLICING (SUBSTRINGS)

In [1]:
s = "Artificial"

##### Syntax

s[start : end : step]

In [2]:
s[0:4]     # 'Arti'
s[:4]      # 'Arti'
s[4:]      # 'ficial'
s[-4:]     # 'cial'
s[::2]     # 'Arii'
s[::-1]    # reverse string

'laicifitrA'

**Why slicing is powerful**

* Token trimming
* Prefix/suffix extraction
* Cleaning text
* Masking sensitive data