# Strings and Ranges

```{tip}
**DOWNLOAD THE NOTEBOOK TO RUN LOCALLY**

Click the download button (![](../assets/img/site/dl-nb.png)) on the upper right to download the notebook and run them locally.
```

## Strings
Strings can be found and used everywhere in your Python code. Some example string manipulation functions are shown below.

### Manipulating Strings

In [1]:
s = 'palay & corn'

In [2]:
# split on whitespace
s.split()

['palay', '&', 'corn']

In [3]:
# split/pack into variables
(a, b, c) = s.split()
print(a)
print(b)
print(c)

palay
&
corn


In [4]:
# slice the string
s[0:5]

'palay'

In [5]:
# split on character
s.split(' & ')

['palay', 'corn']

In all the string operations above, the result is a list.

We can also check if a string contains a substring using ***in***.

In [6]:
# does 'palay' exist in our string
'palay' in s

True

In [7]:
# where is it
s.find('palay')

0

### Formatting Strings

In [8]:
# UPPERCASE
s.upper()

'PALAY & CORN'

In [9]:
# lowercase
s.lower()

'palay & corn'

In [10]:
# Title Case
s.title()

'Palay & Corn'

### You can also use the ***%*** operator, the ***format*** method, or ***f-strings*** (new in Python 3) to format strings.

In [11]:
# using %
"%s & %s" %('palay', 'corn')

'palay & corn'

In [12]:
# using format
"{word1} & {word2}".format(word2='corn', word1='palay')

'palay & corn'

In [13]:
# using fstrings
word1 = 'palay'
word2 = 'corn'
f"{word1} & {word2}"

'palay & corn'

In [14]:
# fstrings allow you to call Python expressions on your strings
f"{word1.lower()} & {word2.upper()}"

'palay & CORN'

---

## Ranges
Ranges are very useful if you need a list of intergers in a for loop.

In [15]:
# create a list of numbers from 0 to 4
list(range(0,5))

[0, 1, 2, 3, 4]

Adding a third parameter to the ***range*** function specifies the step (default: 1)

In [16]:
# create a list of numbers from 1 to 16 with 
# each successive item being 3 more than the previous one
list(range(1,16,3))

[1, 4, 7, 10, 13]

In [17]:
# create a list of numbers from 100 to 0 where each item decreases by 10.
list(range(100, -1, -10))

[100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0]

## Regular Expressions (regex)

[Regular expressions](https://docs.python.org/3/howto/regex.html) (**regex**) in Python are used for matching text patterns and are implemented through the `re` module that allows you to search, match, and manipulate strings using specific patterns defined by regex syntax.

### What can Regex do?

* **Search**: Find specific patterns within text
* **Validate**: Ensure that text conforms to certain rules or formats
* **Extract**: Pull out specific parts of text, such as phone numbers or email addresses

### Basic Regex Concepts

1. **Patterns**: A regex pattern is a sequence of characters that describes the structure of the text you want to match.
2. **Characters**: In regex, characters have special meanings and can be used to create patterns. For example:
	* `.` matches any single character (except newline)
	* `*` matches zero or more occurrences of the preceding pattern
	* `+` matches one or more occurrences of the preceding pattern
3. **Character Classes**: Regex provides several pre-defined character classes that match specific sets of characters, such as:
	* `\d` matches any digit (0-9)
	* `\w` matches any word character (alphanumeric plus underscore)
	* `\s` matches any whitespace character

### Basic Regex Syntax

1. **Literal Characters**: Match a single character exactly
	* `a` matches the letter "a" exactly
2. **Character Classes**: Use special characters to match specific sets of characters
	* `[abc]` matches any of the characters "a", "b", or "c"
3. **Quantifiers**: Control how many times a pattern is matched
	* `a*` matches zero or more occurrences of the character "a"
4. **Groups and Captures**: Enclose a sub-pattern in parentheses to create a group and capture its value

### Example Regex Pattern

Let's say we want to match any phone number that follows this format: `(123) 456-7890`. We can write a regex pattern like this:

```python
\(\d{3}\) \d{3}-\d{4}
```
Here, `\(` matches the opening parenthesis, `\d{3}` matches exactly three digits, `)` matches the closing parenthesis, and so on.


**Problem:** Write a function that takes a string as input and returns the first word that starts with an uppercase letter.

In [36]:
import random
import re

def get_first_capitalized_word(s):
    match = re.search(r'\b[A-Z]\w*\b', s)
    return match.group(0) if match else None

# Example usage:
words = ["Hello", "world", "THIS", "is", "a", "test"]

for x in range(5):
    shuffled_words = random.sample(words, len(words))
    shuffled_string = " ".join(shuffled_words)
    print(f"The first capitalized word in {shuffled_string} is: {get_first_capitalized_word(shuffled_string)}")

The first capitalized word in is Hello THIS test a world is: Hello
The first capitalized word in is world Hello a test THIS is: Hello
The first capitalized word in world is Hello test a THIS is: Hello
The first capitalized word in THIS a world test is Hello is: THIS
The first capitalized word in is test a THIS Hello world is: THIS


## Practice Exercises

### Extract correct phone numbers

**Task**: We define a correct phone number is one that starts with +639 or 639 and contains 12 digits. Extract correct and incorrect phone numbers from a list.

In [41]:
import re

phone_numbers = [
    "+639123456789", 
    "639987654321", 
    "+63987654320", # 11 digits, should be removed
    "01234567890",   # doesn't start with +639, should be removed
    "+63998876543241", # invalid phone number (starts with incorrect length)
]

# Write your solution here

In [42]:
# Solution
# Regular expression to match a valid 12-digit phone number that starts with +63 or 63 and has the correct structure
pattern = r"^\+?63\d{10}$"

correct_phone_numbers = list(filter(lambda x: re.match(pattern, x), phone_numbers))
incorrect_phone_numbers = [number for number in phone_numbers if number not in correct_phone_numbers]

print(f"Correct Phone Numbers: {correct_phone_numbers}")
print(f"Incorrect Phone Numbers: {incorrect_phone_numbers}")
# for num in filtered_phone_numbers:
#     print(num)

Correct Phone Numbers: ['+639123456789', '639987654321']
Incorrect Phone Numbers: ['+63987654320', '01234567890', '+63998876543241']
