### **Dealing with String**
When working on data scraping tasks, string manipulation is crucial for cleaning and processing the extracted data.

In [5]:
"""
Objective: f-string printing
"""
current_page = "https://www.python.org/page-3"
page_status_code = 200

# TODO: Use f-string to print the current page and page status code
# Expected output:
# Current page: https://www.python.org/page-3
# Page status code: 200
print(f"Current page: {current_page}")
print(f"Page status code: {page_status_code}")

Current page: https://www.python.org/page-3
Page status code: 200


In [7]:
"""
Objective: Extract text from URL using .split()
"""
current_page = "https://www.python.org/page-30"

# TODO: Extract the page number from the current_page
# Expected output: "30"
page_number = current_page.split("-")[-1]
print(page_number)



30


In [8]:
"""
Objective: Extract number from URL using isdigit()
"""
texts = "This page has 30 results"

# TODO: Split the texts into list of words
# TODO: Check if word is a number
# TODO: Print the word
# Expected output: "30"
words = texts.split()
for word in words:
    if word.isdigit():
        print(word)
        break

30


In [10]:
"""
Objective: Clean Unwanted Characters using .strip()
"""
raw_data = "  \n\t###Welcome to Python!###\t\n  "

# TODO: Remove the unwanted characters from raw_data
# Expected output: "Welcome to Python!"
cleaned_data = raw_data.strip(" \n\t#")
print(cleaned_data)



Welcome to Python!


In [14]:
"""
Objective: Check if Content is Exist
"""
button_text = "Load More"

# TODO: Check if "Load More" is in button_text
# TODO: Print "More content is available" if true
# TODO: Print "No more content" if false
# Expected output: "More content is available"
if "Load More" in button_text:
    print("More content is available")
else:
    print("No more content")
    

More content is available


In [16]:
"""
Objective: Check if Content is Exist using .find()
"""
button_text = "Load More"

# TODO: Check if "Load More" is in button_text
# TODO: Print "More content is available" if true
# TODO: Print "No more content" if false
# Expected output: "More content is available"

if button_text.find("Load More") != -1:
    print("More content is available")
else:
    print("No more content")
    

More content is available


In [17]:
"""
Objective: Split Data into a List
"""
keywords = "Python, BeautifulSoup, Scrapy, Selenium, Web Scraping"

# TODO: Split keywords into a list
# Expected output: ["Python", "BeautifulSoup", "Scrapy", "Selenium", "Web Scraping"]
keywords_list = keywords.split(", ")
print(keywords_list)



['Python', 'BeautifulSoup', 'Scrapy', 'Selenium', 'Web Scraping']


In [18]:
"""
Objective: Joining list to string
"""
keywords = ["Python", "BeautifulSoup", "Scrapy", "Selenium", "Web Scraping"]

# TODO: Join keywords into a string with ", "
# Expected output: "Python, BeautifulSoup, Scrapy, Selenium, Web Scraping"
joined_keywords = ", ".join(keywords)
print(joined_keywords)

Python, BeautifulSoup, Scrapy, Selenium, Web Scraping


In [19]:
"""
Objective: Extract URL from text
"""
text = "For more information visit https://www.example.com"

# TODO: Split the text into list of words
# TODO: Check if word starts with "https://"
# TODO: Print the word
# Expected output: "https://www.example.com"
words = text.split()
for word in words:
    if word.startswith("https://"):
        print(word)
        break

https://www.example.com


In [20]:
"""
Objective: Extract email from text
"""
text = "For more info, contact us at support@example.com or visit our website."

# TODO: Split the text into list of words
# TODO: Find the word that ends with ".com"
# Expected output: "support@example.com"
words = text.split()
for word in words:
    if word.endswith(".com"):
        print(word)
        break

support@example.com


In [21]:
"""
Objective: Generate Slug for URLs using .replace()
"""
base_url = "https://www.example.com"
section = "News and Articles"

# TODO: Format section from "News and Articles" to "news-and-articles"
# TODO: Combine base_url and section
# TODO: Print the complete URL
# Expected output: "https://www.example.com/news-and-articles"
section_slug = section.lower().replace(" ", "-")
complete_url = f"{base_url}/{section_slug}"
print(complete_url)



https://www.example.com/news-and-articles


In [27]:
""" 
Objective: String manipulation
"""
message = "   Hello, world! How, are you today? "

# TODO: Remove the leading and trailing spaces
# TODO: Replace the spaces with "-"
# Expected output: "hello-world!-how;-are-you-today?"
message = message.strip().replace(" ", "-")
print(message)

Hello,-world!-How,-are-you-today?


### **Reflection**
What is the difference between using .find() and "in" to check if an element is exist in the text?

(answer here)

ANSWER HERE

menggunakan in hanya bisa return true atau false pada target string. sedangkan find akan memberikan nilai index terendah pada substring dan akan memberikan nilai -1 jika tidak ditemukan.
sehingga penggunaan in biasnya digunakan utk memastikan apakah string ada pada text sedangkan .find biasanya digunakan untuk mendapatkan posisi substring pada text.

### **Exploration**
Regular expressions are an advanced way to extract data using specific patterns and rules. While we will discuss regex in the intermediate level, it’s worth starting to learn it now.