### **Lists vs Dictionaries**
When working on data scraping tasks, it's essential to be familiar with core Python data structures like lists, and dictionaries, as they are key to processing and organizing the scraped data efficiently.

In [1]:
"""
Objective: Create a List of URLs
"""
url = "https://example.com/page-1"

# TODO: From the url, extract the main url
# TODO: Create the list of URLs for the next 5 pages
# Expected Output: ['https://example.com/page-1', 'https://example.com/page-2', 'https://example.com/page-3', 'https://example.com/page-4', 'https://example.com/page-5']
base_url = url.split('/page-')[0]
page_number = int(url.split('/page-')[1])
urls = [f"{base_url}/page-{i}" for i in range(page_number, page_number + 5)]
print(urls)

['https://example.com/page-1', 'https://example.com/page-2', 'https://example.com/page-3', 'https://example.com/page-4', 'https://example.com/page-5']


In [None]:
"""
Objective: Extend the List of URLs
"""
urls = ["https://example.com/page-1", "https://example.com/page-2", "https://example.com/page-3", "https://example.com/page-4", "https://example.com/page-5"]
new_urls = ["https://example.com/page-6", "https://example.com/page-7", "https://example.com/page-8", "https://example.com/page-9", "https://example.com/page-10"]

# TODO: The urls have 5 elements
# TODO: Add the new_urls to the urls to get 10 elements
# TODO: Print the length of urls
# Expected Output: 10
urls.extend(new_urls)
print(len(urls))


10


In [12]:
"""
Objective: Extract Data from Nested Lists
"""
data = [["title1", "url1"], ["title2", "url2"], ["title3", "url3"]]

# TODO: Extract the title and url from the data
# Expected Output:
# title1 url1
# title2 url2
# title3 url3
for title, url in data:
    print(title, url)
    

title1 url1
title2 url2
title3 url3


In [13]:
"""
Objective: Remove duplicate elements from a List
"""
data = ["https://www.example1.com", "https://www.example1.com", "https://www.example2.com", "https://www.example2.com", "https://www.example3.com"]

# TODO: Remove duplicates from the data
# Expected result : 
# ['https://www.example1.com', 'https://www.example2.com', 'https://www.example3.com']
data = list(set(data))
print(data)



['https://www.example3.com', 'https://www.example2.com', 'https://www.example1.com']


In [21]:
"""
Objective: Create a Dictionary for Scraped Data
"""
urls = ["https://example.com/page-1", "https://example.com/page-2", "https://example.com/page-3", "https://example.com/page-4", "https://example.com/page-5"]

# Function to scrape data
def scrape_data(url):
    # Extracted data
    data = dict()
    # TODO: Add the title to the dictionary with value "Example Title 1" for page 1
    # TODO: Add the url to the dictionary with value "https://example.com/page-1" for page 1    
    index = int(url.replace(base_url + '/page-', ''))
    data['title'] = f"Example Title {index}"
    data['url'] = url

    return data

# TODO: Loop through the urls and call the scrape_data function for each url
# TODO: Append the returned data to the scraped_data list
# TODO: Print the scraped_data
# Expected Output:
# [{'title': 'Example Title 1', 'url': 'https://example.com/page-1'}, {'title': 'Example Title 2', 'url': 'https://example.com/page-2'}, {'title': 'Example Title 3', 'url': 'https://example.com/page-3'}, {'title': 'Example Title 4', 'url': 'https://example.com/page-4'}, {'title': 'Example Title 5', 'url': 'https://example.com/page-5'}]
scraped_data = []
for url in urls:
    data = scrape_data(url)
    scraped_data.append(data)
print(scraped_data)

[{'title': 'Example Title 1', 'url': 'https://example.com/page-1'}, {'title': 'Example Title 2', 'url': 'https://example.com/page-2'}, {'title': 'Example Title 3', 'url': 'https://example.com/page-3'}, {'title': 'Example Title 4', 'url': 'https://example.com/page-4'}, {'title': 'Example Title 5', 'url': 'https://example.com/page-5'}]


In [22]:
"""
Objective: Retrieve Data from a Dictionary
"""
data = [{"title": "Example Title 1", "url": "https://example.com"},
        {"title": "Example Title 2", "url": "https://example.com"},
        {"title": "Example Title 3", "url": "https://example.com"},
        {"title": "Example Title 4", "url": "https://example.com"},
        {"title": "Example Title 5", "url": "https://example.com"}]

# TODO: Use a for loop to loop through the data
# TODO: Use the item["title"] to get the title
# TODO: Use the item.get("url") to get the url
# TODO: Print the title and url
# Expected Output:
# Example Title 1 https://example.com
# Example Title 2 https://example.com
# Example Title 3 https://example.com
# Example Title 4 https://example.com
# Example Title 5 https://example.com
for item in data:
    title = item["title"]
    url = item.get("url")
    print(title, url)
    

Example Title 1 https://example.com
Example Title 2 https://example.com
Example Title 3 https://example.com
Example Title 4 https://example.com
Example Title 5 https://example.com


In [24]:
"""
Objective: Create a List of Dictionary from two Lists
"""
titles = ["Example Title 1", "Example Title 2", "Example Title 3", "Example Title 4", "Example Title 5"]
urls = ["https://example.com/page-1", "https://example.com/page-2", "https://example.com/page-3", "https://example.com/page-4", "https://example.com/page-5"]

# TODO: Combine the titles and urls into a list of dictionaries
# Expected Output:
# [{'title': 'Example Title 1', 'url': 'https://example.com/page-1'}, {'title': 'Example Title 2', 'url': 'https://example.com/page-2'}, {'title': 'Example Title 3', 'url': 'https://example.com/page-3'}, {'title': 'Example Title 4', 'url': 'https://example.com/page-4'}, {'title': 'Example Title 5', 'url': 'https://example.com/page-5'}] 
data = []
for title, url in zip(titles, urls):
    data.append({"title": title, "url": url})
print(data)

[{'title': 'Example Title 1', 'url': 'https://example.com/page-1'}, {'title': 'Example Title 2', 'url': 'https://example.com/page-2'}, {'title': 'Example Title 3', 'url': 'https://example.com/page-3'}, {'title': 'Example Title 4', 'url': 'https://example.com/page-4'}, {'title': 'Example Title 5', 'url': 'https://example.com/page-5'}]


In [25]:
"""
Objective: Identify all keys in a Dictionary
"""
data = {"title": "Example Title", "url": "https://example.com", "tags": ["tag1", "tag2", "tag3"], "date": "2022-01-01"}

# TODO: Use keys() method to get all keys
# TODO: Convert the keys to a list
# Expected Output: ['title', 'url', 'tags', 'date']
keys = list(data.keys())
print(keys)



['title', 'url', 'tags', 'date']


In [26]:
"""
Objective: Loop through a Dictionary
"""
scraped_data = {
                "title": "Example Title",
                "url": "https://example.com",
                "author": "John Doe",
                "tags": ["tag1", "tag2", "tag3"],
                "views": 1000
            }

# TODO: Use .items() method to loop through the dictionary
# TODO: Print each key and value
# Expected Output:
# title: Example Title
# url: https://example.com
# author: John Doe
# tags: ['tag1', 'tag2', 'tag3']
# views: 1000
for key, value in scraped_data.items():
    print(f"{key}: {value}")
    

title: Example Title
url: https://example.com
author: John Doe
tags: ['tag1', 'tag2', 'tag3']
views: 1000


In [28]:
"""
Objective: Extract Data from a Nested Dictionary
"""
scraped_data = [
    {
        "category": "Programming",
        "articles": [
            {
                "title": "How to Learn Python",
                "url": "https://example.com/learn-python",
                "author": "John Doe",
                "tags": ["Python", "Programming", "Tutorial"],
                "views": 1200,
                "comments": [
                    {"user": "Alice", "comment": "Great article!", "likes": 5},
                    {"user": "Bob", "comment": "Very informative.", "likes": 2}
                ]
            },
            {
                "title": "Advanced Python Tips",
                "url": "https://example.com/advanced-python",
                "author": "Jane Smith",
                "tags": ["Python", "Advanced", "Tips"],
                "views": 800,
                "comments": [
                    {"user": "Charlie", "comment": "Helpful for experts.", "likes": 3}
                ]
            }
        ]
    },
    {
        "category": "Web Scraping",
        "articles": [
            {
                "title": "Top 10 Web Scraping Tools",
                "url": "https://example.com/web-scraping-tools",
                "author": "Jane Smith",
                "tags": ["Web Scraping", "Tools", "Technology"],
                "views": 1500,
                "comments": [
                    {"user": "Dave", "comment": "Awesome list!", "likes": 10}
                ]
            },
            {
                "title": "Understanding BeautifulSoup",
                "url": "https://example.com/beautifulsoup",
                "author": "Alice Johnson",
                "tags": ["Web Scraping", "BeautifulSoup", "Python"],
                "views": 1100,
                "comments": [
                    {"user": "Eve", "comment": "Great for beginners.", "likes": 4},
                    {"user": "Frank", "comment": "Clear explanation!", "likes": 6}
                ]
            }
        ]
    },
    {
        "category": "APIs",
        "articles": [
            {
                "title": "Understanding REST APIs",
                "url": "https://example.com/rest-apis",
                "author": "John Doe",
                "tags": ["APIs", "REST", "Web Development"],
                "views": 900,
                "comments": [
                    {"user": "Grace", "comment": "Very clear overview.", "likes": 7}
                ]
            },
            {
                "title": "GraphQL vs REST",
                "url": "https://example.com/graphql-vs-rest",
                "author": "Charlie Brown",
                "tags": ["APIs", "GraphQL", "Comparison"],
                "views": 1300,
                "comments": [
                    {"user": "Hannah", "comment": "Helpful comparison!", "likes": 9}
                ]
            }
        ]
    }
]

# TODO: Show the article data with the highest number of views
# TODO: Which article has the highest number of comments
# TODO: Which coment has the highest number of likes
# Expected Output:
# Article with highest views: How to Learn Python
# Article with highest comments: How to Learn Python
# Comment with highest likes: Dave - Awesome list!
highest_views = 0
highest_comments = 0
highest_likes = 0
highest_views_article = ""
highest_comments_article = ""
highest_likes_comment = ""
highest_likes_user = ""
for category in scraped_data:
    for article in category["articles"]:
        if article["views"] > highest_views:
            highest_views = article["views"]
            highest_views_article = article["title"]
        if len(article["comments"]) > highest_comments:
            highest_comments = len(article["comments"])
            highest_comments_article = article["title"]
        for comment in article["comments"]:
            if comment["likes"] > highest_likes:
                highest_likes = comment["likes"]
                highest_likes_comment = comment["comment"]
                highest_likes_user = comment["user"]
print(f"Article with highest views: {highest_views_article}")
print(f"Article with highest comments: {highest_comments_article}")
print(f"Comment with highest likes: {highest_likes_user} - {highest_likes_comment}")


Article with highest views: Top 10 Web Scraping Tools
Article with highest comments: How to Learn Python
Comment with highest likes: Dave - Awesome list!


### **Reflection**
What is the difference between using item["keys"] with item.get("keys")? What happens if the key isn't exist?

(answer here)

ANSWER HERE

item['keys'] digunakan hanya key nya sudah yakin ada, sedangkan item.get('keys') lebih flexible dimana jika tidak ada akan mengembalikan none atau defailt nilai jika diberikan misal item.get('keys', 'defKeys')

### **Exploration**
Python Collections provides specialized container datatypes beyond the standard Python collection types like lists, tuples, sets, and dictionaries. These container types are designed to make certain tasks more efficient and readable.