### Discussion Week 6

We will review an example that highlights the need of being proficient in xpath syntax, because we are not able to inspect the html using devtools. 

## Xpath Scraping Examples & Explanation


```markdown
# Discussion: XPath Fundamentals and Practical Scraping

XPath (XML Path Language) is a query language for selecting nodes from an XML/HTML document. With XPath, we can precisely locate elements in a structured document based on tags, attributes, text, and hierarchy.

In web scraping scenarios, especially when working with HTML documents, XPath offers powerful capabilities such as:
- **Selecting elements by tag** (e.g., `//a`, `//table`, etc.)
- **Selecting elements by attribute** (e.g., `//div[@class="content"]`)
- **Selecting elements containing specific text** (e.g., `//td[contains(text(), "Genre")]`)
- **Navigating the tree structure** (using child, sibling, or ancestor axes, such as `parent::`, `following-sibling::`, `preceding-sibling::`, etc.)

Often, the HTML you get via an HTTP library (e.g., `requests`) can differ from what you see in your browser, because many sites serve different HTML to mobile vs. desktop clients, or because scripts dynamically manipulate the DOM. This can make scraping challenging if you rely solely on DevTools to copy selectors from a rendered page. Below is an illustrative example using `requests` and `lxml` to scrape [imsdb.com](https://imsdb.com/).

---

## Scraping Genre Links Example

```python
import requests
import lxml.html as lx

# Step 1: Retrieve the page's HTML
result = requests.get('https://imsdb.com/')
result.raise_for_status()  # Ensure no HTTP errors
html_content = result.text

# Step 2: Parse the HTML content
html = lx.fromstring(html_content)

# Step 3: (Demonstration) Trying to select a specific <table><tbody> might return empty
# (because the structure is different from what we see in some device view)
genre_table = html.xpath('//table/tbody')
print("Attempt to find table/tbody:", genre_table)  # Likely returns []

# Step 4: Different HTML is served depending on viewport/device. In some views,
# the 'Genres' section might appear as a table row with a specific <td> containing the text "Genre".
# In other (mobile) views, it might not appear at all (or only as a script-generated dropdown).
# Let's assume we have the "desktop" or large-viewport HTML.

# One trick: look for the table row containing the cell with text "Genre" 
# (or partial match in case there's trailing whitespace like "\r\n").
genres = html.xpath('//table[tr/td[contains(text(), "Genre")]]/tr//a/@href')
print("Genre links found:", genres)

# Explanation:
#  - //table[tr/td[contains(text(), "Genre")]]: find a <table> that has a <tr>-><td> containing "Genre"
#  - /tr//a/@href: within that table, find all <a> elements inside <tr> and return their "href" attributes
```

In many real-world scenarios, you must carefully inspect the raw HTML returned by `requests` (rather than the rendered HTML in your browser) to craft XPath queries that match the actual structure you’re scraping.

---

## Scraping Script Date Example

In another scenario, suppose we want to retrieve the movie release year from a page like:
[Interstellar Script](https://imsdb.com/Movie%20Scripts/Interstellar%20Script.html).

After inspecting the HTML (mindful it may differ between desktop and mobile), we note that the script date is found as text after a `<b>` element with the text `"Script Date"`. For example:

```python
import requests
import lxml.html as lx

url = 'https://imsdb.com/Movie%20Scripts/Interstellar%20Script.html'
response = requests.get(url)
response.raise_for_status()
html = lx.fromstring(response.text)

# We'll extract all text from <td> elements within the table that has class="script-details"
script_details_texts = html.xpath('//table[@class="script-details"]//td/text()')
print("Script details (all text):", script_details_texts)

# If we specifically want the text node immediately following the <b> element that has text "Script Date":
date_text = html.xpath('//b[text()="Script Date"]/following-sibling::text()[1]')
print("Raw script date text:", date_text)

# The returned text might contain additional words, whitespace, or punctuation.
# Next, we'd typically use regular expressions to isolate the four-digit year from the text.
# For now, we'll just demonstrate that the immediate text is captured.
```

---

## Key Takeaways on XPath Usage

1. **Absolute vs. Relative Paths**  
   - `//tag` searches for `<tag>` anywhere in the document, while `/tag` searches only in the immediate children of the current node.

2. **Attribute Conditions**  
   - `//div[@class="nav"]` selects `<div>` elements with `class="nav"`.

3. **Text Matching**  
   - `//td[text()="Genre"]` matches a `<td>` whose **entire** text content is `"Genre"`.
   - `//td[contains(text(),"Genre")]` matches a `<td>` whose text content contains `"Genre"` as a substring.

4. **Handling Whitespace & Newlines**  
   - Real HTML often includes line breaks like `\r\n`. To handle partial matches, use `contains()` or normalize space if needed.

5. **Navigation Axes**  
   - `following-sibling::`, `preceding-sibling::`, `parent::`, `child::`, etc. let you move in the document relative to a known node.

By understanding these XPath strategies, you can more flexibly navigate HTML structures that aren’t always consistent — especially when the site provides different renders (e.g., mobile vs. desktop) or dynamically alters the DOM via JavaScript.

---

**Note:**  
To handle tricky situations where the page is significantly different when rendered in a browser (due to JavaScript or device-based rendering), you may need to:
- Emulate a specific User-Agent and send the correct headers to get the “desktop” version.
- Use a headless browser solution (e.g., `Selenium`, `Playwright`) to execute JavaScript and get the fully rendered page.

```



## Extended XPath Overview & Examples


```markdown
# Key Takeaways on XPath Usage

## Absolute vs. Relative Paths
- `//tag` searches for `<tag>` anywhere in the document.
- `/tag` searches for `<tag>` only in the immediate children of the current node (i.e., from the root in a full path).

**Example**:
```python
# Absolute path from the document root
html.xpath('/html/body/div/p')

# Relative path (searches anywhere in the document)
html.xpath('//p')
```

## Attribute Conditions
- `//div[@class="nav"]` selects all `<div>` elements with `class="nav"`.

**Example**:
```python
# Select all <img> elements whose "alt" attribute equals "logo"
html.xpath('//img[@alt="logo"]')
```

## Text Matching
- `//td[text()="Genre"]` matches a `<td>` whose entire text content is `"Genre"`.
- `//td[contains(text(),"Genre")]` matches a `<td>` whose text content contains `"Genre"` as a substring.

**Example**:
```python
# Exact text match
html.xpath('//span[text()="Subscribe"]')

# Partial text match (avoids issues with whitespace or additional text)
html.xpath('//span[contains(text(), "Subscribe")]')
```

## Handling Whitespace & Newlines
HTML often includes line breaks like `\r\n`. To handle partial matches, you can use:
- `contains()`
- Functions like `normalize-space()`

**Example**:
```python
# Using contains() to avoid missing text with stray newline characters
html.xpath('//td[contains(text(), "Genre")]')

# Using normalize-space() if there's excessive spacing
html.xpath('//td[normalize-space(text())="Genre"]')
```

## Navigation Axes
- `following-sibling::`, `preceding-sibling::`, `parent::`, `child::`, etc.  
  These allow you to move in the document relative to a known node.

**Example**:
```python
# Select the text node following a <b> element with text 'Script Date'
html.xpath('//b[text()="Script Date"]/following-sibling::text()[1]')

# Select any <div> that is the parent of an <img> with src="logo.png"
html.xpath('//img[@src="logo.png"]/parent::div')
```

---

# Additional XPath Grammar and Methods

Below we introduce more XPath concepts, including wildcard usage, union operators, and common functions for more powerful queries.

## Wildcards
- `*` matches any element node (regardless of its name).
- `@*` matches any attribute node.

**Example**:
```python
# Select all child elements under <div> of class "container", regardless of tag name
html.xpath('//div[@class="container"]/*')

# Select all attributes of the <img> elements
html.xpath('//img/@*')
```

## Union (|) Operator
- Combines multiple XPath expressions so you can select multiple sets of nodes.

**Example**:
```python
# Select all <div> or <span> elements
html.xpath('//div | //span')
```

## Common Functions
- `starts-with(string, substring)`: Tests if `string` starts with `substring`.
- `substring(string, start, length)`: Returns a portion of `string`.
- `string-length(string)`: Returns the length of a string.
- `count(node-set)`: Returns the number of nodes in a node set.

**Example**:
```python
# Select <a> elements whose href starts with "https"
html.xpath('//a[starts-with(@href, "https")]')

# Count how many <p> elements exist
num_paragraphs = html.xpath('count(//p)')
print("Number of <p> elements:", num_paragraphs)
```

## Context Nodes and Parent/Child Notation
- `.` refers to the current context node.
- `..` refers to the parent of the current node.

**Example**:
```python
# From a known element "el", select its parent's sibling divs
el.xpath('../following-sibling::div')
```

## Putting It All Together
When crafting your XPath, you often combine these features:
1. Start with a known node or wildcard.
2. Use predicate filters on attributes/text/position.
3. Employ axes to move to siblings, parents, children, etc.
4. Apply string functions or partial matches to handle real-world HTML quirks.

**Example**:
```python
# 1. Find a table containing a <td> with text "Genre"
# 2. Then locate all <a> within that table (in any row).
genre_links = html.xpath('//table[tr/td[contains(text(), "Genre")]]//a/@href')

# 3. Move from a known <b> element's text to the next text node.
release_date_text = html.xpath('//b[text()="Script Date"]/following-sibling::text()[1]')

# 4. Use starts-with() to filter anchor links that begin with "/scripts".
script_links = html.xpath('//a[starts-with(@href, "/scripts")]/@href')
```

---

By understanding these XPath strategies and methods, you can become more agile in navigating and extracting data from HTML documents that vary in layout or contain dynamic elements. Always remember to inspect the **actual** HTML returned by your HTTP client (e.g., `requests`) rather than relying solely on the rendered DOM in a browser, which may include additional transformations or scripts.

```


Consider the website [`https://imsdb.com/`](https://imsdb.com/). We want to scrape the links that are in the _Genre_ sidebar. Using devtools, we can inspect this element and find that its a child of `table/tbody`. 

In [1]:
import requests
import lxml.html as lx
result = requests.get('https://imsdb.com/')
result.raise_for_status
html = lx.fromstring(result.text)
html.xpath('//table/tbody')

[]

The list is empty. Copying the xpath from devtools doesn't help either. Apparently, the html that the requests returns is not the same as the one rendered by Google Chrome. We can inspect whatever is being returned by checking the _Networks_ tab. 

While re-loading the webpage to monitor the communication in the _Networks_ tab, we note see that the html (for smaller dimensions which can be set in the upper left corner) is now rendered for mobile use. The sidebar with _Genre_ section is now missing. Going back to inspecting the html we see, that the genres are now listed as dropdown menu. The dropdown menu does not contain links, those are generated by a script. 

Back to the network tab! Cycling through all requests, we find that the html is returned as `Document`, but no other data is transferred. Lets inspect the request, and navigate to its _Response_ tab. We can search it for the string `Genres`. We find three instances, but all preparing the script, none containing the links. While dealing with scripts was presented in todays lecture, we should adjust the dimensions (upper left corner) to something larger (e.g., _Nest Hub Max_). 

A new request will now return a different html. Searching for the string `Genre` will now find the corresponding table, its in a different element structure as in our first attempt. However, ... 

In [None]:
html.xpath('//td[text()="Genre"]')

Some whitespace characters prevent us from finding the element! (Direct inspection of `request.text` shows that its `"Genre\r\n"`! 

In [None]:
html.xpath('//td[contains(text(), "Genre")]') 

Now, how to get the correct anchors? 

In [None]:
html.xpath('//table[tr/td[contains(text(), "Genre")]]/tr//a/@href') 

Perfect! Now, consider the [_Interstellar_](https://imsdb.com/Movie%20Scripts/Interstellar%20Script.html) page. We want to retrieve the movie release year. After inspecting the html (it might not be accurate!), we find that the date is the content of a `<td>` element, but is cluttered between a variety of other elements. 

In [None]:
result = requests.get('https://imsdb.com/Movie%20Scripts/Interstellar%20Script.html')
result.raise_for_status

In [None]:
html = lx.fromstring(result.text)

In [None]:
html.xpath('//table[@class="script-details"]//td/text()') 

Its there, but how to we retrieve the correct element text? 

In [None]:
html.xpath('//b[text() = "Script Date"]/following-sibling::text()[1]')

From here, we will use regular expressions to extract the digits of the year. We will learn about regular expressions next week. In the meantime, become an xpath [ninja](https://topswagcode.com/xpath/)!

# Beautiful Soup

## Tutorial: Beautiful Soup Basics


```markdown
# Tutorial: Beautiful Soup Basics

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Here's a brief step-by-step tutorial that outlines how to get started:

---

## 1. Installation

Before using Beautiful Soup, you need to install it. If you haven’t already:

```bash
pip install beautifulsoup4
```

---

## 2. Importing and Creating a Soup Object

To begin parsing HTML, import both `requests` (or another HTTP library) and `BeautifulSoup`:

```python
import requests
from bs4 import BeautifulSoup

# Fetch a webpage
url = "https://example.com"
response = requests.get(url)

# Create a BeautifulSoup object from the HTML text
soup = BeautifulSoup(response.text, "html.parser")

# Alternatively, parse an HTML string directly
html_doc = "<html><body><p>Hello!</p></body></html>"
soup_from_string = BeautifulSoup(html_doc, "html.parser")
```

A `BeautifulSoup` object (`soup` in these examples) acts as a structured representation of your HTML. You can navigate and search it like a tree.

---

## 3. Parsing and Navigating the HTML Tree

### 3.1 Accessing Elements by Tag

```python
# Access the first <title> tag found in the document
page_title = soup.title

# Access the first <body> tag
page_body = soup.body

# Access the first <p> tag
paragraph = soup.p
```

Remember that these direct accesses (`soup.p`, `soup.title`) only give you **the first** occurrence of that tag.

### 3.2 Going Down the Tree

- `.contents` gives a list of **all children** of a tag.
- `.children` is an **iterator** over those children.

```python
# If we want to see what's inside <body>
print(soup.body.contents)

# Or iterate over the children:
for child in soup.body.children:
    print(child)
```

### 3.3 Going Up the Tree

If you have a tag, you can find its parent:

```python
# Access a tag's parent
if soup.p:
    parent_of_p = soup.p.parent
    print("Parent of <p>:", parent_of_p.name)
```

---

## 4. Finding Elements

### 4.1 `find_all()`

`find_all()` returns **all** matches of your query:

```python
# All paragraph tags
paragraphs = soup.find_all("p")

# All tags that have class="important"
important_tags = soup.find_all(class_="important")
```

### 4.2 `find()`

`find()` returns **the first** match:

```python
# First <p> tag with class "important"
first_important_p = soup.find("p", class_="important")
```

### 4.3 CSS Selectors via `.select()`

Use `.select()` to match elements using CSS selectors (like in a browser’s DevTools):

```python
# All <p> tags
paragraphs_css = soup.select("p")

# A <p> tag with id="best-paragraph"
best_paragraph = soup.select("p#best-paragraph")

# A <p> tag with class="important"
important_paragraphs = soup.select("p.important")
```

---

## 5. Extracting Text and Attributes

### 5.1 `.get_text()`

`get_text()` returns **all** text within a tag (including its descendants), stripped of HTML tags:

```python
body_text = soup.body.get_text()
print(body_text)
```

### 5.2 Accessing Attributes

Tags can be treated like a dictionary to get/set attributes:

```python
some_link = soup.find("a")
if some_link:
    href_value = some_link["href"]   # Might raise KeyError if "href" missing
    safer_href = some_link.get("href", "No link available")
    print("Link:", safer_href)
```

You can also view the entire attributes dictionary:

```python
print(some_link.attrs)  # e.g., {"href": "https://example.com"}
```

---

## 6. Advanced Tasks

Below are some advanced tasks you can perform with Beautiful Soup:

1. **Filtering by Function**  
   You can pass a function to `find_all()` or `find()` to define a custom matching condition.

2. **Modifying the Parse Tree**  
   You can insert, delete, or reorder tags within the parsed structure.

3. **Handling Non-Standard Documents**  
   Beautiful Soup is forgiving of poorly formed HTML, but you might need to experiment with different parsers (e.g., `"html5lib"`).

---

## 7. Output and Debugging

### 7.1 `.prettify()`

```python
print(soup.prettify())      # Prints the entire HTML in a nicely formatted way
print(soup.body.prettify()) # Prints just the <body> section
```

This helps you understand the structure that Beautiful Soup sees, which may differ from the raw HTML if there are minor errors.

---

## 8. Putting It All Together: Example

```python
import requests
from bs4 import BeautifulSoup

# Suppose there's a page containing multiple <div class="product"> sections
url = "https://example.com/products"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Step 1: Locate all the product containers
product_divs = soup.find_all("div", class_="product")

# Step 2: Extract details from each product
for product in product_divs:
    title_tag = product.find("h2")
    price_tag = product.find(class_="price")

    if title_tag and price_tag:
        product_name = title_tag.get_text().strip()
        product_price = price_tag.get_text().strip()
        print(f"Product: {product_name} | Price: {product_price}")
```

This small script:
1. Downloads a webpage with `requests`.
2. Parses it with `BeautifulSoup`.
3. Finds multiple product `<div>` elements.
4. Extracts their names and prices, printing them in a loop.

---

**Congratulations!** You now have a quick overview of how Beautiful Soup “thinks” and how you can navigate, search, and extract data from HTML documents in Python. For more complex tasks—like dealing with JavaScript-heavy pages—consider using tools such as `Selenium` or `Playwright` to render dynamic content before scraping.

```


## Beautiful Soup   Example 1


```markdown
# Beautiful Soup

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for navigating, searching, and modifying the parse tree.

Official documentation: [Beautiful Soup Docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

---

## 1. Making the Soup

```python
import requests
from bs4 import BeautifulSoup
import pandas as pd

# An example HTML snippet we want to parse:
page = """
<html>
<head>
    <title>This is the Title!</title>
</head>
<body>
    <p id="best-paragraph">This is a paragraph!</p>
    <p class="important">This is another paragraph! &#127790;</p>
    <p>Visit <a href="https://pudding.cool">The Pudding</a>.</p>
    <span class="important">This is a span, it comes with a taco &#127790;</span>
</body>
</html>
"""

# Creating a BeautifulSoup object from the HTML string
page_soup = BeautifulSoup(page, "html.parser")
print(type(page_soup))  # -> <class 'bs4.BeautifulSoup'>
```

Beautiful Soup transforms the document into a tree of Python objects. Each element or piece of text becomes a node in this parse tree, accessible via the `page_soup` object.

---

## 2. Navigating the Tree

### Navigating by Tag Type

```python
page_soup.head  # returns the <head> element
page_soup.head.title  # returns the <title> element within <head>
page_soup.p  # returns the first <p> element in the document
```

### Going Down: Children

A tag’s children are the tags and strings nested within it.

- **`.contents`** returns a **list** of a tag’s children.
- **`.children`** returns an **iterator** for looping through the children.

```python
print(page_soup.body.contents)
# This returns a list containing the <p> and <span> tags inside <body>.

for child in page_soup.body.children:
    print(child)
# Iterates over each child element (including text nodes or whitespace) in <body>.
```

### Going Up: Parents

You can access a tag’s parent with the **`.parent`** attribute.

```python
print(page_soup.title.parent)
# Returns the parent of <title>, which is <head> in this case.
```

---

## 3. Searching the Tree

Beautiful Soup provides methods to search for elements by tag, attributes, or custom filters.

### `.find_all()`

- **`.find_all()`** looks through a tag’s descendants and returns **all** matches.

```python
# Find all <p> tags
all_paragraphs = page_soup.find_all(name="p")
print(all_paragraphs)

# Find all tags with a specific id
best_paragraph = page_soup.find_all(id="best-paragraph")

# Find all tags with a specific class
important_tags = page_soup.find_all(class_="important")  # class_ (underscore) is required for 'class'
```

You can also pass a dictionary for arbitrary attributes:

```python
page_soup.find_all(attrs={"class": "important"})
```

### `.find()`

- **`.find()`** returns the **first** match in the document.

```python
first_title_tag = page_soup.find(name="title")
first_important_tag = page_soup.find(class_="important")
```

### CSS Selector: `.select()`

- **`.select()`** takes a **CSS selector** string and returns a list of all matching elements.

```python
page_soup.select("p")                  # All <p> tags
page_soup.select("p#best-paragraph")   # <p> tag with id="best-paragraph"
page_soup.select("p.important")        # <p> tag with class="important"
```

---

## 4. Contents and Attributes

### `.get_text()`

- **`.get_text()`** returns all the text beneath a tag (and its descendants) as a single string.

```python
print(page_soup.body.get_text())
```

### Accessing Attributes

Tags in Beautiful Soup can be treated like dictionaries to access attributes.

```python
example_p = page_soup.p
print(example_p["id"])       # If <p> has an id attribute
print(example_p.get("id"))   # Safer access, returns None if not present
```

You can view all attributes of a tag using **`.attrs`**:

```python
print(example_p.attrs)  # A dictionary of all attributes on the <p> tag
```

---

## 5. Output

### `.prettify()`

- **`.prettify()`** will return a nicely formatted (indented) string representation of the soup or a tag.

```python
print(page_soup.prettify())      # Pretty-print the entire document
print(page_soup.body.prettify()) # Pretty-print just the <body> tag
```

This is useful for inspecting the structure of the parsed HTML in a more human-readable form.

---

## Example Summary

```python
import requests
from bs4 import BeautifulSoup

# Suppose we retrieve a page with requests (instead of a raw string).
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# 1. Find elements by tag
links = soup.find_all("a")

# 2. Use CSS selectors
div_items = soup.select("div.item")

# 3. Extract text
for item in div_items:
    print(item.get_text())

# 4. Access attributes
for link in links:
    href = link.get("href")
    print("Link:", href)
```

By using Beautiful Soup’s intuitive methods, you can parse HTML documents, navigate their structure, and extract exactly the information you need.
```


# Example 2: National Weather Service

Let's scrape the [National Weather Service](https://weather.gov/) for the weather forecast of Davis, CA.

## Annotated Example: Scraping the National Weather Service


```markdown
# Scraping the National Weather Service: Davis, CA Forecast

In this example, we’ll demonstrate how to:
1. Send an HTTP request to the National Weather Service page for Davis, CA.
2. Parse the HTML response with Beautiful Soup.
3. Extract specific weather data (period names, short descriptions, temperatures, and detailed descriptions).
4. Assemble the data into a pandas DataFrame.

```python
import requests
from bs4 import BeautifulSoup
import pandas as pd

# ----------------------------------------------------------------------------
# Step 1: Identify the Target URL
# ----------------------------------------------------------------------------
# This URL points to the 7-day forecast for Davis, CA on the National Weather Service website.
url = "https://forecast.weather.gov/MapClick.php?lat=38.54669000000007&lon=-121.74456999999995#.Y9fY5vv565t"

# ----------------------------------------------------------------------------
# Step 2: Fetch the Page Content
# ----------------------------------------------------------------------------
# We use the requests library to get the page. 
# Calling 'raise_for_status()' ensures we raise an exception for any HTTP errors (e.g., 404, 500).
response = requests.get(url)
response.raise_for_status()

# ----------------------------------------------------------------------------
# Step 3: Parse HTML with BeautifulSoup
# ----------------------------------------------------------------------------
# Parse the retrieved HTML text, building a Soup object for further navigation and search.
html_soup = BeautifulSoup(response.text, "html.parser")

# ----------------------------------------------------------------------------
# Step 4: Identify and Extract the "Seven-Day Forecast" Section
# ----------------------------------------------------------------------------
# The page contains a <div> tag with id="seven-day-forecast-container", 
# which holds the daily weather forecast data.
seven_day = html_soup.find(id="seven-day-forecast-container")
print(seven_day.prettify())  # This pretty-prints the found HTML (useful for debugging)

# ----------------------------------------------------------------------------
# Step 5: Extract the Forecast Period Names
# ----------------------------------------------------------------------------
# We look for <p> tags with class="period-name". 
# Each period typically contains a label like "Tonight", "Monday", "Monday Night", etc.
period_names = seven_day.find_all("p", class_="period-name")
# We then gather the text content of each matching tag into a list.
period = [name.get_text() for name in period_names]
print(period)

# ----------------------------------------------------------------------------
# Step 6: Extract the Short Weather Descriptions
# ----------------------------------------------------------------------------
# The short weather descriptions are contained in <p> tags with class="short-desc".
descs = seven_day.find_all("p", {"class": "short-desc"})
description = [desc.get_text() for desc in descs]
print(description)

# ----------------------------------------------------------------------------
# Step 7: Extract the Temperatures
# ----------------------------------------------------------------------------
# Temperatures often have either "temp-hi" or "temp-lo" in their class name. 
# We use a CSS selector that matches any <p> tag whose class attribute *contains* the string "temp".
temps = seven_day.select("p[class*='temp']")
temperature = [temp.get_text() for temp in temps]
print(temperature)

# ----------------------------------------------------------------------------
# Step 8: Extract Detailed Descriptions
# ----------------------------------------------------------------------------
# The images within each tombstone-container have a "title" attribute 
# that provides an extended weather description. 
# We use a CSS selector to find <img> elements inside <div class="tombstone-container">.
images = seven_day.select("div.tombstone-container img")

# "attrs" is a dictionary of HTML attributes. We extract the "title" value from each <img>.
details = [image.attrs["title"] for image in images]
print(details)

# ----------------------------------------------------------------------------
# Step 9: Clean Up the Detailed Descriptions
# ----------------------------------------------------------------------------
# The text might look like "Tonight: Clear. Low around 39. Northwest wind 7 to 9 mph."
# Notice how the portion before the colon might be a time period label like "Tonight". 
# We only want the forecast description part.
example_detail = details[1]
print(example_detail)

# The .partition(":") method splits the string into three parts:
#   1) the substring before the colon
#   2) the colon itself
#   3) the substring after the colon
# Here, we only want the part after the colon (index [2]) and then .strip() to remove extra whitespace.
cleaned_detail = example_detail.partition(":")[2].strip()
print(cleaned_detail)

# We can apply this to every item in the list to remove the period/time label:
new_details = [detail.partition(":")[2].strip() for detail in details]
print(new_details)

# ----------------------------------------------------------------------------
# Step 10: Assemble the Data into a Pandas DataFrame
# ----------------------------------------------------------------------------
# We create a dictionary that maps column names to the lists we collected,
# then pass that dictionary to pd.DataFrame.
weather = pd.DataFrame({
    "Period": period,
    "Description": description,
    "Temperature": temperature,
    "Detail": new_details
})

print(weather)
```

## Explanation

1. **Get the webpage**:  
   We use the `requests` library to fetch the page’s HTML. The URL points to the specific latitude/longitude location for Davis, CA on the National Weather Service site.

2. **Parse with Beautiful Soup**:  
   `BeautifulSoup(response.text, "html.parser")` takes the raw HTML and converts it into a soup object, allowing us to search and navigate the HTML.

3. **Find the “seven-day-forecast-container”**:  
   This `<div>` element groups the day-by-day forecast information.  

4. **Extracting data**:  
   - **Period**: Inside each day’s weather block, the name of the period (like “Today,” “Tonight,” or “Tuesday”) appears in a `<p>` with class “period-name.”  
   - **Short Description**: Found in `<p class="short-desc">`.  
   - **Temperature**: These appear in `<p>` elements whose class contains “temp” (e.g., “temp hi” or “temp lo”).  
   - **Detailed Description**: Found within the `title` attribute of `<img>` elements. That text often includes the period name plus forecast details, so we split the string at the colon (`:`) to separate them.

5. **DataFrame Construction**:  
   We gather the extracted data in lists and convert them to a tidy DataFrame. The resulting table has columns for the period, short weather description, temperature, and a fully detailed forecast description.

This approach shows how to use **Beautiful Soup** effectively for structured web scraping tasks. For dynamic pages (relying heavily on JavaScript), additional tools like Selenium or Playwright might be needed. However, many sites (like this NWS page) provide static HTML that you can scrape directly.
```
