# HTML for Web Scraping
## What is HTML
HTML (HyperText Markup Language) is the standard language used to structure content on the web. When you load a web page, your browser interprets HTML to display the layout, text, images, and other content.

## Basic Structure of an HTML Document
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Page</title>
  </head>
  <body>
    <h1>Welcome!</h1>
    <p>This is a sample page.</p>
  </body>
</html>

## Key Elements
- `<!DOCTYPE html>`: Declares the document type.<br>
- `<html>`: Root element of the page.<br>
- `<head>`: Contains metadata, styles, and scripts.<br>
- `<body>`: Contains visible content.<br>

##Common HTML Tags Used in Scraping
Most Frequently Encountered Tags
| Tag                | Purpose                            |
|--------------------|-------------------------------------|
| div                | Section or container for content    |
| span               | Inline container                    |
| a                  | Anchor tag for hyperlinks           |
| img                | Displays images (uses src)          |
| ul, ol, li         | Lists and list items                |
| table, tr, td      | Tables and cells                    |
| h1 to h6           | Headers (various sizes)             |
| p                  | Paragraph text                      |
| form, input, button| Form elements                |

## Attributes in HTML
HTML tags often include attributes that provide metadata or instructions:
<a href="https://example.com" class="nav-link">Visit Site</a>
## Common Attributes
- `href`: Hyperlink reference
- `src`: Image or media source
- `class`: CSS class (commonly used for scraping)
- `id`: Unique identifier for an element
- `name`: Often used in form elements
- `type`: Used in <input> tags

## Navigating HTML Structure in Scraping
Scraping tools like BeautifulSoup and Selenium use tag names and attributes to locate elements.

Example Targets
- By tag: `soup.find('div')`
- By class: `soup.find('div', class_='product')`
- By ID: `soup.find(id='header')`
- By attribute: `soup.find('a', {'href': True})`

## Understanding Nested Elements
HTML is hierarchical. Tags can contain other tags.
<div class="article">
  <h2>Title</h2>
  <p>This is a summary.</p>
</div>


## Practical Tips
- Use browser DevTools (Right-click > Inspect) to examine HTML structure.
- Target elements with unique `id` or descriptive `class` attributes.
- Use tag nesting logic to extract specific parts of a page.

## Useful Python Libraries for HTML Parsing
- `requests`: For sending HTTP requests
- `BeautifulSoup`: For parsing and traversing HTML
- `lxml`: Fast parser for large documents
- `Selenium`: For interacting with JavaScript-rendered pages