# Basics of HTML and CSS

Websites are basically a collection of files that are stored on a server and are accessible through the internet. Among these files, the most important one is the **HTML** file. HTML stands for **HyperText Markup Language**. It is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as **Cascading Style Sheets (CSS)** and scripting languages such as JavaScript.

## HyperText Markup Language (HTML)

HTML is the standard language for creating web pages. HTML describes the structure of a web page and consists of a series of elements. These elements tell the browser how to display the content. Elements are represented by tags. Tags label pieces of content such as "heading", "paragraph", "table", and so on.

### Parts of an HTML Element

An HTML element is defined by a start tag, some content, and an end tag:

```html
<tagname>Content goes here...</tagname>
```

Here's an example of an HTML element:

```html
<p>This is a paragraph</p>
```

The HTML element is everything from the start tag to the end tag. Sometimes, an element has no content, and that's why it is called an **empty element**. Empty elements are closed in the start tag.

```html
<emptytagname />
```

Here's an example of an empty element:

```html
<img src="images/firefox-icon.png" alt="My test image" />
```

Have you noticed the `src` and `alt` keywords? These are called **attributes**. Attributes contain extra information about the element that you don't want to appear in the actual content. Here, `src` specifies the URL of the image, and `alt` provides alternative text for the image.

### Nesting Elements

HTML elements can be nested inside other elements. This means that an element can contain another element. For example:

```html
<p>This is a <strong>paragraph</strong>.</p>
```

The `<strong>` element is nested inside the `<p>` element. The `<strong>` element is used to define text with strong importance. The content inside the `<strong>` element will be displayed in bold.

### Basic Structure of an HTML Document

A meaningful web page is built on top of an organized HTML document. By following a standard structure, you can make your HTML easier to understand and maintain. Let's take a look at this example:

```html
<!doctype html>
<html lang="en-US">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width" />
    <title>My test page</title>
  </head>
  <body>
    <img src="images/firefox-icon.png" alt="My test image" />
    <p>The picture above is a test image.</p>
  </body>
</html>
```

Now, let's explain each part of the document:

- `<!doctype html>` declares the document type and version of HTML. This is the standard for HTML5 documents.

- `<html>` is the root element of an HTML page. In our example, it has an attribute called `lang` that specifies the language of the document. In this case, it is set to `en-US` (English).

- `<head>` contains meta-information about the document, such as its title and links to its CSS and JavaScript files.

  - `<meta charset="utf-8" />` specifies the character encoding for the HTML document. UTF-8 is the standard character encoding for the web.

  - `<meta name="viewport" content="width=device-width" />` sets the width of the page to follow the screen-width of the device (which will vary depending on the device).

  - `<title>` sets the title of the document, which is displayed in the browser's title bar or in the page's tab.

- `<body>` contains the visible page content. This is where the text, images, and other content are displayed.

  - `<img>` is an empty element that embeds an image in the document. It has two attributes: `src` and `alt`. The `src` attribute specifies the URL of the image, and the `alt` attribute provides alternative text for the image.

  - `<p>` defines a paragraph. It contains text that will be displayed in the document.

## Cascading Style Sheets (CSS)

CSS is a style sheet language used for describing the presentation of a document written in HTML. CSS describes how elements should be rendered on screen, on paper, in speech, or on other media.

### CSS Syntax

CSS is a rule-based language. You define rules to specify styles for individual elements. Each rule has three parts: a selector, a property, and a value.

```css
selector {
  property: value;
}
```

- **Selector.** This is the HTML element that you want to style. For example, `p` is a selector for all `<p>` elements.

- **Property.** This is the aspect of the element that you want to change. For example, `color` is a property that changes the color of the text.

- **Value.** This is the value of the property. For example, `red` is a value that sets the color to red.

### How to Apply CSS

There are three ways to apply CSS to an HTML document:

1. **Inline CSS.** You can apply CSS directly to HTML elements using the `style` attribute in the tag. For example:

    ```html
    <p style="color: red;">This is a red paragraph.</p>
    ```

2. **Internal CSS.** You can use the `<style>` element in the `<head>` section of the document to apply CSS to the entire document. For example:

    ```html
    <!doctype html>
    <html lang="en-US">
      <head>
        <meta charset="utf-8" />
        <meta name="viewport" content="width=device-width" />
        <title>My test page</title>
        <style>
          p {
            color: red;
          }
        </style>
      </head>
      <body>
        <p>This is a red paragraph.</p>
      </body>
    </html>
    ```

1. **External CSS.** You can use an external CSS file to apply CSS to the entire document. For example, you can create a file called `styles.css` and include it in the `<head>` section of the document. Here's a sample content of `index.html`:

    ```html
    <!doctype html>
    <html lang="en-US">
      <head>
        <meta charset="utf-8" />
        <meta name="viewport" content="width=device-width" />
        <title>My test page</title>
        <link rel="stylesheet" type="text/css" href="styles.css" />
      </head>
      <body>
        <p>This is a red paragraph.</p>
      </body>
    </html>
    ```

    And here's a sample content of `styles.css`:

    ```css
    /* styles.css */
    p {
      color: red;
    }
    ```

## Why Learn HTML and CSS?

Learning HTML and CSS is crucial for web scraping because these languages form the foundation of how information is structured and presented on the web.

When performing web scraping, understanding HTML allows you to identify and locate the specific data you want to extract by recognizing the underlying structure of the webpage. Additionally, knowing CSS enables you to navigate through the webpage's styling to accurately extract relevant content, ensuring that your scraping scripts effectively retrieve the desired information.

## Some Considerations

As someone who has been involved in web scraping for a while, I learned some nuances that should be considered when scraping websites.

- **Messy HTML.** Some websites have messy HTML that can make it difficult to scrape the content you want. This can include missing tags, mismatched tags, inconsistent patterns, and other issues.

- **Dynamic Content.** Some websites use JavaScript to load content dynamically. This means that the content is not present in the HTML when the page is first loaded.

- **Infinite Scrolling.** Some websites use infinite scrolling to load more content as you scroll down the page. This can make it difficult to scrape all the content you want.

## Learning Resources

If you're interested in learning more about HTML and CSS, here are some resources to get you started:

- [Mozilla Developer Network (MDN) Web Docs](https://developer.mozilla.org/en-US/)
- [Project Odin - Foundations Course](https://www.theodinproject.com/paths/foundations/courses/foundations)
- [FreeCodeCamp - Learn HTML and CSS](https://www.freecodecamp.org/news/learn-html-and-css-from-the-ceo-of-scrimba/)