# Why Understanding HTML and Markdown Is Important in Web Scraping

## 1. Why You Need to Understand HTML  
HTML is a markup language that defines the structure of web pages, so understanding the HTML structure is essential for accurately extracting data during web scraping.

- Understanding Tags and Hierarchy  
    - Knowing the roles of HTML tags such as `<div>`, `<span>`, `<table>`, `<ul>`, and `<li>` makes it easier to locate the desired data.  
    - You can use attributes such as `id`, `class`, `name`, and `href` to select specific elements.

- Using XPath and CSS Selectors  
    - Libraries like BeautifulSoup (Python) and rvest (R) use CSS selectors or XPath to find specific elements.  
    - Example: `soup.select('div.article > p')` (BeautifulSoup) or `html_nodes(doc, "div.article > p")` (rvest)

- Handling Dynamic Web Pages (Understanding JavaScript Rendering)  
    - Some websites use JavaScript to load data dynamically, so tools like Selenium or Playwright are needed.

## 2. Why You Need to Understand Markdown  
Markdown is mainly used for documentation, blog posts, and API docs, and you may need to handle it when extracting data from the web.

- Handling Web Pages with Markdown  
    - You may need to retrieve data stored in Markdown format from websites (e.g., GitHub, Jupyter Notebook, blogs) and convert it to HTML.  
    - You might also need to restore Markdown after extracting only text from HTML using `BeautifulSoup.get_text()`.

- Cleaning and Converting Markdown Data  
    - For example, you may need to crawl text saved in Markdown from a web page and convert it to another format (HTML, LaTeX, etc.).  
    - You can use `markdown2`, `mistune` (Python), or `markdown` (R package) for conversion.

# 1. HTML Key Concepts

## What is HTML?
- **HTML (HyperText Markup Language)** is used to structure content on the web.
- Uses tags to structure content (text, images, links, etc.).
- A basic web page typically includes HTML, CSS, and JavaScript.

## Basic structure
- The basic structure of an HTML document includes: `<html> → <head> → <body>`

**HTML Example**
```html
<!DOCTYPE html>
<html>
<head>
  <title>Page Title</title>
</head>
<body>
  <!-- Content goes here -->
</body>
</html>
```

## Common HTML Tags

- `<h1>` to `<h6>`: Headings (`<h1>` is the largest)
```html
  <h1>Main Title</h1>
  <h2>Section Title</h2>
```
- `<p>`: Paragraph
```html
<p>This is a paragraph of text.</p>
```
- `<strong>`: Bold text
```html
<p>This is <strong>important</strong> information.</p>
```
- `<em>`: Italic text
```html
<p>This word is <em>emphasized</em>.</p>
```
- `<a href="URL">`: Hyperlink
```html
<a href="https://example.com">Visit Example</a>
```
- `<img src="path" alt="description">`: Image
```html
<img src="profile.jpg" alt="Profile Photo" width="200">
```
- `<video controls>` + `<source src="file.mp4" type="video/mp4">`: Embed local video
```html
<video width="640" height="360" controls>
  <source src="my_video.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
```
- `<iframe src="https://www.youtube.com/embed/ID">`: Embed YouTube video
```html
<iframe width="640" height="360"
        src="https://www.youtube.com/embed/VIDEO_ID"
        frameborder="0"
        allowfullscreen>
</iframe>
```

# 2. Lists, Tables, Forms in HTML

## Lists in HTML (`<ul>`, `<ol>`)

- Used to display a list of items on a webpage. Two main types:

    - `<ul>`: Unordered list (with bullet points)
    - `<ol>`: Ordered list (with numbers)
    - `<li>`: List item

**HTML Example**
```html
<h2>My Academic Interests</h2>
<ul>
  <li>Statistics</li>
  <li>Data Analysis & Data Visualization</li>
  <li>Sports Big Data</li>
</ul>

<h2>My Hobbies</h2>
<ol>
  <li>Piano</li>
  <li>Soccer</li>
  <li>Original Sound Track</li>
</ol>
```

## Tables in HTML (`<table>`)

Used to display structured data in a grid format.

- `<table>`: Table container
- `<tr>`: Table row
- `<th>`: Table header cell (bold)
- `<td>`: Table data cell

**HTML Example**
```html
<h2>My Profile</h2>
<table border="1">
  <tr>
    <th>Item</th>
    <th>Details</th>
  </tr>
  <tr>
    <td>Name</td>
    <td>Soonwon KWON</td>
  </tr>
  <tr>
    <td>Occupation</td>
    <td>Fourth-year student</td>
  </tr>
</table>
```

## Forms in HTML (`<form>`)

Used to collect user input.

- `<form>`: Wraps the input elements
- `<input>`: Single-line input field (e.g., text, checkbox, password)
- `<textarea>`: Multi-line text input
- `<button>`: Clickable button (submit, etc.)

**HTML Example**
```html
<h2>Leave a Message</h2>
<form action="/submit" method="POST">
  <label for="name">Name:</label>
  <input type="text" id="name" name="name" required><br><br>

  <label for="message">Message:</label><br>
  <textarea id="message" name="message" rows="4" cols="40"></textarea><br><br>

  <button type="submit">Submit</button>
</form>
```
- `<label>`: Describes input field
- `<input type="text">`: Single-line text input
- `<textarea>`: Multi-line text area
- `<button type="submit">`: Submit button

# 3. CSS Basics & Webpage Styling

## What is CSS?
- CSS (Cascading Style Sheets) is a language that controls the **style and layout** of HTML elements.
- While HTML creates the structure, CSS adjusts **colors**, **sizes**, and **layout**.

## Three Ways to Apply CSS

### Inline Style (using `style` attribute)
**HTML Example**
```html
<p style="color: blue;">This text is blue.</p>
```
- Style applied directly to HTML elements  
- Not recommended (hard to maintain, messy)

### Internal Style (using `<style>` tag)
**HTML Example**
```html
<head>
  <style>
    p {
      color: blue;
      font-size: 18px;
    }
  </style>
</head>
```
- CSS written inside the HTML document  
- Useful for small projects

### External Style (recommended)
**HTML Example**
```html
<head>
  <link rel="stylesheet" href="styles.css">
</head>
```
- CSS written in a separate `.css` file  
- Easy to maintain and reuse across pages

## CSS Syntax & Selectors

### Basic Syntax
**CSS Example**
```css
selector {
  property: value;
}
```
- **Selector**: targets an element  
- **Property**: what you want to change  
- **Value**: how you want it to look

### Common CSS Selectors

| Selector | Description                    | Example                      |
|----------|--------------------------------|------------------------------|
| `*`      | All elements                   | `* { margin: 0; }`           |
| `h1`     | Tag name selector              | `h1 { color: red; }`         |
| `.class` | Class selector                 | `.title { font-size: 20px; }`|
| `#id`    | ID selector                    | `#header { background: black; }` |
| `A, B`   | Multiple selectors             | `h1, p { color: blue; }`     |
| `A B`    | Descendant selector (B inside A) | `div p { color: green; }`  |

### Example CSS
**CSS Example**
```css
/* Apply to whole page */
body {
  font-family: Arial, sans-serif;
  background-color: #f0f0f0;
}

/* Heading style */
h1 {
  color: darkblue;
  text-align: center;
}

/* Paragraph style */
p {
  font-size: 18px;
  color: gray;
}
```

## CSS Box Model

### What is the Box Model?
Every HTML element is treated as a rectangular box with:

- **Content**: text or image inside
- **Padding**: space between content and border
- **Border**: line surrounding the box
- **Margin**: space outside the border (separates from other elements)

### Box Model Example
**CSS Example**
```css
.box {
  width: 300px;
  padding: 20px;
  border: 2px solid black;
  margin: 10px;
}
```

```html
<div class="box">This is a box model example.</div>
```

## CSS Layout: `display` & Flexbox

### `display` Property

| Value   | Description                              |
|---------|------------------------------------------|
| `block` | Elements stack vertically (e.g., `<div>`) |
| `inline` | Elements stay inline (e.g., `<span>`)    |
| `flex`  | Flexible layout system for arrangement    |

### Flexbox Basic Example
**CSS Example**
```css
.container {
  display: flex;
  justify-content: space-around;
}
.box {
  width: 100px;
  height: 100px;
  background-color: lightblue;
}
```

```html
<div class="container">
  <div class="box">1</div>
  <div class="box">2</div>
  <div class="box">3</div>
</div>
```

# 4. Markdown Basics

## What is Markdown?

- Markdown is a lightweight markup language for writing documents using plain text.
- Easier than HTML and highly readable.
- Widely used in platforms like GitHub, Jupyter Notebook, and RMarkdown.
- In R, Markdown is used in `.Rmd` files to combine documentation and executable code.

## Basic Markdown Syntax

### Headers
```markdown
# Header 1  
## Header 2  
### Header 3
```

### Emphasis
```markdown
*Italic* or _Italic_  
**Bold** or __Bold__  
~~Strikethrough~~
```
*Italic* , **Bold**, ~~Strikethrough~~

### Lists

- **Unordered List:**
```markdown
- Item 1  
  - Sub-item 1.1  
  - Sub-item 1.2  
- Item 2
```

- **Ordered List:**
```markdown
1. First item  
2. Second item  
3. Third item
```

### Links

```markdown
[CRAN R Official Site](https://cran.r-project.org/)
```

[CRAN R Official Site](https://cran.r-project.org/)

### Images

```markdown
![R Logo](https://www.r-project.org/logo/Rlogo.png)
```

![R Logo](https://www.r-project.org/logo/Rlogo.png)

# 5. Introduction to RMarkdown

## What is RMarkdown?

- RMarkdown (`.Rmd`) is a document format that combines R code and narrative text.
- Allows you to create documents with analysis results and interpretation together.
- Output formats: **HTML, PDF, Word**.
- Structure: **YAML Header + Markdown Text + R Code Chunks**

## RMarkdown Document Structure

### 1. YAML Header

**YAML Header Example**

```yaml
---
title: "RMarkdown Example"
author: "Soonwon KWON"
date: "`r Sys.Date()`"
output: html_document
---
```

- Defines document title, author, date, and output format.

### 2. Markdown Text + R Code Chunk

**Markdown Example**
````markdown
## Data Analysis Results

Here we compute some basic summary statistics.

```{r}
summary(cars)
```
````

- **R code chunks (` ```{r} `)** allow you to run R code inside the document.

### 3. Output Formats: HTML, PDF, Word

- After writing your `.Rmd` file, click the **Knit** button to render the document.
- Requires the `rmarkdown` package.

# 6. Using Markdown & RMarkdown: Tables, Code Blocks, Math

## Tables, Code Blocks, and Math in Markdown

### 1. Creating Tables in Markdown

Markdown uses pipes `|` and hyphens `-` to format tables.

**Basic Table Example:**

```markdown
| Name         | Date of Birth | Nationality   |
|--------------|----------------|----------------|
| Luka Modrić  | 1985-09-09     | Croatia        |
| Eden Hazard  | 1991-01-07     | Belgium        |
```

| Name         | Date of Birth | Nationality   |
|--------------|----------------|----------------|
| Luka Modrić  | 1985-09-09     | Croatia        |
| Eden Hazard  | 1991-01-07     | Belgium        |

### 2. Writing Code Blocks

Useful for displaying analysis code or command-line instructions clearly.

- **Inline Code:**  
  ```markdown
  `summary(mtcars)`
  ```

- **Multi-line Code Block (R):**  
  ````markdown
  ```r
  summary(mtcars)
  ```
  ````

### 3. Writing Math with LaTeX

In RMarkdown, you can use LaTeX syntax to render math equations.

- **Inline Math:**
```markdown
`$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) $`
```

$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)$

- **Block Math:**  
```markdown
$$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)
$$
```

$$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)$$

## Using ggplot2 and Data Tables in RMarkdown

### Inserting ggplot2 Plots

R code chunks can include plots using `ggplot2`.

**Basic ggplot2 Example:**

````markdown
```{r, echo=FALSE}
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp, color = factor(cyl))) +
  geom_point() +
  theme_minimal()
```
````

### Printing Clean Tables with `kable()`

Use `knitr::kable()` in RMarkdown to generate clean and formatted tables.

**Basic kable Table Example:**

````markdown
```{r}
library(knitr)
kable(head(mtcars))
```
````

# 7. Advanced Use of RMarkdown

## Document Format Conversion in RMarkdown

### Output Format Options

RMarkdown supports rendering into various formats:

- HTML document (`html_document`)
- PDF document (`pdf_document`)
- Word document (`word_document`)

**Basic YAML Header Example:**

```yaml
---
title: "Analysis Report"
author: "Soonwon KWON"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    number_sections: true
---
```

- `toc: true`: Adds a table of contents  
- `number_sections: true`: Adds numbered headings

## Creating Interactive Documents

### Using Shiny in RMarkdown

- You can add interactive features by setting `runtime: shiny` in the YAML header.

**Shiny Document Example:**

```yaml
---
title: "Interactive Shiny Document"
output: html_document
runtime: shiny
---
```

**Markdown Example**
````markdown
## Summary Based on Input

```{r, echo=FALSE}
library(shiny)

sliderInput("obs", "Number of observations to show:", min = 1, max = 100, value = 10)

renderPrint({
  summary(mtcars[1:input$obs, ])
})
```
````

### Introduction to Flexdashboard

- **Flexdashboard** is a package built on RMarkdown to easily create dashboards.
- It is useful for displaying multiple visualizations in a single screen layout.

**Basic Flexdashboard YAML Header:**

```yaml
---
title: "Flexdashboard Example"
output: flexdashboard::flex_dashboard
---
```

**Flexdashboard Example Layout:**

````markdown
Column {data-width=650}
-------------------------------------

### ggplot2 Scatter Plot

```{r}
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp, color = factor(cyl))) +
  geom_point()
```

Column {data-width=350}
-------------------------------------

### Data Table

```{r}
library(DT)
datatable(mtcars)
```
````

# 8. Writing Practical Reports & Automating Analysis with RMarkdown

## Writing a Practical Report

### Key Components of a Good Report

- Set title, author, and date using a YAML header  
- Include a table of contents (`toc: true`) and section numbering (`number_sections: true`)  
- Summarize data, visualize results, and report key statistics  
- Present conclusions and directions for further analysis  

### Example Report Template

**YAML Header Example:**

```yaml
---
title: "Customer Data Analysis Report"
author: "Soonwon KWON"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    number_sections: true
---
```

**Markdown Example:**

````markdown
## 1. Data Overview

This report presents an analysis of customer data.

```{r}
library(dplyr)
data <- mtcars %>% group_by(cyl) %>% summarise(avg_mpg = mean(mpg))
knitr::kable(data)
```

## 2. Visualization

```{r}
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp, color = factor(cyl))) +
  geom_point() +
  theme_minimal()
```

## 3. Conclusion & Insights

- Cars with a higher number of cylinders (`cyl`) tend to have lower fuel efficiency (`mpg`).  
- Further data collection and analysis are recommended.
````

## Repetitive Analysis & Automated Report Generation

### Parameterized Reports

- Useful when applying the same analysis to different datasets  
- You can use `params` to dynamically insert input values and automate report generation

### Example of a Parameterized Report Using `params`

**YAML Header Example:**

```yaml
---
title: "Automated Analysis Report"
output: html_document
params:
  dataset: "mtcars"
---
```

**Markdown Example:**

````markdown
## Data Summary

Target dataset: `r params$dataset`

```{r}
dataset <- get(params$dataset)
summary(dataset)
```
````

# 9. Using Markdown on GitHub

## The Role of Markdown on GitHub

### Why Markdown is Important on GitHub
- GitHub uses **Markdown (.md files)** as the default format for writing project documents  
- Markdown is useful for explaining project details, installation steps, and usage examples  
- It also enhances formatting inside **GitHub Issues** and **Pull Requests (PRs)**

### Common Use Cases

- `README.md` – Describe the project, how to install/use it  
- `CONTRIBUTING.md` – Guide for contributing to the project  
- Issues / Pull Requests – Report bugs, request features, summarize changes  

## Writing a README.md File

### Basic README.md Template

````markdown
# Project Title

## Introduction
This project is an example to learn **how to use Markdown on GitHub**.

## Installation
1. Clone this repository:
   ```
   git clone https://github.com/username/repository.git
   ```
2. Install the required packages  
3. Run the project  

## Usage Example
```r
print("Hello, GitHub Markdown!")
```

## How to Contribute
1. Open an issue  
2. Create a branch and make your changes  
3. Submit a Pull Request (PR)  

## License
MIT License
````

- Use `#` for section headings  
- Use `**bold**` to emphasize text  
- Use backticks for `inline code` and code blocks  

## Using Markdown in GitHub Issues & Pull Requests

### Markdown in Issues

- Use Markdown to clearly format bug reports, feature requests, and tasks  
- Use checklists, mentions (`@username`), and issue references (`#issue-number`)

### Issue Template Example

````markdown
## 🐞 Bug Report

### Description
- [ ] Behavior is different from expected  
- [ ] Certain feature is not working  

### Steps to Reproduce
1. Run this command: `python script.py`  
2. Error message:  
   ```
   Error: File not found.
   ```

### Expected Result
It should run without errors.

### Environment
- OS: Ubuntu 22.04  
- R Version: 4.1.2
````

### Markdown in Pull Requests (PRs)

- Use Markdown to document changes, summarize commits, and list test results  
- Include checklists to guide code review

### Pull Request Template Example

```markdown
## Changes Made
- Added new feature (`feature-branch`)  
- Refactored existing code  

## Checklist
- [x] Code successfully runs  
- [x] Documentation updated  
- [ ] Additional tests needed  

## Screenshots
(Include UI or output images if applicable)

## Related Issues
Fixes #12
```