## <span style="color:#3366ff; font-weight:bold;">Web Scraping Workshop - Dec 23, Saturday 2023</span>
### Instructor: <span style="font-weight:normal;">Sudip Parajuli</span>
##### Organized by: EXCESS as X-Tech Studio 4.0 PreEvent


## **Module 1: Introduction to Web Scraping**


### What is Web Scraping?
- 🕸️ Web Scraping: the magical art of extracting data from any website!

### Why is Web Scraping Valuable?
- 🌟 Because it turns random web pages into treasure troves of information!

### Real World Applications of Web Scraping
- 🏛️ Academic research: Gathering data for scholarly studies
- 📈 Business insights: Accessing market trends at your fingertips
- 🛒 Price monitoring: Finding the best deals for savvy shoppers
- 🔍 Competitive analysis: Keeping an eye on the competition
- 🗞️ News aggregation: Bringing the headlines to you


## **Module 2: HTML Basics and Inspecting Web Pages**



In [None]:
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>HTML Basics and Inspecting Web Pages</title>
</head>
<body>
    <h1>Welcome to HTML Basics!</h1>
    <p>HTML (HyperText Markup Language) is the standard language for creating web pages. It provides the structure for content on the web.</p>
    
    <h2>Basic HTML Structure</h2>
    <p>HTML documents consist of elements. Each element begins with an opening tag and ends with a closing tag.</p>
    <p>&lt;element&gt;Content&lt;/element&gt;</p>
    
    <h2>Inspecting Web Pages</h2>
    <p>Inspecting web pages allows us to view and understand the structure, styles, and content of a webpage using browser developer tools.</p>
    <p>To inspect a web page:</p>
    <ol>
        <li>Right-click on an element.</li>
        <li>Select "Inspect" or "Inspect Element".</li>
        <li>The developer tools panel will open, displaying the HTML and CSS of the element.</li>
    </ol>
    
    <h2>Example Element</h2>
    <p>This is an example paragraph. You can inspect this paragraph to see its HTML structure.</p>
    
    <h2>Example Table</h2>
    <table border="1">
        <caption>Sample Table</caption>
        <tr>
            <th>Header 1</th>
            <th>Header 2</th>
            <th>Header 3</th>
        </tr>
        <tr>
            <td>Row 1, Cell 1</td>
            <td>Row 1, Cell 2</td>
            <td>Row 1, Cell 3</td>
        </tr>
        <tr>
            <td>Row 2, Cell 1</td>
            <td>Row 2, Cell 2</td>
            <td>Row 2, Cell 3</td>
        </tr>
    </table>
    
    <footer>
        <p>You can put your footer contents here!</p>
    </footer>
</body>
</html>


### **<span style="color: lightgray;">Output:- This is how the rendered page should look like</span>**

![Rendered Image](page.png)


### **Intro to HTML Tags and Attributes**

HTML (Hypertext Markup Language) utilizes tags and attributes to structure and define content within a web page. Tags are used to mark the beginning and end of elements, while attributes provide additional information about the elements.

**HTML Tags**

Tags are enclosed in angle brackets <>, and most come in pairs—an opening tag and a closing tag.


**Example:**


Putting Text
```html
<p>This is a paragraph tag. It has an opening <p> and a closing </p> tag.</p>
<h1> to <h6> Tags (Headings)
Defines headings with varying sizes, where <h1> is the largest and <h6> is the smallest.   
```
Anchor Tag
- Creates hyperlinks to other web pages or resources.
```html
<a href="https://example.com">Visit our website</a>
```

Image
- Embeds images into a web page.
```html
<img src="image.jpg" alt="Description of the image">
```

Unordered List Tag
- Creates an unordered (bulleted) list.
For ordered just replace the ul with ol
```html
<ul>
    <li>Item 1</li>
    <li>Item 2</li>
</ul>
```

Table Tag
- Creates a table structure with rows and columns.

```html
<table>
    <tr>
        <th>Header 1</th>
        <th>Header 2</th>
        <th>Header 3</th>
    </tr>
    <tr>
        <td>Row 1, Cell 1</td>
        <td>Row 1, Cell 2</td>
        <td>Row 1, Cell 3</td>
    </tr>
    <tr>
        <td>Row 2, Cell 1</td>
        <td>Row 2, Cell 2</td>
        <td>Row 2, Cell 3</td>
    </tr>
</table>
```
Form Tag
- Defines a form for user input with input fields and a submit button.
```html
<form action="/submit-form" method="post">
    <label for="username">Username:</label>
    <input type="text" id="username" name="username"><br><br>
    <label for="password">Password:</label>
    <input type="password" id="password" name="password"><br><br>
    <input type="submit" value="Submit">
</form>
```

Div Tag
- Creates a division or a container that can be styled using CSS.
```html
<div>
    <!-- Content to be enclosed within the div -->
</div>
```




### **Overview of CSS selectors for targeting elements**

### **Inspecting Web Pages**

Inspecting web pages allows us to view and understand the structure, styles, and content of a webpage using browser developer tools.

**To inspect a web page:**
1. Right-click on an element.
2. Select "Inspect" or "Inspect Element".
3. The developer tools panel will open, displaying the HTML and CSS of the element.


## **Module 3: Extracting Data with CSS Selectors**

### **Overview of CSS selectors for targeting elements**

### **Locating and targeting elements using CSS Selectors**

### **Extracting data from HTML elements (text, attributes, etc.)**

## **Module 4: Handling Dynamic Web Content**

### **Understanding dynamic web content**

### **Dealing with JavaScript-driven Websites**

### **Introduction to headless browsers and their role in web scraping**

## **Module 5: Web Scraping Tools and Libraries**

### **Introduction to popular web scraping tools and libraries (e.g: BeautifulSoup, Scrapy, Selenium, MechanicalSoup)**

### **Pros and Cons of different tools**

### **Selecting the right tool for your scraping needs**

## **Module 6: Best Practices and Ethical Considerations**

### **Respecting website terms of service and scraping policies**