### Web Scraping with Python 
Web scraping is the process of extracting data from websites using automated scripts. In Python, web scraping is commonly done using the requests and BeautifulSoup libraries, which allow us to retrieve and parse HTML content from web pages.



#### 1. Installing and Importing Required Libraries

In [13]:
import requests
import bs4
import lxml
print("libraries installed successfully!")

libraries installed successfully!


#### Explanation:
- requests: This module is used to send HTTP requests and fetch web page content.

- bs4 (BeautifulSoup): A library for parsing HTML and extracting data from web pages.

- lxml: A parser that improves the performance of BeautifulSoup.


### 2. Sending an HTTP Request and Fetching a Web Page

In [17]:
result = requests.get("https://www.biography.com/")
print(type(result))

<class 'requests.models.Response'>


#### Explanation:
- requests.get(url): Sends an HTTP GET request to the specified URL (https://www.biography.com/).

- type(result): Checks the type of the response object.

### 3. Displaying the HTML Content of the Web Page

In [28]:
print(result.text)

<!DOCTYPE html><html lang="en-US"><head><meta charset="utf-8" data-next-head/><meta name="viewport" content="width=device-width, initial-scale=1.0" data-next-head/><meta name="X-UA-Compatible" http-equiv="X-UA-Compatible" content="IE=edge" data-next-head/><link rel="canonical" href="https://www.biography.com/" data-next-head/><meta name="msapplication-tap-highlight" content="no" data-next-head/><title data-next-head>Biography: Historical and Celebrity Profiles</title><meta name="title" content="Biography: Historical and Celebrity Profiles" data-next-head/><meta name="description" content="Read exclusive biographies, watch videos, and discover fascinating stories about your favorite icons, musicians, authors, and historical figures." data-next-head/><meta name="keywords" content data-next-head/><meta property="og:type" content="website" data-next-head/><meta name="theme-color" content="#000000" data-next-head/><link href="/_assets/design-tokens/biography/static/images/favicon.3635572.ic

In [29]:
<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >\d</span></td><td>A digit</td><td>file_\d\d</td><td>file_25</td></tr>

<tr ><td><span >\w</span></td><td>Alphanumeric</td><td>\w-\w\w\w</td><td>A-b_1</td></tr>



<tr ><td><span >\s</span></td><td>White space</td><td>a\sb\sc</td><td>a b c</td></tr>



<tr ><td><span >\D</span></td><td>A non digit</td><td>\D\D\D</td><td>ABC</td></tr>

<tr ><td><span >\W</span></td><td>Non-alphanumeric</td><td>\W\W\W\W\W</td><td>*-+=)</td></tr>

<tr ><td><span >\S</span></td><td>Non-whitespace</td><td>\S\S\S\S</td><td>Yoyo</td></tr></table>

SyntaxError: invalid syntax (2978712601.py, line 1)

#### Explanation:
- result.text: Retrieves the raw HTML content of the web page.

- Printing this output will display the entire HTML source code of the page.

 Note: The output will be a long string containing HTML tags, styles, and JavaScript.

### 4. Parsing the HTML Using BeautifulSoup

In [20]:
from bs4 import BeautifulSoup

In [21]:
soup = BeautifulSoup(result.text, "lxml")
title = soup.title.text
print("page title: ", title)

page title:  Biography: Historical and Celebrity Profiles


#### Explanation:
BeautifulSoup(result.text, "lxml"): Parses the HTML using the lxml parser.

soup.title.text: Extracts the title of the web page.

### Key Takeaways
1. Web Scraping Process:

    - Use requests to fetch the HTML content of a web page.

    - Use BeautifulSoup to parse and navigate the HTML structure.

    - Extract specific data elements like text, titles, or tables.

2. Common Uses:

    - Extracting news articles, stock market data, product prices, etc.

    - Automating data collection for research and analysis.

3. Best Practices:

    - Always check a website’s robots.txt file to ensure scraping is allowed.

    - Avoid excessive requests to prevent overloading the server.

    - Use headers to mimic a real browser request.