### Web Scraping

Web scraping is the process of extracting data from websites. It involves fetching the content of a web page and parsing it to retrieve specific information. This technique is commonly used for data collection, analysis, and automation tasks.

#### Key Components of Web Scraping:
1. **HTTP Requests**: Sending requests to a website's server to retrieve HTML content using libraries like `requests` or `urllib`.
2. **HTML Parsing**: Extracting specific data from the HTML structure using tools like `BeautifulSoup` or `lxml`.
3. **Data Storage**: Saving the extracted data in a structured format such as CSV, JSON, or a database.

#### Common Tools and Libraries:
- **Python**:
    - `BeautifulSoup` (for parsing HTML and XML)
    - `Scrapy` (a web scraping framework)
    - `Selenium` (for scraping dynamic content)
- **JavaScript**:
    - `Puppeteer` (for headless browser automation)

#### Ethical Considerations:
- Always check the website's `robots.txt` file to ensure compliance with its scraping policies.
- Avoid overloading servers with excessive requests.
- Respect copyright and data privacy laws.

In [1]:
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup

In [7]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'
}

url = 'https://www.ambitionbox.com/list-of-companies?page=1'

try:
    response = requests.get(url, headers=headers, timeout=1000)
    print(response.text)
except requests.exceptions.ReadTimeout:
    print("Read timed out. Try increasing timeout or checking your connection.")
except requests.exceptions.RequestException as e:
    print("Request failed:", e)


Request failed: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
