# Requests

The **requests** library is a popular and widely used Python library for making HTTP requests. It provides a simple and intuitive interface for sending HTTP requests, handling responses, and managing various aspects of the HTTP protocol. 

In [1]:
import requests

In [2]:
response = requests.get("https://www.espncricinfo.com")

In [3]:
response

<Response [200]>

In [4]:
if response:
    print("Successful")
else:
    print("An Error occured")

Successful


In [5]:
import requests
from requests.exceptions import HTTPError

In [6]:
try:
    response = requests.get("https://www.espncricio.com/invalid")

    # If the response was successful, no Exception will be raised
    response.raise_for_status()
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')  # Python 3.6
except Exception as err:
    print(f'Other error occurred: {err}')  # Python 3.6
else:
    print('Success!')

Other error occurred: HTTPSConnectionPool(host='www.espncricio.com', port=443): Max retries exceeded with url: /invalid (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002374066E5F0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


In [7]:
response = requests.get("https://www.espncricinfo.com")

In [None]:
response.content

In [None]:
response.text

## Headers
Response headers are part of the HTTP response sent by a server in response to an HTTP request made by a client (e.g., a web browser or an API client). <br>
Response headers typically include metadata about the response, such as the content type of the response payload, the length of the content, the server type, caching directives, cookies, and more.

In [15]:
response.headers

{'Content-Type': 'text/html; charset=utf-8', 'x-hsci-cache-time': '2023-07-06T13:40:06.140Z', 'Content-Encoding': 'gzip', 'Content-Length': '58533', 'Expires': 'Thu, 06 Jul 2023 13:40:42 GMT', 'Cache-Control': 'max-age=0, no-cache, no-store', 'Pragma': 'no-cache', 'Date': 'Thu, 06 Jul 2023 13:40:42 GMT', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Set-Cookie': 'country=pk; path=/, edition=espncricinfo-en-pk; expires=Thu, 13-Jul-2023 13:40:42 GMT; path=/, edition-view=espncricinfo-en-pk; expires=Thu, 13-Jul-2023 13:40:42 GMT; path=/, region=unknown; expires=Thu, 13-Jul-2023 13:40:42 GMT; path=/, _dcf=1; expires=Thu, 13-Jul-2023 13:40:42 GMT; path=/, connectionspeed=full; expires=Thu, 13-Jul-2023 13:40:42 GMT; path=/, SWID=9b773a7a-5bf5-4c00-b6ae-172195d51c2f; expires=Wed, 01-Jul-2043 13:40:42 GMT; path=/; domain=.espncricinfo.com', 'strict-transport-security': 'max-age=15552000'}

In [16]:
response.headers['content-Length']

'58533'

In [17]:
import requests

url = "https://dad-jokes.p.rapidapi.com/random/joke"

headers = {
    "X-RapidAPI-Key": "c8f979d1c9mshf087da133c07655p157502jsnd1bcad804d03",
    "X-RapidAPI-Host": "dad-jokes.p.rapidapi.com"
}

response = requests.get(url, headers=headers)

response_json = response.json()

In [18]:
response_json

{'success': True,
 'body': [{'_id': '60dd35be386902dbcec7c4a2',
   'setup': 'I think the Rainforest Cafe takes the whole rainforest theme too far.',
   'punchline': 'This one time I was sitting there eating my chicken tenders and they bulldozed 40% of the restaurant.',
   'type': 'cafe',
   'likes': [],
   'author': {'name': 'unknown', 'id': None},
   'approved': True,
   'date': 1618108661,
   'NSFW': False,
   'shareableLink': 'https://dadjokes.io/joke/60dd35be386902dbcec7c4a2'}]}

# grequests

The **grequests** library is a Python library that provides an interface for sending multiple HTTP requests asynchronously using **requests** library. It allows you to send multiple requests concurrently, making it more efficient and faster compared to sending requests sequentially. The "g" in **grequests** stands for "gevent," which is the underlying library used for asynchronous execution.<br>
**grequests** is a powerful library for efficiently handling multiple asynchronous requests in Python. It's particularly useful when you need to make multiple requests concurrently and optimize the performance of your application.

In [2]:
import grequests

In [3]:
urls = []
for x in range(1,4):
    urls.append(f"https://www.fiverr.com/search/gigs?query=web%20development&source=pagination&ref_ctx_id=2735480b6e13e6ba588c6b695fc3b777&search_in=everywhere&search-autocomplete-original-term=web%20development&page={x}&offset=-3")
    

In [4]:
Requests = [grequests.get(link) for link in urls]
Response = grequests.map(Requests)

Response

[<Response [403]>, <Response [403]>, <Response [403]>]

# Beautiful Soup

Beautiful Soup is a Python library used for web scraping and parsing HTML and XML documents. It provides an easy-to-use and powerful interface to extract data from web pages. Beautiful Soup allows you to navigate and search the HTML/XML tree structure, extract specific elements or data, and manipulate the parsed data.<br>

Here are some key features and functionalities of Beautiful Soup:<br>

* Parsing HTML/XML: Beautiful Soup takes raw HTML or XML content and parses it into a tree-like structure, allowing you to navigate and search the document easily. It can handle imperfect and poorly formatted markup and provides methods to find elements based on tags, attributes, and CSS selectors.

* Accessing Elements: Once the document is parsed, Beautiful Soup provides several methods to access specific elements or groups of elements within the document. You can access elements by tag name, CSS class, ID, or attribute values.

* Navigating the Tree Structure: Beautiful Soup represents the document as a tree structure, allowing you to traverse and navigate the elements. You can move up and down the tree, access parent and sibling elements, and find children and descendants.

* Searching and Filtering: Beautiful Soup provides powerful search and filtering capabilities to extract specific elements or data from the document. You can search for elements based on various criteria, such as tag names, attributes, or text content. It supports CSS selectors, regular expressions, and advanced filtering methods.

* Modifying and Manipulating Data: Beautiful Soup allows you to modify the parsed data by adding, removing, or modifying elements, attributes, or text content. You can extract data, clean it, and transform it according to your needs.

* Integration with External Parsers: Beautiful Soup supports various external parsers, such as lxml, html5lib, and html.parser. These parsers provide different trade-offs between speed, flexibility, and compatibility, allowing you to choose the most suitable one for your use case.

In [12]:
import requests
URL = "https://www.geeksforgeeks.org/data-structures/"
r = requests.get(URL)
print(r)

<Response [200]>


In [6]:

from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content,'html.parser')

In [7]:
soup.prettify()



In [8]:
soup.find('title').text

'Data Structures - GeeksforGeeks'

In [9]:
 
r = requests.get("https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed") 

  
htmldata = r.text 
soup = BeautifulSoup(htmldata, 'html.parser') 
data = '' 
for data in soup.find_all("p"): 
    print(data.get_text()) 

Before you read this article and learn automation in Python….let’s watch a video of Christian Genco (a talented programmer and an entrepreneur) explaining the importance of coding by taking the example of automation.
You might have laughed loudly after watching this video and you surely, you might have understood the importance of automation in real life as well. Let’s come to the topic now…
We all know that Python is ruling all over the world, and we also know that Python is beginner’s friendly and it’s easy to learn in comparison to other languages. One of the best things you can do with Python is Automation. 

Consider a scenario that you’re asked to create an account on a website for 30,000 employees. How would you feel? Surely you will be frustrated doing this task manually and repeatedly. Also, this is going to take too much time which is not a smart decision. 
Now just imagine the life of employees who are into the data entry jobs. Their job is to take the data from tables such 

In [10]:
import re
response = requests.get("https://stackoverflow.com/questions")
      
html_document = response.text

soup = BeautifulSoup(html_document, 'html.parser')


for link in soup.find_all('a', attrs={'href': re.compile("^https://")}):
    print(link.get('href'))

https://stackoverflow.com
https://stackoverflow.co/
https://stackoverflow.co/teams/
https://stackoverflow.co/teams/
https://stackoverflow.co/talent/
https://stackoverflow.co/advertising/
https://stackoverflow.co/labs/
https://stackoverflow.co/
https://stackoverflow.com
https://stackoverflow.com
https://stackoverflow.com/help
https://chat.stackoverflow.com/?tab=site&host=stackoverflow.com
https://meta.stackoverflow.com
https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=https%3a%2f%2fstackoverflow.com%2fquestions
https://stackoverflow.com/users/login?ssrc=site_switcher&returnurl=https%3a%2f%2fstackoverflow.com%2fquestions
https://stackexchange.com/sites
https://stackoverflow.blog
https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2fquestions
https://stackoverflow.com/users/signup?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2fquestions
https://stackoverflow.com/jobs/companies?so_medium=stackoverflow&so_source=SiteNav
https://