# Introduction to Web Scraping with Python

Welcome to this Jupyter Notebook tutorial on web scraping with Python! In this guide, I have explore the basics of web scraping using the Python programming language and two powerful libraries, namely requests for making HTTP **requests** and **BeautifulSoup** for parsing HTML content.

Web scraping is a technique used to extract information from websites. It involves sending HTTP requests to a website, retrieving the HTML content, and then parsing and extracting relevant data. This can be incredibly useful for tasks such as data collection, content aggregation, or automating repetitive tasks.

In this notebook, I have walk through a series of examples to understand how to use Python to scrape data from different websites. I have utilized the requests library to make HTTP requests and the BeautifulSoup library to parse and navigate HTML documents. The examples cover various aspects of web scraping, from accessing a website and checking response status to extracting specific elements based on HTML tags and attributes.

Before proceed with the examples, the necessary libraries installed in your Python environment by using the following commands:

**!pip install requests** </br>
**!pip install beautifulsoup4**

## Loading Libraries

In [23]:
import requests as rq
from bs4 import BeautifulSoup as bs

##  Get request to the Amazon India website

In [24]:
url="https://www.amazon.in/"
r=rq.get(url) # request to access website
print(r.status_code) # check response status

503


In [27]:
print(r.text) # for access whole 

<!DOCTYPE html>
<!--[if lt IE 7]> <html lang="en-us" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]-->
<!--[if IE 7]>    <html lang="en-us" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]-->
<!--[if IE 8]>    <html lang="en-us" class="a-no-js a-lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="a-no-js" lang="en-us"><!--<![endif]--><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title dir="ltr">Amazon.in</title>
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css">
<script>

if (true === true) {
    var ue_t0 = (+ new Date()),
        ue_csm = window,
        ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },
        ue_furl = "fls-eu.amazon.in",
        ue_mid = "A21TJRUUN4KGV",
      

## Pulling data out of HTML and XML

In [28]:
site=bs(r.text, "lxml")
print(site)

<!DOCTYPE html>
<!--[if lt IE 7]> <html lang="en-us" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]--><!--[if IE 7]>    <html lang="en-us" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]--><!--[if IE 8]>    <html lang="en-us" class="a-no-js a-lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><html class="a-no-js" lang="en-us"><!--<![endif]--><head>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<title dir="ltr">Amazon.in</title>
<meta content="width=device-width" name="viewport"/>
<link href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css" rel="stylesheet"/>
<script>

if (true === true) {
    var ue_t0 = (+ new Date()),
        ue_csm = window,
        ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },
        ue_furl = "fls-eu.amazon.in",
        ue_mid = "A21TJRUUN4KGV",
     

## Print outer to inner tag

In [30]:
print(site.div.script) # outer tag to inner tag

<script>
           if (true === true) {
             document.write('<img src="https://fls-eu.amaz'+'on.in/'+'1/oc-csi/1/OP/requestId=KA3QGXQ3N2DXJPQ0CD5S&js=1" />');
           };
          </script>


## Prints the attributes

In [34]:
tag=site.div
print(tag.attrs) # attrs keyword to address attriutes
print(tag.attrs["class"]) # address like dict

{'class': ['a-container', 'a-padding-double-large'], 'style': 'min-width:350px;padding:44px 0 !important'}
['a-container', 'a-padding-double-large']


## Locate elements within an HTML document

In [35]:
r=rq.get("https://www.airbnb.co.uk/")
print(r)

<Response [200]>


In [36]:
site=bs(r.text, "lxml")
tag=site.div.find("a", {"class":"screen-reader-only screen-reader-only-focusable skip-to-content"}) # address attr
tag=site.div.find_all("a", class_="screen-reader-only screen-reader-only-focusable skip-to-content") # address attr
print(tag)

[<a class="screen-reader-only screen-reader-only-focusable skip-to-content" data-hook="skip-to-content" href="#site-content" tabindex="0">Skip to content</a>]


In [37]:
print(len(tag))

1


In [38]:
url="https://webscraper.io/test-sites/e-commerce/allinone"
r=rq.get(url)
site=bs(r.text, "lxml")
tag=site.div
find=tag.find(tag, class_="img-responsive")