# Beautiful Soup

Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.

It was first introduced by Leonard Richardson, who is still contributing to this project and this project is additionally supported by Tidelift (a paid subscription tool for open-source maintenance)

To read about it more, please refer [this](https://analyticsindiamag.com/beautiful-soup-webscraping-python/) article.

# Code Implementation

## Installation

For installing Beautiful Soup we need Python made framework for the same, and also some other supported or additional frameworks can be installed by given PIP command below:



In [1]:
!python -m pip install pip --upgrade --user 
!python -m pip install beautifulsoup4 requests lxml html5lib --user -q



## Quickstart

A small code to see how BeautifulSoup is faster than any other tools, we are extracting the source code from demoblaze 

In [2]:
from bs4 import BeautifulSoup
import requests  
URL = "https://www.demoblaze.com/"
r = requests.get(URL)  
soup = BeautifulSoup(r.content, 'html5lib')
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
  <meta content="" name="description"/>
  <meta content="" name="author"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <link href="favicon.ico" rel="icon"/>
  <title>
   STORE
  </title>
  <link href="node_modules/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="node_modules/video.js/dist/video-js.min.css" rel="stylesheet"/>
  <link href="css/latofonts.css" rel="stylesheet" type="text/css"/>
  <link href="css/latostyle.css" rel="stylesheet" type="text/css"/>
  <style>
   .navbar-toggler {
      z-index: 1;
    }

    @media (max-width: 800px) {
      #carouselExampleIndicators {
        display: none;
      }
    }

    /* Temporary fix for img-fluid sizing within the carousel */

    .carousel-item.active,
    .carousel-item-next,
    .carousel-item-prev {
      display: block;
    }

    body 

  Now “.prettify()” is a built-in function provided by the Beautiful Soup module, it gives the visual representation of the parsed URL Source code. i.e. it arranges all the tags in a parse-tree manner with better readability
  prettify function

## Let’s Extract Some data !

Our purpose is to scrape all the Titles of articles from the Analytics India Magazine homepage.

In [5]:
#importing modules
from bs4 import BeautifulSoup
r = requests.get('https://analyticsindiamag.com/')
soup = BeautifulSoup(r.content,'lxml')
article_block =soup.find_all('div',class_='post-title')

for titles in article_block:
	title =titles.find('span').get_text()
	print(title)

Can AI Be A Good Teammate?
Google Announces New Updates For Data Studio; Adds Google Maps For Embedded Reports
TensorFlow 2.7.0 Released: All Major Updates & Features
Github Appoints Thomas Dohmke As New CEO
On Making AI Research More Lucrative In India
Infosys Partners With Shell To Launch AI-Based Shell Inventory Optimizer Solution
Planning to Leverage Open Source? Go Ahead! Here’s Why
HCL Technologies To Hire 10,000 Professionals For AWS Business Unit
Using Knowledge Distillation On Augmented Graph Convolutional Networks To Detect Money Laundering In Bitcoin Transactions
Get Ready for the Year’s Most Prestigious Hackathon Presented by Genpact
Shiv Nadar University Delhi-NCR Launches An Annual Championship To Recognize Future Business Leaders In Data Analytics
Latest Technology Innovation To Watch Out For At NVIDIA GTC 2021
Nominations Open For 40Under40 Data Scientists Awards 2022
Why IIM Calcutta’s Advanced Programme In Data Sciences Is The Right Choice For You
Planning to Leverage