<a href="https://colab.research.google.com/github/faisu6339-glitch/Machine-learning/blob/main/WebScrap(Revision).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

üîπ What is requests?

requests is a Python library used to send HTTP requests to a web server and receive responses (like HTML content, JSON, images, etc.).

In web scraping, we mainly use it to:

Fetch a webpage‚Äôs HTML source

Send GET and POST requests

Add headers (like User-Agent)

Handle status codes, cookies, and sessions

üîπ Why use requests for Web Scraping?

‚úî Simple and beginner-friendly
‚úî Faster than browser automation (Selenium)
‚úî Ideal for static websites
‚úî Works perfectly with BeautifulSoup

In [1]:
pip install requests




In [2]:
pip install requests beautifulsoup4




In [3]:
import requests
from bs4 import BeautifulSoup


In [4]:
url = "https://en.wikipedia.org/wiki/Mahatma_Gandhi"

headers = {
    "User-Agent": "Mozilla/5.0"
}

response = requests.get(url, headers=headers)

print(response.status_code)


200


In [5]:
soup = BeautifulSoup(response.text, "html.parser")


In [6]:
title = soup.find("h1").text
print("Title:", title)


Title: Mahatma Gandhi


üîπ Step 6: Extract First 5 Paragraphs

In [7]:
count = 0
for p in soup.find_all("p"):
    text = p.text.strip()
    if text:
        print(text)
        print("-" * 50)
        count += 1
    if count == 5:
        break


Mohandas Karamchand Gandhi[c] (2¬†October 1869¬†‚Äì 30¬†January 1948)[2] was an Indian lawyer, anti-colonial nationalist, and political ethicist who employed nonviolent resistance to lead the successful campaign for India's independence from British rule. He inspired movements for civil rights and freedom across the world. The honorific MahƒÅtmƒÅ (from Sanskrit, meaning great-souled, or venerable), first applied to him in South Africa in 1914, is used worldwide.[3]
--------------------------------------------------
Born and raised in a Hindu family in coastal Gujarat, Gandhi was trained in the law at the Inner Temple in London and was called to the bar at the age of 22. After two uncertain years in India, where he was unable to start a successful law practice, Gandhi moved to South Africa in 1893 to represent an Indian merchant in a lawsuit. He went on to live in South Africa for the next 21 years. Here, Gandhi raised a family and first employed nonviolent resistance in a campaign for 

üîπ Step 7: Extract All Section Headings/

In [8]:
for heading in soup.find_all(["h2", "h3"]):
    print(heading.text.replace("[edit]", ""))


Contents
Early life and background
Parents
Childhood
Marriage
Three years in London
Student of law
Vegetarianism and committee work
Called to the bar
Civil rights activist in South Africa (1893‚Äì1914)
Europeans, Indians and Africans
Struggle for Indian independence (1915‚Äì1947)
Role in World War I
Champaran agitations
Kheda agitations
Khilafat Movement
Non-co-operation
Salt Satyagraha (Salt March/Civil Disobedience Movement)
Gandhi as folk hero
Negotiations
Round Table Conferences
Congress politics
World War II and Quit India movement
Partition and independence
Death
Funeral and memorials
Principles, practices, and beliefs
Truth and Satyagraha
Nonviolence
Brahmacharya: abstinence from sex and food
Literary works
Legacy
Followers and international influence
Global days that celebrate Gandhi
Awards
Film, theatre, and literature
21st-century impact within India
Descendants
See also
Notes
References
General and cited references
External links


üîπ Step 8: Extract Infobox Data (Key‚ÄìValue)

In [9]:
info = {}

infobox = soup.find("table", class_="infobox")
if infobox:
    for row in infobox.find_all("tr"):
        th = row.find("th")
        td = row.find("td")
        if th and td:
            info[th.text.strip()] = td.text.strip()

for k, v in info.items():
    print(f"{k}: {v}")


Born: Mohandas Karamchand Gandhi(1869-10-02)2 October 1869Porbandar, Kathiawar Agency, India
Died: 30 January 1948(1948-01-30) (aged¬†78)New Delhi, India
Cause¬†of death: Assassination by gunshot
Monuments: Raj Ghat, Delhi
Gandhi Smriti, New Delhi
Other¬†names: BƒÅp≈´ (father), RƒÅ·π£·π≠rapitƒÅ (the Father of the Nation)
Alma¬†mater: Samaldas Arts College[a]University College London[b]Inns of Court School of Law
Occupations: Lawyeractivistpolitician
Years¬†active: 1893‚Äì1948
Known¬†for: Leadership of the campaign for India's independence from British ruleNonviolent resistance
Political party: Indian National Congress (1920‚Äì1934)
Spouse: Kasturba Gandhi
‚Äã ‚Äã(m.¬†1883; died¬†1944)‚Äã
Children: HarilalManilalRamdasDevdas
Parents: Karamchand GandhiPutlibai Gandhi
Relatives: Gandhi family
Preceded by: Maulana Azad
Succeeded by: Sarojini Naidu


üîπ Step 9: Extract All Wikipedia Links

In [10]:
links = []

for a in soup.find_all("a", href=True):
    link = a["href"]
    if link.startswith("/wiki/") and ":" not in link:
        links.append("https://en.wikipedia.org" + link)

print(links[:10])  # First 10 links


['https://en.wikipedia.org/wiki/Main_Page', 'https://en.wikipedia.org/wiki/Main_Page', 'https://en.wikipedia.org/wiki/Mahatma_Gandhi', 'https://en.wikipedia.org/wiki/Mahatma_Gandhi', 'https://en.wikipedia.org/wiki/Mahatma_Gandhi', 'https://en.wikipedia.org/wiki/Gandhi_(disambiguation)', 'https://en.wikipedia.org/wiki/Mah%C4%81tm%C4%81', 'https://en.wikipedia.org/wiki/Porbandar', 'https://en.wikipedia.org/wiki/Assassination_of_Mahatma_Gandhi', 'https://en.wikipedia.org/wiki/Raj_Ghat_and_associated_memorials']


In [11]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
res=requests.get('https://en.wikipedia.org/wiki/Mahatma_Gandhi', headers=headers)
soup=BeautifulSoup(res.text,'html.parser')

In [12]:
soup

<!DOCTYPE html>

<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-available" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Mahatma Gandhi - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-fe

In [13]:
heading=soup.find('h1').text
print(heading)

Mahatma Gandhi


In [14]:
page_title = soup.title.text
print(page_title)

Mahatma Gandhi - Wikipedia


In [15]:
soup.text



In [16]:
print(soup.text.replace('\n\n',''))


Mahatma Gandhi - WikipediaJump to contentMain menuMain menu
move to sidebar
hide		Navigation
	
Main pageContentsCurrent eventsRandom articleAbout WikipediaContact us		Contribute
	
HelpLearn to editCommunity portalRecent changesUpload fileSpecial pagesSearchSearch
Appearance
DonateCreate accountLog in
Personal toolsDonate Create account Log in
Contents
move to sidebar
hide
(Top)1
Early life and background
Toggle Early life and background subsection1.1
Parents
1.2
Childhood
1.3
Marriage
2
Three years in London
Toggle Three years in London subsection2.1
Student of law
2.2
Vegetarianism and committee work
2.3
Called to the bar
3
Civil rights activist in South Africa (1893‚Äì1914)
Toggle Civil rights activist in South Africa (1893‚Äì1914) subsection3.1
Europeans, Indians and Africans
4
Struggle for Indian independence (1915‚Äì1947)
Toggle Struggle for Indian independence (1915‚Äì1947) subsection4.1
Role in World War I
4.2
Champaran agitations
4.3
Kheda agitations
4.4
Khilafat Movement
4.5
N

In [17]:
print(soup.text.replace('\n\n',''))


Mahatma Gandhi - WikipediaJump to contentMain menuMain menu
move to sidebar
hide		Navigation
	
Main pageContentsCurrent eventsRandom articleAbout WikipediaContact us		Contribute
	
HelpLearn to editCommunity portalRecent changesUpload fileSpecial pagesSearchSearch
Appearance
DonateCreate accountLog in
Personal toolsDonate Create account Log in
Contents
move to sidebar
hide
(Top)1
Early life and background
Toggle Early life and background subsection1.1
Parents
1.2
Childhood
1.3
Marriage
2
Three years in London
Toggle Three years in London subsection2.1
Student of law
2.2
Vegetarianism and committee work
2.3
Called to the bar
3
Civil rights activist in South Africa (1893‚Äì1914)
Toggle Civil rights activist in South Africa (1893‚Äì1914) subsection3.1
Europeans, Indians and Africans
4
Struggle for Indian independence (1915‚Äì1947)
Toggle Struggle for Indian independence (1915‚Äì1947) subsection4.1
Role in World War I
4.2
Champaran agitations
4.3
Kheda agitations
4.4
Khilafat Movement
4.5
N

In [18]:
soup.find_all('p')

[<p class="mw-empty-elt">
 </p>,
 <p><b>Mohandas Karamchand Gandhi</b><sup class="reference" id="cite_ref-4"><a href="#cite_note-4"><span class="cite-bracket">[</span>c<span class="cite-bracket">]</span></a></sup> (2<span class="nowrap">¬†</span>October 1869¬†‚Äì 30<span class="nowrap">¬†</span>January 1948)<sup class="reference" id="cite_ref-5"><a href="#cite_note-5"><span class="cite-bracket">[</span>2<span class="cite-bracket">]</span></a></sup> was an Indian lawyer, <a href="/wiki/Nationalism#anti-colonial" title="Nationalism">anti-colonial nationalist</a>, and <a href="/wiki/Political_ethics" title="Political ethics">political ethicist</a> who employed <a href="/wiki/Nonviolent_resistance" title="Nonviolent resistance">nonviolent resistance</a> to lead the successful <a href="/wiki/Indian_independence_movement" title="Indian independence movement">campaign for India's independence</a> from <a href="/wiki/British_Raj" title="British Raj">British rule</a>. He inspired movements for 

In [19]:
for p in soup.find_all('p'):
  print(p.text)
  print('-'*10)



----------
Mohandas Karamchand Gandhi[c] (2¬†October 1869¬†‚Äì 30¬†January 1948)[2] was an Indian lawyer, anti-colonial nationalist, and political ethicist who employed nonviolent resistance to lead the successful campaign for India's independence from British rule. He inspired movements for civil rights and freedom across the world. The honorific MahƒÅtmƒÅ (from Sanskrit, meaning great-souled, or venerable), first applied to him in South Africa in 1914, is used worldwide.[3]

----------
Born and raised in a Hindu family in coastal Gujarat, Gandhi was trained in the law at the Inner Temple in London and was called to the bar at the age of 22. After two uncertain years in India, where he was unable to start a successful law practice, Gandhi moved to South Africa in 1893 to represent an Indian merchant in a lawsuit. He went on to live in South Africa for the next 21 years. Here, Gandhi raised a family and first employed nonviolent resistance in a campaign for civil rights. In 1915, age

In [20]:
corpus=''
for p in soup.find_all('p'):
    if p.text.strip():  # Check if the paragraph has non-empty text
        corpus=corpus+p.text.strip()
        corpus=corpus+'\n'

print(corpus)

Mohandas Karamchand Gandhi[c] (2¬†October 1869¬†‚Äì 30¬†January 1948)[2] was an Indian lawyer, anti-colonial nationalist, and political ethicist who employed nonviolent resistance to lead the successful campaign for India's independence from British rule. He inspired movements for civil rights and freedom across the world. The honorific MahƒÅtmƒÅ (from Sanskrit, meaning great-souled, or venerable), first applied to him in South Africa in 1914, is used worldwide.[3]
Born and raised in a Hindu family in coastal Gujarat, Gandhi was trained in the law at the Inner Temple in London and was called to the bar at the age of 22. After two uncertain years in India, where he was unable to start a successful law practice, Gandhi moved to South Africa in 1893 to represent an Indian merchant in a lawsuit. He went on to live in South Africa for the next 21 years. Here, Gandhi raised a family and first employed nonviolent resistance in a campaign for civil rights. In 1915, aged 45, he returned to Indi

In [21]:
for i in range(3,467):
  print('['+str(i)+']')

[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
[87]
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
[99]
[100]
[101]
[102]
[103]
[104]
[105]
[106]
[107]
[108]
[109]
[110]
[111]
[112]
[113]
[114]
[115]
[116]
[117]
[118]
[119]
[120]
[121]
[122]
[123]
[124]
[125]
[126]
[127]
[128]
[129]
[130]
[131]
[132]
[133]
[134]
[135]
[136]
[137]
[138]
[139]
[140]
[141]
[142]
[143]
[144]
[145]
[146]
[147]
[148]
[149]
[150]
[151]
[152]
[153]
[154]
[155]
[156]
[157]
[158]
[159]
[160]
[161]
[162]
[163]
[164]
[165]
[166]
[167]
[168]
[169]
[170]
[171]
[172]
[173]
[174]
[175]
[176]
[177]
[178]
[179]
[180]
[181]
[182]
[183]
[184]
[185]
[186]


In [22]:
for i in range(3,467):
  corpus=corpus.replace('['+str(i)+']','')

In [23]:
print(corpus)

Mohandas Karamchand Gandhi[c] (2¬†October 1869¬†‚Äì 30¬†January 1948)[2] was an Indian lawyer, anti-colonial nationalist, and political ethicist who employed nonviolent resistance to lead the successful campaign for India's independence from British rule. He inspired movements for civil rights and freedom across the world. The honorific MahƒÅtmƒÅ (from Sanskrit, meaning great-souled, or venerable), first applied to him in South Africa in 1914, is used worldwide.
Born and raised in a Hindu family in coastal Gujarat, Gandhi was trained in the law at the Inner Temple in London and was called to the bar at the age of 22. After two uncertain years in India, where he was unable to start a successful law practice, Gandhi moved to South Africa in 1893 to represent an Indian merchant in a lawsuit. He went on to live in South Africa for the next 21 years. Here, Gandhi raised a family and first employed nonviolent resistance in a campaign for civil rights. In 1915, aged 45, he returned to India a

##Scraping Quotes

In [24]:
import requests
from bs4 import BeautifulSoup
link='https://quotes.toscrape.com/catalouge/page-1.html'
res=requests.get(link)
soup=BeautifulSoup(res.text,'html.parser')

#Scrapping only quotes and author name

In [29]:
import requests
from bs4 import BeautifulSoup

link = 'https://quotes.toscrape.com/page/1/'
res = requests.get(link)

soup = BeautifulSoup(res.text, 'html.parser')

quotes = []
authors = []

for quote in soup.find_all('span', class_='text'):
    text = quote.text.strip('‚Äú‚Äù')   # better way to remove quotes
    quotes.append(text)
    print(text)
    print()

for auth in soup.find_all('small', class_='author'):
    authors.append(auth.text)
    print(auth.text)


The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.

It is our choices, Harry, that show what we truly are, far more than our abilities.

There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.

The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.

Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.

Try not to become a man of success. Rather become a man of value.

It is better to be hated for what you are than to be loved for what you are not.

I have not failed. I've just found 10,000 ways that won't work.

A woman is like a tea bag; you never know how strong it is until it's in hot water.

A day without sunshine is like, you know, night.

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
Andr√© Gide
Thoma

In [30]:
authors

['Albert Einstein',
 'J.K. Rowling',
 'Albert Einstein',
 'Jane Austen',
 'Marilyn Monroe',
 'Albert Einstein',
 'Andr√© Gide',
 'Thomas A. Edison',
 'Eleanor Roosevelt',
 'Steve Martin']

In [34]:
quotes

['The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.',
 'It is our choices, Harry, that show what we truly are, far more than our abilities.',
 'There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.',
 'The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.',
 "Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.",
 'Try not to become a man of success. Rather become a man of value.',
 'It is better to be hated for what you are than to be loved for what you are not.',
 "I have not failed. I've just found 10,000 ways that won't work.",
 "A woman is like a tea bag; you never know how strong it is until it's in hot water.",
 'A day without sunshine is like, you know, night.']

#Code to extract quotes,author name,details and tags

In [36]:
for sp in soup.find_all('div', class_='quote'):

    quote = sp.find('span', class_='text')
    if quote:
        print(quote.text)
        print()

    author = sp.find('small', class_='author')
    if author:
        print(author.text)
        print()

    details = sp.find('div', class_='tags')
    if details:
        tags = details.find_all('a', class_='tag')
        for tag in tags:
            print(tag.text)

    print()
    print("*" * 125)


‚ÄúThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.‚Äù

Albert Einstein

change
deep-thoughts
thinking
world

*****************************************************************************************************************************
‚ÄúIt is our choices, Harry, that show what we truly are, far more than our abilities.‚Äù

J.K. Rowling

abilities
choices

*****************************************************************************************************************************
‚ÄúThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.‚Äù

Albert Einstein

inspirational
life
live
miracle
miracles

*****************************************************************************************************************************
‚ÄúThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.‚Äù

Jane Austen

alit

#Creating list

In [37]:
data=[]
for sp in soup.find_all('div', class_='quote'):
  quote = sp.find('span', class_='text')
  author = sp.find('small', class_='author')
  details = sp.find('div', class_='tags')
  tags=[]
  for tag in sp.find_all('a',class_='tag'):
    tags.append(tag.text)
  tags=','.join(tags)
  data.append([quote,author,details,tags])

In [38]:
data

[[<span class="text" itemprop="text">‚ÄúThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.‚Äù</span>,
  <small class="author" itemprop="author">Albert Einstein</small>,
  <div class="tags">
              Tags:
              <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
  <a class="tag" href="/tag/change/page/1/">change</a>
  <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
  <a class="tag" href="/tag/thinking/page/1/">thinking</a>
  <a class="tag" href="/tag/world/page/1/">world</a>
  </div>,
  'change,deep-thoughts,thinking,world'],
 [<span class="text" itemprop="text">‚ÄúIt is our choices, Harry, that show what we truly are, far more than our abilities.‚Äù</span>,
  <small class="author" itemprop="author">J.K. Rowling</small>,
  <div class="tags">
              Tags:
              <meta class="keywords" content="abilities,choices" itemprop="keywords"/>
  <a cl

#Accessing Varoius Href

In [41]:
author_1 = []

for sp in soup.find_all('div', class_='quote'):
    author = sp.find('small', class_='author')
    if author:
        author_1.append(author.text)

print(author_1)


['Albert Einstein', 'J.K. Rowling', 'Albert Einstein', 'Jane Austen', 'Marilyn Monroe', 'Albert Einstein', 'Andr√© Gide', 'Thomas A. Edison', 'Eleanor Roosevelt', 'Steve Martin']


In [42]:
author_1 = []

for sp in soup.find_all('div', class_='quote'):
    author = sp.find('small', class_='author')
    if author:
        author_1.append(author.text)
        print(author.text)


Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
Andr√© Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin


In [43]:
tags=[]
for tag in sp.find_all('a',class_='tag'):
  tags.append(tag.text)

In [44]:
tags=','.join(tags)

In [45]:
for sp in soup.find_all('div', class_='quote'):
  quote = sp.find('span', class_='text')
quote

<span class="text" itemprop="text">‚ÄúA day without sunshine is like, you know, night.‚Äù</span>

#Extracting all the info of author

In [47]:
import pandas as pd

df = pd.DataFrame({'Author': author_1})
print(df.head())


            Author
0  Albert Einstein
1     J.K. Rowling
2  Albert Einstein
3      Jane Austen
4   Marilyn Monroe


In [49]:
import requests
from bs4 import BeautifulSoup
import time

base_url = "https://quotes.toscrape.com"

res = requests.get(base_url)
soup = BeautifulSoup(res.text, "html.parser")

authors_data = []

for sp in soup.find_all("div", class_="quote"):

    author_name = sp.find("small", class_="author").text
    author_link = sp.find("a")["href"]   # author detail page link

    author_url = base_url + author_link
    author_res = requests.get(author_url)
    author_soup = BeautifulSoup(author_res.text, "html.parser")

    born_date = author_soup.find("span", class_="author-born-date").text
    born_place = author_soup.find("span", class_="author-born-location").text
    description = author_soup.find("div", class_="author-description").text.strip()

    authors_data.append({
        "Author": author_name,
        "Born Date": born_date,
        "Born Place": born_place,
        "Description": description
    })

    print(author_name)
    print(born_date)
    print(born_place)
    print(description)
    print("-" * 100)

    time.sleep(1)  # polite scraping


Albert Einstein
March 14, 1879
in Ulm, Germany
In 1879, Albert Einstein was born in Ulm, Germany. He completed his Ph.D. at the University of Zurich by 1909. His 1905 paper explaining the photoelectric effect, the basis of electronics, earned him the Nobel Prize in 1921. His first paper on Special Relativity Theory, also published in 1905, changed the world. After the rise of the Nazi party, Einstein made Princeton his permanent home, becoming a U.S. citizen in 1940. Einstein, a pacifist during World War I, stayed a firm proponent of social justice and responsibility. He chaired the Emergency Committee of Atomic Scientists, which organized to alert the public to the dangers of atomic warfare.At a symposium, he advised: "In their struggle for the ethical good, teachers of religion must have the stature to give up the doctrine of a personal God, that is, give up that source of fear and hope which in the past placed such vast power in the hands of priests. In their labors they will have t

In [60]:
import pandas as pd

df1 = pd.DataFrame(authors_data)
df1.to_csv("authors_info.csv", index=False)


In [61]:
df1

Unnamed: 0,Author,Born Date,Born Place,Description
0,Albert Einstein,"March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
1,J.K. Rowling,"July 31, 1965","in Yate, South Gloucestershire, England, The U...",See also: Robert GalbraithAlthough she writes ...
2,Albert Einstein,"March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
3,Jane Austen,"December 16, 1775","in Steventon Rectory, Hampshire, The United Ki...",Jane Austen was an English novelist whose work...
4,Marilyn Monroe,"June 01, 1926",in The United States,Marilyn Monroe (born Norma Jeane Mortenson; Ju...
5,Albert Einstein,"March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
6,Andr√© Gide,"November 22, 1869","in Paris, France",Andr√© Paul Guillaume Gide was a French author ...
7,Thomas A. Edison,"February 11, 1847","in Milan, Ohio, The United States","Thomas Alva Edison was an American inventor, s..."
8,Eleanor Roosevelt,"October 11, 1884",in The United States,Anna Eleanor Roosevelt was an American politic...
9,Steve Martin,"August 14, 1945","in Waco, Texas, The United States","Stephen Glenn ""Steve"" Martin is an American ac..."


#‚úÖ STEP 1: Create AUTHOR DETAILS DataFrame (no duplicates)

In [53]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

base_url = "https://quotes.toscrape.com"

res = requests.get(base_url)
soup = BeautifulSoup(res.text, "html.parser")

author_info = {}

for sp in soup.find_all("div", class_="quote"):

    author_name = sp.find("small", class_="author").text
    author_link = base_url + sp.find("a")["href"]

    if author_name not in author_info:   # avoid duplicates

        author_res = requests.get(author_link)
        author_soup = BeautifulSoup(author_res.text, "html.parser")

        author_info[author_name] = {
            "Author": author_name,
            "Born Date": author_soup.find("span", class_="author-born-date").text,
            "Born Place": author_soup.find("span", class_="author-born-location").text,
            "Description": author_soup.find("div", class_="author-description").text.strip()
        }

        time.sleep(1)

author_df = pd.DataFrame(author_info.values())


This code is designed to scrape author information from the first page of `quotes.toscrape.com`.

Here's a breakdown of what each part does:

1.  **Imports**: It imports necessary libraries: `requests` for making HTTP requests, `BeautifulSoup` for parsing HTML, `pandas` for data manipulation, and `time` for pausing execution.

2.  **`base_url`**: Defines the base URL of the website to be scraped.

3.  **Initial Request**:
    *   `res = requests.get(base_url)`: Fetches the HTML content of the base URL (the first page).
    *   `soup = BeautifulSoup(res.text, "html.parser")`: Parses the HTML content, making it easy to navigate and extract data.

4.  **`author_info` dictionary**: An empty dictionary is initialized to store unique author details. This helps avoid re-scraping the same author's page if they appear multiple times on the first page.

5.  **`for` loop (`for sp in soup.find_all("div", class_="quote")`)**:
    *   This loop iterates through each `div` element with the class `quote` on the page. Each `div` contains a single quote and its associated information.

6.  **Extracting Author Name and Link**:
    *   `author_name = sp.find("small", class_="author").text`: Extracts the name of the author.
    *   `author_link = base_url + sp.find("a")["href"]`: Constructs the full URL to the author's details page.

7.  **Author Info Caching and Scraping**:
    *   `if author_name not in author_info:`: Checks if the author's details have already been scraped and stored in `author_info`. This is an optimization to prevent redundant requests.
    *   If the author is new:
        *   `author_res = requests.get(author_link)`: Fetches the HTML content of the author's specific page.
        *   `author_soup = BeautifulSoup(author_res.text, "html.parser")`: Parses the author's page HTML.
        *   The `Born Date`, `Born Place`, and `Description` are extracted from the author's page.
        *   These details are stored in the `author_info` dictionary, with the author's name as the key.
        *   `time.sleep(1)`: A pause is added after scraping each new author's page to avoid overwhelming the server (polite scraping).

8.  **Creating DataFrame**:
    *   `author_df = pd.DataFrame(author_info.values())`: After iterating through all quotes on the first page and gathering unique author information, a Pandas DataFrame named `author_df` is created from the values in the `author_info` dictionary. Each unique author becomes a row in this DataFrame with their name, born date, born place, and description.

In [68]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

base_url = "https://quotes.toscrape.com"

res = requests.get(base_url)
soup = BeautifulSoup(res.text, "html.parser")

author_info = {}

for sp in soup.find_all("div", class_="quote"):

    author_name = sp.find("small", class_="author").text
    author_link = base_url + sp.find("a")["href"]

    if author_name not in author_info:   # avoid duplicates

        author_res = requests.get(author_link)
        author_soup = BeautifulSoup(author_res.text, "html.parser")

        author_info[author_name] = {
            "Author": author_name,
            "Born Date": author_soup.find("span", class_="author-born-date").text,
            "Born Place": author_soup.find("span", class_="author-born-location").text,
            "Description": author_soup.find("div", class_="author-description").text.strip()
        }

        time.sleep(1)

author_df = pd.DataFrame(author_info.values())

#‚úÖ STEP 2: Quotes DataFrame

In [54]:
quotes_data = []

for sp in soup.find_all("div", class_="quote"):

    quote_text = sp.find("span", class_="text").text.strip("‚Äú‚Äù")
    author_text = sp.find("small", class_="author").text
    tags = [tag.text for tag in sp.find_all("a", class_="tag")]

    quotes_data.append({
        "Quote": quote_text,
        "Author": author_text,
        "Tags": ", ".join(tags)
    })

quotes_df = pd.DataFrame(quotes_data)


#‚úÖ STEP 3: MERGE BOTH DATAFRAMES ‚≠ê (MOST IMPORTANT)

In [55]:
final_df = quotes_df.merge(author_df, on="Author", how="left")


In [56]:
final_df.to_csv("quotes_with_author_details.csv", index=False)
print("‚úÖ CSV created successfully!")
print(final_df.head())


‚úÖ CSV created successfully!
                                               Quote           Author  \
0  The world as we have created it is a process o...  Albert Einstein   
1  It is our choices, Harry, that show what we tr...     J.K. Rowling   
2  There are only two ways to live your life. One...  Albert Einstein   
3  The person, be it gentleman or lady, who has n...      Jane Austen   
4  Imperfection is beauty, madness is genius and ...   Marilyn Monroe   

                                           Tags          Born Date  \
0        change, deep-thoughts, thinking, world     March 14, 1879   
1                            abilities, choices      July 31, 1965   
2  inspirational, life, live, miracle, miracles     March 14, 1879   
3              aliteracy, books, classic, humor  December 16, 1775   
4                    be-yourself, inspirational      June 01, 1926   

                                          Born Place  \
0                                    in Ulm, Germany  

In [62]:
df

Unnamed: 0,Quote,Author,Tags
0,The world as we have created it is a process o...,Albert Einstein,"change, deep-thoughts, thinking, world"
1,"It is our choices, Harry, that show what we tr...",J.K. Rowling,"abilities, choices"
2,There are only two ways to live your life. One...,Albert Einstein,"inspirational, life, live, miracle, miracles"
3,"The person, be it gentleman or lady, who has n...",Jane Austen,"aliteracy, books, classic, humor"
4,"Imperfection is beauty, madness is genius and ...",Marilyn Monroe,"be-yourself, inspirational"
5,Try not to become a man of success. Rather bec...,Albert Einstein,"adulthood, success, value"
6,It is better to be hated for what you are than...,Andr√© Gide,"life, love"
7,"I have not failed. I've just found 10,000 ways...",Thomas A. Edison,"edison, failure, inspirational, paraphrased"
8,A woman is like a tea bag; you never know how ...,Eleanor Roosevelt,misattributed-eleanor-roosevelt
9,"A day without sunshine is like, you know, night.",Steve Martin,"humor, obvious, simile"


In [58]:
quotes_df

Unnamed: 0,Quote,Author,Tags
0,The world as we have created it is a process o...,Albert Einstein,"change, deep-thoughts, thinking, world"
1,"It is our choices, Harry, that show what we tr...",J.K. Rowling,"abilities, choices"
2,There are only two ways to live your life. One...,Albert Einstein,"inspirational, life, live, miracle, miracles"
3,"The person, be it gentleman or lady, who has n...",Jane Austen,"aliteracy, books, classic, humor"
4,"Imperfection is beauty, madness is genius and ...",Marilyn Monroe,"be-yourself, inspirational"
5,Try not to become a man of success. Rather bec...,Albert Einstein,"adulthood, success, value"
6,It is better to be hated for what you are than...,Andr√© Gide,"life, love"
7,"I have not failed. I've just found 10,000 ways...",Thomas A. Edison,"edison, failure, inspirational, paraphrased"
8,A woman is like a tea bag; you never know how ...,Eleanor Roosevelt,misattributed-eleanor-roosevelt
9,"A day without sunshine is like, you know, night.",Steve Martin,"humor, obvious, simile"


In [59]:
final_df

Unnamed: 0,Quote,Author,Tags,Born Date,Born Place,Description
0,The world as we have created it is a process o...,Albert Einstein,"change, deep-thoughts, thinking, world","March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
1,"It is our choices, Harry, that show what we tr...",J.K. Rowling,"abilities, choices","July 31, 1965","in Yate, South Gloucestershire, England, The U...",See also: Robert GalbraithAlthough she writes ...
2,There are only two ways to live your life. One...,Albert Einstein,"inspirational, life, live, miracle, miracles","March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
3,"The person, be it gentleman or lady, who has n...",Jane Austen,"aliteracy, books, classic, humor","December 16, 1775","in Steventon Rectory, Hampshire, The United Ki...",Jane Austen was an English novelist whose work...
4,"Imperfection is beauty, madness is genius and ...",Marilyn Monroe,"be-yourself, inspirational","June 01, 1926",in The United States,Marilyn Monroe (born Norma Jeane Mortenson; Ju...
5,Try not to become a man of success. Rather bec...,Albert Einstein,"adulthood, success, value","March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
6,It is better to be hated for what you are than...,Andr√© Gide,"life, love","November 22, 1869","in Paris, France",Andr√© Paul Guillaume Gide was a French author ...
7,"I have not failed. I've just found 10,000 ways...",Thomas A. Edison,"edison, failure, inspirational, paraphrased","February 11, 1847","in Milan, Ohio, The United States","Thomas Alva Edison was an American inventor, s..."
8,A woman is like a tea bag; you never know how ...,Eleanor Roosevelt,misattributed-eleanor-roosevelt,"October 11, 1884",in The United States,Anna Eleanor Roosevelt was an American politic...
9,"A day without sunshine is like, you know, night.",Steve Martin,"humor, obvious, simile","August 14, 1945","in Waco, Texas, The United States","Stephen Glenn ""Steve"" Martin is an American ac..."


#Scrapping Multiple pages

In [64]:
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd
import time

base_url = "https://quotes.toscrape.com"

Multiple_Pages = []
author_info = {}   # cache to avoid duplicate author scraping

for page in tqdm(range(1, 11)):
    link = f"{base_url}/page/{page}/"
    res = requests.get(link)
    soup = BeautifulSoup(res.text, "html.parser")

    for sp in soup.find_all('div', class_='quote'):

        # ---- Quote info ----
        quote = sp.find('span', class_='text').text.strip("‚Äú‚Äù")
        author = sp.find('small', class_='author').text

        tags = [tag.text for tag in sp.find_all('a', class_='tag')]
        tags = ",".join(tags)

        # ---- Author info (scrape once) ----
        if author not in author_info:
            author_link = base_url + sp.find("a")["href"]
            author_res = requests.get(author_link)
            author_soup = BeautifulSoup(author_res.text, "html.parser")

            author_info[author] = {
                "Born Date": author_soup.find("span", class_="author-born-date").text,
                "Born Place": author_soup.find("span", class_="author-born-location").text,
                "Description": author_soup.find("div", class_="author-description").text.strip()
            }

            time.sleep(1)  # polite scraping

        # ---- Append final data ----
        Multiple_Pages.append([
            quote,
            author,
            tags,
            author_info[author]["Born Date"],
            author_info[author]["Born Place"],
            author_info[author]["Description"]
        ])


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:52<00:00,  5.21s/it]


This code is designed to scrape quotes and author information from multiple pages of a website, specifically `quotes.toscrape.com`.

Here's a breakdown of what each part does:

1.  **Imports**: It imports necessary libraries: `requests` for making HTTP requests, `BeautifulSoup` for parsing HTML, `tqdm` for showing progress bars, `pandas` for data manipulation, and `time` for pausing execution.

2.  **`base_url`**: Defines the base URL of the website to be scraped.

3.  **`Multiple_Pages` list**: An empty list to store all the scraped data (quotes, authors, tags, and author details).

4.  **`author_info` dictionary**: Used as a cache to store author details once they've been scraped. This prevents re-scraping the same author's page multiple times if they have multiple quotes on different pages.

5.  **Outer `for` loop (`for page in tqdm(range(1, 11))`)**:
    *   This loop iterates through pages 1 to 10 of the website.
    *   `tqdm` provides a progress bar, which is helpful for long-running scraping tasks.
    *   Inside the loop, it constructs the URL for each page (`link`).
    *   `requests.get(link)` fetches the HTML content of the page.
    *   `BeautifulSoup(res.text, "html.parser")` parses the HTML content, making it easy to navigate and extract data.

6.  **Inner `for` loop (`for sp in soup.find_all('div', class_='quote')`)**:
    *   This loop iterates through each `div` element with the class `quote` on the current page, as each `div` contains a single quote and its associated information.

7.  **Extracting Quote Info**:
    *   `quote = sp.find('span', class_='text').text.strip("‚Äú‚Äù")`: Extracts the text of the quote, removing surrounding quotation marks.
    *   `author = sp.find('small', class_='author').text`: Extracts the name of the author.
    *   `tags = [tag.text for tag in sp.find_all('a', class_='tag')]`: Extracts all associated tags for the quote and stores them in a list.
    *   `tags = ",".join(tags)`: Joins the list of tags into a single comma-separated string.

8.  **Extracting Author Info (with caching)**:
    *   `if author not in author_info:`: This is a crucial optimization. It checks if the author's details have already been scraped and stored in the `author_info` cache.
    *   If the author is not in the cache:
        *   `author_link = base_url + sp.find("a")["href"]`: Constructs the full URL to the author's details page.
        *   `requests.get(author_link)` fetches the HTML content of the author's page.
        *   `BeautifulSoup(author_res.text, "html.parser")` parses the author's page HTML.
        *   It then extracts the `Born Date`, `Born Place`, and `Description` from the author's page.
        *   These details are stored in the `author_info` dictionary, using the author's name as the key.
        *   `time.sleep(1)`: A pause is added after scraping each new author's page to avoid overwhelming the server (polite scraping).

9.  **Appending Final Data (`Multiple_Pages.append(...)`)**:
    *   Finally, all the extracted information for the current quote (quote text, author, tags, and the author's cached born date, born place, and description) is appended as a list to the `Multiple_Pages` list.

This script effectively navigates through multiple pages, extracts specific data, and optimizes by only scraping detailed author information once.

In [67]:
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd
import time

base_url = "https://quotes.toscrape.com"

Multiple_Pages = []
author_info = {}   # cache to avoid duplicate author scraping

for page in tqdm(range(1, 11)):
    link = f"{base_url}/page/{page}/"
    res = requests.get(link)
    soup = BeautifulSoup(res.text, "html.parser")

    for sp in soup.find_all('div', class_='quote'):

        # ---- Quote info ----
        quote = sp.find('span', class_='text').text.strip("‚Äú‚Äù")
        author = sp.find('small', class_='author').text

        tags = [tag.text for tag in sp.find_all('a', class_='tag')]
        tags = ",".join(tags)

        # ---- Author info (scrape once) ----
        if author not in author_info:
            author_link = base_url + sp.find("a")["href"]
            author_res = requests.get(author_link)
            author_soup = BeautifulSoup(author_res.text, "html.parser")

            author_info[author] = {
                "Born Date": author_soup.find("span", class_="author-born-date").text,
                "Born Place": author_soup.find("span", class_="author-born-location").text,
                "Description": author_soup.find("div", class_="author-description").text.strip()
            }

            time.sleep(1)  # polite scraping

        # ---- Append final data ----
        Multiple_Pages.append([
            quote,
            author,
            tags,
            author_info[author]["Born Date"],
            author_info[author]["Born Place"],
            author_info[author]["Description"]
        ])

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:51<00:00,  5.20s/it]


In [65]:
df2 = pd.DataFrame(
    Multiple_Pages,
    columns=[
        "Quote",
        "Author",
        "Tags",
        "Born Date",
        "Born Place",
        "Description"
    ]
)

df2.to_csv("quotes_multiple_pages.csv", index=False)
print("‚úÖ CSV file created successfully")
print(df2.head())


‚úÖ CSV file created successfully
                                               Quote           Author  \
0  The world as we have created it is a process o...  Albert Einstein   
1  It is our choices, Harry, that show what we tr...     J.K. Rowling   
2  There are only two ways to live your life. One...  Albert Einstein   
3  The person, be it gentleman or lady, who has n...      Jane Austen   
4  Imperfection is beauty, madness is genius and ...   Marilyn Monroe   

                                       Tags          Born Date  \
0       change,deep-thoughts,thinking,world     March 14, 1879   
1                         abilities,choices      July 31, 1965   
2  inspirational,life,live,miracle,miracles     March 14, 1879   
3             aliteracy,books,classic,humor  December 16, 1775   
4                 be-yourself,inspirational      June 01, 1926   

                                          Born Place  \
0                                    in Ulm, Germany   
1  in Yate, South 

In [66]:
df2

Unnamed: 0,Quote,Author,Tags,Born Date,Born Place,Description
0,The world as we have created it is a process o...,Albert Einstein,"change,deep-thoughts,thinking,world","March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
1,"It is our choices, Harry, that show what we tr...",J.K. Rowling,"abilities,choices","July 31, 1965","in Yate, South Gloucestershire, England, The U...",See also: Robert GalbraithAlthough she writes ...
2,There are only two ways to live your life. One...,Albert Einstein,"inspirational,life,live,miracle,miracles","March 14, 1879","in Ulm, Germany","In 1879, Albert Einstein was born in Ulm, Germ..."
3,"The person, be it gentleman or lady, who has n...",Jane Austen,"aliteracy,books,classic,humor","December 16, 1775","in Steventon Rectory, Hampshire, The United Ki...",Jane Austen was an English novelist whose work...
4,"Imperfection is beauty, madness is genius and ...",Marilyn Monroe,"be-yourself,inspirational","June 01, 1926",in The United States,Marilyn Monroe (born Norma Jeane Mortenson; Ju...
...,...,...,...,...,...,...
95,You never really understand a person until you...,Harper Lee,better-life-empathy,"April 28, 1926","in Monroeville, Alabama, The United States","Harper Lee, known as Nelle, was born in the Al..."
96,You have to write the book that wants to be wr...,Madeleine L'Engle,"books,children,difficult,grown-ups,write,write...","November 29, 1918","in New York City, New York, The United States",Madeleine L'Engle was an American writer best ...
97,Never tell the truth to people who are not wor...,Mark Twain,truth,"November 30, 1835","in Florida, Missouri, The United States","Samuel Langhorne Clemens, better known by his ..."
98,"A person's a person, no matter how small.",Dr. Seuss,inspirational,"March 02, 1904","in Springfield, MA, The United States",Theodor Seuss Geisel was born 2 March 1904 in ...


#Book Scrapping

In [69]:
import requests
from bs4 import BeautifulSoup
link='http://books.toscrape.com/'
res=requests.get(link)
soup=BeautifulSoup(res.text,'html.parser')

In [70]:
book=soup.find_all('li',class_='col-xs-6 col-sm-4 col-md-3 col-lg-3')

In [71]:
data=[]
for sp in soup.find_all('li',class_='col-xs-6 col-sm-4 col-md-3 col-lg-3'):
  title=sp.find('h3').text
  book_link=sp.find('a')['href']
  price=sp.find('p',class_='price_color').text
  rating=sp.find('p',class_='star-rating')['class'][1]
  stock=sp.find('p',class_='instock availability').text.strip()
  data.append([title,book_link,price,rating,stock])

In [72]:
data

[['A Light in the ...',
  'catalogue/a-light-in-the-attic_1000/index.html',
  '√Ç¬£51.77',
  'Three',
  'In stock'],
 ['Tipping the Velvet',
  'catalogue/tipping-the-velvet_999/index.html',
  '√Ç¬£53.74',
  'One',
  'In stock'],
 ['Soumission',
  'catalogue/soumission_998/index.html',
  '√Ç¬£50.10',
  'One',
  'In stock'],
 ['Sharp Objects',
  'catalogue/sharp-objects_997/index.html',
  '√Ç¬£47.82',
  'Four',
  'In stock'],
 ['Sapiens: A Brief History ...',
  'catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
  '√Ç¬£54.23',
  'Five',
  'In stock'],
 ['The Requiem Red',
  'catalogue/the-requiem-red_995/index.html',
  '√Ç¬£22.65',
  'One',
  'In stock'],
 ['The Dirty Little Secrets ...',
  'catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html',
  '√Ç¬£33.34',
  'Four',
  'In stock'],
 ['The Coming Woman: A ...',
  'catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html',
  '√Ç¬£17.93',
  'Three',

In [73]:
data[0]

['A Light in the ...',
 'catalogue/a-light-in-the-attic_1000/index.html',
 '√Ç¬£51.77',
 'Three',
 'In stock']

#Scrapping multiple pages

In [75]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm

BASE_URL = "https://books.toscrape.com/catalogue/page-{}.html"

data3 = []

for page in tqdm(range(1, 51)):   # 50 pages available

    url = BASE_URL.format(page)
    res = requests.get(url)

    if res.status_code != 200:
        break

    soup = BeautifulSoup(res.text, "html.parser")

    for sp in soup.find_all('li', class_='col-xs-6 col-sm-4 col-md-3 col-lg-3'):

        title = sp.find('h3').find('a')['title']

        book_link = sp.find('a')['href']
        book_link = "https://books.toscrape.com/catalogue/" + book_link

        price = sp.find('p', class_='price_color').text

        rating = sp.find('p', class_='star-rating')['class'][1]

        stock = sp.find('p', class_='instock availability').text.strip()

        data3.append([title, book_link, price, rating, stock])


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [00:03<00:00, 14.19it/s]


In [76]:
data3

[['A Light in the Attic',
  'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
  '√Ç¬£51.77',
  'Three',
  'In stock'],
 ['Tipping the Velvet',
  'https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html',
  '√Ç¬£53.74',
  'One',
  'In stock'],
 ['Soumission',
  'https://books.toscrape.com/catalogue/soumission_998/index.html',
  '√Ç¬£50.10',
  'One',
  'In stock'],
 ['Sharp Objects',
  'https://books.toscrape.com/catalogue/sharp-objects_997/index.html',
  '√Ç¬£47.82',
  'Four',
  'In stock'],
 ['Sapiens: A Brief History of Humankind',
  'https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
  '√Ç¬£54.23',
  'Five',
  'In stock'],
 ['The Requiem Red',
  'https://books.toscrape.com/catalogue/the-requiem-red_995/index.html',
  '√Ç¬£22.65',
  'One',
  'In stock'],
 ['The Dirty Little Secrets of Getting Your Dream Job',
  'https://books.toscrape.com/catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/i

In [77]:
df3 = pd.DataFrame(
    data,
    columns=["Title", "Book Link", "Price", "Rating", "Stock"]
)

df3.to_csv("books_multiple_pages.csv", index=False)

print("‚úÖ Scraping completed!")
print(df3.head())


‚úÖ Scraping completed!
                                   Title  \
0                   A Light in the Attic   
1                     Tipping the Velvet   
2                             Soumission   
3                          Sharp Objects   
4  Sapiens: A Brief History of Humankind   

                                           Book Link    Price Rating     Stock  
0  https://books.toscrape.com/catalogue/a-light-i...  √Ç¬£51.77  Three  In stock  
1  https://books.toscrape.com/catalogue/tipping-t...  √Ç¬£53.74    One  In stock  
2  https://books.toscrape.com/catalogue/soumissio...  √Ç¬£50.10    One  In stock  
3  https://books.toscrape.com/catalogue/sharp-obj...  √Ç¬£47.82   Four  In stock  
4  https://books.toscrape.com/catalogue/sapiens-a...  √Ç¬£54.23   Five  In stock  


In [78]:
df3

Unnamed: 0,Title,Book Link,Price,Rating,Stock
0,A Light in the Attic,https://books.toscrape.com/catalogue/a-light-i...,√Ç¬£51.77,Three,In stock
1,Tipping the Velvet,https://books.toscrape.com/catalogue/tipping-t...,√Ç¬£53.74,One,In stock
2,Soumission,https://books.toscrape.com/catalogue/soumissio...,√Ç¬£50.10,One,In stock
3,Sharp Objects,https://books.toscrape.com/catalogue/sharp-obj...,√Ç¬£47.82,Four,In stock
4,Sapiens: A Brief History of Humankind,https://books.toscrape.com/catalogue/sapiens-a...,√Ç¬£54.23,Five,In stock
...,...,...,...,...,...
995,Alice in Wonderland (Alice's Adventures in Won...,https://books.toscrape.com/catalogue/alice-in-...,√Ç¬£55.53,One,In stock
996,"Ajin: Demi-Human, Volume 1 (Ajin: Demi-Human #1)",https://books.toscrape.com/catalogue/ajin-demi...,√Ç¬£57.06,Four,In stock
997,A Spy's Devotion (The Regency Spies of London #1),https://books.toscrape.com/catalogue/a-spys-de...,√Ç¬£16.97,Five,In stock
998,1st to Die (Women's Murder Club #1),https://books.toscrape.com/catalogue/1st-to-di...,√Ç¬£53.98,One,In stock
