# BeautifulSoup

BeautifulSoup is a Python library used for web scraping purposes to pull the data out of HTML and XML files. 
It provides Pythonic idioms for iterating, searching, and modifying the parse tree. 
Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.

# Requests

In web scraping, making HTTP requests to fetch the HTML content of a webpage is a fundamental step. 
The requests library in Python is commonly used for this purpose. 

In [1]:
import requests
from bs4 import BeautifulSoup as bs

In [2]:
# Getting the title of the webpage
url= 'https://en.wikipedia.org/wiki/List_of_highest-grossing_Nigerian_films'
response = requests.get(url)
if response.status_code == 200:
    soup = bs(response.content, 'html.parser')
    title = soup.title.text
    print(f"Title: {title}")
else:
    print(f"Error: {response.status_code}")

Title: List of highest-grossing Nigerian films - Wikipedia


In [3]:
r=requests.get(url)
# convert to a beautiful soup object
soup=bs(r.text,'html')
print(soup)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-zebra-design-enabled vector-feature-custom-font-size-clientpref-0 vector-feature-client-preferences-disabled vector-feature-client-prefs-pinned-disabled vector-toc-available" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of highest-grossing Nigerian films - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-f

In [4]:
table= soup.find('table')  # one table to consider
print(table)

<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Title
</th>
<th>Year
</th>
<th>Domestic Gross
</th>
<th>Studio(s)
</th>
<th>Director(s)
</th></tr>
<tr>
<td>1
</td>
<td><i><a href="/wiki/Battle_on_Buka_Street" title="Battle on Buka Street">Battle on Buka Street</a></i>
</td>
<td>2022
</td>
<td>₦668,423,056<sup class="reference" id="cite_ref-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria_1-0"><a href="#cite_note-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria-1">[1]</a></sup>
</td>
<td>Funke Ayotunde Akindele Network / FilmOne
</td>
<td><a href="/wiki/Funke_Akindele" title="Funke Akindele">Funke Akindele</a>, Tobi Makinde
</td></tr>
<tr>
<td>2
</td>
<td><i><a href="/wiki/Omo_Ghetto:_The_Saga" title="Omo Ghetto: The Saga">Omo Ghetto: The Saga</a></i>
</td>
<td>2020
</td>
<td>₦636,129,120<sup class="reference" id="cite_ref-Top_20_films_9th_15th_April_2021_-_Cinema_Exhibitors_Association_of_

</tbody></table>


In [5]:
table1= soup.find_all('table')
table1

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Rank
 </th>
 <th>Title
 </th>
 <th>Year
 </th>
 <th>Domestic Gross
 </th>
 <th>Studio(s)
 </th>
 <th>Director(s)
 </th></tr>
 <tr>
 <td>1
 </td>
 <td><i><a href="/wiki/Battle_on_Buka_Street" title="Battle on Buka Street">Battle on Buka Street</a></i>
 </td>
 <td>2022
 </td>
 <td>₦668,423,056<sup class="reference" id="cite_ref-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria_1-0"><a href="#cite_note-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria-1">[1]</a></sup>
 </td>
 <td>Funke Ayotunde Akindele Network / FilmOne
 </td>
 <td><a href="/wiki/Funke_Akindele" title="Funke Akindele">Funke Akindele</a>, Tobi Makinde
 </td></tr>
 <tr>
 <td>2
 </td>
 <td><i><a href="/wiki/Omo_Ghetto:_The_Saga" title="Omo Ghetto: The Saga">Omo Ghetto: The Saga</a></i>
 </td>
 <td>2020
 </td>
 <td>₦636,129,120<sup class="reference" id="cite_ref-Top_20_films_9th_15th_April_2021_

In [6]:
soup.find('table',class_="wikitable sortable") # just one table being the first

<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Title
</th>
<th>Year
</th>
<th>Domestic Gross
</th>
<th>Studio(s)
</th>
<th>Director(s)
</th></tr>
<tr>
<td>1
</td>
<td><i><a href="/wiki/Battle_on_Buka_Street" title="Battle on Buka Street">Battle on Buka Street</a></i>
</td>
<td>2022
</td>
<td>₦668,423,056<sup class="reference" id="cite_ref-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria_1-0"><a href="#cite_note-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria-1">[1]</a></sup>
</td>
<td>Funke Ayotunde Akindele Network / FilmOne
</td>
<td><a href="/wiki/Funke_Akindele" title="Funke Akindele">Funke Akindele</a>, Tobi Makinde
</td></tr>
<tr>
<td>2
</td>
<td><i><a href="/wiki/Omo_Ghetto:_The_Saga" title="Omo Ghetto: The Saga">Omo Ghetto: The Saga</a></i>
</td>
<td>2020
</td>
<td>₦636,129,120<sup class="reference" id="cite_ref-Top_20_films_9th_15th_April_2021_-_Cinema_Exhibitors_Association_of_

In [7]:
table=soup.find_all('table')[0]

In [8]:
print(table)

<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Title
</th>
<th>Year
</th>
<th>Domestic Gross
</th>
<th>Studio(s)
</th>
<th>Director(s)
</th></tr>
<tr>
<td>1
</td>
<td><i><a href="/wiki/Battle_on_Buka_Street" title="Battle on Buka Street">Battle on Buka Street</a></i>
</td>
<td>2022
</td>
<td>₦668,423,056<sup class="reference" id="cite_ref-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria_1-0"><a href="#cite_note-Top_20_films_30th_December_-_1st_Januayr_2023_-_Cinema_Exhibitors_Association_of_Nigeria-1">[1]</a></sup>
</td>
<td>Funke Ayotunde Akindele Network / FilmOne
</td>
<td><a href="/wiki/Funke_Akindele" title="Funke Akindele">Funke Akindele</a>, Tobi Makinde
</td></tr>
<tr>
<td>2
</td>
<td><i><a href="/wiki/Omo_Ghetto:_The_Saga" title="Omo Ghetto: The Saga">Omo Ghetto: The Saga</a></i>
</td>
<td>2020
</td>
<td>₦636,129,120<sup class="reference" id="cite_ref-Top_20_films_9th_15th_April_2021_-_Cinema_Exhibitors_Association_of_

In [9]:
table.find_all('th')

[<th>Rank
 </th>,
 <th>Title
 </th>,
 <th>Year
 </th>,
 <th>Domestic Gross
 </th>,
 <th>Studio(s)
 </th>,
 <th>Director(s)
 </th>]

In [10]:
columns1 = table.find_all('th')
column_names=[c.text.strip() for c in columns1]
print(column_names)

['Rank', 'Title', 'Year', 'Domestic Gross', 'Studio(s)', 'Director(s)']


In [11]:
import pandas as pd
df = pd.DataFrame(columns=column_names)
df

Unnamed: 0,Rank,Title,Year,Domestic Gross,Studio(s),Director(s)


In [12]:
column_data=table.find_all('tr')

In [13]:
l=[]   # Empty list
for row in column_data[1:]:
    row_data = row.find_all('td')
    Individual_row_data =[data.text.strip() for data in row_data]
    l.append(Individual_row_data)   # Adding an empty list to the data to make room for an index
df = pd.DataFrame(l,columns=column_names)
df

Unnamed: 0,Rank,Title,Year,Domestic Gross,Studio(s),Director(s)
0,1,Battle on Buka Street,2022,"₦668,423,056[1]",Funke Ayotunde Akindele Network / FilmOne,"Funke Akindele, Tobi Makinde"
1,2,Omo Ghetto: The Saga,2020,"₦636,129,120[2]",SceneOne Productions,Funke Akindele. JJC Skillz
2,3,The Wedding Party,2016,"₦452,288,605[3]",Ebonylife Films / FilmOne / Inkblot Production...,Kemi Adetiba
3,4,The Wedding Party 2,2017,"₦433,197,377[4]",Ebonylife Films / FilmOne / Inkblot Production...,Niyi Akinmolayan
4,5,Chief Daddy,2018,"₦387,540,749[5]",EbonyLife Films,Niyi Akinmolayan
...,...,...,...,...,...,...
95,96,Elevator Baby,2019,"₦30,595,550[20]",Anthill Studios,Akay Mason
96,97,The Bride Price,2023,"₦30,585,205",AstraTv Africa,Ikechukwu Oku
97,98,The Perfect Arrangement,2022,"₦30,515,400[7]",Inkblot Productions,Chinaza Onuzu
98,99,Muna,2019,"₦30,456,996[60]",KevStel Productions,Kevin Nwankwor


In [14]:
table3=soup.find_all('table')[2]
table3

<table class="wikitable sortable">
<tbody><tr>
<th>Year
</th>
<th>Title
</th>
<th>Domestic Gross
</th></tr>
<tr>
<td>2023
</td>
<td><i>Orisa</i>
</td>
<td>₦170,048,475
</td></tr>
<tr>
<td>2022
</td>
<td><i>Battle on Buka Street</i>
</td>
<td>₦668,423,056
</td></tr>
<tr>
<td>2021
</td>
<td><i>Christmas in Miami</i>
</td>
<td>₦265,583,000
</td></tr>
<tr>
<td>2020
</td>
<td><i>Omo Ghetto: The Saga</i>
</td>
<td>₦636,129,120
</td></tr>
<tr>
<td>2019
</td>
<td><i>Sugar Rush</i>
</td>
<td>₦287,053,270
</td></tr>
<tr>
<td>2018
</td>
<td><i>Chief Daddy</i>
</td>
<td>₦387,540,749
</td></tr>
<tr>
<td>2017
</td>
<td><i>The Wedding Party</i> 2
</td>
<td>₦433,197,337
</td></tr>
<tr>
<td>2016
</td>
<td><i>The Wedding Party</i>
</td>
<td>₦452,288,605
</td></tr>
<tr>
<td>2015
</td>
<td><i>Fifty</i>
</td>
<td>₦80,030,500<sup class="reference" id="cite_ref-instagram.com_25-1"><a href="#cite_note-instagram.com-25">[25]</a></sup>
</td></tr>
<tr>
<td>2014
</td>
<td><i>30 Days in Atlanta</i>
</td>
<td>₦163,

In [15]:
table3.find_all('th')

[<th>Year
 </th>,
 <th>Title
 </th>,
 <th>Domestic Gross
 </th>]

In [16]:
columns_table3 = table3.find_all('th')
column_of_table3=[c.text.strip() for c in columns_table3]
print(column_of_table3)

['Year', 'Title', 'Domestic Gross']


In [17]:
df3 = pd.DataFrame(columns=column_of_table3)
df3

Unnamed: 0,Year,Title,Domestic Gross


In [18]:
column_data_of_table3=table3.find_all('tr')
#l=[]
for row3 in column_data_of_table3[1:]:
    row_data3 = row3.find_all('td')
    Individual_row_data3 =[data3.text.strip() for data3 in row_data3]
    Individual_row_data3

In [19]:
column_data_of_table3

[<tr>
 <th>Year
 </th>
 <th>Title
 </th>
 <th>Domestic Gross
 </th></tr>,
 <tr>
 <td>2023
 </td>
 <td><i>Orisa</i>
 </td>
 <td>₦170,048,475
 </td></tr>,
 <tr>
 <td>2022
 </td>
 <td><i>Battle on Buka Street</i>
 </td>
 <td>₦668,423,056
 </td></tr>,
 <tr>
 <td>2021
 </td>
 <td><i>Christmas in Miami</i>
 </td>
 <td>₦265,583,000
 </td></tr>,
 <tr>
 <td>2020
 </td>
 <td><i>Omo Ghetto: The Saga</i>
 </td>
 <td>₦636,129,120
 </td></tr>,
 <tr>
 <td>2019
 </td>
 <td><i>Sugar Rush</i>
 </td>
 <td>₦287,053,270
 </td></tr>,
 <tr>
 <td>2018
 </td>
 <td><i>Chief Daddy</i>
 </td>
 <td>₦387,540,749
 </td></tr>,
 <tr>
 <td>2017
 </td>
 <td><i>The Wedding Party</i> 2
 </td>
 <td>₦433,197,337
 </td></tr>,
 <tr>
 <td>2016
 </td>
 <td><i>The Wedding Party</i>
 </td>
 <td>₦452,288,605
 </td></tr>,
 <tr>
 <td>2015
 </td>
 <td><i>Fifty</i>
 </td>
 <td>₦80,030,500<sup class="reference" id="cite_ref-instagram.com_25-1"><a href="#cite_note-instagram.com-25">[25]</a></sup>
 </td></tr>,
 <tr>
 <td>2014
 </td>
 <td

In [20]:
Individual_row_data3

['2009', 'The Figurine', '₦30,000,000']

In [21]:
l=[]
for row3 in column_data_of_table3[1:]:
    row_data3 = row3.find_all('td')
    Individual_row_data3 =[data3.text.strip() for data3 in row_data3]
    Individual_row_data3
    l.append(Individual_row_data3)
df3 = pd.DataFrame(l,columns=column_of_table3)
df3

Unnamed: 0,Year,Title,Domestic Gross
0,2023,Orisa,"₦170,048,475"
1,2022,Battle on Buka Street,"₦668,423,056"
2,2021,Christmas in Miami,"₦265,583,000"
3,2020,Omo Ghetto: The Saga,"₦636,129,120"
4,2019,Sugar Rush,"₦287,053,270"
5,2018,Chief Daddy,"₦387,540,749"
6,2017,The Wedding Party 2,"₦433,197,337"
7,2016,The Wedding Party,"₦452,288,605"
8,2015,Fifty,"₦80,030,500[25]"
9,2014,30 Days in Atlanta,"₦163,351,300"


In [22]:
# Export DataFrames to Excel using ExcelWriter
with pd.ExcelWriter(r'C:\Users\Victoria\Documents\Movie.xlsx') as excel_writer:
    df.to_excel(excel_writer, sheet_name='Sheet1', index=False)
    df3.to_excel(excel_writer, sheet_name='Sheet3', index=False)