# **Kehinde Adeola Dada**
# **Web Scraping Assignment - Nigeria's Largest Companies**

### **Description:**
The goal of this project is to scrape detailed information about Nigeria’s largest companies based on their revenue and market capitalization as of 2024. This data can be utilized for various purposes such as market analysis, economic research, or business intelligence. The companies span multiple industries, with oil and gas, telecommunications, and agroindustry being dominant sectors.

In [4]:
# Importing the libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [27]:
url = 'https://en.wikipedia.org/wiki/List_of_largest_companies_in_Nigeria'
url

'https://en.wikipedia.org/wiki/List_of_largest_companies_in_Nigeria'

In [29]:
resp = requests.get(url)
resp.status_code

200

In [31]:
# Passing the HTML from the response using BeautifulSoup
soup = BeautifulSoup(resp.text, 'html.parser')

## **Revenue Based - Largest Companies in Nigeria**
### **Scrape the Largest Companies in Nigeria based on Revenue**

revenue table tag = <table class="wikitable sortable" 

In [42]:
# Getting the project title
revenue_table_span = soup.find_all('table', class_="wikitable sortable")
revenue_table_span

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Rank
 </th>
 <th>Company
 </th>
 <th>Industry
 </th>
 <th>Revenue<br/>(US$ millions)
 </th>
 <th>Profits<br/>(US$ millions)
 </th></tr>
 <tr>
 <td>1
 </td>
 <td><a class="mw-redirect" href="/wiki/Nigeria_National_Petroleum_Corporation" title="Nigeria National Petroleum Corporation">Nigeria National Petroleum</a>
 </td>
 <td>Oil and gas
 </td>
 <td>9,706
 </td>
 <td>1,877
 </td></tr>
 <tr>
 <td>2
 </td>
 <td><a class="mw-redirect" href="/wiki/Nigeria_Liquefied_Natural_Gas" title="Nigeria Liquefied Natural Gas">Nigeria Liquefied Natural Gas</a>
 </td>
 <td>Oil and gas
 </td>
 <td>6,315
 </td>
 <td>...
 </td></tr>
 <tr>
 <td>3
 </td>
 <td><a class="mw-redirect" href="/wiki/MTN_Nigeria" title="MTN Nigeria">MTN Nigeria</a>
 </td>
 <td>Telecommunications
 </td>
 <td>3,514
 </td>
 <td>536
 </td></tr>
 <tr>
 <td>4
 </td>
 <td><a href="/wiki/Dangote_Cement" title="Dangote Cement">Dangote Cement</a>
 </td>
 <td>Cement
 </td>
 <td>2,699
 </td>

In [46]:
# Getting the revenue headers with tag
revenue_headers = revenue_table_span[0].find_all('th')
revenue_headers

[<th>Rank
 </th>,
 <th>Company
 </th>,
 <th>Industry
 </th>,
 <th>Revenue<br/>(US$ millions)
 </th>,
 <th>Profits<br/>(US$ millions)
 </th>]

In [68]:
# Getting the revenue headers without tag
revenue_titles = []

for header in revenue_headers:
    text = header.get_text().strip()
    revenue_titles.append(text)
revenue_titles

['Rank',
 'Company',
 'Industry',
 'Revenue(US$ millions)',
 'Profits(US$ millions)']

In [76]:
# Extracting the rows excluding the header
rows = revenue_table_span[0].find_all('tr')[1:]

# Initializing empty list to store the rows
revenue_data = []

# Getting the columns
for row in rows:
    columns = row.find_all('td')

    # Extracting text from column and strip any trailing spaces
    row_data = [column.get_text().strip() for column in columns]

    # Combining header titles and row data
    revenue_data.append(dict(zip(revenue_titles, row_data)))

print(revenue_data)

[{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '9,706', 'Profits(US$ millions)': '1,877'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '6,315', 'Profits(US$ millions)': '...'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '3,514', 'Profits(US$ millions)': '536'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '2,699', 'Profits(US$ millions)': '721'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '2,686', 'Profits(US$ millions)': '219'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '2,014', 'Profits(US$ millions)': '67'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '1,503', 'Profits(US$ m

In [56]:
# Storing revenue data in a DataFrame
revn_data = [{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '9,706', 'Profits(US$ millions)': '1,877'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '6,315', 'Profits(US$ millions)': '...'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '3,514', 'Profits(US$ millions)': '536'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '2,699', 'Profits(US$ millions)': '721'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '2,686', 'Profits(US$ millions)': '219'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '2,014', 'Profits(US$ millions)': '67'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '1,503', 'Profits(US$ millions)': '343'}, {'Rank': '8', 'Company': 'Nigerian Breweries', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '890', 'Profits(US$ millions)': '19'}, {'Rank': '9', 'Company': 'Jumia', 'Industry': 'Retail', 'Revenue(US$ millions)': '837', 'Profits(US$ millions)': '...'}, {'Rank': '10', 'Company': 'Nestle Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '749', 'Profits(US$ millions)': '102'}, {'Rank': '11', 'Company': 'Krystal Digital Network Solutions', 'Industry': 'Infotech', 'Revenue(US$ millions)': '678', 'Profits(US$ millions)': '21'}, {'Rank': '12', 'Company': 'Julius Berger', 'Industry': 'Construction', 'Revenue(US$ millions)': '631', 'Profits(US$ millions)': '3'}, {'Rank': '13', 'Company': 'Nigerian Bottling Company', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '627', 'Profits(US$ millions)': '...'}, {'Rank': '14', 'Company': 'Lafarge Africa', 'Industry': 'Cement', 'Revenue(US$ millions)': '602', 'Profits(US$ millions)': '97'}, {'Rank': '15', 'Company': 'Dangote Sugar Refinery', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '559', 'Profits(US$ millions)': '78'}, {'Rank': '16', 'Company': 'BUA Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '547', 'Profits(US$ millions)': '184'}, {'Rank': '17', 'Company': 'TotalEnergies Nigeria', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '534', 'Profits(US$ millions)': '5'}, {'Rank': '18', 'Company': 'Seplat Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '498', 'Profits(US$ millions)': '−80'}, {'Rank': '19', 'Company': 'Ardova Plc', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '474', 'Profits(US$ millions)': '5'}, {'Rank': '20', 'Company': '11PLC', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '428', 'Profits(US$ millions)': '16'}, {'Rank': '21', 'Company': 'International Breweries plc', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '357', 'Profits(US$ millions)': '−32'}, {'Rank': '22', 'Company': 'Conoil', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '307', 'Profits(US$ millions)': '...'}, {'Rank': '23', 'Company': 'Honeywell Flour Mill', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '286', 'Profits(US$ millions)': '3'}, {'Rank': '24', 'Company': 'PZ Cussons Nigeria', 'Industry': 'Consumer goods', 'Revenue(US$ millions)': '216', 'Profits(US$ millions)': '4'}, {'Rank': '25', 'Company': 'UAC of Nigeria', 'Industry': 'Conglomerate', 'Revenue(US$ millions)': '213', 'Profits(US$ millions)': '11'}]
large_revenue_data = pd.DataFrame(revn_data)
large_revenue_data.head()

Unnamed: 0,Rank,Company,Industry,Revenue(US$ millions),Profits(US$ millions)
0,1,Nigeria National Petroleum,Oil and gas,9706,1877
1,2,Nigeria Liquefied Natural Gas,Oil and gas,6315,...
2,3,MTN Nigeria,Telecommunications,3514,536
3,4,Dangote Cement,Cement,2699,721
4,5,Nigerian Petroleum Development,Oil and gas,2686,219


In [84]:
# Converting to .csv and store in .csv file
large_revenue_data.to_csv('largest_companies_revenue_2024.csv', index=False)

## **Market Capitalization Based - Largest Companies in Nigeria**
### **Scrape the Largest Companies in Nigeria based on market capitalization**

market capitalization table tag = <table class="wikitable sortable" 

In [66]:
# Getting the market capitalization header with tag
market_cap_header = revenue_table_span[1].find_all('th')
market_cap_header

[<th>Rank
 </th>,
 <th>Company
 </th>,
 <th>Industry
 </th>,
 <th>Market cap<br/>(US$ millions)
 </th>]

In [74]:
# Getting the market capitalization header without tag
market_cap_titles = []

for header in market_cap_header:
    text = header.get_text().strip()
    market_cap_titles.append(text)
market_cap_titles

['Rank', 'Company', 'Industry', 'Market cap(US$ millions)']

In [80]:
# Extracting the rows excluding the header
rows = revenue_table_span[0].find_all('tr')[1:]

# Initializing empty list to store the rows
market_cap_data = []

# Getting the columns
for row in rows:
    columns = row.find_all('td')

    # Extracting text from column and strip any trailing spaces
    row_data = [column.get_text().strip() for column in columns]

    # Combining header titles and row data
    market_cap_data.append(dict(zip(market_cap_titles, row_data)))

print(market_cap_data)

[{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '9,706'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '6,315'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '3,514'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '2,699'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '2,686'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,014'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '1,503'}, {'Rank': '8', 'Company': 'Nigerian Breweries', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '890'}, {'Rank': '9', 'Company': 'Jumia', 'Industry': 'Retail', 'Market cap(US$ mi

In [82]:
market_capital_data = [{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '9,706'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '6,315'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '3,514'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '2,699'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '2,686'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,014'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '1,503'}, {'Rank': '8', 'Company': 'Nigerian Breweries', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '890'}, {'Rank': '9', 'Company': 'Jumia', 'Industry': 'Retail', 'Market cap(US$ millions)': '837'}, {'Rank': '10', 'Company': 'Nestle Nigeria', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '749'}, {'Rank': '11', 'Company': 'Krystal Digital Network Solutions', 'Industry': 'Infotech', 'Market cap(US$ millions)': '678'}, {'Rank': '12', 'Company': 'Julius Berger', 'Industry': 'Construction', 'Market cap(US$ millions)': '631'}, {'Rank': '13', 'Company': 'Nigerian Bottling Company', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '627'}, {'Rank': '14', 'Company': 'Lafarge Africa', 'Industry': 'Cement', 'Market cap(US$ millions)': '602'}, {'Rank': '15', 'Company': 'Dangote Sugar Refinery', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '559'}, {'Rank': '16', 'Company': 'BUA Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '547'}, {'Rank': '17', 'Company': 'TotalEnergies Nigeria', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '534'}, {'Rank': '18', 'Company': 'Seplat Petroleum Development', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '498'}, {'Rank': '19', 'Company': 'Ardova Plc', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '474'}, {'Rank': '20', 'Company': '11PLC', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '428'}, {'Rank': '21', 'Company': 'International Breweries plc', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '357'}, {'Rank': '22', 'Company': 'Conoil', 'Industry': 'Oil and gas', 'Market cap(US$ millions)': '307'}, {'Rank': '23', 'Company': 'Honeywell Flour Mill', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '286'}, {'Rank': '24', 'Company': 'PZ Cussons Nigeria', 'Industry': 'Consumer goods', 'Market cap(US$ millions)': '216'}, {'Rank': '25', 'Company': 'UAC of Nigeria', 'Industry': 'Conglomerate', 'Market cap(US$ millions)': '213'}]
large_market_cap_data = pd.DataFrame(market_capital_data)
large_market_cap_data.head()

Unnamed: 0,Rank,Company,Industry,Market cap(US$ millions)
0,1,Nigeria National Petroleum,Oil and gas,9706
1,2,Nigeria Liquefied Natural Gas,Oil and gas,6315
2,3,MTN Nigeria,Telecommunications,3514
3,4,Dangote Cement,Cement,2699
4,5,Nigerian Petroleum Development,Oil and gas,2686


In [86]:
# Converting to .csv and store in .csv file
large_market_cap_data.to_csv('largest_companies_market_capitalization_2024.csv', index=False)