### Importing necessary libraries

In [1]:
import certifi
import urllib3
import pandas as pd
import unicodedata
import re
from bs4 import BeautifulSoup

### Retrieving data from specified addresses, converting and saving into a dictionary

In [2]:
#Specify the url
urls = ["https://en.wikipedia.org/wiki/Carfax_(company)", "https://en.wikipedia.org/wiki/Apple_Inc.", 
        "https://en.wikipedia.org/wiki/Thomson_Reuters", "https://en.wikipedia.org/wiki/IHS_Markit"]

In [3]:
#Query the website with SSL certificates verification and return the html to the variable 'response'
http = urllib3.PoolManager(cert_reqs = "CERT_REQUIRED", ca_certs = certifi.where())

def wiki_scraper(urls):
    topic_info_dict = {}
    for url in urls:
        response = http.request('GET', url)
        soup = BeautifulSoup(response.data, "lxml")
        topic_name = soup.find('h1').text
        links_list = [link.get("href") for link in soup.find_all("a", href = lambda href: href and "http" in href)]
        h2h3_list = [span.text for span in soup.find_all("span", attrs = {'class' : 'mw-headline'})]
        p_list = [p.text for p in soup.find_all('p') if p.text != '']
        for index, paragraph in enumerate(p_list):
            paragraph = re.sub(r'\[(\d+)\]', '', paragraph)
            paragraph = re.sub(r'\[.*?\]', '', paragraph)
            paragraph = re.sub(r'\ufeff', '', paragraph)
            paragraph = unicodedata.normalize('NFKD', paragraph)
            p_list[index] = paragraph    
        topic_info_dict[topic_name] = {'topic_name' : topic_name, 'associated_links': links_list, 'headlines' : h2h3_list, 'informations' : p_list}

    return topic_info_dict

In [4]:
dict1 = wiki_scraper(urls)
dict1.keys()

dict_keys(['Carfax (company)', 'Apple Inc.', 'Thomson Reuters', 'IHS Markit'])

In [5]:
dict1['Carfax (company)'].keys()

dict_keys(['topic_name', 'associated_links', 'headlines', 'informations'])

### Displaying paragraphs

In [6]:
for paragraph in dict1['Carfax (company)']['informations']:
    print(paragraph + '\n\n')

Carfax, Inc. is a commercial web-based service that supplies vehicle history reports to individuals and businesses on used cars and light trucks for the American and Canadian consumers.


In 1984 Carfax was founded in Columbia, Missouri, by a computer professional named Ewin Barnett III working together with Robert Daniel Clark, an accountant from Huntingdon, Pennsylvania. The company is now headquartered in Centreville, Virginia, with a data center operation in Columbia, Missouri. Barnett was initially trying to combat odometer fraud. By working closely with the Missouri Automobile Dealers Association, in 1986 he offered the early version Carfax vehicle history report to the dealer market. These reports were developed with a database of just 10,000 records and were distributed via fax machine. By the end of 1993, Carfax obtained title information from nearly all fifty states. In December 1996, the company's website was launched to offer consumers the same vehicle history reports alrea

### Displaying hyperlinks

In [7]:
for hyperlink in dict1['Carfax (company)']['associated_links']:
    print(hyperlink + '\n')

http://www.carfax.com/

http://www.westcarsettlement.com/

http://www.carfax.com/press/releases/fairfax-county-eda-presents-carfax-with-virginia-grant-for-headquarters-expansion

http://www.consumeraffairs.com/news04/2006/10/carfax_history.html

http://www.tcautos.com/carfax.html

https://web.archive.org/web/20081227054147/http://www.tcautos.com/carfax.html

http://press.ihs.com/press-release/corporate-financial/ihs-completes-three-strategic-acquisitions

https://web.archive.org/web/20151004161432/http://news.carfax.com/2013-06-20-Carfax-Unveils-New-Car-Care-Mobile-App

http://news.carfax.com/2013-06-20-Carfax-Unveils-New-Car-Care-Mobile-App

https://www.cnbc.com/2016/03/21/ihs-and-markit-to-merge-in-deal-valued-at-more-than-13b.html

https://web.archive.org/web/20151004161432/http://news.carfax.com/2013-06-20-Carfax-Unveils-New-Car-Care-Mobile-App

http://news.carfax.com/2013-06-20-Carfax-Unveils-New-Car-Care-Mobile-App

http://legacygt.com/forums/showthread.php?t=67083

http://www.ad