# Six Degrees of Separation with Wikipedia Corpus

## About

You may have heard of the concept of "six degrees of separation," which suggests that any two people on the planet can be connected through a chain of acquaintances with no more than six intermediaries. This idea is based on the observation that, by examining a person's connections to friends and family, and then extending that examination to their friends’ and family’s connections, one can often find a link between two individuals within six levels of separation.

To explore the validity of the "six degrees of separation" theory, I have chosen to investigate the connections between the two richest individuals in the world as of June 13, 2024. According to [Forbes](https://www.forbes.com/billionaires/), the wealthiest person is  **Bernard Arnault**, followed by **Elon Musk**.

The ultimate goal is to utilize Natural Language Processing (NLP) techniques to determine if there is a connection between *Bernard Arnault*  and  *Elon Musk* within the first six degrees of separation. 

## Implementation Steps

1. Load the Wikipedia of *Bernard Arnault*
2. Use named entity recognition to locate any names in *Bernard Arnault*'s Wikipedia 
 page
3. Repeat step 1 & 2 above for the Wikipedia pages of each name found in step 2.
4. Continue  this process six levels deep to build a graph of people connected to*Bernard Arnault*'s
 Wikipedia pag
5. Along the way, check whether *Elon Musk's* name appears in the graph
   e. 

## Executing the steps

**Loading the required libraries**

In [1]:
import requests #for downloading wiki pages
from bs4 import BeautifulSoup #web scrapping
import spacy #for recognizing named entities
import wikipediaapi #wikipedia api capabilities
import time

**1. Loading the Wikipedia page of Bernard Arnault**

While loading his page I only extract text information.

In [2]:
bernard = requests.get('https://en.wikipedia.org/wiki/Bernard_Arnault') #downloading wikipedia page
bernard_soup = BeautifulSoup(bernard.content, 'html.parser')
bernard_text = bernard_soup.get_text(strip = True) #text without tags

Preview of extracted Bernard Arnault's wikipedia page 

In [3]:
bernard_text[0:1000]

"Bernard Arnault - WikipediaJump to contentMain menuMain menumove to sidebarhideNavigationMain pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonateContributeHelpLearn to editCommunity portalRecent changesUpload fileSearchSearchAppearanceCreate accountLog inPersonal toolsCreate accountLog inPages for logged out editorslearn moreContributionsTalkContentsmove to sidebarhide(Top)1Early life2CareerToggle Career subsection2.11971–1987: Professional start2.21987–1989: Co-founding and acquisition of LVMH2.31989–2001: LVMH's Initial expansion and growth2.42001–present: Increasing success and profitability3Personal lifeToggle Personal life subsection3.1Family3.2Wealth3.3Request for Belgian citizenship3.4Transport3.5Lifestyle4Art collection5Political views6Awards and honors7See also8References9External linksToggle the table of contentsBernard Arnault54 languagesالعربيةAsturianuAzərbaycanca閩南語 / Bân-lâm-gúБеларускаяBikol CentralБългарскиCatalàČeštinaDanskDeutschEestiΕλληνικάEspa

**2. Using named entity recognition to locate any names in *Bernard Arnault*'s Wikipedia  page.**

In [4]:
nlp = spacy.load("en_core_web_sm") #loading the language model
bernard_doc = nlp(bernard_text)
bernard_ents = bernard_doc.ents

In [5]:
level_1 = [entity.text for entity in bernard_ents if entity.label_ == 'PERSON']
for name in level_1:
    print(name, end =', ')

Bernard Arnault - WikipediaJump, CommonsAppearancemove, Jean Étienne Arnault(1949-03, collector.[2][3]He, Boussac Saint-Frères, Louis Vuitton, Moët Hennessy, Moët Hennessy, Jean Étienne Arnault, Marie-Josèphe Savinel, Étienne Savinel, Arnault, Ferret-Savinel, Antoine Bernheim, Boussac, Bon Marché, Boussac, Louis VuittonHe, Alain Chevalier, Moët Hennessy, andHenry Racamier, ofLouis Vuitton, Arnault, Tribune.[16]La Tribune, JacobsandSephorain, Thomas Pinkin, Samaritainein, FordandDomenico De Sole, De Sole, De Sole, Cheval Blancin, d'Yquem, Libertysurf, jewelerBulgari, Arnault, Marco De Vincenzo, Tiffany, losses".[45]In, Louis Vuitton, Dior, Anne Dewavrin, Hélène Mercier, Alexandre, Antoine, Alexandre, Stephanie Watine, Jean, Delphine, Niel, Delphine, Bezosto, Arnaut, Jeff Bezos, Yves Klein, Henry Moore, andAndy Warhol.[80][81]He, Jean, Knight, David Rockefeller Award, June2024.^"Bernard Arnault, Bernard Arnault, Founder, Van-Der, La Voix, Nord, Moët, Vanessa, Elizabeth, Retrieved18 May20

*- From the inspection of the above named entities, some text doesn't represent people so such text must be removed so that only people's names who have wikipedia pages are maintained for further analysis.*

**Maintaining persons with wikipedia pages**

In [6]:
#specifying user_agent to identify my project
my_wiki = wikipediaapi.Wikipedia('Six Degrees of Separation with Wikipedia Corpus', 'en')

#Maintaining persons with wikipedia pages
level_1 = set([person for person in level_1 if not (person =='Bernard Arnault') and my_wiki.page(person).exists()])
level_1 = {'Bernard Arnault': list(level_1)}

The refined list of people's names on Bernard Arnault's Wikipedia pageis shown below

In [7]:
for name in level_1['Bernard Arnault']:
    print(name, end =', ')

Elizabeth, Twitter, Louis Vuitton, Bloomberg, Tiffany, dmy, Niel, Suzy, Arnault, Robert, Henry Moore, Francesca, La Voix, Boussac, Matthew, Gail, Erik, Alain Chevalier, Delphine, Bon Marché, Nord, Moët, Jacques, Roberta, Katie, Miles, Dior, Bezos, Jean, Jeff Bezos, Stella, Moët Hennessy, Knight, Ben, Steven, Arnaut, Luisa, Lisa, Donna Karan International, Binnie, Stewart, Vanessa, Andrew, Gwladys, Tait, Marc, Antoine, Founder, Simon, Isla, Bernard, Yves Klein, Alexandre, 

**3. Repeating step 1 & 2 above for the Wikipedia pages of each name found in step 2.**

In [13]:
level_2 = {}
for owner in level_1.keys():
    for person in level_1[owner]:
        if my_wiki.page(person).exists():
            wiki_page = my_wiki.page(person)
            text = wiki_page.text
            person_doc = nlp(text)
            person_ents = person_doc.ents
            below = [entity.text for entity in person_ents if entity.label_ == 'PERSON']
            below = set([x for x in below if not (x == person)])
            below = list(below)
            level_2.update({person:below})
        else:
            continue

In [16]:
level_2.keys()

dict_keys(['Elizabeth', 'Twitter', 'Louis Vuitton', 'Bloomberg', 'Tiffany', 'dmy', 'Niel', 'Suzy', 'Arnault', 'Robert', 'Henry Moore', 'Francesca', 'La Voix', 'Boussac', 'Matthew', 'Gail', 'Erik', 'Alain Chevalier', 'Delphine', 'Bon Marché', 'Nord', 'Moët', 'Jacques', 'Roberta', 'Katie', 'Miles', 'Dior', 'Bezos', 'Jean', 'Jeff Bezos', 'Stella', 'Moët Hennessy', 'Knight', 'Ben', 'Steven', 'Arnaut', 'Luisa', 'Lisa', 'Donna Karan International', 'Binnie', 'Stewart', 'Vanessa', 'Andrew', 'Gwladys', 'Tait', 'Marc', 'Antoine', 'Founder', 'Simon', 'Isla', 'Bernard', 'Yves Klein', 'Alexandre'])

The above dictionary `level_2` contains the 'person' named entities for each name displayed above.

**4. Continue this process six levels deep to build a graph of people connected to *Bernard Arnault*'s Wikipedia page.**

The code below automatically computes the remaining depth levels however it's time expensive. Therefore I decided to perform the remaining levels individually

In [None]:
level_3 = {}; level_4 ={};  level_5={}; level_6 ={}
remainder = [level_2, level_3, level_4, level_5, level_6]
for i in range(1,5):
    for owner in remainder[i-1].keys():
        for person in remainder[i-1][owner]:
            if my_wiki.page(person).exists() == True:
                time.sleep(2)
                wiki_page = my_wiki.page(person)
                text = wiki_page.text
                person_doc = nlp(text)
                person_ents = person_doc.ents 
                below = [entity.text for entity in person_ents if entity.label_ == 'PERSON'] 
                below = set([x for x in below if not (x == person)]) 
                below = list(below) 
                remainder[i].update({person:below})
            else: 
                continue

*Depth level 3 computation*

In [30]:
level_3 = {}
for owner in level_2.keys():
    for person in level_2[owner]:
        try:
            if my_wiki.page(person).exists(): 
                wiki_page = my_wiki.page(person) 
                text = wiki_page.text 
                person_doc = nlp(text) 
                person_ents = person_doc.ents 
                below = [entity.text for entity in person_ents if entity.label_ == 'PERSON']
                below = set([x for x in below if not (x == person)])
                below = list(below)
                level_3.update({person:below})
        except KeyError:
            continue

In [33]:
level_3.keys()

dict_keys(['Queen Elizabeth', 'Elisabeth Andreassen', 'Elizabeth II', 'Elizabeth Reef', 'Jelisaveta', 'Elizabeth Stakes', 'Elizaveta', 'Elisabeth', 'Killah Priest', 'Elizabeth Cup', 'Lisa', 'Michael Kunze', 'Zach Bryan', 'Sylvester Levay', 'Florian Weber', 'Musk', 'Peiter Zatko', 'Geng Shuang', 'Simon Oxley', 'Venmo', 'Quartz', 'Erdoğan', 'Twitter Lite', 'Matthew Auer', 'Caitlyn Jenner', 'Zatko', 'Moldovan', 'Yusaku Maezawa', 'Multimedia', 'Dick Costolo', 'Dyson', 'Steven Levy', 'Raffi Krikorian', 'Strike', 'Mike Carr', 'Sigmund Freud', 'Ellen', 'Glass', 'Caroline Criado-Perez', 'Tufekci', 'Jonathan Zittrain', 'John', 'Git', 'Larry Bird', 'AlDub', 'Carter Wilkerson', 'Bill Gates', 'Fitton', 'Tao Lin', 'Randi Harper', 'Adolf Hitler', 'John C. Dvorak', 'Birdwatch', 'Pope Francis', 'Williams', 'Twitter Blue', 'Rodrigo Duterte', 'Harry Potter', 'Michael Jackson', 'Mac', 'Oscar', 'Ben Silverman', 'Tweetie', 'Larry', 'Stella Creasy', 'Gruen', 'Leslie', 'Lady Gaga', 'Jamal Khashoggi', 'Edward

*Depth level 4 computation*

In [None]:
level_4 = {}
for owner in level_3.keys():
    for person in level_3[owner]:
        try:
            if my_wiki.page(person).exists(): 
                wiki_page = my_wiki.page(person) 
                text = wiki_page.text 
                person_doc = nlp(text) 
                person_ents = person_doc.ents 
                below = [entity.text for entity in person_ents if entity.label_ == 'PERSON']
                below = set([x for x in below if not (x == person)])
                below = list(below)
                level_4.update({person:below})
        except KeyError:
            continue

*Depth level 5 computation*

In [None]:
level_5 = {}
for owner in level_4.keys():
    for person in level_4[owner]:
        try:
            if my_wiki.page(person).exists(): 
                wiki_page = my_wiki.page(person) 
                text = wiki_page.text 
                person_doc = nlp(text) 
                person_ents = person_doc.ents 
                below = [entity.text for entity in person_ents if entity.label_ == 'PERSON']
                below = set([x for x in below if not (x == person)])
                below = list(below)
                level_5.update({person:below})
        except KeyError:
            continue

*Depth level 6 computation*

In [None]:
level_6 = {}
for owner in level_5.keys():
    for person in level_5[owner]:
        try:
            if my_wiki.page(person).exists(): 
                wiki_page = my_wiki.page(person) 
                text = wiki_page.text 
                person_doc = nlp(text) 
                person_ents = person_doc.ents 
                below = [entity.text for entity in person_ents if entity.label_ == 'PERSON']
                below = set([x for x in below if not (x == person)])
                below = list(below)
                level_6.update({person:below})
        except KeyError:
            continue