In [2]:
import pandas as pd

We have a csv that we manually collected from [The Wiki Game](https://www.thewikigame.com/group) with the paths of 30 Wikipedia Game winners. These are the paths taken by real human players on the website. Let's read these in and make them into an easier format for our program to process.

In [3]:
human_games_df = pd.read_csv("human_games.csv")

In [4]:
human_paths = list(human_games_df['Path'].values)

In [5]:
for i in range(len(human_paths)):
    human_paths[i] = human_paths[i].split("\xa0→\xa0")


In [6]:
human_paths[:4]

[['Romance languages',
  'Africa',
  'Asia',
  'Pacific Rim',
  'Australia',
  'Great Barrier Reef'],
 ['Natalie Portman',
  'United States nationality law',
  'American Samoa',
  'Samoan language',
  'Language family',
  'Proto-language',
  'Proto-Indo-European language'],
 ['IBM', 'Artificial intelligence', 'Facial recognition system', 'Human eye'],
 ['Academy Awards',
  'American Broadcasting Company',
  'Television network',
  'Telecommunications network']]

Great! We can see that the first item in each sublist is the source Wikipedia Article. The last item in each sublist is the target Wikipedia Article. Let's make a list of both using list comprehensions.

In [7]:
sources = [x[0] for x in human_paths]
targets = [x[-1] for x in human_paths]

print(f"First four sources: {sources[:4]}")
print(f"First four targets: {targets[:4]}")

First four sources: ['Romance languages', 'Natalie Portman', 'IBM', 'Academy Awards']
First four targets: ['Great Barrier Reef', 'Proto-Indo-European language', 'Human eye', 'Telecommunications network']


Now let's use the code we wrote in WikipediaSearch to see how the semantic and random player compare to the human players. We set a limit of 25 clicks for both players. The wikipedia game has a 120 second time limit and we figured that it would be very difficult for a human to click 25 links in that amount of time if they were playing any real strategy

In [10]:
from WikipediaSearch import WikipediaSearch

semantic_player = WikipediaSearch("semantic")
random_player = WikipediaSearch("random")

semantic_paths = []
random_paths = []
for i in range(len(sources)):
    print(f"Source: {sources[i]}, Target: {targets[i]}")

    semantic_path = semantic_player.search(sources[i], targets[i], limit=25)
    semantic_paths.append(semantic_path)

    random_path = random_player.search(sources[i], targets[i], limit=25)
    random_paths.append(random_path)
    
    print(f"Semantic path length: {len(semantic_path)}")
    print(f"Random path length: {len(random_path)}")
    print()



Source: Romance languages, Target: Great Barrier Reef
Semantic path length: 3
Random path length: 0

Source: Natalie Portman, Target: Proto-Indo-European language
Semantic path length: 0
Random path length: 0

Source: IBM, Target: Human eye
Semantic path length: 24
Random path length: 0

Source: Academy Awards, Target: Telecommunications network
Semantic path length: 12
Random path length: 0

Source: Folk music, Target: John Krasinski
Semantic path length: 0
Random path length: 0

Source: Kindergarten, Target: John Calvin
Semantic path length: 4
Random path length: 0

Source: Glucose, Target: Nintendo Entertainment System
Semantic path length: 5
Random path length: 0

Source: Planet, Target: BBC One
Semantic path length: 3
Random path length: 0

Source: Text messaging, Target: Resurrection of Jesus
Semantic path length: 8
Random path length: 0

Source: Charles Bronson, Target: Role-playing game
Semantic path length: 8
Random path length: 0

Source: Wavelength, Target: Kelly Clarkson
Se

In [11]:
human_path_lengths = [len(x) if len(x) else None for x in human_paths]
semantic_path_lengths = [len(x) if len(x) else None for x in semantic_paths]
random_path_lengths = [len(x) if len(x) else None for x in random_paths]

Now let's put this all into a single dataframe so we can investigate:

In [12]:
df = pd.DataFrame(data={
    "source": sources,
    "target": targets,
    "human_path": human_paths,
    "semantic_path": semantic_paths,
    "random_path": random_paths,
    "human_path_length": human_path_lengths,
    "semantic_path_length": semantic_path_lengths,
    "random_path_length": random_path_lengths,
})


In [13]:
df

Unnamed: 0,source,target,human_path,semantic_path,random_path,human_path_length,semantic_path_length,random_path_length
0,Romance languages,Great Barrier Reef,"[Romance languages, Africa, Asia, Pacific Rim,...","[Romance languages, Australia, Great Barrier R...",[],6,3.0,
1,Natalie Portman,Proto-Indo-European language,"[Natalie Portman, United States nationality la...",[],[],7,,
2,IBM,Human eye,"[IBM, Artificial intelligence, Facial recognit...","[IBM, Human resources management, Human capita...",[],4,24.0,
3,Academy Awards,Telecommunications network,"[Academy Awards, American Broadcasting Company...","[Academy Awards, Streaming service provider, A...",[],4,12.0,
4,Folk music,John Krasinski,"[Folk music, American folk music, Music of the...",[],[],8,,
5,Kindergarten,John Calvin,"[Kindergarten, Education in the United States,...","[Kindergarten, Paul Monroe, John Dewey, John C...",[],5,4.0,
6,Glucose,Nintendo Entertainment System,"[Glucose, WHO Model List of Essential Medicine...","[Glucose, Reward system, Video game addiction,...",[],7,5.0,
7,Planet,BBC One,"[Planet, Galileo Galilei, Thomas Harriot, Engl...","[Planet, BBC News, BBC One]",[],7,3.0,
8,Text messaging,Resurrection of Jesus,"[Text messaging, Santa Claus, Easter Bunny, Ea...","[Text messaging, Esprit de corps, Esprit de co...",[],5,8.0,
9,Charles Bronson,Role-playing game,"[Charles Bronson, Vigilante, Duel, 1908 Summer...","[Charles Bronson, Combat!, Combat (disambiguat...",[],11,8.0,


And let's save the csv for future analysis!

In [14]:
df.to_csv("competition_output.csv")