# Capturing All Characters from Fallout 4
This notebook aims at capturing each available character in Fallout 4 to be combined with a separate script which may scrape all of a selected character's dialogue in-game. While the first notebook that scrapes Cait's dialogue had used `BeautifulSoup`, this notebook uses `requests-html` and a touch of list finagling until a pretty pandas DataFrame is obtained.

In [1]:
from requests_html import HTMLSession

session = HTMLSession()

url = 'https://fallout.fandom.com/wiki/Fallout_4_characters'
r = session.get(url)

Get the main element container and split its text based on each new-line entry.

In [2]:
super_list = str(r.html.find('#mw-content-text > div')[0].text).split('\n')
super_list[150:175]

['Location',
 'Sole Survivor',
 'MQDadVoice.txt /MQMomVoice.txt',
 '00000007',
 '00000014',
 'Sanctuary Hills, Vault 111',
 'Base game',
 'Abernathy Farm',
 'Name',
 'Dialogue file',
 'Form ID',
 'Ref ID',
 'Location',
 'Blake Abernathy',
 'BlakeAbernathy.txt',
 '0006B4D3',
 '0006D3A2',
 'Abernathy farm',
 'Connie Abernathy',
 'ConnieAbernathy.txt',
 '0006B4D1',
 '0006D3A3',
 'Abernathy farm',
 'Lucy Abernathy',
 'LucyAbernathy.txt']

Notice that characters that have (considerable) dialogue have an associated `.txt` file. Therefore, a simple approach would be to check if '.txt' is present in an element and to get the element before it for the character's name.

In [10]:
names = []
for idx in range(len(super_list)):
    if '.txt' in super_list[idx]:
        names.append(super_list[idx - 1])

In [11]:
names[:5]

['Sole Survivor',
 'Blake Abernathy',
 'Connie Abernathy',
 'Lucy Abernathy',
 'Chancer']

Similarly, locations of each character seem to appear as the third entry *after* the element with the '.txt' extension. While the location isn't necessary, it wouldn't hurt to add.

In [14]:
locations = []
for idx in range(len(super_list)):
    if '.txt' in super_list[idx]:
        try:
            locations.append(super_list[idx + 3])
        except:
            break

In [15]:
locations[:5]

['Sanctuary Hills, Vault 111',
 'Abernathy farm',
 'Abernathy farm',
 'Abernathy farm',
 'Andrew station']

All that is left is to zip the lists together and put them into a DataFrame for exporting.

In [17]:
import pandas as pd

df = pd.DataFrame(zip(names, locations), columns=['Character', 'Location'])
df.head()

Unnamed: 0,Character,Location
0,Sole Survivor,"Sanctuary Hills, Vault 111"
1,Blake Abernathy,Abernathy farm
2,Connie Abernathy,Abernathy farm
3,Lucy Abernathy,Abernathy farm
4,Chancer,Andrew station


In [18]:
df.to_csv('character_names.csv', index=False)