# Importing Libraries

__[Selenium documentation](https://selenium-python.readthedocs.io/)__

In [67]:
import pandas as pd
import time

""" This browser automation library is very good for scraping dynamic websites. """
from selenium import webdriver

"""
    webdriver_manager.chrome: Specifies which supported search engine we want to import, here, which is chrome.
    
    ChromeDriverManager: Controls the ChromeDriver and allows you to drive the browser.
    Before this webdriver_manager package existed, people had to manually download a binary of chrome driver and save it in their laptop
    and whatever the chrome driver changed, they would have to update this chrome driver, which is very tiresome.
"""
from webdriver_manager.chrome import ChromeDriverManager

# Create Driver

In [68]:
""" 'driver' is an object and it can manipulate our browser; that is how selenium works. """
driver = webdriver.Chrome(ChromeDriverManager().install())

  driver = webdriver.Chrome(ChromeDriverManager().install())


In [69]:
""" Going to the witcher wiki website. """
page_url = "https://witcher.fandom.com/wiki/Category:Characters_in_the_stories"

""" Asking the driver to direct us to this page URL."""
driver.get(page_url)

We notice that we have a pop-up to accept the site user cookies. 
When we have this pop-up, some elements on the page will be hidden.
So we have to find a way to click on the accept button.

In __[Selenium documentation](https://selenium-python.readthedocs.io/)__, when you'll go to __[4. Locating Elements](https://selenium-python.readthedocs.io/locating-elements.html)__, you'll see that there's a method that allows us to select a button that has a certain text. 

`driver.find_element(By.XPATH, '//button[text()="Some text"]')`

Since our element is div, and not a button, we'll modify the code. <br><br>
Run the below code **only if** you see an "ACCEPT" button asking you to accept the user cookies.

In [70]:
# from selenium.webdriver.common.by import By

# driver.find_element(By.XPATH, '//div[text()="ACCCEPT"]').click()

On this page, you can see that there are a couple of books, and each of these different books have a set of characters.
So, for eg, if we click on _Category:Baptism of Fire characters_, we have a bunch of characters here. These are all the characters that appear in this book. <br><br>
So our task now is to go into each of these links and extract the character list for each of these books.

# Find Books

If we inspect our page, each of the links here have a link attribute of the _href_ and a class attribute. On __[Selenium documentation](https://selenium-python.readthedocs.io/)__, in __[4.6 Locating Elements by Class Name](https://selenium-python.readthedocs.io/locating-elements.html#locating-elements-by-class-name)__, there's the following method which let's you fing an element by class name:<br>
`.find_element(By.CLASS_NAME, 'content')` <br>
To find multiple elements by class name, just add an 's' to the above method:<br>
`.find_elements(By.CLASS_NAME, 'content')`

In [71]:
book_categories = driver.find_elements(By.CLASS_NAME, 'category-page__member-link')

In [72]:
book_categories

[<selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="7065e3c3-fdb0-4a24-b22f-58da75cb6e11")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="05ada637-7f74-4bd6-89f0-6ea48f47052f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="919156aa-011d-4eaa-b17c-7a554010227d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="3085936b-251f-45c1-95b6-8073034fd6fd")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="345052f9-0caf-4f84-85cd-3dd4f0950892")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="0ce84ac5-00f7-454c-be65-d68a07873ff7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d3422542d42a4cf55a54923fac2c6008", element="e5b94275-3f72-4dce-a477-b2

In [73]:
""" Seeing the text of 1st book category. """
book_categories[0].text

'Category:Baptism of Fire characters'

This is exactly the first book category on the webpage.

In [74]:
""" Getting the link of the 1st book category. """
book_categories[0].get_attribute('href')

'https://witcher.fandom.com/wiki/Category:Baptism_of_Fire_characters'

In [75]:
""" Now if we ask our driver to go into this URL, there's a list of all the characters here."""
driver.get(book_categories[0].get_attribute('href'))

We will now inspect all the name elements and see how we can extract these elements from the webpage. We see that there's a class name here (same as above).<br> In a very similar manner as above, we are going to extract the character names.

In [76]:
character_elems = driver.find_elements(By.CLASS_NAME, 'category-page__member-link')

In [77]:
""" 1st character's name. """
character_elems[0].text

'Adalia'

# Extracting All The Characters From All The Books

In [78]:
""" Create driver. """
driver = webdriver.Chrome(ChromeDriverManager().install())

""" Go to the characters in books page. """
page_url = "https://witcher.fandom.com/wiki/Category:Characters_in_the_stories"
driver.get(page_url)

""" Click on Accept cookies. """
time.sleep(3) # Wait for 3 seconds to let the page load fully.
# driver.find_element(By.XPATH, '//div[text()="ACCCEPT"]').click()

""" Find books. """
book_categories = driver.find_elements(By.CLASS_NAME, 'category-page__member-link')

""" Creating a list of dictionaries of all the book names and their URLs. """
books = []
for category in book_categories:
    book_url = category.get_attribute('href')
    book_name = category.text
    books.append({'book_name': book_name, 'url': book_url})

  driver = webdriver.Chrome(ChromeDriverManager().install())


In [79]:
books

[{'book_name': 'Category:Baptism of Fire characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Baptism_of_Fire_characters'},
 {'book_name': 'Category:Blood of Elves characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Blood_of_Elves_characters'},
 {'book_name': "Godamba Thaess'en",
  'url': 'https://witcher.fandom.com/wiki/Godamba_Thaess%27en'},
 {'book_name': 'Category:Season of Storms characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Season_of_Storms_characters'},
 {'book_name': 'Category:Something Ends, Something Begins characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Something_Ends,_Something_Begins_characters'},
 {'book_name': 'Category:Sword of Destiny characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Sword_of_Destiny_characters'},
 {'book_name': 'Category:Szpony i kły characters',
  'url': 'https://witcher.fandom.com/wiki/Category:Szpony_i_k%C5%82y_characters'},
 {'book_name': 'Category:Tales from the world of The W

In [None]:
character_list = []

for book in books:
    """ Go into the book page. """
    driver.get(book['url'])
    
    """ Extracting the character name elements. """
    character_elems = driver.find_elements(By.CLASS_NAME, 'category-page__member-link')
    for  elem in character_elems:
        character_list.append({'book': book['book_name'], 'character': elem.text})

In [None]:
""" Creating a Pandas DataFrame. """
character_df = pd.DataFrame(character_list)
character_df

# Which book has the highest number of characters?

In [None]:
""" Installing matplotlib """
!pip3 install matplotlib

In [None]:
""" Checking the total number of values (value_counts) of all the books. """
import matplotlib.pyplot as plt
character_df['book'].value_counts().plot(kind="bar")
plt.show()

The Lady of the Lake and Time of Contempt has the highest number of characters in all of the books! 