# Unite For Literacy

In this notebook, the Selenium python library is used to scrape the text from the Unite for Literacy website, and the googletrans and gTTS libraries are used for tranlation and text to speech.

### 1. Installation and Initial Steps

Before you can use the _Selenium_ and _googletrans_ packages, you must install them. This can be done using the pip install command.

In [47]:
#installs the Selenium package
!pip install selenium

# Installs googletrans package
!pip install googletrans==3.1.0a0



In [158]:
#Import the necessary packages
from selenium import webdriver
from selenium.webdriver.common.by import By
import ipywidgets as widgets
import googletrans
from googletrans import *
import time

In [159]:
#Add any URLs from the Unite for Literacy website to the dictionary that includes book titles and book URLs.
bookURLs = {
    "If I Had a Puppy": "https://www.uniteforliteracy.com/featured/new/book?BookId=1305",
    "Going to School": "https://www.uniteforliteracy.com/featured/new/book?BookId=2007",
    "Let's Make Salsa": "https://www.uniteforliteracy.com/featured/new/book?BookId=1639",
    "Roly Poly": "https://www.uniteforliteracy.com/featured/new/book?BookId=242",
    "The Desert": "https://www.uniteforliteracy.com/featured/new/book?BookId=212"
}

In [160]:
#Select the book using the dropdown menu
bookSelection = widgets.Dropdown(
    options=["If I Had a Puppy", "Going to School", "Let's Make Salsa", "Roly Poly", "The Desert"],
    value="If I Had a Puppy",
    description='Book:',
    disabled=False,
)
bookSelection

Dropdown(description='Book:', options=('If I Had a Puppy', 'Going to School', "Let's Make Salsa", 'Roly Poly',…

In [161]:
#Select the language using the dropdown menu
languageSelection = widgets.Dropdown(
    options=list(googletrans.LANGUAGES.values()),
    value="english",
    description="Language:",
    disabled=False,
)
languageSelection

Dropdown(description='Language:', index=21, options=('afrikaans', 'albanian', 'amharic', 'arabic', 'armenian',…

In [162]:
#Initialize the translator object and set the variable transLang as the language selected with the dropdown
transLang = languageSelection.label
translator = Translator()

### 2. Selenium Webdriver

The Selenium package opens a browser webdriver that can be controlled through python. This is the only way to access the information on Unite For Literacy's website as normal webscraping methods cannot bypass the security in the website. Furthermore, you must access the text elements using their x-path.

Install the chromedriver at this link: https://chromedriver.chromium.org/downloads (Different chromedrivers are not guaranteed to work for this program)

In [163]:
#Accesses the dictionary to get the URL based on the title of the book that is selected
url = bookURLs[bookSelection.label]

#Creates a driver object
#Replace the path of the webdriver with the path from your computer
#The driver is headless to not unnecessarily open a browser
option = webdriver.ChromeOptions()
option.add_argument('headless')
#Use respective filepath for chromedriver:
driver = webdriver.Chrome("/Users/srideepdornala/Downloads/chromedriver", options=option)

#This line opens the URL
driver.get(url)

  driver = webdriver.Chrome("/Users/srideepdornala/Downloads/chromedriver", options=option)


In [164]:
#Prints the title of the book
print("Title: " + bookSelection.label)

Title: If I Had a Puppy


In [165]:
#Function that turns a page in the book on the webdriver
def turnPage():
    driver.find_element(By.CLASS_NAME, "PageEdgeRight").click()

#Function that turns a page back in the book
def turnBackPage():
    driver.find_element(By.CLASS_NAME, "PageEdgeLeft").click()

#Creates a story variable that can be accessed by the functions below. Used for the text to speech function at the end
global story
story = ""

#All three functions access the text using the element's x-path
#The functions translate the accessed element to the desired language
#The first page function applies for the first page, the lastPage function is for the last, and the getText function is used for all other pages
def getText():
    text = translator.translate(driver.find_element(By.XPATH, "//*[@id=\"book-zoom\"]/div/div[7]/div[1]/div/div/div").text, src='en', dest = transLang).text
    global story
    story = story + text
    return text
def getText2():
    text = translator.translate(driver.find_element(By.XPATH, "//*[@id=\"book-zoom\"]/div/div[8]/div[1]/div/div/div").text, src='en', dest = transLang).text
    global story
    story = story + text
    return text
def firstPage():
    text = translator.translate(driver.find_element(By.XPATH, "//*[@id=\"book-zoom\"]/div/div[5]/div[1]/div/div/div").text, src='en', dest = transLang).text
    global story
    story = story + text
    return text
def lastPage():
    text = translator.translate(driver.find_element(By.XPATH, "//*[@id=\"book-zoom\"]/div/div[8]/div[1]/div/div/div").text, src='en', dest = transLang).text
    global story
    story = story + text
    return text
def lastPage2():
    text = translator.translate(driver.find_element(By.XPATH, "//*[@id=\"book-zoom\"]/div/div[4]/div[1]/div/div/div").text, src='en', dest = transLang).text
    global story
    story = story + text
    return text

When using the webdriver to scrape the text, each of the functions have to be run after waiting a set amount of time. This is because the _turnPage_ function takes time to execute on the webdriver. Though the code will execute in the program, the action will not finish executing in time in the webdriver. If the turn page action does not finish executing in time, you will access the text from the previous page. To bypass this, we use the sleep function from the python time library.

Also, each of the books on Unite For Literacy's website have 9 pages, so this process will work regardless of which book is used.

In [166]:
#Turn a page to first page with text
turnPage()
time.sleep(2)

#Prints the first page and then turns the page
#Waits 3 seconds to allow for the driver to complete
print(firstPage())
turnPage()
time.sleep(2)

#Loop that loops through the middle pages and prints the text
#Waits 3 seconds to allow webdriver to finish process
for x in range(7):
    print(getText())
    turnPage()
    time.sleep(2)
    
#Prints the last page of the book
print(lastPage())

# Importing the needed module for text to speech
from gtts import gTTS
  
# Importing the needed module for playing the converted audio
import IPython

abrv = list(googletrans.LANGUAGES.keys())[list(googletrans.LANGUAGES.values()).index(transLang)]

obj = gTTS(text=story, lang=abrv, slow=False)

# Saving the converted audio in an mp3 file named translation
obj.save("translation.mp3")

# Playing the audio file
IPython.display.Audio("translation.mp3")

If I had a puppy, I would take good care of him.
If he were hungry, I would feed him.
If he wanted to play, I would teach him how to fetch.
If he were sick, I would take him to the veterinarian.
If he got dirty, I would give him a warm bath.
If he were lonely, I would invite a puppy pal over to play.
If he wanted to be outside, we would take a walk together.
If he were sleepy, I would put him in his soft bed.
If I had a puppy, we would be best friends!


To run the program again, return to the first page on the webdriver and run the next cell. Since the xpaths are altered after the first run of the program, new functions must be used.

In [167]:
#Return to the title page
for x in range(10):
    turnBackPage()
    time.sleep(2)

In [168]:
#Turn a page to first page with text
turnPage()
time.sleep(2)
print(getText())

#Restart book
for x in range(8):
    print(getText2())
    turnPage()
    time.sleep(2)
    
#Prints the last page of the book
print(lastPage2())

If I had a puppy, I would take good care of him.

If he were hungry, I would feed him.
If he wanted to play, I would teach him how to fetch.
If he were sick, I would take him to the veterinarian.
If he got dirty, I would give him a warm bath.
If he were lonely, I would invite a puppy pal over to play.
If he wanted to be outside, we would take a walk together.
If he were sleepy, I would put him in his soft bed.
If I had a puppy, we would be best friends!


In [169]:
#Closes the webdriver
driver.close()