# Amazon Audible Bot

Website: [Audible](https://www.audible.com/search)

Note that the script in the notebook will not use a headless browser so you can see the bot in action, contrary to the python script, which will run in the background.

We first start by importing the required libraries.

In [46]:
from utils import *
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Then we need to setup and initialize the driver.

In [47]:
# initialize chrome driver
driver = webdriver.Chrome('/usr/bin/chromedriver/chromedriver')
url = 'https://www.audible.com/search'
driver.get(url)
# maximize window for locating all the elements we need
driver.maximize_window()

The first thing we do is to get the pagination element, get the number of pages (which is the number of the last page), so we can loop through all of the pages.

In [48]:
# pagination
pagination_bar_elements = driver.find_element_by_xpath('//ul[contains(@class, "pagingElements")]').find_elements_by_tag_name('li')
last_page = int(pagination_bar_elements[-2].text)
print('The number of pages to scrape:', last_page)

The number of pages to scrape: 25


Now, we can work on the logic to get books data from each page.

In [49]:
# get the main content of the page
content = driver.find_element_by_id('center-3')
books = content.find_elements_by_xpath('./div/div/div/span/ul/li')
print('The number of books in the current page:', len(books))

# try on the first book
first_book = books[0]

title = first_book.find_element_by_tag_name('h3').text

authors_text = get_authors(first_book.find_element_by_xpath('.//li[contains(@class, "authorLabel")]').text)

length = get_length(first_book.find_element_by_xpath('.//li[contains(@class, "runtimeLabel")]').text)

date = get_date(first_book.find_element_by_xpath('.//li[contains(@class, "releaseDateLabel")]').text)

price = get_price(first_book.find_element_by_xpath('.//p[contains(@id, "buybox-regular-price")]').text)

print(title)
print(authors_text)
print(length)
print(date)
print(price)

The number of books in the current page: 20
Dragon's Justice 7
Bruce Sentar
603
2023-07-09
24.95


We see that the whole process is working fine, let's now group all of these steps together, and not forget to add a delay between each page, so that we don't get blocked by the website.

In [50]:
titles = []
authors = []
lengths = []
dates = []
prices = []

content = driver.find_element_by_id('center-3')
books = content.find_elements_by_xpath('./div/div/div/span/ul/li')

print('Current page:', 1)

# loop through the books and get the information
for book in books:
    title = book.find_element_by_tag_name('h3').text
    titles.append(title)
    authors_text = get_authors(book.find_element_by_xpath('.//li[contains(@class, "authorLabel")]').text)
    authors.append(authors_text)
    length = get_length(book.find_element_by_xpath('.//li[contains(@class, "runtimeLabel")]').text)
    lengths.append(length)
    date = get_date(book.find_element_by_xpath('.//li[contains(@class, "releaseDateLabel")]').text)
    dates.append(date)
    price = get_price(book.find_element_by_xpath('.//p[contains(@id, "buybox-regular-price")]').text)
    prices.append(price)

len(titles), len(authors), len(lengths), len(dates), len(prices)

Current page: 1


(20, 20, 20, 20, 20)

In [51]:
for n_page in range(1, last_page)[:2]:
    # go to the next page
    driver.find_element_by_xpath('//span[contains(@class, "nextButton")]').click()
    print('Current page: ', n_page+1)
    # add condition for sleep time
    condition = EC.presence_of_element_located((By.ID, 'center-3'))
    content = WebDriverWait(driver, 2).until(condition)
    books = content.find_elements_by_xpath('./div/div/div/span/ul/li')
    for book in books:
        title = book.find_element_by_tag_name('h3').text
        titles.append(title)
        authors_text = get_authors(book.find_element_by_xpath('.//li[contains(@class, "authorLabel")]').text)
        authors.append(authors_text)
        length = get_length(book.find_element_by_xpath('.//li[contains(@class, "runtimeLabel")]').text)
        lengths.append(length)
        date = get_date(book.find_element_by_xpath('.//li[contains(@class, "releaseDateLabel")]').text)
        dates.append(date)
        price = get_price(book.find_element_by_xpath('.//p[contains(@id, "buybox-regular-price")]').text)
        prices.append(price)

Current page:  2
Current page:  3


In [52]:
len(titles), len(authors), len(lengths), len(dates), len(prices)

(59, 59, 59, 59, 59)

In [53]:
# quit the driver
driver.quit()

Finally, we can save the lists in a pandas dataframe, and export it to a csv file.

In [None]:
# create a dataframe
df = pd.DataFrame({
    'title': titles,
    'authors': authors,
    'length': lengths,
    'release_date': dates,
    'price': prices
})

# save to csv
df.to_csv('../data/audible_sample.csv', index=False)