## Task 1

Create a text corpus of three books from [project gutenberg](https://gutenberg.org/) and save it to your local system. Pay attention to the conditions of the robots.txt file.
(Time 25 mins)

In [1]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import time

In [None]:
# Solution with Beautiful Soup

start = time.time()

url_list = ["https://gutenberg.org/cache/epub/1232/pg1232-images.html",
            "https://gutenberg.org/cache/epub/2554/pg2554-images.html",
            "https://gutenberg.org/cache/epub/1727/pg1727-images.html"]

for index, url in enumerate(url_list):
    html_response = urlopen(url) # returns http.client.HTTPResponse as a file like object
    bs = BeautifulSoup(html_response, 'html.parser')
    text = ""
    print(bs.h1.string)
    if bs.h1.string:
        print(bs.h1.string)
        text = text + bs.h1.string
    all_p_elements = bs.select(".chapter")
    for elem in all_p_elements:
        print(elem.text)
        text = text + elem.text
    with open(str(index)+".txt", "w", encoding="utf8") as f:
        f.write(text)

print(f'Executed in {time.time()-start} seconds')

In [30]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time

In [31]:
# Solution with Selenium

start = time.time()
url_list = ["https://gutenberg.org/cache/epub/1232/pg1232-images.html",
            "https://gutenberg.org/cache/epub/2554/pg2554-images.html",
            "https://gutenberg.org/cache/epub/1727/pg1727-images.html"]

url = "https://fill.dev/form/login-simple"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()

for index, url in enumerate(url_list):
    text = ""
    driver.get(url=url)
    elements = driver.find_elements(By.TAG_NAME,"p")
    for elem in elements:
        print(elem.text)
        text = text + elem.text

    with open(str(index)+"_selenium.txt", "w", encoding="utf8") as f:
        f.write(text)
print(f'Executed in {time.time()-start} seconds')
    

Title: The Prince
Author: Niccolò Machiavelli
Translator: W. K. Marriott
Release date: February 11, 2006 [eBook #1232]
Most recently updated: July 1, 2022
Language: English
Credits: John Bickers, David Widger and Others
Nicolo Machiavelli, born at Florence on 3rd May 1469. From 1494 to 1512 held an official post at Florence which included diplomatic missions to various European courts. Imprisoned in Florence, 1512; later exiled and returned to San Casciano. Died at Florence on 22nd June 1527.
Nicolo Machiavelli was born at Florence on 3rd May 1469. He was the second son of Bernardo di Nicolo Machiavelli, a lawyer of some repute, and of Bartolommea di Stefano Nelli, his wife. Both parents were members of the old Florentine nobility.
His life falls naturally into three periods, each of which singularly enough constitutes a distinct and important era in the history of Florence. His youth was concurrent with the greatness of Florence as an Italian power under the guidance of Lorenzo de’ Me

## Task 2 

Try to fill the [form](https://www.selenium.dev/selenium/web/web-form.html) and submit. Extract any text present in the page after submission.
(Time 40 mins)

## Solution using Selenium
To upload a file, we have created an empty test_document.txt file in the project folder. Then upon capturing the element for file input, we have sent the full path of the file using send_keys().

Text input fields, datepickers and dropdown select items can be sent inputs with send_keys() whereas items such as checkboxes can be clicked.
However, dragging the range element is slightly more complex. In this case, we have used action chains to move the range object.

### [Action Chains](https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains)
ActionChains are can mimic mouse movements, mouse button actions, key press, and context menu interactions. This is useful for doing more complex actions like hover over and drag and drop. More details can be found [here](https://www.selenium.dev/selenium/docs/api/py/webdriver/selenium.webdriver.common.action_chains.html).

In [None]:
import os
from selenium.webdriver import ActionChains

In [64]:
url = "https://www.selenium.dev/selenium/web/web-form.html"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)

text_input_xpath = '//*[@id="my-text-id"]'
password_xpath = '/html/body/main/div/form/div/div[1]/label[2]/input'
textarea_xpath = '/html/body/main/div/form/div/div[1]/label[3]/textarea'
dropdown_select_xpath = '/html/body/main/div/form/div/div[2]/label[1]/select/option[3]'
dropdown_datalist_xpath = '/html/body/main/div/form/div/div[2]/label[2]/input'
file_input_xpath = '/html/body/main/div/form/div/div[2]/label[3]/input'
checkbox_xpath = '//*[@id="my-check-2"]'
radio_button_xpath = '//*[@id="my-radio-2"]'
color_picker_xpath = '/html/body/main/div/form/div/div[3]/label[1]/input'
date_xpath = '/html/body/main/div/form/div/div[3]/label[2]/input'
range_path = '/html/body/main/div/form/div/div[3]/label[3]/input'
submit_button_xpath = '/html/body/main/div/form/div/div[2]/button'

items_to_send_keys = {
    text_input_xpath: "some random text",
    password_xpath: "super strong password",
    textarea_xpath: "another random text",
    dropdown_datalist_xpath: "Seattle",
    file_input_xpath: os.path.join(os.getcwd(), "test_document.txt"),
    color_picker_xpath: "#14baaa",
    date_xpath: "06/18/2024",

}


items_to_click = [dropdown_select_xpath, checkbox_xpath, radio_button_xpath, submit_button_xpath]

# Use of Action chains for moving range
element =  driver.find_element(By.XPATH, range_path)
move = ActionChains(driver)
move.drag_and_drop_by_offset(source = element, xoffset=100, yoffset=0).release().perform()


for xpath, text in items_to_send_keys.items():
    element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath)))
    element.send_keys(text)
    time.sleep(1)


for xpath in items_to_click:
    element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath)))
    element.click()
    time.sleep(1)

driver.close()

## Task 3

1. Scrape any other website of your choice (Time 45 mins)
2. Present your difficulties to your peers (5 mins)