# ReCaptcha Solver

This notebook presents a demonstration of a ReCaptcha solver powered by Mutlimodal LLMs, showcasing the integration of natural language processing and computer vision with web browsing functionality.

In [2]:
from IPython.display import Video
Video('media/selenium_recaptcha.mp4', embed=True) 

## How It Works

This code snippet is a Python class named [GoogleRecaptchaWrapper](../../../../../slangchain/slangchain/tools/selenium/tool.py), designed to automate the process of solving Google's reCAPTCHA challenges using Selenium WebDriver. Let's break down how this class works:

### Class Structure and Initial Setup
- The class defines several attributes with default values and optional types using Pydantic's `Field` for configuration and the Selenium WebDriver elements for interacting with web elements.
- It includes settings for timeouts, model description parameters (for Multimodal LLM assistance in interpreting CAPTCHAs), elements of the CAPTCHA challenge (like iframes and checkboxes), and various delays to mimic human interaction.
- The class also defines methods for image enhancement, leveraging PIL (Python Imaging Library) for better visibility of CAPTCHA images during processing.

### Key Methods and Their Functions
1. **`_find_not_robot_iframe`, `_find_recaptcha_checkbox_border_element`, `_wait_recaptcha_iframe`, `_find_image_tiles`, `_find_instructions`, `_find_status_message`, and `_find_submit_button`**:
   - These methods are responsible for locating various elements of the reCAPTCHA challenge on the webpage, such as iframes, checkbox borders, image tiles, instructions, status messages, and the submit button. They use Selenium's `WebDriverWait` and `expected_conditions` to ensure elements are present and interactable.
<br><br>

1. **`click_recaptcha_checkbox_border` and `click_submit_button`**:
   - These methods perform click actions on the reCAPTCHA checkbox and the submit button, respectively, using the elements found by the aforementioned methods.
<br><br>

1. **`_get_ai_website_description` and `_get_website_main_content`**:
   - These methods aim to use AI (presumably an external service, modeled by the [Base64ImageStringExplainerChain](../../../../../slangchain/slangchain/chains/image_explainer/base.py)) to interpret the website's content or the CAPTCHA challenge, based on a screenshot converted to a base64-encoded string.
<br><br>

1. **`_get_image_tile_selection_by_screenshot`**:
   - Method thatuses Multimodal LLM interpretation of CAPTCHA images to determine which tiles to select. It prepares a detailed prompt for the AI, receives selections, and handles exceptions.
<br><br>

1. **`_enhance_image_brightness`, `_enhance_image_contrast`, `_enhance_image_sharpness`, and `_enhance_image`**:
   - These methods apply various image enhancements to the CAPTCHA images to potentially improve their readability before processing. They use PIL's image enhancement features.
<br><br>

1. **`_add_tile_numbers_to_image`, `_convert_image_to_base64`, and `create_image_element_base64`**:
   - These methods are involved in processing CAPTCHA images, adding tile numbers to them for easier identification, and converting the images to base64 strings for further processing or AI analysis.
<br><br>

1. **`_get_image_tile_selection` and `click_image_tiles`**:
   - After determining which tiles to select in a CAPTCHA challenge (using AI or manual logic), these methods perform the actual selection by clicking on the appropriate tiles.
<br><br>

1. **`process_recaptcha`**:
   - This is the main method orchestrating the entire process of solving the reCAPTCHA. It sequentially clicks the checkbox, waits for and processes the image CAPTCHA if present, clicks image tiles based on AI or predefined logic, and finally clicks the submit button. It handles various states of the challenge, including success, failure, or absence of CAPTCHA.
<br><br>

1. **`from_parameters`**:
   - A class method to instantiate [GoogleRecaptchaWrapper](../../../../../slangchain/slangchain/tools/selenium/tool.py) with specific parameters, allowing customization of timeouts, AI model settings, and image enhancement settings.
<br><br>

### Additional Notes
- The use of delays and random intervals between actions is an attempt to mimic human interaction with the CAPTCHA challenge, reducing the likelihood of detection by anti-bot measures.

This class represents a comprehensive approach to programmatically interact with and attempt to solve Google's reCAPTCHA challenges, leveraging both Selenium for web automation and potentially AI for interpreting complex CAPTCHA images.

## Example

Before we build, let's configure our environment:

In [None]:
import getpass
import os

def _set_if_undefined(var: str):
  if not os.environ.get(var):
    os.environ[var] = getpass(f"Please provide your {var}")

_set_if_undefined("ANTHROPIC_API_KEY")

In [None]:
import sys
import logging

stream_handler = logging.StreamHandler(sys.stdout)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(stream_handler)

In [None]:
import os
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options as ChromeOptions
from slangchain.tools.selenium.tool import GoogleRecaptchaWrapper

url = "https://www.google.com/recaptcha/api2/demo"
driver_timeout = 10
window_width = 750
window_height = 750

chrome_options = ChromeOptions()

chrome_options.add_argument((
  "--user-agent=Mozilla/5.0 (Linux; Android 6.0;"
  " HTC One M9 Build/MRA58K) AppleWebKit/537.36"
  " (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"))
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("--disable-blink-features=AutomationControlled")  
driver =  Chrome(options=chrome_options)
driver.set_window_size(width = window_width, height = window_height)
driver.get(url)

captcha_tool = GoogleRecaptchaWrapper.from_parameters(
  driver = driver,
  model_describer_name = "gpt-4-vision-preview",
  brightness = 1.0,
  temperature = 0)

In [None]:

captcha_tool.process_recaptcha()