This project is a Spring Boot-based web service that performs the following tasks:
- Calls a public REST API to fetch the first state name from India.
- Uses Selenium WebDriver to scrape Google search results for that state.
- Extracts the top 2 search results (title + link).
- Saves the scraped data into an Excel file using Apache POI.
- Provides a REST API endpoint for users to trigger the search operation.
- Implements global exception handling and logging for better debugging and error tracking.
- Includes JUnit test cases to validate different components.
- Spring Boot – Framework for building Java-based web applications.
- Selenium WebDriver – A browser automation tool used for web scraping.
- Apache POI – A Java library to handle Excel file operations.
- Apache HTTP Client – Used to make HTTP requests to external APIs.
- Jackson Databind – Handles JSON serialization and deserialization.
- Java 17 – Required for compatibility with Spring Boot 3.x.
- SLF4J & Logback – Used for logging throughout the application.
- JUnit & Mockito – Used for unit and integration testing.
The project follows the Controller-Service-Utility pattern:
- Configuration Layer – Stores common configurations in
application.properties. - Controller Layer – Handles HTTP requests and maps them to appropriate service methods.
- Service Layer – Implements business logic, interacts with Selenium for web scraping.
- Utility Layer – Handles common functionalities like writing data to Excel files.
- Global Exception Handler – Manages application-wide exceptions.
- Logging Integration – Provides structured logging for debugging and monitoring.
- JUnit Test Cases – Ensures correctness of the implemented functionalities.
- Stores common properties such as Selenium driver path, timeouts, and Excel file path.
- Uses
@ConfigurationPropertiesto dynamically load values fromapplication.properties.
- Defines RESTful endpoints for triggering the scraping process.
- Uses
@RestControllerto expose API methods. - Implements structured logging for tracking API calls.
- Implements Selenium WebDriver to automate Google search.
- Configures ChromeDriver with headless mode for efficient execution.
- Uses
WebDriverWaitto handle dynamic page elements. - Extracts search result titles and URLs from Google’s result page.
- Calls
ScraperUtilto write the extracted data into an Excel file. - Implements proper error handling to manage issues like
NoSuchElementException,TimeoutException, and connection failures. - Uses structured logging (
LoggerFactory.getLogger()) to track execution flow and debugging purposes. - Ensures efficient memory management by closing WebDriver instances after execution.
- Can be extended to support multiple search engines like Bing or Yahoo in the future.
- Utilizes Apache POI to create and write Excel files.
- Generates an Excel sheet with headers (
Search Keyword,Title,Link). - Saves the output file locally for easy access.
- Uses
try-with-resourcesto manage file operations and prevent resource leaks. - Implements a method to append data if the file already exists, ensuring new search results do not overwrite previous ones.
- Includes data validation to check for empty or duplicate entries before writing to the Excel file.
- Optimized for performance to handle large-scale data efficiently.
- Implements
@ControllerAdviceto catch and handle exceptions globally. - Returns meaningful error responses with HTTP status codes.
- Logs exception details for debugging.
- Uses SLF4J with Logback for structured logging.
- Logs method entry, exit, and errors in each layer.
- Configured in
logback.xmlto support different log levels (INFO, DEBUG, ERROR).
- Includes unit tests for Controller, Service, Utility, and Exception Handling.
- Uses Mockito to mock dependencies and isolate test scenarios.
- Validates API behavior, Selenium operations, and Excel file creation.
- Implements integration tests for verifying end-to-end workflow.
- User Calls API → Sends an HTTP request to
/api/scraper/search?keyword=India. - Controller Processes Request → Calls
ScraperServiceto execute the search. - Service Performs Web Scraping → Uses Selenium to extract search results.
- Data Stored in Excel →
ScraperUtilsaves extracted data to an Excel file. - Response Sent to User → API returns success message or error details.
- Error Handling → If any issue occurs,
GlobalExceptionHandlercatches and logs it. - Tests Executed → JUnit tests validate correctness of each component.
Download the latest ChromeDriver from: 🔗 ChromeDriver Download
Use Maven to start the service:
mvn spring-boot:runMake an API request using cURL or Postman:
curl "http://localhost:8080/api/scraper/search?keyword=India"Execute all test cases with:
mvn testThis document provides a technical breakdown of the Spring Boot Selenium API Scraper project, explaining its architecture, dependencies, and execution flow. This implementation ensures efficient web scraping, proper logging, error handling, and unit testing, making it maintainable and scalable. 🚀