This shows how to set up and run the example Java web-scraping scripts from this repository using IntelliJ IDEA and Maven.
Note: The project was created and tested in IntelliJ IDEA with Maven. Other IDEs (like VS Code) may require additional setup and are not covered here.
Before you start, make sure you have:
- Java JDK installed (e.g. 17 or newer)
- IntelliJ IDEA (Community Edition is enough)
- Git (optional, if you want to clone the repo instead of downloading ZIP)
- A working internet connection (Maven needs to download dependencies)
You can verify Java with:
java -version-
Open IntelliJ IDEA
-
On the welcome screen, choose “New Project”
-
For a build system, select “Maven”
-
Check "Add sample code
-
Click Create
-
IntelliJ will create a sample Maven project with:
pom.xmlsrc/main/java/org/example/Main.java
- Open the GitHub repository in your browser
- Open the
pom.xmlfile from the repo - Copy its entire contents
- In IntelliJ, open the
pom.xmlthat was generated when you created the Maven project - Replace the IntelliJ-generated contents with the contents from GitHub
- Save the file
This step ensures all necessary dependencies (e.g. Jsoup, Gson, OpenCSV, etc.) are declared.
-
In the GitHub repo, open
Main.java -
Copy its entire contents
-
In IntelliJ, open the
Main.javafile created by the project wizard:- Path should be something like:
**src/main/java/org/example/Main.java**
- Path should be something like:
-
Replace the contents with the code from GitHub.
-
Make sure the
packageline at the top matches your project package, e.g.:package org.example;
-
Save the file.
If the repo contains additional example files, such as:
MainPagination.javaMainParallelScraper.java
Add them to your IntelliJ project:
-
In IntelliJ’s Project view, go to:
src/main/java/org/example -
Right-click on the
org.examplepackage > New > Java Class. -
Name the class exactly as in the repo, e.g.:
MainPaginationMainParallelScraper
-
For each class:
-
Open the corresponding file on GitHub.
-
Copy its entire contents.
-
Paste into the new class in IntelliJ.
-
Ensure the
packageline at the top is the same as in other files, e.g.:package org.example;
-
-
Save all files.
To make sure Maven downloads and refreshes all dependencies:
-
Open Settings:
- On Windows/Linux: File > Settings
- On macOS: IntelliJ IDEA > Settings (or Preferences)
-
Go to: Build, Execution, Deployment > Build Tools > Maven > Repositories
-
You should see at least:
- Local repository
- One or more Remote repositories
-
Select each repository (local and remote) and click “Update”.
-
Click OK to close the dialog.
This forces Maven to refresh metadata and ensure it can resolve all dependencies.
- In the Project view, right-click on
pom.xml. - Select Maven > Sync (or Reimport / Reload Project, depending on IntelliJ version).
- Wait for IntelliJ to finish downloading and indexing dependencies (you’ll see progress at the bottom of the window).
If everything is configured correctly, all imports like Jsoup, Document, Elements, Gson, CSVWriter, etc. should now resolve without errors.
Each example has its own main method and can be run separately.
- Open
Main.java(orMainPagination.javaorMainParallelScraper.java) - Right-click anywhere inside the file
- Click “Run 'Main.main()'” (or similar)
- Check the Run tool window for output


