acmdl

acmdl is a Python script that scrapes the ACM Digital Library for articles based on a specified keyword and downloads the corresponding PDF files. The script uses Selenium for web scraping and is configured via a config.toml file.

Features

Searches for articles in the ACM Digital Library based on a keyword.
Downloads the PDFs of the articles found.
Uses Selenium to navigate through the ACM website.
Configurable keyword via config.toml file.

Requirements

Python 3.6+
Google Chrome browser
ChromeDriver
The following Python packages:
- selenium
- requests
- webdriver-manager
- toml

Installation

Clone this repository.

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Alternative: Use `uv` for Dependency Management

We recommend using uv, an extremely fast Python package and project manager written in Rust, to manage dependencies. uv can replace tools like pip, pip-tools, pipx, poetry, and virtualenv, offering significant performance improvements and better project management.

To install uv:

On macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

With pip:
```
pip install uv
```

To manage dependencies using uv, navigate to your project directory and run:

uv init example
uv add selenium requests webdriver-manager toml

Configuration

Create a config.toml file in the root directory with the following content:

query = "Java nullpointer"

This file allows you to specify the keyword used for searching articles in the ACM Digital Library.

Usage

Run the script using:

python acmdl.py

The script will search for articles based on the keyword specified in config.toml and attempt to download the corresponding PDFs.

Important Note on IP Blocking

The ACM Digital Library may block your IP address if it detects too many requests in a short period. This could result in your IP being blocked for up to a week. To avoid this, consider adding longer delays between requests or using a VPN to rotate your IP address.

Disclaimer

Use this script responsibly. The ACM Digital Library is a paid service, and excessive scraping may violate its terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
config		config
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

acmdl

Features

Requirements

Installation

Alternative: Use `uv` for Dependency Management

Configuration

Usage

Important Note on IP Blocking

Disclaimer

About

Releases

Packages

Languages

License

scovl/acmdl

Folders and files

Latest commit

History

Repository files navigation

acmdl

Features

Requirements

Installation

Alternative: Use uv for Dependency Management

Configuration

Usage

Important Note on IP Blocking

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Alternative: Use `uv` for Dependency Management

Packages