📄 REF Paper PDF Downloader and other useful tools

This Python script helps you automatically download open access PDFs for a list of DOIs using the Unpaywall API. It's particularly useful for research projects like the Research Excellence Framework (REF) where you need full-text academic outputs.

🚀 Features

Fetches OA PDF URLs via the Unpaywall API
Downloads PDFs and saves them locally
Handles errors gracefully
Avoids duplicate downloads
Rate-limited to respect API policies
Matches pdf to DOIS
Merge json files together
Clean json files
Count number of samples in json file
Extract text from pdfs and keep track of status either success or failed
Join pdf text and label together then convert to jsonl

🛠️ Requirements

Python 3.x
requests, pandas Install dependencies:
```
pip install requests pandas
```

📂 Project Structure

.
├── extracted_dois.csv     # Your input CSV with a 'DOI' column
├── ref_pdfs/              # Downloaded PDFs will be saved here
└── download_ref_pdfs.py   # Main script

📋 Setup

Insert your email in the script (Unpaywall requires it):
```
EMAIL = "your_email@example.com"
```
Prepare your CSV file named extracted_dois.csv with a column titled DOI.
Run the script:
```
python download_ref_pdfs.py
```

📌 Notes

The script uses doi.replace("/", "_") to ensure valid filenames.
A 1-second delay between requests helps you stay compliant with API usage limits.
Only works for open access papers.

🧠 Attribution

This script uses the Unpaywall API, which provides free access to millions of open-access research papers.

📧 Contact

Created by Hazeeb – feel free to reach out for questions or improvements!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
README.md		README.md
chunker.py		chunker.py
chunker2.py		chunker2.py
cleaner.py		cleaner.py
combiner.py		combiner.py
counter.py		counter.py
doichecker.py		doichecker.py
downloadloop.py		downloadloop.py
extracted_dois.csv		extracted_dois.csv
matcher.py		matcher.py
merger.py		merger.py
pypaperbot.py		pypaperbot.py
unpawall api.py		unpawall api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 REF Paper PDF Downloader and other useful tools

🚀 Features

🛠️ Requirements

📂 Project Structure

📋 Setup

📌 Notes

🧠 Attribution

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 REF Paper PDF Downloader and other useful tools

🚀 Features

🛠️ Requirements

📂 Project Structure

📋 Setup

📌 Notes

🧠 Attribution

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages