A toolkit for downloading and processing archive.org data
This project uses streaming to download and process archive.org data. It is easy on the processor and memory, so you can run through enormous amounts of data relatively quickly.
You can find the tools from src/bin/
directory. Start by running:
cargo build --release
This will build the binary to target/release/rad
. You can then run the
programs from there:
./target/release/rad
This will print out the help for the program.
This project is a personal toolkit that probably requires some Rust knowledge to run. The best parts:
- It works
- It's fast (a few hours to parse all pastebin data)
- Parallel processing (did I say it's fast?)
- No error handling – catastrophic failures on network hiccups (good for people who like to live dangerously)
- Hopefully gets me some edge finding bug bounties
Please check out the blog post from archive.org titled "Let us serve you, but don’t bring us down".
TL;DR don't use multiple VPS instances, it will DDOS archive.org.
If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
The code in this project is licensed under MIT license.
Logo was made using DeepFloyd IF, with the prompt:
Logo for a project called "RAD", pixel art, isometric, vibrant, archive, download, diskette, 8-bit retro 90-s design