A python script to backup the contents of Yahoo! groups, be they private or public.
You will need:
- Python 3.5+
- a MongoDB instance
- a computer with a GUI as Selenium is used for the scraping (to be able to handle private groups)
- a driver for Selenium to use with the browser (Chromedriver is recommended as Firefox is no longer compatible with this script).
MacOS users in particular may also require a recent version of
icu4c, available through Homebrew.
git clone https://github.com/csaftoiu/yahoo-groups-backup.git cd yahoo-groups-backup pip install -r requirements.txt cp redactions.yaml.template redactions.yaml # edit this file if you want cp settings.yaml.template settings.yaml # definitely edit this file with your yahoo credentials
To scrape an entire site, say the
./yahoo-groups-backup.py scrape_messages --driver chrome concatenative
This will shove all the messages into a Mongo database (default
localhost:27017), into the database of the same name as the group.
To scrape the files as well (though this group has no files):
./yahoo-groups-backup.py scrape_files --driver chrome concatenative
To dump the scraped messages as a human-friendly, fully static (i.e. viewable from the file system) website:
./yahoo-groups-backup.py dump_site concatenative concatenative_static_site
Then simply open
concatenative_static_site/index.html and browse!
To see the full usage:
I'm getting some weird error
Older versions of Selenium might be troublesome. Try:
pip install -U selenium