Reports bot maintains reports and other useful things for WikiProjects.
First, clone the bot:
git clone email@example.com:harej/reports_bot.git cd reports_bot
Python 3.4+ is required. You should set up and activate a
virtual environment in the
directory, though this is not required. The following workaround may be
necessary on Debian systems:
python3 -m venv --without-pip venv source venv/bin/activate curl https://bootstrap.pypa.io/get-pip.py | python
Next, install these dependencies:
pip install mwoauth requests PyYAML oursql3 mwparserfromhell BTrees \ mediawiki-utilities numpy scikit-learn
You also need Pywikibot. You
pip install pywikibot, but I recommend installing it from source to get
the latest updates:
cd venv git clone https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot cd pywikibot git submodule update --init python setup.py develop
If you set up a virtualenv, run the following command to ensure the bot's task runner always uses it:
sed -e "1s|.*|#! $PWD/venv/bin/python|" -i "" ./run
Depending on your setup, you may wish to create a separate, unprivileged user for the bot. The recommended method is:
sudo adduser --system --home /path/to/reportsbot reportsbot
If so, make sure to create the bot's
logs directories with the
mkdir config logs && sudo chown reportsbot config logs
Reports bot uses a MySQL database to store its on-wiki config, the WikiProject
page indices, and some other information. Create this database using the schema
schema.sql. You may change the database name—for example, on
Labs, it should probably be something like
sXXXXX__wpx. Make note of this for
the next step.
The bot requires a
config/config.yml file for itself and
config/user-config.py file for Pywikibot. The easiest way to create these
is with the configuration assistant:
./run -q configure
(As described below, you should use
sudo ./run -q configure if the bot is
running under an unprivileged user.)
Afterwards, you may edit these files manually whenever necessary.
Reports bot's standard tasks are located in the
tasks/ directory. A
script is provided to make things simple, assuming you've followed the standard
setup procedure above.
To run a task located at
For a full description of the command-line interface:
If you're using a separate user for the bot, be aware that the
tries to ensure that it is running under the account that owns the
directory. You can run jobs from
reportsbot's crontab using the plain
syntax, but manual jobs under your own account should be initiated with
If you prefer to use
reportsbot as a regular Python package and execute task
files at arbitrary locations, you can use this syntax, which supports the same
python3 -m reportsbot.cli full/path/to/task.py python3 -m reportsbot.cli --help
The bot stores logs in the
logs/ directory unless
./run. A few different kinds of logs are kept:
all.logstores non-DEBUG level logs for all tasks. It automatically rotates when it grows large.
all.errstores WARNING-level logs and above for all tasks. It automatically rotates when it grows large.
<sitename>/<taskname>.logstores non-DEBUG level logs for the specified task running on the specified site. It automatically rotates nightly.
<sitename>/<taskname>.errstores WARNING-level logs and above. It automatically rotates when it grows large.
<sitename>/<taskname>.log.verbosestores full logs for the last run of the task. It is cleared at the start of each run.
The bot also prints all logs (including DEBUG-level) to standard error unless
--quiet) is passed to
./run, in which case only ERROR-level and
higher are printed. This option can be useful for cron jobs; if you set up cron
to email you the output of
./run -q, you will be notified immediately when
A number of tasks are provided. Advice on developing your own is given at the end of this section.
load_project_config: Loads WikiProject-specific configuration from the wiki and stores it in the bot's database.
metrics: Updates monthly metrics on the number of articles in a project.
new_discussions: Provides a list of new discussions within a WikiProject's scope.
update_members: Updates WikiProject membership lists based on WikiProjectCard transclusions.
update_project_index: Updates the index of articles associated with each project.
To create new tasks, you can follow the skeleton in
Important things to keep in mind:
- The name of the task class doesn't matter (the bot searches by filename), but it should be descriptive.
runmethod is the only method called by the task runner, other than
__init__, which should only do inexpensive setup if you override it.
- There should only be one Task subclass per module.
__all__is used to identify which class to run in case multiple exist in the module namespace, like if you import other Task subclasses to use their methods.
The task has access to two important attributes:
self._botis the Bot instance, which provides the following functionality:
self._bot.site: the Pywikibot site instance
self._bot.wikidb: a database connection to the wiki replica
self._bot.localdb: a connection to the bot's local database
self._bot.wikidata: an interface to Wikidata Some methods are available for working with WikiProjects in a structured manner. See the
reportsbot.bot.Botclass documentation for details.
self._loggeris the Logger instance that you should use for all log messages.
stdoutdirectly should be avoided.