Skip to content

magnusmanske/petscan_rs

Repository files navigation

PetScan

PetScan is a powerful querying tool for Wikimedia. A query is prepared in the PetScan submission form.

Live: https://petscan.wmflabs.org/

Docs: https://meta.wikimedia.org/wiki/PetScan/en

Development

Prerequisites

Setup local MySQL database on port 3308

CREATE TABLE `query` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `querystring` longtext DEFAULT NULL,
  `created` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `started_queries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `querystring` longtext DEFAULT NULL,
  `created` varchar(100) DEFAULT NULL,
  `process_id` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

Forward replicas

ssh login.toolforge.org -L 3306:XXX.analytics.db.svc.wikimedia.cloud:3306 -L 3309:wikidatawiki.analytics.db.svc.wikimedia.cloud:3306

XXX: wiki to be queried (e.g. commonswiki)

🔗 https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database

Create config.json

Put the ports from the above ssh command with the respective wikis into port_mapping:

{
  "host": "127.0.0.1",
  "user": "<databaseusername>",
  "password": "<databasepassword>",
  "schema": "petscan",
  "http_port": 8000,
  "timeout": 30000,
  "restart-code": "",
  "port_mapping":{
    "<xxx>":3306,
    "wikidatawiki":3309
  },
  "mysql": [
    [
      "<replicausername>",
      "<replicapassword>"
    ]
  ]
}

The credentials to the database replicas can be found in ~/replica.my.cnf of the Toolforge user account.

Start server

cargo run

Run a query from command line

You can run a query from command line using the URL parameters. Output will be in the specified format, except HTML whcih will be automatically rewritten to JSON.

cargo run -- 'url_parameters'

Releases

No releases published

Packages

No packages published