Skip to content

strategist922/RSemanticMediaWikiBot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSemanticMediaWikiBot

This is an bot developed in R for editing Semantic MediaWiki templates. This code is very much in development, and it is highly recommended to test it on a few pages before letting it loose on a wiki.

The primary motivation for Yet Another MediaWiki Bot Framework is that this bot is specifically design to help with batch editing of data contained within Semantic Templates that are commonly used with Semantic MediaWiki.

The main idea is that this bot converts templates into data structures in R. For example, it allows you to read from a wiki page a template such as:

{{City
| point=52.015, 4.356667
| country=Netherlands
}}

...and then convert this data into a list within R. The data contained in the list can be accessed via template$point, template$country, etc.

##Installation Once you check out the code, you can install the package via:

cd Directory/Of/RSemanticMediaWikiBot
bash ./checkBuildAndInstall.sh

This runs a shell script which performs the steps below:

  1. Check that everything is ok:
cd Directory/Of/RSemanticMediaWikiBot
R CMD check .
  1. Build:
cd .. 
R CMD build RSemanticMediaWikiBot
  1. Install it so that it is accessible within the R environment:
sudo R CMD INSTALL RSemanticMediaWikiBot_0.1.tar.gz

The functions can then be accessed from within R code by first declaring:

library(RSemanticMediaWikiBot)

##Basic usage - logging in, reading, editing

###Logging in

#TODO fill these in based on your own configuration
username=USERNAME
password=PASSWORD
apiURL = "http://my.wiki.com/wiki/api.php"

bot = initializeBot(apiURL) #initialize the bot
login(username, password, bot) #login to the wiki

###Reading page text

text = read(title="MyWikiPage", bot) 

###Editing and saving page text

edit(title="MyWikiPage", 
     text="this is the new page text", 
     bot, 
     summary="my edit summary")

###Deleting pages

delete(pageName, bot, reason="deleting old page")

##Working with template data

###Extracting templates Assuming that you are not working with multiple instance templates, you can retrieve and modify the data in a template as such:

template = getTemplateByName("MyTemplateName", "MyWikiPage", bot)[[1]]
#[[1]] is needed as a list is returned
#If using multiple-instance templates, then multiple templates will be returned

###Getting and modifying values of template parameters

valueOfTemplate = template$data$NameOfTemplateParameter

You can then modify this value by:

template$data$NameOfTemplateParameter = newValue

###Removing template parameters If you want to completely remove a parameter from a template (i.e. both the key and the value) such as changing this:

{{City
| point=52.015, 4.356667
| country=Netherlands
}}

to this:

{{City
| country=Netherlands
}}

then you can just do:

template$data$point = NULL

###Writing the template back to the wiki page The template with its new value can then be written back to the wiki as such:

writeTemplateToPage(template, bot, editSummary="testing bot")

The template contains information about the page which it came from, so the name of the page does not need to be specified.

###Writing Spreadsheet Data to Multiple Pages Spreadsheet data loaded into a dataframe can be used to make it easy to write data to templates contained on multiple pages. The first column of the data frame specifies the name of the page, while the second column is the name of the template to write to. The headers for the rest of the columns need to correspond to the names of the parameters in that template. The default behavior of this code is to not overwrite existing values unless you explicitly tell it to. A list of pages for which an existing value for a parameter were found are returned.

# default - will not overwrite existing parameter values that are already set
errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, editSummary="what the bot is doing")

# overwrite existing values
errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, overWriteConflicts=TRUE, editSummary="what the bot is doing")

###Writing a Data Frame to a Table on a Single page### The syntax for a sortable wikitable can be generated from a data frame. The code currently doesn't figure out how to intelligently put it on a page - it's up to you to figure out how to paste things together in some useful way.

# get the wiki table syntax
wikiTable = getWikiTableTextForDataFrame(df)

# put some text before and after the table
pageText = paste(someText, "\n\n", wikiTable, "\n\n", someMoreText, sep="")
  
# write this all to some wiki page
edit(title=pageTitle,
     text=pageText,
     bot,
     summary="adding a table")

##Future development/known issues

  • No support yet for multiple-instance templates. There needs to be a way to distinguish if one wants to edit an existing one, or add another.
  • No support yet for adding a new template to a page.
  • When editing a page, no check is done to see if it will create the page.
  • Nested template calls may not be parsed correctly
  • If the code is not able to connect to the wiki API, then it will terminate instead of trying to connect again. In practical experience, this means that you may have to run a script multiple times if you have several thousand edits.
  • There seems to be a memory leak if you read and/or edit around 10,000+ pages.

About

Bot for editing Semantic MediaWiki templates, written in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 98.4%
  • Shell 1.6%