Skip to content

Backstory

rado edited this page Jan 10, 2021 · 8 revisions

The Botjagwar is a project whose name was derived from my own username on Wikimedia projects. Tired of making repetitive manual changes on Wikipedia that was also beginning to inflict repetitive strain injuries due to prolonged work on an inappropriate keyboard, I had decided to ease my life a little bit by writing software to do some work for me.

Questions

So right in my teen years I had to answer these questions:

  1. What software am I going to build?
  2. How am I going to build it?
  3. Why would I like to build software?

2009-2011: Fetus years of the programming adventure

At that moment, in 2009, I was 15 and even though I had desired to have a career in computer science, I didn't knew yet how to write software or why I wanted to write software. With Wikipedia, and bot programming that I knew was possible, the why part was mostly answered. The next biggest problem was to answer the what and the how

From May to December 2010. I was trying to find my answers to the what question. As for the how I was at that moment simply copying working code found here and there on the Internet and making my own clumsy changes to solve my then-problems. (I must confess I am still doing that to some extent)

As a self-taught teenage amateur developer in 2010, design patterns were quite unknown to me. Reading/debugging other peoples' code or integrating their frameworks were simply out of my methodology, resulting sometimes in clumsy wheel reinvention. Maybe reading a proper programming book might have helped back then, but I found them to be too "abstract," and in the wrong language (library books were in English and for Java, but French for Python was expected).

On-wiki, the Malagasy Wiktionary has reached 1 million entries mostly thanks to verb inflections in... Volapuk. Due to exams preparation, I stopped working on-wiki for 5 months.

2011-2013: Early years

In mid-2011, when exams were over I learned some basic skill on regexes and web crawling to allow me to populate the Malagasy Wiktionary with a little bit more than Volapuk inflection pages. I wrote a parser to scrape dictionary pages and build French and English to Malagasy entries on Wiktionary as well as Malagasy newspapers for words to create.

In mid-2012, my regex skills were sharp enough to scrape both the English and French Wiktionaries for foreign languages entries. In parallel, inflections for Malagasy words were created resulting in a massive creation of inflections entries for Malagasy, pretty much like Volapuk a year earlier.

2013-2016: Plateau years

From 2013 to 2015, pieces of information I had, which were 2 huge text containing English-Malagasy and French-Malagasy dictionaries, were used to translate other languages into Malagasy, resulting in a boom of translated languages in Malagasy from maybe 8 to 3,000 languages. I also implemented an IRC client connecting to Wikimedia IRC servers to perform translation nearly in real time using edits on the French or the English Wiktionary. The same IRC client also helped implementing a real-time interwiki bot.

Also in 2013, I wrote my story on my blog.

From 2015 to 2018, there were no major functional changes to botjagwar, due to the code size (5,000 lines total). Furthermore it was more and more difficult to handle it all mentally due to the following elements:

  • Lack of adequate tools:
    • Up until 2014, I was still using Python IDLE, which was Notepad but with fancy colours for syntax and a button to run the file currently open.
    • Later, I was using PyDev from 2014-2015 then PyCharm, never looked back since
    • Debugging tools were out of my reach.
  • Non-use of version control (Me in 2014: "Git? Never heard of that Pokemon")
  • Lack of unit testing (I obviously did some code testing but not as systematic as a unit test as we know it)
  • Fear of breaking already-working features. Working data was initially two flat files. Then I transitioned to using MySQL as a database, used raw. No ORM. Data handling at that moment was messy at best leading to frequent encoding errors.
  • Also before 2015, the Pywikibot framework was not working for Python 3.x (first attempt to write a bot was in 2009 and failed miserably with syntax errors popping up in frameworks code), only to learn 1 year later that the Pywikibot framework only worked for Python 2.x. Who knew?)
  • Self-satisfaction:
    • I created a bot that works and translates obscure languages into Malagasy working 24/7.
    • What more do you want me to create?
    • I can handle things my way!

2016-2019 : Refactorings

In November 2016, I made my entry on the job market, and was introduced by peers to new, cool technologies which made my life a bit easier afterwards regarding software maintenance.

In April 2017, I made my first commit on the botjagwar repository as it currently exists.

In 2018 I had decided to split the translation module into 3 components: IRC client, translation module and dictionary module. Also, unit-test them and make them easily deployable on platform. For the trivia, properly configuring botjagwar on a computer took me days, now it takes minutes, and downloading dependencies takes a huge part of those minutes.

The dictionary module (dictionary_service) was going to be a REST interface whose backend would be a database (MySQL? PostgreSQL? SQLite? I don't care). It stores words. The translation module (entry_translator) was going to be a REST interface whose role is to interact with the dictionary module. Upon a REST request, it fetches a page, parses it and performs translation with the help of the dictionary module. The IRC client interface receives messages from the server each time an edit is made and call the translation module to do its job.

In 2018 to 2020, code migration from python 2.7 to 3.6 was finished and the main feature was properly test-covered, so I went on to scrape the English Wiktionary for its inflection entries. An additional 2 million entries were created as a result making the Malagasy Wiktionary jump from to 4 to 6 million entries.

2020: Front-end

In 2020 due to Covid-19 pandemic and the lockdown that ensued, I was determined to create a front-end to manage the massive amount of data gathered from other websites. That's how botjagwar-frontend was created.

2020: Do-over

Eventually, the sheer number of automatically created entries and their quality created some turmoil among the Wiktionary community and some of them have teamed together to file a request for comments which has seen the deletion of over 4.8 million of the automatically created entries. Maybe I'll stop here at the moment, but hopefully the adventure will still continue.