# Introduction

This notebook explains how I extract robots from the administrator ('user') column from block log query csvs and from the Wikipedia APIs.

In [48]:
# append system path to import utils
import sys
sys.path.append('../')

# Extract all administrator names

The code below shows how to extract the names of all administrators. This information is extracted to identify bots among the administrators.

In [49]:
from utils.name_extraction import extract_administrator_names

In [50]:
dir_path = '../data/block_logs'

admin_names = extract_administrator_names(dir_path, start_year=2004, end_year=2023)

File not found: ../data/block_logs/2004/2004-01.csv
File not found: ../data/block_logs/2004/2004-01.csv.gz
File not found: ../data/block_logs/2004/2004-02.csv
File not found: ../data/block_logs/2004/2004-02.csv.gz
File not found: ../data/block_logs/2004/2004-03.csv
File not found: ../data/block_logs/2004/2004-03.csv.gz
File not found: ../data/block_logs/2004/2004-04.csv
File not found: ../data/block_logs/2004/2004-04.csv.gz
File not found: ../data/block_logs/2004/2004-05.csv
File not found: ../data/block_logs/2004/2004-05.csv.gz
File not found: ../data/block_logs/2004/2004-06.csv
File not found: ../data/block_logs/2004/2004-06.csv.gz
File not found: ../data/block_logs/2004/2004-07.csv
File not found: ../data/block_logs/2004/2004-07.csv.gz
File not found: ../data/block_logs/2004/2004-08.csv
File not found: ../data/block_logs/2004/2004-08.csv.gz
File not found: ../data/block_logs/2004/2004-09.csv
File not found: ../data/block_logs/2004/2004-09.csv.gz
File not found: ../data/block_logs/20

Extracting: ../data/block_logs/2006/2006-01.csv.gz
Extracting: ../data/block_logs/2006/2006-02.csv.gz
Extracting: ../data/block_logs/2006/2006-03.csv.gz
Extracting: ../data/block_logs/2006/2006-04.csv.gz
Extracting: ../data/block_logs/2006/2006-05.csv.gz
Extracting: ../data/block_logs/2006/2006-06.csv.gz
Extracting: ../data/block_logs/2006/2006-07.csv.gz
Extracting: ../data/block_logs/2006/2006-08.csv.gz
Extracting: ../data/block_logs/2006/2006-09.csv.gz
Extracting: ../data/block_logs/2006/2006-10.csv.gz
Extracting: ../data/block_logs/2006/2006-11.csv.gz
Extracting: ../data/block_logs/2006/2006-12.csv.gz
Extracting: ../data/block_logs/2007/2007-01.csv.gz
Extracting: ../data/block_logs/2007/2007-02.csv.gz
Extracting: ../data/block_logs/2007/2007-03.csv.gz
Extracting: ../data/block_logs/2007/2007-04.csv.gz
Extracting: ../data/block_logs/2007/2007-05.csv.gz
Extracting: ../data/block_logs/2007/2007-06.csv.gz
Extracting: ../data/block_logs/2007/2007-07.csv.gz
Extracting: ../data/block_logs/

The code below write the list of administrator names into a json file. Please uncomment to test it.

In [63]:
# import json

# # Convert and write JSON object to a gzip-compressed file
# with open("../data/processed_data/administrator_names.json", "wt") as outfile:
#     json.dump(admin_names, outfile)

# Identify administrator bots

Use list comprehension to identify names that might be a Wikipedia Bot.

In [52]:
[name for name in admin_names if 'bot' in str(name).lower()]

['User:Botley Brewery',
 'User:Lynyabbribot',
 'EgressBot',
 'User:WikiEditBot550',
 'AnomieBOT III',
 'AntiAbuseBot',
 'ST47ProxyBot',
 'NinjaRobotPirate',
 'ProcseeBot',
 'User:Shmuelisabot123',
 'Water Bottle',
 'TorNodeBot',
 'ListManBot']

I searched all the above names on Wikipedia to identify bots. 

These are identified bots:
- [ProcseeBot](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/ProcseeBot)
- [ListManBot](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/ListManBot)
- [AnomieBOT III](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/AnomieBOT_III)
- [AntiAbuseBot](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/AntiAbuseBot)
- [EgressBot](https://en.wikipedia.org/wiki/User:EgressBot)
- [TorNodeBot](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/TorNodeBot)
- [ST47ProxyBot](https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/ST47ProxyBot)

In [53]:
administrator_bots = [
    'ProcseeBot', 'ListManBot', 'AnomieBOT III', 'AntiAbuseBot',
    'EgressBot', 'TorNodeBot', 'ST47ProxyBot'
    ]

# Fetch editors who are bots from Wikipedia API

The code below shows how I fectch users that are bots from Wikipedia API.

In [54]:
import requests
import pandas as pd

In [55]:
url = "https://en.wikipedia.org/w/api.php"

parameters = {
    "action": "query",
    "format": "json",
    "list": "allusers",
    "aulimit": "500",
    "augroup": "bot", # This parameter limits the fetched users to bots
}

request_session = requests.Session()
# Send request
r = request_session.get(url=url, params=parameters)

In [56]:
data = r.json()
bots = data['query']['allusers']
bots_df = pd.DataFrame(bots)

In [57]:
bot_names = list(bots_df['name'])
bot_names

["'zinbot",
 'AAlertBot',
 'Acebot',
 'Addbot',
 'AdminStatsBot',
 'Ahechtbot',
 'Aidan9382-Bot',
 'AndreasJSbot',
 'AnomieBOT',
 'AnomieBOT II',
 'AnomieBOT III',
 'AntiCompositeBot',
 'ArbClerkBot',
 'Arbitrarily0Bot',
 'ArmbrustBot',
 'ArticlesForCreationBot',
 'AttributionBot',
 'AudeBot',
 'AvicBot',
 'B-bot',
 'BD2412bot',
 'BOTarate',
 'BOTijo',
 'BattyBot',
 'Bender the Bot',
 'Bibcode Bot',
 'Bitbotje',
 'BogBot',
 'Bot1058',
 'Bot11598',
 'BotMultichill',
 'BotMultichillT',
 'BotPuppet',
 'BrownBot',
 'BsherrAWBBOT',
 'BsoykaBot',
 'ButlerBlogBot',
 'COIBot',
 'CanisRufus',
 'CeraBot',
 'Cerabot~enwiki',
 'Cewbot',
 'Chartbot',
 'CheMoBot',
 'Chem-awb',
 'Chobot',
 'Chris G Bot',
 'Chris G Bot 3',
 'ChristieBot',
 'Citation bot',
 'Citation bot 1',
 'Citation bot 2',
 'Citation bot 3',
 'Citation bot 4',
 'CitationCleanerBot',
 'ClueBot II',
 'ClueBot III',
 'ClueBot NG',
 'CommonsDelinker',
 'CommonsNotificationBot',
 'Community Tech bot',
 'ContinuityBot',
 'CountryBot',
 '

# Combine & save two bot lists

In [58]:
all_bot_names = list(set(administrator_bots).union(set(bot_names)))

In [59]:
all_bot_names

['TokenzeroBot',
 'BOTijo',
 'JJMC89 bot III',
 'Bot11598',
 'Cerabot~enwiki',
 'Luasóg bot',
 'AnomieBOT III',
 'AndreasJSbot',
 'SineBot',
 'MuZebot',
 'SportsStatsBot',
 'RscprinterBot',
 'AttributionBot',
 'WxBot',
 'SDPatrolBot II',
 'NihiltresBot',
 'TorNodeBot',
 'ImageRemovalBot',
 'NukeBot',
 'FastilyBot',
 'John of Reading Bot',
 'TowBot',
 'JJMC89 bot',
 'Eejit43Bot',
 'JL-Bot',
 'Legobot',
 'MilHistBot',
 'TWLBot',
 'Monkbot',
 'DYKUpdateBot',
 'RottenBot',
 'FACBot',
 'ErfgoedBot',
 'MenoBot II',
 'JeffGBot',
 'Acebot',
 'CommonsNotificationBot',
 "'zinbot",
 'JhealdBot',
 'Muninnbot',
 'VahurzpuBot',
 'EnterpriseyBot',
 'Galobot',
 "Joe's Null Bot",
 'The Sky Bot',
 'HotArticlesBot',
 'SDZeroBot',
 'Citation bot 2',
 'ClueBot II',
 'NovemBot',
 'DeadbeefBot',
 'ClueBot III',
 'Muhbot',
 'DarafshBot',
 'WebCiteBOT',
 'AudeBot',
 'PowerBOT',
 'Fluxbot',
 'WP 1.0 bot',
 'BOTarate',
 'MinusBot',
 'OgreBot',
 'ZackBot',
 'TolBot',
 'UsuallyNonviolentBot',
 'ST47Bot',
 'Ragesos

The code below write the list of bot names into a json file. Please uncomment to test it.

In [62]:
# import json

# # Convert and write JSON object
# with open("../data/scraped_data_metrics/bot_names.json", "wt") as outfile:
#      json.dump(all_bot_names, outfile)