#1 MediaWiki API
Identify a movie, television, video game, or other media property that has both (a) 5 or more related articles on Wikipedia and (b) 5 or more other articles on the same topic on a Fandom.com website. Any large entertainment franchise will definitely work but feel free to get creative! For example, you might choose 5 Wikipedia articles about the anime Naruto and 5 articles (pages) from the naruto.fandom.com site. You may notice that fandom.com has a top layer with staff-produced video content, but once you dig down into a particular fandom's wiki, you'll start to see a more familiar wiki style page. For example, compare the fandom.com page about the SpongeBob pilot episode 'Help Wanted' and the Wikipedia page about the same pilot episode.

First modify the code from first sets of notebooks I used in the Community Data Science Course (Spring 2023)/Week 6 lecture to download data (and metadata) about revisions to the 5 articles you chose from Wikipedia. Be ready to share:

(i) what proportion of those edits were made by users without accounts ("anon"),
(ii) what proportion of those edits were marked as "minor", and
(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didn't do this 


in class but I made the TSV file would allow this).
Now grab data for the 5 articles you chose from the Fandom.com wiki you identified and grab revision/edit data from there. (Hint: Your wikipedia work will give you lots of clues here: for example, the fandom API endpoint for The Wire is https://thewire.fandom.com/api.php and the Fandom API, as I said in class, is the same as the Wikipedia API). Produce answers to the same three questions (i, ii, and iii) above but using this dataset.
Finally, choose either your Wikipedia or Fandom datasets as the data source for a visualization that shows how each of those articles have grown in length (as measured in characters or "bytes") over time. (Hint: you'll need to return "size" as one of the revision properties (rvprop) if you are not doing it already.)



In [1]:
import requests
import json

In [10]:
#1.1
def get_article_revision_json(title):
    api_answers = []

    wp_api_url = "http://en.wikipedia.org/w/api.php/"

    parameters = {'action' : 'query',
                  'titles' : title,
                  'prop' : 'revisions',
                  'rvprop' : 'flags|timestamp|user|size|ids',
                  'rvlimit' : 500,
                  'format' : 'json',
                   }

    while True:
        call = requests.get(wp_api_url, params=parameters)
        api_answer = call.json()
        
        api_answers.append(api_answer)

        if 'continue' in api_answer.keys():
            parameters.update(api_answer['continue'])
        else:
            break
        
    return(api_answers)

# Wikipedia article titles
article_titles = [
    "Rick and Morty",
    "List of Rick and Morty episodes",
    "List of Rick and Morty characters",
    "The Ricklantis Mixup",
    "Tales from the Citadel"
]

# Get revisions for each article
revisions_data = []
for title in article_titles:
    revisions_data.extend(get_article_revision_json(title))
    

# Calculate the proportion of anonymous edits
total_edits = 0
anon_edits = 0

for data in revisions_data:
    pages = data['query']['pages']
    for page_id in pages:
        revisions = pages[page_id]['revisions']
        for revision in revisions:
            total_edits += 1
            if 'anon' in revision:
                anon_edits += 1

with open("revisions_data.json", "w") as outfile:
    json.dump(revisions_data, outfile)


anon_edit_proportion = anon_edits / total_edits
print(f"Proportion of anonymous edits: {anon_edit_proportion:.2%}")


Proportion of anonymous edits: 35.94%


In [12]:
total_edits = 0
minor_edits = 0

for data in revisions_data:
    pages = data['query']['pages']
    for page_id in pages:
        revisions = pages[page_id]['revisions']
        for revision in revisions:
            total_edits += 1
            if 'minor' in revision:
                minor_edits += 1

minor_edit_proportion = minor_edits / total_edits
print(f"Proportion of minor edits: {minor_edit_proportion:.2%}")

Proportion of minor edits: 17.02%


In [15]:
with open("edits_over_time.tsv", "w") as tsvfile:
    tsvfile.write("timestamp\ttitle\n")
    for data in revisions_data:
        pages = data['query']['pages']
        for page_id in pages:
            title = pages[page_id]['title']
            revisions = pages[page_id]['revisions']
            for revision in revisions:
                timestamp = revision['timestamp']
                timestamp = timestamp.replace("T", " ").replace("Z", "")
                tsvfile.write(f"{timestamp}\t{title}\n")

In [16]:
#https://docs.google.com/spreadsheets/d/1ERXWFttbGKxzZpSbpG5refz0d-WYTYc-TEi26-_bcBo/edit?usp=sharing

In [24]:
#1.2
def get_f_article_revision_json(title, fandom_url):
    api_answers = []

    wp_api_url = f"{fandom_url}/api.php"

    parameters = {'action' : 'query',
                  'titles' : title,
                  'prop' : 'revisions',
                  'rvprop' : 'flags|timestamp|user|size|ids',
                  'rvlimit' : 500,
                  'format' : 'json',
                   }

    while True:
        call = requests.get(wp_api_url, params=parameters)
        api_answer = call.json()
        
        api_answers.append(api_answer)

        if 'continue' in api_answer.keys():
            parameters.update(api_answer['continue'])
        else:
            break
        
    return(api_answers)

fandom_url = "https://rickandmorty.fandom.com"

fandom_article_titles = [
    "Rick_and_Morty_(comic_series)",
    "A_Rick_in_King_Mortur's_Mort",
    "Rick_Sanchez_(C-132)",
    "Morty_Smith_(C-132)",
    "Clackspire_Labyrinth"
]

fandom_revisions_data = []
for title in fandom_article_titles:
    fandom_revisions_data.extend(get_f_article_revision_json(title, fandom_url))

# Save Fandom revisions data to a JSON file
with open("fandom_revisions_data.json", "w") as outfile:
    json.dump(fandom_revisions_data, outfile)


In [44]:
# Calculate the proportion of anon edits and minor edits in the Fandom.com wiki
total_fandom_edits = 0
anon_fandom_edits = 0
minor_fandom_edits = 0

for data in fandom_revisions_data:
    pages = data['query']['pages']
    for page_id in pages:
        revisions = pages[page_id]['revisions']
        for revision in revisions:
            total_fandom_edits += 1
            if 'anon' in revision:
                anon_fandom_edits += 1
            if 'minor' in revision:
                minor_fandom_edits += 1

anon_fandom_edit_proportion = anon_fandom_edits / total_fandom_edits
minor_fandom_edit_proportion = minor_fandom_edits / total_fandom_edits

print(f"Proportion of anon edits in Fandom.com wiki: {anon_fandom_edit_proportion:.2%}")
print(f"Proportion of minor edits in Fandom.com wiki: {minor_fandom_edit_proportion:.2%}")


Proportion of anon edits in Fandom.com wiki: 0.00%
Proportion of minor edits in Fandom.com wiki: 14.47%


In [47]:
# Create a TSV file with timestamp, title, and count of each Fandom.com wiki edit
title_counts = {}

# Count edits for each title
for data in fandom_revisions_data:
    pages = data['query']['pages']
    for page_id in pages:
        title = pages[page_id]['title']
        revisions = pages[page_id]['revisions']
        for revision in revisions:
            if title in title_counts:
                title_counts[title] += 1
            else:
                title_counts[title] = 1

with open("fandom_edits_over_time.tsv", "w") as myfile:
    myfile.write("timestamp\ttitle\tcount\n")
    for data in fandom_revisions_data:
        pages = data['query']['pages']
        for page_id in pages:
            title = pages[page_id]['title']
            revisions = pages[page_id]['revisions']
            for revision in revisions:
                timestamp = revision['timestamp']
                timestamp = timestamp.replace("T", " ").replace("Z", "")
                myfile.write(f"{timestamp}\t{title}\t{title_counts[title]}\n")


for title, count in title_counts.items():
    print(f"{title}: {count}")


Rick and Morty (comic series): 1
A Rick in King Mortur's Mort: 44
Rick Sanchez (C-132): 60
Morty Smith (C-132): 34
Clackspire Labyrinth: 13


In [48]:
#https://docs.google.com/spreadsheets/d/1msaDIZqGzwKdqpQdwIThL5y1oiEMfJ4ZpMWmhfVWeWA/edit?usp=sharing
