<b>Media Wiki API </b>

Identify a movie, television, video game, or other media property that has both (a) 5 or more related articles on Wikipedia and (b) 5 or more other articles on the same topic on a Fandom.com website. Any large entertainment franchise will definitely work but feel free to get creative! For example, you might choose 5 Wikipedia articles about the anime Naruto and 5 articles (pages) from the naruto.fandom.com site. You may notice that fandom.com has a top layer with staff-produced video content, but once you dig down into a particular fandom's wiki, you'll start to see a more familiar wiki style page. For example, compare the fandom.com page about the SpongeBob pilot episode 'Help Wanted' and the Wikipedia page about the same pilot episode.

1. First modify the code from first sets of notebooks I used in the Community Data Science Course (Spring 2023)/Week 6 lecture to download data (and metadata) about revisions to the 5 articles you chose from Wikipedia. Be ready to share:<br>
(i) what proportion of those edits were made by users without accounts ("anon"),<br>
(ii) what proportion of those edits were marked as "minor", and<br>
(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didn't do this in class but I made the TSV file would allow this).

In [9]:
#(i) what proportion of those edits were made by users without accounts ("anon")
#(ii) what proportion of those edits were marked as "minor"

import requests
import json

In [10]:
def get_article_revision_json(title):
    api_answers = []
    wp_api_url = "http://en.wikipedia.org/w/api.php/"
    parameters = {'action' : 'query',
                  'titles' : title,
                  'prop' : 'revisions',
                  'rvprop' : 'flags|timestamp|user|size|ids',
                  'rvlimit' : 500,
                  'format' : 'json',
                   }
    while True:
        call = requests.get(wp_api_url, params=parameters)
        api_answer = call.json()
        api_answers.append(api_answer)
        if 'continue' in api_answer.keys():
            parameters.update(api_answer['continue'])
        else:
            break
        
    return(api_answers)

In [22]:
dict1 = {"website": "en.wikipedia.org", "page_title": "Finn_the_Human"}
dict2 = {"website": "en.wikipedia.org", "page_title": "Fionna_and_Cake"}
dict3 = {"website": "en.wikipedia.org", "page_title": "Jake_the_Dog"}
dict4 = {"website": "en.wikipedia.org", "page_title": "Marceline_the_Vampire_Queen"}
dict5 = {"website": "en.wikipedia.org", "page_title": "Princess_Bubblegum"}

dict_list = [dict1, dict2, dict3, dict4, dict5]

with open("AdventureTimeCharacters.jsonl", "w") as adventure_time_characters:
    json.dump(dict1, adventure_time_characters)
    adventure_time_characters.write('\n')
    json.dump(dict2, adventure_time_characters)
    adventure_time_characters.write('\n')
    json.dump(dict3, adventure_time_characters)
    adventure_time_characters.write('\n')
    json.dump(dict4, adventure_time_characters)
    adventure_time_characters.write('\n')
    json.dump(dict5, adventure_time_characters)


In [21]:
with open('AdventureTimeCharacters.jsonl', 'r') as input_file,\
    open("Adventure_Time_revisions.jsonl", 'w') as output_file:
    
    for line in input_file.readlines():
        line_dict = json.loads(line)
        page_title = line_dict["page_title"]
        
        print(f"now working on: {page_title}")
        api_answers = get_article_revision_json(page_title)
        for api_answer in api_answers:
            print(json.dumps(api_answer), file=output_file)

now working on: Finn_the_Human
now working on: Fionna_and_Cake
now working on: Jake_the_Dog
now working on: Marceline_the_Vampire_Queen
now working on: Princess_Bubblegum


In [34]:
revisions = []
minor_revisions_count = 0

with open("Adventure_Time_revisions.jsonl", 'r') as input_file:
    for line in input_file.readlines():
        api_answer = json.loads(line)
        
        pages = api_answer["query"]["pages"]

        for page_id in pages.keys():
            query_revisions = pages[page_id]["revisions"]
            title = pages[page_id]['title']

            for rev in query_revisions:
                if "userhidden" in rev.keys():
                    continue
                
                rev["title"] = title

                if "anon" in rev.keys():
                    rev["anon"] = True
                else:
                    rev["anon"] = False

                if "minor" in rev.keys():
                    rev["minor"] = True
                    minor_revisions_count += 1  # Increment minor revision count
                else:
                    rev["minor"] = False

                rev["timestamp"] = rev["timestamp"].replace("T", " ")
                rev["timestamp"] = rev["timestamp"].replace("Z", "")

                revisions.append(rev)
num_edits = len(revisions)
num_anon = 0

for rev in revisions:
    if rev["anon"]:
        num_anon = num_anon + 1

prop_anon = num_anon / num_edits
prop_minor = minor_revisions_count / num_edits

print(f"proportion anon: {prop_anon}")
print(f"proportion minor: {prop_minor}")



proportion anon: 0.38433153562304856
proportion minor: 0.2015327845586148


In [35]:
#(iii) make and share a visualization of the total number of edits across those 5 articles over time 
#(I didn't do this in class but I made the TSV file would allow this).
edits_by_day = {}
for rev in revisions:
    day_string = rev['timestamp'][0:10]

    if day_string in edits_by_day.keys():
        edits_by_day[day_string] = edits_by_day[day_string] + 1
    else:
        edits_by_day[day_string] = 1

In [43]:
edits_by_day = {}

for rev in revisions:
    day_string = rev['timestamp'][0:10]
    title = rev['title']
    key = (day_string, title)

    if key in edits_by_day.keys():
        edits_by_day[key] += 1
    else:
        edits_by_day[key] = 1



In [44]:
edits_by_day

{('2023-05-04', 'Finn the Human'): 1,
 ('2023-05-03', 'Finn the Human'): 3,
 ('2023-04-28', 'Finn the Human'): 12,
 ('2023-04-27', 'Finn the Human'): 10,
 ('2023-04-24', 'Finn the Human'): 6,
 ('2023-04-16', 'Finn the Human'): 1,
 ('2023-04-14', 'Finn the Human'): 2,
 ('2023-03-28', 'Finn the Human'): 1,
 ('2023-02-27', 'Finn the Human'): 1,
 ('2023-02-21', 'Finn the Human'): 1,
 ('2023-01-24', 'Finn the Human'): 1,
 ('2022-12-16', 'Finn the Human'): 5,
 ('2022-12-03', 'Finn the Human'): 1,
 ('2022-11-28', 'Finn the Human'): 3,
 ('2022-11-07', 'Finn the Human'): 1,
 ('2022-09-25', 'Finn the Human'): 1,
 ('2022-09-13', 'Finn the Human'): 5,
 ('2022-09-08', 'Finn the Human'): 2,
 ('2022-09-04', 'Finn the Human'): 1,
 ('2022-09-01', 'Finn the Human'): 1,
 ('2022-08-29', 'Finn the Human'): 1,
 ('2022-08-19', 'Finn the Human'): 1,
 ('2022-08-06', 'Finn the Human'): 1,
 ('2022-08-03', 'Finn the Human'): 1,
 ('2022-07-28', 'Finn the Human'): 1,
 ('2022-07-25', 'Finn the Human'): 1,
 ('2022-06

In [45]:
with open("Adventure_Time_edits_by_day.tsv", "w", encoding='utf-8') as output_file:
    # write a header with three columns
    print("title\tdate\tedits", file=output_file)

    # iterate through every day and title and print out data into the file
    for key in edits_by_day.keys():
        title = key[1]
        day_string = key[0]
        print("\t".join([title, day_string, str(edits_by_day[key])]), file=output_file)

https://docs.google.com/spreadsheets/d/1zGd-TfhISpTmsMm1N6fC68doyCI7X1n_Gt3v1psJNE0/edit?usp=sharing

2. Now grab data for the 5 articles you chose from the Fandom.com wiki you identified and grab revision/edit data from there. (Hint: Your wikipedia work will give you lots of clues here: for example, the fandom API endpoint for The Wire is https://thewire.fandom.com/api.php and the Fandom API, as I said in class, is the same as the Wikipedia API). Produce answers to the same three questions (i, ii, and iii) above but using this dataset.

Finally, choose either your Wikipedia or Fandom datasets as the data source for a visualization that shows how each of those articles have grown in length (as measured in characters or "bytes") over time. (Hint: you'll need to return "size" as one of the revision properties (rvprop) if you are not doing it already.)