== #1 MediaWiki API == 

Identify a movie, television, video game, or other media property that has both (a) 5 or more related articles on Wikipedia '''and''' (b) 5 or more other articles on the same topic on a [https://fandom.com Fandom.com] website. Any large entertainment franchise will definitely work but feel free to get creative! For example, you might choose 5 Wikipedia articles about the anime Naruto and 5 articles (pages) from the naruto.fandom.com site. You may notice that fandom.com has a top layer with staff-produced video content, but once you dig down into a particular fandom's wiki, you'll start to see a more familiar wiki style page. For example, compare [https://spongebob.fandom.com/wiki/Help_Wanted the fandom.com page about the SpongeBob pilot episode 'Help Wanted'] and [https://en.wikipedia.org/wiki/Help_Wanted_(SpongeBob_SquarePants) the Wikipedia page about the same pilot episode].

# First modify the code from first sets of notebooks I used in the [[../Week 6 lecture]] to download data (and metadata) about revisions to the 5 articles you chose from Wikipedia. Be ready to share:
## (i) what proportion of those edits were made by users without accounts ("anon"),
## (ii) what proportion of those edits were marked as "minor", and 
## (iii) make and share a visualization of the total number of edits across those 5 articles over time (I didn't do this in class but I made the TSV file would allow this).
# Now grab data for the 5 articles you chose from the Fandom.com wiki you identified and grab revision/edit data from there. ('''''Hint:''' Your wikipedia work will give you lots of clues here: for example, the fandom API endpoint for The Wire is https://thewire.fandom.com/api.php and the Fandom API, as I said in class, is the same as the Wikipedia API''). Produce answers to the same three questions (i, ii, and iii) above but using this dataset.
# Finally, choose either your Wikipedia or Fandom datasets as the data source for a visualization that shows how each of those articles have grown in length (as measured in characters or "bytes") over time. ('''''Hint:''' you'll need to return "size" as one of the revision properties (<code>rvprop</code>) if you are not doing it already.'')


In [14]:
import requests
import json
# import time

def get_article_revision_json(title, wiki_name="wikipedia"):
    api_answers = []

    # create a base url for the api and then a normal url which is initially
    # just a copy of it
    # The following line is what the requests call is doing, basically.
    # f"http://en.wikipedia.org/w/api.php/?action=query&titles={title}&prop=revisions&rvprop=flags|timestamp|user|size|ids&rvlimit=500&format=json&continue="
    # e.g.: http://en.wikipedia.org/w/api.php/?action=query&titles=Soundgarden&prop=revisions&rvprop=flags|timestamp|user|size|ids&rvlimit=500&format=json
    wp_api_url=""
    if wiki_name=="wikipedia":
        wp_api_url = "http://en.wikipedia.org/w/api.php/"

    if wiki_name=='fandom':
        wp_api_url = "https://sims.fandom.com/api.php"


    parameters = {'action' : 'query',
                  'titles' : title,
                  'prop' : 'revisions',
                  'rvprop' : 'flags|timestamp|user|size|ids',
                  'rvlimit' : 500,
                  'format' : 'json',
                   }

    # we'll repeat this forever (i.e., we'll only stop when we find
    # the "break" command)
    while True:
        # this will wait for one second
        # time.sleep(1)
        
        # the first line open the urls but also handles unicode urls
        call = requests.get(wp_api_url, params=parameters)
        api_answer = call.json()
        
        # now we'll add this to whatever we are tracking
        api_answers.append(api_answer)
        
        # 'continue' tells us there's more revisions to add
        if 'continue' in api_answer.keys():
            # replace the 'continue' parameter with the contents of the
            # api_answer dictionary.
            parameters.update(api_answer['continue'])
        else:
            break
        
    return(api_answers)

In [21]:
dict_wikipedia_titles={'The_Sims_4_expansion_packs#City_Living':'',
                 'The_Sims_4_expansion_packs#Cats_&_Dogs':'', 
                 'The_Sims_4_expansion_packs#Seasons':'', 
                 'The_Sims_4_expansion_packs#Get_Famous':'', 
                 'The_Sims_4_expansion_packs#Island_Living':''}

dict_fandom_titles={'The_Sims_4:_City_Living':'',
                    #'The_Sims_4:_Cats_%26_Dogs':'',
                    'The_Sims_4:_Seasons':'',
                    'The_Sims_4:_Get_Famous':'',
                    'The_Sims_4:_Island_Living':''}

for key in dict_wikipedia_titles:
    response = get_article_revision_json(key,"wikipedia")
    dict_wikipedia_titles[key] = json.dumps(response)
    #print(response)
# for key in dict_fandom_titles:
#     response = get_article_revision_json(key)
#     dict_wikipedia_titles[key] = json.dumps(response)


In [24]:
def get_revision(pageJson):
    revisions=[]
    obj= json.loads(pageJson)
    currentObj = obj[0]
    pages = currentObj["query"]["pages"]
    # for every page, (there should always be only one) get its revisions:
    for page_id in pages.keys():
        #print(page_id) #They are the same The_Sims_4_expansion_packs that's why it has the same page_id
        query_revisions = pages[page_id]["revisions"]
        title = pages[page_id]['title']
        #print(title)
        for rev in query_revisions:
            if "userhidden" in rev.keys():
                continue
            # 1: add a title field for the article because we're going to mix them together
            rev["title"] = title

                # 2: let's "recode" anon so it's true or false instead of present/missing
            if "anon" in rev.keys():
                rev["anon"] = True
            else:
                rev["anon"] = False
            # 3: let's recode "minor" in the same way
            if "minor" in rev.keys():
                rev["minor"] = True
            else:
                rev["minor"] = False

            # we're going to change the timestamp to make it work a little better in excel/spreadsheets
            rev["timestamp"] = rev["timestamp"].replace("T", " ")
            rev["timestamp"] = rev["timestamp"].replace("Z", "")
            #print(rev)
                # finally, save the revisions we've seen to a varaible
            revisions.append(rev)

    return revisions

def countKey(revisions,key):
    total = 0
    if (revisions is not None):
        for rev in revisions:
            if (rev[key] == "True"):
                total= total+1
    return total



In [25]:
print('(i) what proportion of those edits were made by users without accounts ("anon")?')
for key in dict_wikipedia_titles:
    print(key)
    key_revisions=get_revision(dict_wikipedia_titles[key])
    print(key_revisions)
    #print(key_revisions)
    print(countKey(key_revisions,"anon"))
    

(i) what proportion of those edits were made by users without accounts ("anon")?
The_Sims_4_expansion_packs#City_Living
[{'revid': 1148145049, 'parentid': 1147767160, 'user': 'Mythdon', 'timestamp': '2023-04-04 10:25:15', 'size': 77647, 'title': 'The Sims 4 expansion packs', 'anon': False, 'minor': False}, {'revid': 1147767160, 'parentid': 1147754627, 'user': 'DemonStalker', 'timestamp': '2023-04-02 01:39:37', 'size': 77651, 'title': 'The Sims 4 expansion packs', 'anon': False, 'minor': False}, {'revid': 1147754627, 'parentid': 1145044037, 'user': '2600:4040:46C1:5300:883E:435E:E901:8CFA', 'anon': True, 'timestamp': '2023-04-01 23:58:32', 'size': 77615, 'title': 'The Sims 4 expansion packs', 'minor': False}, {'revid': 1145044037, 'parentid': 1145043978, 'user': 'FeFiFo', 'timestamp': '2023-03-17 00:04:24', 'size': 77557, 'title': 'The Sims 4 expansion packs', 'anon': False, 'minor': False}, {'revid': 1145043978, 'parentid': 1145043849, 'user': 'FeFiFo', 'timestamp': '2023-03-17 00:04:0

In [30]:
print('(ii) what proportion of those edits were marked as "minor"?')
for key in dict_wikipedia_titles:
    print(key)
    key_revisions=get_revision(dict_wikipedia_titles[key])
    #print(key_revisions)
    #print(key_revisions)
    print(countKey(key_revisions,"minor"))


(ii) what proportion of those edits were marked as "minor"?
The_Sims_4_expansion_packs#City_Living
0
The_Sims_4_expansion_packs#Cats_&_Dogs
0
The_Sims_4_expansion_packs#Seasons
0
The_Sims_4_expansion_packs#Get_Famous
0
The_Sims_4_expansion_packs#Island_Living
0


In [32]:
print('#(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didnt do this in class but I made the TSV file would allow this).')
file_name = f"The_Sims_4_wikipedia_time_visualizations.tsv"
for key in dict_wikipedia_titles:
    print(key)
    with open(file_name, "w") as f:    
        print("title\ttimestamp", file=f)
        key_revisions=get_revision(dict_wikipedia_titles[key])
        #print(key_revisions)
        #print(key_revisions)
        for revision in key_revisions:
            timestamp = revision["timestamp"]
            print(key,"\t",timestamp, file=f)            
        
print(f"{file_name} created")


#(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didnt do this in class but I made the TSV file would allow this).
The_Sims_4_expansion_packs#City_Living
The_Sims_4_expansion_packs#Cats_&_Dogs
The_Sims_4_expansion_packs#Seasons
The_Sims_4_expansion_packs#Get_Famous
The_Sims_4_expansion_packs#Island_Living
The_Sims_4_wikipedia_time_visualizations.tsv created


In [35]:
# dict_fandom_titles={'The_Sims_4:_City_Living':'',
#                     'The_Sims_4:_Cats_%26_Dogs':'',
#                     'The_Sims_4:_Seasons':'',
#                     'The_Sims_4:_Get_Famous':'',
#                     'The_Sims_4:_Island_Living':''}

for key in dict_fandom_titles:
    response = get_article_revision_json(key,"fandom")
    print(response)
    dict_fandom_titles[key] = json.dumps(response)

[{'batchcomplete': '', 'query': {'normalized': [{'from': 'The_Sims_4:_City_Living', 'to': 'The Sims 4: City Living'}], 'pages': {'163414': {'pageid': 163414, 'ns': 0, 'title': 'The Sims 4: City Living', 'revisions': [{'revid': 1024169, 'parentid': 1019848, 'user': 'Marica Di Campo', 'timestamp': '2023-04-12T08:49:41Z', 'size': 15559}, {'revid': 1019848, 'parentid': 1016140, 'minor': '', 'user': 'MasterTeska', 'timestamp': '2023-02-21T13:45:19Z', 'size': 15562}, {'revid': 1016140, 'parentid': 1016139, 'minor': '', 'user': 'K6ka', 'timestamp': '2022-12-29T17:55:09Z', 'size': 15527}, {'revid': 1016139, 'parentid': 968253, 'user': '80.229.18.148', 'anon': '', 'timestamp': '2022-12-29T17:52:11Z', 'size': 15527}, {'revid': 968253, 'parentid': 968251, 'user': 'AireDaleDogz', 'timestamp': '2022-07-21T15:43:54Z', 'size': 15527}, {'revid': 968251, 'parentid': 965178, 'user': '93.103.37.175', 'anon': '', 'timestamp': '2022-07-21T15:23:22Z', 'size': 15686}, {'revid': 965178, 'parentid': 952366, 'u

In [37]:
print('(i) what proportion of those edits were made by users without accounts ("anon")?')
for key in dict_fandom_titles:
    #print(dict_fandom_titles[key])
    key_revisions=get_revision(dict_fandom_titles[key])
    print(key)
    print(countKey(key_revisions,"anon"))
    

(i) what proportion of those edits were made by users without accounts ("anon")?
The_Sims_4:_City_Living
0
The_Sims_4:_Seasons
0
The_Sims_4:_Get_Famous
0
The_Sims_4:_Island_Living
0


In [39]:
print('(ii) what proportion of those edits were marked as "minor"?')
for key in dict_fandom_titles:
    print(key)
    key_revisions=get_revision(dict_fandom_titles[key])
    #print(key_revisions)
    #print(key_revisions)
    print(countKey(key_revisions,"minor"))


(ii) what proportion of those edits were marked as "minor"?
The_Sims_4:_City_Living
0
The_Sims_4:_Seasons
0
The_Sims_4:_Get_Famous
0
The_Sims_4:_Island_Living
0


In [42]:
print('#(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didnt do this in class but I made the TSV file would allow this).')
file_name = f"The_Sims_4_fandom_time_visualizations.tsv"
for key in dict_fandom_titles:
    print(key)
    with open(file_name, "w") as f:    
        print("title\ttimestamp", file=f)
        key_revisions=get_revision(dict_fandom_titles[key])
        #print(key_revisions)
        #print(key_revisions)
        for revision in key_revisions:
            timestamp = revision["timestamp"]
            print(key,"\t",timestamp, file=f)            
        
print(f"{file_name} created")

#(iii) make and share a visualization of the total number of edits across those 5 articles over time (I didnt do this in class but I made the TSV file would allow this).
The_Sims_4:_City_Living
The_Sims_4:_Seasons
The_Sims_4:_Get_Famous
The_Sims_4:_Island_Living
The_Sims_4_fandom_time_visualizations.tsv created


In [43]:
#Finally, choose either your Wikipedia or Fandom datasets as the data source for a visualization that shows how each of those articles have grown in length (as measured in characters or "bytes") over time. (Hint: you'll need to return "size" as one of the revision properties (rvprop) if you are not doing it already.)
file_name = f"The_Sims_4_fandom_size_visualizations.tsv"
for key in dict_fandom_titles:
    print(key)
    
    with open(file_name, "w") as f:    
        print("title\ttimestamp\tsize", file=f)
        key_revisions=get_revision(dict_fandom_titles[key])
        #print(key_revisions)
        #print(key_revisions)        
        for revision in key_revisions:
            timestamp = revision["timestamp"]
            size = revision["size"]
            print(key,"\t", timestamp,"\t",size, file=f)
        

print(f"{file_name} created")

The_Sims_4:_City_Living
The_Sims_4:_Seasons
The_Sims_4:_Get_Famous
The_Sims_4:_Island_Living
The_Sims_4_fandom_size_visualizations.tsv created
