# Graphing Recent Changes on a Wiki #

I wanted to determine how frequently users made edits to Talk pages on [Fanlore](www.fanlore.org), a fan-created and fan-run encyclopedia and part of the Organization for Transformative Works. Talk pages are an open discussion space that is tied to each wiki page, and are used to propose edits and leave notes for other editots.

You can view the Recent Changes page for Fanlor ehere: (https://fanlore.org/w/index.php?namespace=1&tagfilter=&title=Special%3ARecentChanges)

For this project, I used the following libraries:

In [8]:
import requests
import json
from collections import Counter
import plotly.express as px

Using the requests module, I specified the parameters for the query to retrieve data from the API:

In [9]:
#convert 90 days into seconds - for rcend parameter below
rangeEnd = 60*60*24*90

#dictionary w/ parameters for the API request
parameters = {
    "format": "json",
    "action": "query",
    "list": "recentchanges",
    "rcnamespace": 1,
    "rcprop": "timestamp|title|ids|user",
    "rcstart": "now",
    "rcend": f"{rangeEnd}",
    "rclimit": 500,
}

"Namespace 1" refers to all Talk pages. 90 days is the maximum length of time that data can be stored on the "Recent Changes" list. There is also a limit of maximum 500 entries which can be retrieved at once.

In [10]:
tsList = []  # Empty list to store the timestamps we will get later

#ask the Fanlore server for the data and load it with JSON
r = requests.get("https://fanlore.org/w/api.php", params=parameters)
print("Paging fanlore - reponse is: ", r.status_code)
scrape = json.loads(r.content)


Paging fanlore - reponse is:  200


"200" means the server accepted our request and is sending over data. 

Now that our list of recent changes is loaded as a dictionary, we can loop through it, pulling out the timestamp of each edit and adding it to the tsList variable we created before.

In [11]:
for change in scrape['query']['recentchanges']:
    tsList.append(change['timestamp'])

I wanted to also determine the number of total edits, as well as the earliest and latest dates of the edits. Because the data from the API is returned in a chronological order, this was fairly easy to do - we only needed to retrieve the first and last item on the list.

In [12]:
print(f"Retrieved {len(tsList)} edits from the server. (Maximum is 500)")
print(f"The latest date is {tsList[0]}. The earliest date is {tsList[(len(tsList)-1)]}")

Retrieved 478 edits from the server. (Maximum is 500)
The latest date is 2021-08-31T13:53:28Z. The earliest date is 2021-06-02T19:44:54Z


As you can see above, the entries in our list were formatted to include the time of the edit as well. Since I didn't need that info, I split the timestamps at the "T" character, storing the new dates in their own list.

In [13]:
tsListTrimmed = []

for item in tsList:
    tsListTrimmed.append(item.split('T')[0])


Now that we have a clean list of dates, we can use the Counter object (part of the Collections library) to determine their frequency. This will create a new dictionary with the keys indicating the number of times each value (date) occurs. However, since this dictionary is ordered from most to least frequent, it will not be very useful for plotting on a line graph, so we will convert it to a regular dictionary as well. 

In [14]:
tsCounter = Counter(tsListTrimmed) # ordered from highest to lowest count...
graphData = dict(tsCounter)  #chronological


Now we are ready to start creating our graph. You can hover over the line to see individual values for each day, or use the slider below the graph to zoom in and isolate a particular range.

In [15]:
x_val = list(graphData.keys())  # dates
y_val = list(graphData.values())  # freq

print("Making the graph... Check your browser!")

graph = px.line(
    x=x_val,
    y=y_val,
    title="Frequency of Talk Page edits on Fanlore.org, last 90 days",
    labels={"x": "Date", "y": "# of Edits"})

graph.update_xaxes(rangeslider_visible=True)
graph.show()


Making the graph... Check your browser!
