## Data Acquisition Document

This document is for data acquisition process.

The document uses code from the example document provided by Dr. McDonald, accessing the article data from Wikipedia REST API. This dataset uses a list of article titles from the academy award winning movies, and here I removed the one `Victor/Victoria` since it is not a valid title.

In [1]:
import json, time, urllib.parse
import requests

In [6]:
import pandas as pd

### List access

Here, I read in the excel file containing all the article titles that we want to analyze.

In [33]:
articles = pd.read_excel('thank_the_academy.AUG.2023.csv.xlsx')
articles.name

0                Everything Everywhere All at Once
1       All Quiet on the Western Front (2022 film)
2                            The Whale (2022 film)
3                                Top Gun: Maverick
4                   Black Panther: Wakanda Forever
                           ...                    
1353                       The Yankee Doodle Mouse
1354                      The Yearling (1946 film)
1355                 Yesterday, Today and Tomorrow
1356             You Can't Take It with You (film)
1357                        Zorba the Greek (film)
Name: name, Length: 1358, dtype: object

### API access

The following code is from `wp_article_views_example` provided by Dr. McDonald. The CONSTANTS are used for the API calls, and the function returns a json response containing all the information needed for the analysis.

In [44]:
# CONSTANTS
API_REQUEST_PAGEVIEWS_ENDPOINT = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/'
API_REQUEST_PER_ARTICLE_PARAMS = 'per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}'

# The Pageviews API asks that we not exceed 100 requests per second, we add a small delay to each request
API_LATENCY_ASSUMED = 0.002       # Assuming roughly 2ms latency on the API and network
API_THROTTLE_WAIT = (1.0/100.0)-API_LATENCY_ASSUMED

REQUEST_HEADERS = {
    'User-Agent': '<gjwong@uw.edu>, University of Washington, MSDS DATA 512 - AUTUMN 2023',
}


# EXAMPLES to REPRODUCE
# This is just a list of English Wikipedia article titles that we can use for example requests
# ARTICLE_TITLES = [ 'Bison', 'Northern flicker', 'Red squirrel', 'Chinook salmon', 'Horseshoe bat' ]
ARTICLE_TITLES = articles.name


In [45]:
# FUNCTIONS
def request_pageviews_per_article(article_title = None, 
                                  endpoint_url = API_REQUEST_PAGEVIEWS_ENDPOINT, 
                                  endpoint_params = API_REQUEST_PER_ARTICLE_PARAMS, 
                                  request_template = ARTICLE_PAGEVIEWS_PARAMS_TEMPLATE,
                                  headers = REQUEST_HEADERS):

    # article title can be as a parameter to the call or in the request_template
    if article_title:
        request_template['article'] = article_title

    if not request_template['article']:
        raise Exception("Must supply an article title to make a pageviews request.")

    # Titles are supposed to have spaces replaced with "_" and be URL encoded
    article_title_encoded = urllib.parse.quote(request_template['article'].replace(' ','_'))
    request_template['article'] = article_title_encoded
    
    # now, create a request URL by combining the endpoint_url with the parameters for the request
    request_url = endpoint_url+endpoint_params.format(**request_template)
    
    # make the request
    try:
        # we'll wait first, to make sure we don't exceed the limit in the situation where an exception
        # occurs during the request processing - throttling is always a good practice with a free
        # data source like Wikipedia - or other community sources
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response


### test

The following chunks are a simple test of applying the function written above, extracting information for 1 Wikipedia page, with access from desktop, and time range from 2015-01 to 2023-09. The test outputs a `sample.json` file.

In [None]:
ARTICLE_PAGEVIEWS_PARAMS_TEMPLATE = {
    "project":     "en.wikipedia.org",
    "access":      "desktop",      # this should be changed for the different access types
    "agent":       "user",
    "article":     "",             # this value will be set/changed before each request
    "granularity": "monthly",
    "start":       "2015010100",   # start and end dates need to be set
    "end":         "2023100100"    
}


In [18]:
print("Getting pageview data for: ",ARTICLE_TITLES[0])
views = request_pageviews_per_article(ARTICLE_TITLES[0])

Getting pageview data for:  Everything Everywhere All at Once


In [25]:
views['items']

[{'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020010100',
  'access': 'desktop',
  'agent': 'user',
  'views': 1209},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020020100',
  'access': 'desktop',
  'agent': 'user',
  'views': 2944},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020030100',
  'access': 'desktop',
  'agent': 'user',
  'views': 2612},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020040100',
  'access': 'desktop',
  'agent': 'user',
  'views': 4530},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020050100',
  'access': 'desktop',
  'agent': 'user',
  'views': 3952},
 {'project': 'en.wik

In [19]:
print("Have %d months of pageview data"%(len(views['items'])))
for month in views['items']:
    print(json.dumps(month,indent=4))

Have 45 months of pageview data
{
    "project": "en.wikipedia",
    "article": "Everything_Everywhere_All_at_Once",
    "granularity": "monthly",
    "timestamp": "2020010100",
    "access": "desktop",
    "agent": "user",
    "views": 1209
}
{
    "project": "en.wikipedia",
    "article": "Everything_Everywhere_All_at_Once",
    "granularity": "monthly",
    "timestamp": "2020020100",
    "access": "desktop",
    "agent": "user",
    "views": 2944
}
{
    "project": "en.wikipedia",
    "article": "Everything_Everywhere_All_at_Once",
    "granularity": "monthly",
    "timestamp": "2020030100",
    "access": "desktop",
    "agent": "user",
    "views": 2612
}
{
    "project": "en.wikipedia",
    "article": "Everything_Everywhere_All_at_Once",
    "granularity": "monthly",
    "timestamp": "2020040100",
    "access": "desktop",
    "agent": "user",
    "views": 4530
}
{
    "project": "en.wikipedia",
    "article": "Everything_Everywhere_All_at_Once",
    "granularity": "monthly",
    "

In [28]:
with open("sample.json", "w") as f:
    for month in views['items']:
        json.dump(month, f, indent=4)

### Desktop

The following code retrieve all the view count data for the desktop access, and save it into a json file called `academy_monthly_desktop_201501-202309.json`. The time range of this json file is from 2015-01 to 2023-09.

In [37]:
monthly_desktop = {}
for article in ARTICLE_TITLES:
    print("Getting pageview data for: ", article)
    views = request_pageviews_per_article(article)
    monthly_desktop[article] = views['items']

with open("academy_monthly_desktop_201501-202309.json", "w") as f:
    json.dump(monthly_desktop, f, indent=4)

Getting pageview data for:  Everything Everywhere All at Once
Getting pageview data for:  All Quiet on the Western Front (2022 film)
Getting pageview data for:  The Whale (2022 film)
Getting pageview data for:  Top Gun: Maverick
Getting pageview data for:  Black Panther: Wakanda Forever
Getting pageview data for:  Avatar: The Way of Water
Getting pageview data for:  Women Talking (film)
Getting pageview data for:  Guillermo del Toro's Pinocchio
Getting pageview data for:  Navalny (film)
Getting pageview data for:  The Elephant Whisperers
Getting pageview data for:  An Irish Goodbye
Getting pageview data for:  The Boy, the Mole, the Fox and the Horse (film)
Getting pageview data for:  RRR (film)
Getting pageview data for:  CODA (2021 film)
Getting pageview data for:  Dune (2021 film)
Getting pageview data for:  The Eyes of Tammy Faye (2021 film)
Getting pageview data for:  No Time to Die
Getting pageview data for:  The Windshield Wiper
Getting pageview data for:  The Long Goodbye (Riz A

Getting pageview data for:  The Help (film)
Getting pageview data for:  A Separation
Getting pageview data for:  The Fantastic Flying Books of Mr. Morris Lessmore
Getting pageview data for:  The Shore (2011 film)
Getting pageview data for:  Undefeated (2011 film)
Getting pageview data for:  The Muppets (film)
Getting pageview data for:  Saving Face (2012 film)
Getting pageview data for:  Beginners
Getting pageview data for:  Rango (2011 film)
Getting pageview data for:  The King's Speech
Getting pageview data for:  Inception
Getting pageview data for:  The Social Network
Getting pageview data for:  The Fighter
Getting pageview data for:  Toy Story 3
Getting pageview data for:  Alice in Wonderland (2010 film)
Getting pageview data for:  Black Swan (film)
Getting pageview data for:  In a Better World
Getting pageview data for:  The Lost Thing
Getting pageview data for:  God of Love (film)
Getting pageview data for:  The Wolfman (2010 film)
Getting pageview data for:  Strangers No More
Ge

Getting pageview data for:  Father and Daughter (film)
Getting pageview data for:  Into the Arms of Strangers: Stories of the Kindertransport
Getting pageview data for:  Quiero ser (I want to be...)
Getting pageview data for:  Big Mama (film)
Getting pageview data for:  American Beauty (1999 film)
Getting pageview data for:  The Matrix
Getting pageview data for:  The Cider House Rules (film)
Getting pageview data for:  Topsy-Turvy
Getting pageview data for:  Sleepy Hollow (film)
Getting pageview data for:  Boys Don't Cry (1999 film)
Getting pageview data for:  Tarzan (1999 film)
Getting pageview data for:  One Day in September
Getting pageview data for:  The Red Violin
Getting pageview data for:  The Old Man and the Sea (1999 film)
Getting pageview data for:  My Mother Dreams the Satan's Disciples in New York
Getting pageview data for:  King Gimp
Getting pageview data for:  Girl, Interrupted (film)
Getting pageview data for:  All About My Mother
Getting pageview data for:  Shakespeare 

Getting pageview data for:  A Fish Called Wanda
Getting pageview data for:  Pelle the Conqueror
Getting pageview data for:  The Accused (1988 film)
Getting pageview data for:  The Appointments of Dennis Jennings
Getting pageview data for:  Beetlejuice
Getting pageview data for:  Bird (1988 film)
Getting pageview data for:  Hôtel Terminus: The Life and Times of Klaus Barbie
Getting pageview data for:  The Milagro Beanfield War
Getting pageview data for:  Tin Toy
Getting pageview data for:  You Don't Have to Die
Getting pageview data for:  The Last Emperor
Getting pageview data for:  Moonstruck
Getting pageview data for:  The Untouchables (film)
Getting pageview data for:  Babette's Feast
Getting pageview data for:  Dirty Dancing
Getting pageview data for:  Harry and the Hendersons
Getting pageview data for:  Innerspace
Getting pageview data for:  The Man Who Planted Trees (film)
Getting pageview data for:  Ray's Male Heterosexual Dance Hall
Getting pageview data for:  The Ten-Year Lunch

Getting pageview data for:  Number Our Days
Getting pageview data for:  King Kong (1976 film)
Getting pageview data for:  Logan's Run (film)
Getting pageview data for:  One Flew Over the Cuckoo's Nest (film)
Getting pageview data for:  Barry Lyndon
Getting pageview data for:  Jaws (film)
Getting pageview data for:  Dog Day Afternoon
Getting pageview data for:  Nashville (film)
Getting pageview data for:  Shampoo (film)
Getting pageview data for:  The Sunshine Boys (1975 film)
Getting pageview data for:  Angel and Big Joe
Getting pageview data for:  Dersu Uzala (1975 film)
Getting pageview data for:  The End of the Game (1975 film)
Getting pageview data for:  Great (1975 film)
Getting pageview data for:  The Man Who Skied Down Everest
Getting pageview data for:  The Hindenburg (film)
Getting pageview data for:  The Godfather: Part II
Getting pageview data for:  The Towering Inferno
Getting pageview data for:  The Great Gatsby (1974 film)
Getting pageview data for:  Chinatown (1974 film)

Getting pageview data for:  All About Eve
Getting pageview data for:  All Quiet on the Western Front (1930 film)
Getting pageview data for:  The Devil and Daniel Webster (film)
Getting pageview data for:  All the King's Men (1949 film)
Getting pageview data for:  Ama Girls
Getting pageview data for:  America America
Getting pageview data for:  An American in Paris (film)
Getting pageview data for:  Amphibious Fighters
Getting pageview data for:  Anastasia (1956 film)
Getting pageview data for:  Anchors Aweigh (film)
Getting pageview data for:  Anna and the King of Siam (film)
Getting pageview data for:  Annie Get Your Gun (film)
Getting pageview data for:  Anthony Adverse
Getting pageview data for:  The Apartment
Getting pageview data for:  Aquatic House Party
Getting pageview data for:  Arise, My Love
Getting pageview data for:  Around the World in 80 Days (1956 film)
Getting pageview data for:  The Awful Truth
Getting pageview data for:  The Bachelor and the Bobby-Soxer
Getting pagev

Getting pageview data for:  Goldfinger (film)
Getting pageview data for:  Gone with the Wind (film)
Getting pageview data for:  The Good Earth (film)
Getting pageview data for:  Goodbye, Miss Turlock
Getting pageview data for:  Goodbye, Mr. Chips (1939 film)
Getting pageview data for:  Grand Canyon (1958 film)
Getting pageview data for:  Grand Hotel (1932 film)
Getting pageview data for:  Grandad of Races
Getting pageview data for:  The Grapes of Wrath (film)
Getting pageview data for:  The Great Caruso
Getting pageview data for:  Great Expectations (1946 film)
Getting pageview data for:  The Great Lie
Getting pageview data for:  The Great McGinty
Getting pageview data for:  The Great Waltz (1938 film)
Getting pageview data for:  The Great Ziegfeld
Getting pageview data for:  The Greatest Show on Earth (film)
Getting pageview data for:  Green Dolphin Street (film)
Getting pageview data for:  The Guns of Navarone (film)
Getting pageview data for:  Hamlet (1948 film)
Getting pageview dat

Getting pageview data for:  The Paleface (1948 film)
Getting pageview data for:  Panic in the Streets (film)
Getting pageview data for:  Papa's Delicate Condition
Getting pageview data for:  The Patriot (1928 film)
Getting pageview data for:  Penny Wisdom
Getting pageview data for:  Phantom of the Opera (1943 film)
Getting pageview data for:  The Philadelphia Story (film)
Getting pageview data for:  Picnic (1955 film)
Getting pageview data for:  The Picture of Dorian Gray (1945 film)
Getting pageview data for:  Pillow Talk (film)
Getting pageview data for:  The Pink Phink
Getting pageview data for:  Pinocchio (1940 film)
Getting pageview data for:  A Place in the Sun (1951 film)
Getting pageview data for:  Plymouth Adventure
Getting pageview data for:  Pollyanna (1960 film)
Getting pageview data for:  Porgy and Bess (film)
Getting pageview data for:  Portrait of Jennie
Getting pageview data for:  Prelude to War
Getting pageview data for:  Pride and Prejudice (1940 film)
Getting pagevie

Getting pageview data for:  Viva Zapata!
Getting pageview data for:  Waikiki Wedding
Getting pageview data for:  The Walls of Malapaga
Getting pageview data for:  The War of the Worlds (1953 film)
Getting pageview data for:  Watch on the Rhine
Getting pageview data for:  Water Birds
Getting pageview data for:  The Way of All Flesh (1927 film)
Getting pageview data for:  West Side Story (1961 film)
Getting pageview data for:  The Westerner (1940 film)
Getting pageview data for:  The Wetback Hound
Getting pageview data for:  What Ever Happened to Baby Jane? (1962 film)
Getting pageview data for:  When Tomorrow Comes (film)
Getting pageview data for:  When Magoo Flew
Getting pageview data for:  When Worlds Collide (1951 film)
Getting pageview data for:  White Shadows in the South Seas
Getting pageview data for:  White Wilderness (film)
Getting pageview data for:  Who's Who in Animal Land
Getting pageview data for:  Why Korea?
Getting pageview data for:  Wilson (1944 film)
Getting pageview

### Mobile

The following code retrieve all the view count data for the mobile access, and save it into a json file called academy_monthly_mobile_201501-202309.json. The time range of this json file is from 2015-01 to 2023-09.

Since the API document doesn't provide a one-call method to get all mobile data, I made 2 separate calls `mobile-app` access and `mobile-web` access.

In [48]:
ARTICLE_PAGEVIEWS_PARAMS_MOBILE_APP = {
    "project":     "en.wikipedia.org",
    "access":      "mobile-app",      # this should be changed for the different access types
    "agent":       "user",
    "article":     "",             # this value will be set/changed before each request
    "granularity": "monthly",
    "start":       "2015010100",   # start and end dates need to be set
    "end":         "2023100100"    # this is likely the wrong end date
}

ARTICLE_PAGEVIEWS_PARAMS_MOBILE_WEB = {
    "project":     "en.wikipedia.org",
    "access":      "mobile-web",      # this should be changed for the different access types
    "agent":       "user",
    "article":     "",             # this value will be set/changed before each request
    "granularity": "monthly",
    "start":       "2015010100",   # start and end dates need to be set
    "end":         "2023100100"    # this is likely the wrong end date
}

In [67]:
print("Getting pageview data for: ",ARTICLE_TITLES[0])
views_app = request_pageviews_per_article(ARTICLE_TITLES[0], 
                                      request_template=ARTICLE_PAGEVIEWS_PARAMS_MOBILE_APP)

views_web = request_pageviews_per_article(ARTICLE_TITLES[0], 
                                      request_template=ARTICLE_PAGEVIEWS_PARAMS_MOBILE_WEB)
views_app['items']

Getting pageview data for:  Everything Everywhere All at Once


[{'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020010100',
  'access': 'mobile-app',
  'agent': 'user',
  'views': 65},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020020100',
  'access': 'mobile-app',
  'agent': 'user',
  'views': 152},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020030100',
  'access': 'mobile-app',
  'agent': 'user',
  'views': 120},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020040100',
  'access': 'mobile-app',
  'agent': 'user',
  'views': 284},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020050100',
  'access': 'mobile-app',
  'agent': 'user',
  'views': 231},
 {'project'

In [52]:
views_web['items']

[{'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020010100',
  'access': 'mobile-web',
  'agent': 'user',
  'views': 2241},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020020100',
  'access': 'mobile-web',
  'agent': 'user',
  'views': 4955},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020030100',
  'access': 'mobile-web',
  'agent': 'user',
  'views': 4427},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020040100',
  'access': 'mobile-web',
  'agent': 'user',
  'views': 9540},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020050100',
  'access': 'mobile-web',
  'agent': 'user',
  'views': 7878},
 {'pr

After retrieving 2 sets of mobile access info, I merged them together to get a full count for the mobile access. The following function merged 2 json style document together, add up the `views` based on the article title and timestamp.

In [68]:
def merge_mobile(app, web):
    merged = []
    
    assert len(app) == len(web)
    for i in range(len(app)):
        app[i]['views'] += web[i]['views']
        app[i]['access'] = 'mobile'

        merged.append(app[i])
    return merged
        
merge_mobile(views_app['items'], views_web['items'])

[{'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020010100',
  'access': 'mobile',
  'agent': 'user',
  'views': 2306},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020020100',
  'access': 'mobile',
  'agent': 'user',
  'views': 5107},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020030100',
  'access': 'mobile',
  'agent': 'user',
  'views': 4547},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020040100',
  'access': 'mobile',
  'agent': 'user',
  'views': 9824},
 {'project': 'en.wikipedia',
  'article': 'Everything_Everywhere_All_at_Once',
  'granularity': 'monthly',
  'timestamp': '2020050100',
  'access': 'mobile',
  'agent': 'user',
  'views': 8109},
 {'project': 'en.wikipedi

In [71]:
monthly_mobile = {}
for article in ARTICLE_TITLES:
    print("Getting pageview data for: ", article)
    views_app = request_pageviews_per_article(article, 
                                      request_template=ARTICLE_PAGEVIEWS_PARAMS_MOBILE_APP)
    views_web = request_pageviews_per_article(article, 
                                      request_template=ARTICLE_PAGEVIEWS_PARAMS_MOBILE_WEB)
    views = merge_mobile(views_app['items'], views_web['items'])
    monthly_mobile[article] = views

with open("monthly_mobile_test.json", "w") as f:
    json.dump(monthly_mobile, f, indent=4)

Getting pageview data for:  Everything Everywhere All at Once
Getting pageview data for:  All Quiet on the Western Front (2022 film)
Getting pageview data for:  The Whale (2022 film)
Getting pageview data for:  Top Gun: Maverick
Getting pageview data for:  Black Panther: Wakanda Forever
Getting pageview data for:  Avatar: The Way of Water
Getting pageview data for:  Women Talking (film)
Getting pageview data for:  Guillermo del Toro's Pinocchio
Getting pageview data for:  Navalny (film)
Getting pageview data for:  The Elephant Whisperers
Getting pageview data for:  An Irish Goodbye
Getting pageview data for:  The Boy, the Mole, the Fox and the Horse (film)
Getting pageview data for:  RRR (film)
Getting pageview data for:  CODA (2021 film)
Getting pageview data for:  Dune (2021 film)
Getting pageview data for:  The Eyes of Tammy Faye (2021 film)
Getting pageview data for:  No Time to Die
Getting pageview data for:  The Windshield Wiper
Getting pageview data for:  The Long Goodbye (Riz A

Getting pageview data for:  Midnight in Paris
Getting pageview data for:  The Help (film)
Getting pageview data for:  A Separation
Getting pageview data for:  The Fantastic Flying Books of Mr. Morris Lessmore
Getting pageview data for:  The Shore (2011 film)
Getting pageview data for:  Undefeated (2011 film)
Getting pageview data for:  The Muppets (film)
Getting pageview data for:  Saving Face (2012 film)
Getting pageview data for:  Beginners
Getting pageview data for:  Rango (2011 film)
Getting pageview data for:  The King's Speech
Getting pageview data for:  Inception
Getting pageview data for:  The Social Network
Getting pageview data for:  The Fighter
Getting pageview data for:  Toy Story 3
Getting pageview data for:  Alice in Wonderland (2010 film)
Getting pageview data for:  Black Swan (film)
Getting pageview data for:  In a Better World
Getting pageview data for:  The Lost Thing
Getting pageview data for:  God of Love (film)
Getting pageview data for:  The Wolfman (2010 film)
Ge

Getting pageview data for:  Pollock (film)
Getting pageview data for:  Father and Daughter (film)
Getting pageview data for:  Into the Arms of Strangers: Stories of the Kindertransport
Getting pageview data for:  Quiero ser (I want to be...)
Getting pageview data for:  Big Mama (film)
Getting pageview data for:  American Beauty (1999 film)
Getting pageview data for:  The Matrix
Getting pageview data for:  The Cider House Rules (film)
Getting pageview data for:  Topsy-Turvy
Getting pageview data for:  Sleepy Hollow (film)
Getting pageview data for:  Boys Don't Cry (1999 film)
Getting pageview data for:  Tarzan (1999 film)
Getting pageview data for:  One Day in September
Getting pageview data for:  The Red Violin
Getting pageview data for:  The Old Man and the Sea (1999 film)
Getting pageview data for:  My Mother Dreams the Satan's Disciples in New York
Getting pageview data for:  King Gimp
Getting pageview data for:  Girl, Interrupted (film)
Getting pageview data for:  All About My Moth

Getting pageview data for:  Working Girl
Getting pageview data for:  The Accidental Tourist (film)
Getting pageview data for:  A Fish Called Wanda
Getting pageview data for:  Pelle the Conqueror
Getting pageview data for:  The Accused (1988 film)
Getting pageview data for:  The Appointments of Dennis Jennings
Getting pageview data for:  Beetlejuice
Getting pageview data for:  Bird (1988 film)
Getting pageview data for:  Hôtel Terminus: The Life and Times of Klaus Barbie
Getting pageview data for:  The Milagro Beanfield War
Getting pageview data for:  Tin Toy
Getting pageview data for:  You Don't Have to Die
Getting pageview data for:  The Last Emperor
Getting pageview data for:  Moonstruck
Getting pageview data for:  The Untouchables (film)
Getting pageview data for:  Babette's Feast
Getting pageview data for:  Dirty Dancing
Getting pageview data for:  Harry and the Hendersons
Getting pageview data for:  Innerspace
Getting pageview data for:  The Man Who Planted Trees (film)
Getting pa

Getting pageview data for:  In the Region of Ice
Getting pageview data for:  Leisure (film)
Getting pageview data for:  Number Our Days
Getting pageview data for:  King Kong (1976 film)
Getting pageview data for:  Logan's Run (film)
Getting pageview data for:  One Flew Over the Cuckoo's Nest (film)
Getting pageview data for:  Barry Lyndon
Getting pageview data for:  Jaws (film)
Getting pageview data for:  Dog Day Afternoon
Getting pageview data for:  Nashville (film)
Getting pageview data for:  Shampoo (film)
Getting pageview data for:  The Sunshine Boys (1975 film)
Getting pageview data for:  Angel and Big Joe
Getting pageview data for:  Dersu Uzala (1975 film)
Getting pageview data for:  The End of the Game (1975 film)
Getting pageview data for:  Great (1975 film)
Getting pageview data for:  The Man Who Skied Down Everest
Getting pageview data for:  The Hindenburg (film)
Getting pageview data for:  The Godfather: Part II
Getting pageview data for:  The Towering Inferno
Getting pagevi

Getting pageview data for:  Albert Schweitzer (film)
Getting pageview data for:  Alexander's Ragtime Band (film)
Getting pageview data for:  All About Eve
Getting pageview data for:  All Quiet on the Western Front (1930 film)
Getting pageview data for:  The Devil and Daniel Webster (film)
Getting pageview data for:  All the King's Men (1949 film)
Getting pageview data for:  Ama Girls
Getting pageview data for:  America America
Getting pageview data for:  An American in Paris (film)
Getting pageview data for:  Amphibious Fighters
Getting pageview data for:  Anastasia (1956 film)
Getting pageview data for:  Anchors Aweigh (film)
Getting pageview data for:  Anna and the King of Siam (film)
Getting pageview data for:  Annie Get Your Gun (film)
Getting pageview data for:  Anthony Adverse
Getting pageview data for:  The Apartment
Getting pageview data for:  Aquatic House Party
Getting pageview data for:  Arise, My Love
Getting pageview data for:  Around the World in 80 Days (1956 film)
Getti

Getting pageview data for:  Going My Way
Getting pageview data for:  Gold Diggers of 1935
Getting pageview data for:  The Golden Fish (film)
Getting pageview data for:  Goldfinger (film)
Getting pageview data for:  Gone with the Wind (film)
Getting pageview data for:  The Good Earth (film)
Getting pageview data for:  Goodbye, Miss Turlock
Getting pageview data for:  Goodbye, Mr. Chips (1939 film)
Getting pageview data for:  Grand Canyon (1958 film)
Getting pageview data for:  Grand Hotel (1932 film)
Getting pageview data for:  Grandad of Races
Getting pageview data for:  The Grapes of Wrath (film)
Getting pageview data for:  The Great Caruso
Getting pageview data for:  Great Expectations (1946 film)
Getting pageview data for:  The Great Lie
Getting pageview data for:  The Great McGinty
Getting pageview data for:  The Great Waltz (1938 film)
Getting pageview data for:  The Great Ziegfeld
Getting pageview data for:  The Greatest Show on Earth (film)
Getting pageview data for:  Green Dolp

Getting pageview data for:  One Night of Love
Getting pageview data for:  One Way Passage
Getting pageview data for:  Overture to The Merry Wives of Windsor
Getting pageview data for:  The Paleface (1948 film)
Getting pageview data for:  Panic in the Streets (film)
Getting pageview data for:  Papa's Delicate Condition
Getting pageview data for:  The Patriot (1928 film)
Getting pageview data for:  Penny Wisdom
Getting pageview data for:  Phantom of the Opera (1943 film)
Getting pageview data for:  The Philadelphia Story (film)
Getting pageview data for:  Picnic (1955 film)
Getting pageview data for:  The Picture of Dorian Gray (1945 film)
Getting pageview data for:  Pillow Talk (film)
Getting pageview data for:  The Pink Phink
Getting pageview data for:  Pinocchio (1940 film)
Getting pageview data for:  A Place in the Sun (1951 film)
Getting pageview data for:  Plymouth Adventure
Getting pageview data for:  Pollyanna (1960 film)
Getting pageview data for:  Porgy and Bess (film)
Getting 

Getting pageview data for:  The Vanishing Prairie
Getting pageview data for:  The Virgin Spring
Getting pageview data for:  Viva Villa!
Getting pageview data for:  Viva Zapata!
Getting pageview data for:  Waikiki Wedding
Getting pageview data for:  The Walls of Malapaga
Getting pageview data for:  The War of the Worlds (1953 film)
Getting pageview data for:  Watch on the Rhine
Getting pageview data for:  Water Birds
Getting pageview data for:  The Way of All Flesh (1927 film)
Getting pageview data for:  West Side Story (1961 film)
Getting pageview data for:  The Westerner (1940 film)
Getting pageview data for:  The Wetback Hound
Getting pageview data for:  What Ever Happened to Baby Jane? (1962 film)
Getting pageview data for:  When Tomorrow Comes (film)
Getting pageview data for:  When Magoo Flew
Getting pageview data for:  When Worlds Collide (1951 film)
Getting pageview data for:  White Shadows in the South Seas
Getting pageview data for:  White Wilderness (film)
Getting pageview da

### All access

The following code generates the `all-access` data by combining the desktop access data and mobile access data.

In [83]:
monthly_all_access = {}

with open('monthly_desktop_test.json', 'r') as f1, open('monthly_mobile_test.json', 'r') as f2:
    data1 = json.load(f1)
    data2 = json.load(f2)
    
for key, values in data1.items():
    print("Getting pageview data for: ", key)
    if key not in monthly_all_access:
        monthly_all_access[key] = []
    
    for value in values:
        timestamp = value['timestamp']

        # check if the same key and timestamp exists in the second JSON
        match = next((item for item in data2.get(key, []) if item['timestamp'] == timestamp), None)

        if match:
            value['views'] += match['views']
            # remove duplicates from data2
            data2[key].remove(match)
        value.pop('access', None)
        monthly_all_access[key].append(value)

for key, values in data2.items():
    for value in values:
        value.pop('access', None)
        
    if key not in monthly_all_access:
        monthly_all_access[key] = []
    monthly_all_access[key].extend(values)

Getting pageview data for:  Everything Everywhere All at Once
Getting pageview data for:  All Quiet on the Western Front (2022 film)
Getting pageview data for:  The Whale (2022 film)
Getting pageview data for:  Top Gun: Maverick
Getting pageview data for:  Black Panther: Wakanda Forever
Getting pageview data for:  Avatar: The Way of Water
Getting pageview data for:  Women Talking (film)
Getting pageview data for:  Guillermo del Toro's Pinocchio
Getting pageview data for:  Navalny (film)
Getting pageview data for:  The Elephant Whisperers
Getting pageview data for:  An Irish Goodbye
Getting pageview data for:  The Boy, the Mole, the Fox and the Horse (film)
Getting pageview data for:  RRR (film)
Getting pageview data for:  CODA (2021 film)
Getting pageview data for:  Dune (2021 film)
Getting pageview data for:  The Eyes of Tammy Faye (2021 film)
Getting pageview data for:  No Time to Die
Getting pageview data for:  The Windshield Wiper
Getting pageview data for:  The Long Goodbye (Riz A

In [86]:
with open("monthly_cumulative_test.json", "w") as f:
    json.dump(monthly_all_access, f, indent=4)