### Guardian API access

You will need to apply for a free developer key in order to use the API fully within Jupyter:

[Guardian Open Platform - Getting started](https://open-platform.theguardian.com/access/)

You can explore what is possible with the API here:

[Guardian Open Platform - explore](https://open-platform.theguardian.com/explore/)

In [77]:
#import required libraries
import requests
import json
import re
import time

In [2]:
#load your personal API key
with open('private/guardian_key.txt', 'r') as file:
    key = file.read().strip()
len(key)

36

In [67]:
 #build a search URL
base_url = 'https://content.guardianapis.com/'
search_string = "ukraine"
production_office = "aus"
from_date = "2023-03-01"

full_url = base_url+f"search?q={search_string}&production-office={production_office}&from-date={from_date}&show-fields=body&api-key={key}"

#url = baseUrl+'"'+searchString+'"'+'&production-office='+production_office+'&from-date='+fromDate+'&api-key='+key
print(full_url[:120])

https://content.guardianapis.com/search?q=ukraine&production-office=aus&from-date=2023-03-01&show-fields=body&api-key=1c


In [68]:
# get data from server
response = requests.get(full_url)
resp_data = response.json()['response']
resp_data

{'status': 'ok',
 'userTier': 'developer',
 'total': 207,
 'startIndex': 1,
 'pageSize': 10,
 'currentPage': 1,
 'pages': 21,
 'orderBy': 'relevance',
 'results': [{'id': 'world/live/2023/mar/12/russia-ukraine-war-live-ukraine-buying-time-in-bakhmut-ahead-of-counteroffensive',
   'type': 'liveblog',
   'sectionId': 'world',
   'sectionName': 'World news',
   'webPublicationDate': '2023-03-12T18:26:12Z',
   'webTitle': 'Russia-Ukraine war live: Ukraine ‘buying time’ in Bakhmut – as it happened',
   'webUrl': 'https://www.theguardian.com/world/live/2023/mar/12/russia-ukraine-war-live-ukraine-buying-time-in-bakhmut-ahead-of-counteroffensive',
   'apiUrl': 'https://content.guardianapis.com/world/live/2023/mar/12/russia-ukraine-war-live-ukraine-buying-time-in-bakhmut-ahead-of-counteroffensive',
   'isHosted': False,
   'pillarId': 'pillar/news',
   'pillarName': 'News'},
  {'id': 'world/live/2023/apr/08/russia-ukraine-war-live-moscows-forces-bombard-annexed-regions-us-investigates-ukraine-w

In [69]:
num_pages = resp_data['pages']
num_pages

21

In [90]:
def articles_from_page_results(page_results):
    articles = {}
    for result in page_results:
        article_date = result['webPublicationDate']
        article_title = result['webTitle']+f" [{article_date}]"
        article_html = result['fields']['body']
        article_text = re.sub(r'<.*?>','',article_html)
        articles[article_title] = article_text
    return articles

In [95]:
def get_all_articles_for_response(response_json,full_url):
    total_pages = response_json['pages']
    total_articles = response_json['total']
    print(f"Fetching {total_articles} articles from {total_pages} pages...")
    all_articles = {}
    page1_articles = articles_from_page_results(response_json['results'])
    all_articles.update(page1_articles)
    print("Added articles for page: 1")
    
    for page in range(2,total_pages+1):
        print("Getting articles from API for page:",page)
        page_response = requests.get(full_url+f"&page={page}")
        page_data = page_response.json()['response']
        print("Processing results for page:",page_data['currentPage'])
        page_articles = articles_from_page_results(page_data['results'])
        print(f"Fetched {len(page_articles)} articles.")
        all_articles.update(page_articles)
        print("Added articles for page:",page)
        print(f"Status: {len(all_articles)} articles.")
        time.sleep(1) # make sure we're not hitting the API to hard
    
    print(f"FINISHED: Fetched {len(all_articles)} articles.")
    return all_articles


In [96]:
my_articles = get_all_articles_for_response(resp_data,full_url)

Fetching 207 articles from 21 pages...
Added articles for page: 1
Getting articles from API for page: 2
Processing results for page: 2
Fetched 10 articles.
Added articles for page: 2
Status: 20 articles.
Getting articles from API for page: 3
Processing results for page: 3
Fetched 10 articles.
Added articles for page: 3
Status: 30 articles.
Getting articles from API for page: 4
Processing results for page: 4
Fetched 10 articles.
Added articles for page: 4
Status: 38 articles.
Getting articles from API for page: 5
Processing results for page: 5
Fetched 10 articles.
Added articles for page: 5
Status: 48 articles.
Getting articles from API for page: 6
Processing results for page: 6
Fetched 10 articles.
Added articles for page: 6
Status: 58 articles.
Getting articles from API for page: 7
Processing results for page: 7
Fetched 10 articles.
Added articles for page: 7
Status: 68 articles.
Getting articles from API for page: 8
Processing results for page: 8
Fetched 10 articles.
Added articles f

In [102]:
print("Total Articles:",len(my_articles))
for title,text in my_articles.items():
    print(title)

Total Articles: 202
Russia-Ukraine war live: Ukraine ‘buying time’ in Bakhmut – as it happened [2023-03-12T18:26:12Z]
Russia-Ukraine war – as it happened: Ukraine to boost defences along border with Belarus [2023-04-08T17:16:35Z]
Russia-Ukraine war: Zelenskiy and UK prime minister discuss accelerating military support for Ukraine – as it happened [2023-04-14T17:51:38Z]
Russia-Ukraine war live: Putin visits Mariupol in first trip to occupied eastern Ukraine – as it happened [2023-03-19T19:12:28Z]
Russia-Ukraine war: Russia nearly shot down British spy plane near Ukraine, alleged leaked US document claims – as it happened [2023-04-10T18:24:19Z]
Russia-Ukraine war live: Hungary signs new energy deals with Russia; UN tally of Ukraine civilian deaths approaches 8,500 [2023-04-11T18:00:08Z]
Russia-Ukraine war live: Putin and Zelenskiy visit troops near frontline [2023-04-18T17:48:26Z]
Former New Zealand soldier killed fighting Russian forces in Ukraine [2023-03-23T04:22:50Z]
Russia-Ukraine w

In [101]:
file_path = "data/"
file_name = "ukraine_articles.json"

with open(f"{file_path}{file_name}",'w', encoding='utf-8') as fp:
    fp.write(json.dumps(my_articles))