### Guardian API access

You will need to apply for a free developer key in order to use the API fully within Jupyter:

[Guardian Open Platform - Getting started](https://open-platform.theguardian.com/access/)

You can explore what is possible with the API here:

[Guardian Open Platform - explore](https://open-platform.theguardian.com/explore/)

In [1]:
#import required libraries
import requests
import json
import re
import time

In [5]:
#load your personal API key
with open('private/guardian_key.txt', 'r') as file:
    key = file.read().strip()
len(key)

36

In [22]:
 #build a search URL
base_url = 'https://content.guardianapis.com/'
search_string = "flood"
production_office = "aus"
from_date = "2025-04-01"

full_url = base_url+f"search?q={search_string}&production-office={production_office}&from-date={from_date}&show-fields=body&api-key={key}"

#url = baseUrl+'"'+searchString+'"'+'&production-office='+production_office+'&from-date='+fromDate+'&api-key='+key
print(full_url[:120])

https://content.guardianapis.com/search?q=flood&production-office=aus&from-date=2025-04-01&show-fields=body&api-key=bc4c


In [23]:
# get data from server
server_response = requests.get(full_url)
server_data = server_response.json()
resp_data = server_data.get('response','')
if resp_data == '':
    print("ERROR obtaining results:",server_data)
else:
    print("SUCCESS!")
    print(f"{resp_data['total']} results found available in {resp_data['pages']} pages")
    print(f"{resp_data['pageSize']} results per page")
    results = resp_data.get('results',[])
    

SUCCESS!
12 results found available in 2 pages
10 results per page


In [24]:
results[0]

{'id': 'australia-news/live/2025/apr/05/australia-election-news-live-albanese-dutton-china-trump-darwin-queensland-ntwnfb',
 'type': 'liveblog',
 'sectionId': 'australia-news',
 'sectionName': 'Australia news',
 'webPublicationDate': '2025-04-05T05:02:53Z',
 'webTitle': 'Cameraman injured after football kick; PM visits flood-hit Queensland – as it happened',
 'webUrl': 'https://www.theguardian.com/australia-news/live/2025/apr/05/australia-election-news-live-albanese-dutton-china-trump-darwin-queensland-ntwnfb',
 'apiUrl': 'https://content.guardianapis.com/australia-news/live/2025/apr/05/australia-election-news-live-albanese-dutton-china-trump-darwin-queensland-ntwnfb',
 'fields': {'body': '<div id="block-67f09af18f0843b305e4b50d" class="block is-summary" data-block-contributor=""> <p class="block-time published-time"> <time datetime="2025-04-05T04:31:19.297Z">5.31am <span class="timezone">BST</span></time> </p>   <h2 class="block-title">What we learned today, Saturday 5 April</h2>  <di

In [25]:
num_pages = resp_data['pages']
num_pages

2

In [26]:
def articles_from_page_results(page_results):
    articles = {}
    for result in page_results:
        article_date = result['webPublicationDate']
        article_title = result['webTitle']+f" [{article_date}]"
        article_html = result['fields']['body']
        article_text = re.sub(r'<.*?>','',article_html)
        articles[article_title] = article_text
    return articles

In [27]:
def get_all_articles_for_response(response_json,full_url):
    total_pages = response_json['pages']
    total_articles = response_json['total']
    print(f"Fetching {total_articles} articles from {total_pages} pages...")
    all_articles = {}
    page1_articles = articles_from_page_results(response_json['results'])
    all_articles.update(page1_articles)
    print("Added articles for page: 1")
    
    for page in range(2,total_pages+1):
        print("Getting articles from API for page:",page)
        page_response = requests.get(full_url+f"&page={page}")
        page_data = page_response.json()['response']
        print("Processing results for page:",page_data['currentPage'])
        page_articles = articles_from_page_results(page_data['results'])
        print(f"Fetched {len(page_articles)} articles.")
        all_articles.update(page_articles)
        print("Added articles for page:",page)
        print(f"Status: {len(all_articles)} articles.")
        time.sleep(1) # make sure we're not hitting the API to hard
    
    print(f"FINISHED: Fetched {len(all_articles)} articles.")
    return all_articles


In [28]:
my_articles = get_all_articles_for_response(resp_data,full_url)

Fetching 12 articles from 2 pages...
Added articles for page: 1
Getting articles from API for page: 2
Processing results for page: 2
Fetched 2 articles.
Added articles for page: 2
Status: 12 articles.
FINISHED: Fetched 12 articles.


In [29]:
print("Total Articles:",len(my_articles))
for title,text in my_articles.items():
    print(title)

Total Articles: 12
Cameraman injured after football kick; PM visits flood-hit Queensland – as it happened [2025-04-05T05:02:53Z]
One Australian’s dramatic rescue from a flood in one of the driest places on Earth [2025-04-01T07:49:17Z]
Bigger than Texas: the true size of Australia’s devastating floods [2025-04-04T14:00:57Z]
Northern NSW braces for heavy rain as Queensland flooding forecast to move south [2025-04-02T07:27:16Z]
Afternoon Update Election 2025: Greens want 1% of budget for environment; Dutton ready for fight with Trump; and Val Kilmer dies aged 65 [2025-04-02T06:13:55Z]
Queensland’s recovery to ‘take months and years’ after floods sweep across vast interior [2025-04-02T05:22:12Z]
How historic is what we’re seeing in the Queensland floods? It’s hard to grasp the full magnitude [2025-04-05T20:00:35Z]
The disaster relief payments you may be eligible for after western Queensland’s flooding and ex-Tropical Cyclone Alfred [2025-04-06T01:16:42Z]
Morning Mail: Trump to reveal tarif

In [30]:
file_path = "data/"
file_name = "flood_articles.json"

with open(f"{file_path}{file_name}",'w', encoding='utf-8') as fp:
    fp.write(json.dumps(my_articles))