# Wikimedia API tutorial

In this tutorial we will learn how to build your own app using Wikimedia API.

You will also see what are the advantages when a python library is available to interact with existing APIs.

Snippets of code from: https://api.wikimedia.org/wiki/Getting_featured_content_from_Wikipedia_with_Python 

In [12]:
# Get today's date in YYYY/MM/DD format.
import datetime

today = datetime.datetime.now()
date = today.strftime('%Y/%m/%d')

# Choose your language, and get today's featured content.
import requests

language_code = 'en' # English
headers = {
  'Authorization': 'Bearer YOUR_ACCESS_TOKEN',
  'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}

base_url = 'https://api.wikimedia.org/feed/v1/wikipedia/'
url = base_url + language_code + '/featured/' + date
response = requests.get(url, headers=headers)

In [13]:
# Open the browser
driver = webdriver.Chrome()

# Open the website
driver.get("https://www.selenium.dev/selenium/web/web-form.html")

In [14]:
# Get the title
title = driver.title

# Waiting time needed for the website to load before raising an error
driver.implicitly_wait(0.5)

title

'Web form'

In [15]:
# Get the text input box using its name
text_box = driver.find_element(by=By.NAME, value="my-text")

# Get the submit button
submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")

# Type inside the text box
text_box.send_keys("Selenium")

# Submit
submit_button.click()

# Get the message that is returned
message = driver.find_element(by=By.ID, value="message")

text = message.text

text

driver.quit()

# Now let's try to scrape something

In [16]:
# Options are the settings you can give to the browser
from selenium.webdriver.chrome.options import Options

In [17]:
# Initialize Settings
opts = Options()

# Opens the browers without showing it on your screen
opts.add_argument("--headless")

driver = webdriver.Chrome(options=opts)
driver.get("https://ground.news/")
driver.title

'Ground News'

In [18]:
# Save all the links in the navigation bar in a list
links = driver.find_elements(By.CSS_SELECTOR, ".embla__slide a")

# If the list is not empty, show the links that were collected
if links:
    print(f"Found {len(links)} nav links:")
    for link in links:
         print(link.text, "→", link.get_attribute("href"))

Found 13 nav links:
Israel-Gaza → https://ground.news/interest/israeli-palestinian-conflict
Remembrance Day → https://ground.news/interest/remembrance-day
Government Shutdown → https://ground.news/interest/government-shutdown
Artificial Intelligence → https://ground.news/interest/ai
Veterans → https://ground.news/interest/veterans
 → https://ground.news/interest/basketball
 → https://ground.news/interest/social-media
 → https://ground.news/interest/senate
 → https://ground.news/interest/ncaa_f919c3
 → https://ground.news/interest/world-cup
 → https://ground.news/interest/donald-trump
 → https://ground.news/interest/housing
 → https://ground.news/interest/facebook


That is pure html, so we could have handled that using BeautifulSoup.

Now, let's do something BeautifulSoup can't.

In [None]:
# Identify the button to click on
topic = driver.find_element(By.ID, "header-trending-Artificial Intelligence")
# or topic = driver.find_element(By.CSS_SELECTOR, 'a[href="/interest/ai"]')

# Click on the button
topic.click()

# Check that you are on the correct url
print(driver.current_url)

That didn't work because some buttons are hidden by banners or other elements, even if you can't see them on the webpage.

In these cases you can enforce clicking on the button using JavaScript.

In [33]:
# Enforce clicking on the button using JavaScript
driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", topic)

# Check that you are on the correct url
print(driver.current_url)

https://ground.news/interest/ai


In [41]:
# Retrieve the articles in 
articles = driver.find_elements(By.CLASS_NAME, "group")

In [46]:
all_links = []

for article in articles:
    links = article.find_elements(By.TAG_NAME, "a") #"a"
    for link in links:
        href = link.get_attribute("href")
        if href:
            all_links.append(href)

# Remove duplicates
print(set(all_links))

{'https://ground.news/article/jacob-wallenberg-on-ai-regulation-gone-too-far', 'https://technode.global/2025/11/11/aseans-digital-economy-poised-to-surpass-300b-in-gmv-by-2025-report/', 'https://ground.news/article/san-francisco-startup-raises-30m-for-embryo-gene-editing-research_fc0e79', 'https://www.lavanguardia.com/vida/20251111/11249545/bruselas-vias-rebajar-proteccion-datos-ia.html', 'https://ground.news/article/softbank-sells-its-entire-stake-in-nvidia-for-583-billion_727f47', 'https://ground.news/article/brussels-in-the-process-of-reducing-data-protection-by-ai', 'https://ground.news/article/keeping-enterprises-ahead-of-ai-driven-storage-demands-synology-pas7700-delivers-high-performance-security-and-availability', 'https://ground.news/article/google-brain-founder-andrew-ng-says-everyone-should-still-learn-to-code-but-not-the-old-way', 'https://ground.news/article/amazon-ads-brings-ai-video-generator-in-india-check-price-features-and-availability', 'https://ground.news/article/a

In [11]:
import re

titles = []
for link in all_links:
    match = re.search(r'/article/([^/]+)$', link)
    if match:
        titles.append(re.sub(r'-', ' ', match.group(1)))

print(titles)
#driver.quit()

['russian strikes hit an apartment block and energy sites in ukraine killing 4_d380a2', 'denmark announces plan to ban social media access for under 15s_8bfa5d', 'tesla shareholders approve elon musks 1 trillion pay package_1f8dbd', 'german nurse given life sentence for murdering 10 patients', 'france blocks shein operations after controversy over child like sex dolls_1db296', 'former french president nicolas sarkozy to be released from prison pending appeal_a309d7', 'kremlin says it wants war to end but peace process is stalled', 'kremlin says it wants war to end but peace process is stalled', 'belarus leader threatens to seize over 1 000 lithuanian trucks stuck by closed border', 'canadian hungarian british writer david szalay wins booker prize for fiction with his novel flesh_2266f8', 'eu considers forcing member states to ban huawei and zte_8c3acc', 'vatican investigates swiss guard after allegations of an antisemitic incident in st peters square_754755', 'von der leyen offers mino