# Fun with Wikipedia Data

#### CS 66: Introduction to Computer Science II

## What are we doing?

I'm experimenting with some ideas for fun assignments that use more APIs like these. The next assignment will show how we can use it to build a web crawler that automatically follows web links and saves information about them. Eventually we may use this to more interesting things.

## Wikipedia API

Wikipedia has an API that allows you to integrate Wikipedia into your applications

It works much like th COVID API we used

Here's a reference: https://www.mediawiki.org/wiki/API:REST_API/Reference

Try this code out for yourself in a `.py` file in VS Code.

In [None]:
import requests

endpoint = "https://en.wikipedia.org/w/rest.php/v1/page/Mars"
response = requests.get(endpoint)
page_results = response.json()
print(page_results)

## Group Activity Problem 1

You probably noticed that the data you printed from the Wikipedia request is too big to completely understand. Write some code that will instead print out out the type of the object that you get back (`page_results`).

## Group Activity Problem 2

Hopefully you noticed that `page_results` is a dictionary. Loop through it and print out all of its keys. How many keys are there - what do you think they are for?

## Group Activity Problem 3

Try printing some of the values for each of the `page_results` keys. Discuss what each of them are for. Which one contains the bulk of the data?

In [2]:
import requests

endpoint = "https://en.wikipedia.org/w/rest.php/v1/page/Mars"
response = requests.get(endpoint)
page_results = response.json()
print(page_results["title"])
print(page_results["content_model"])
#... keep going

Mars
wikitext


## Group Activity Problem 4

You could try printing out all of `page_results[source]`, but it is probably too much to fit on the terminal. Instead, just print out the first 10000 characters. Notice towards the bottom of this, you will see what looks like the actual text of the Wikipedia article on Mars.

Go to https://en.wikipedia.org/wiki/Mars and compare.

In [4]:
import requests

endpoint = "https://en.wikipedia.org/w/rest.php/v1/page/Mars"
response = requests.get(endpoint)
page_results = response.json()
print(page_results["source"][:10000]) #use slicing to print only 10000 characters


{{Short description|Planet}}

{{About|the planet|the deity|Mars (mythology)|other uses}}

{{Pp-move-indef}}
{{Pp-semi-indef}}

{{Use British English|date=July 2020}}
{{Use dmy dates|date=July 2020}}

{{Infobox planet
| name                   = Mars
| symbol                 = [[File:Mars symbol (bold).svg|24px|♂]]
| image                  = OSIRIS Mars true color.jpg
| image_alt              = Mars appears as a red-orange globe with darker blotches and white icecaps visible on both of its poles.
| caption                = Pictured in natural color in 2007{{efn|This image was taken by the [[Rosetta (spacecraft)|''Rosetta'']] spacecraft's [[Optical, Spectroscopic, and Infrared Remote Imaging System]] (OSIRIS), at a distance of ≈{{convert|240000|km}} during its February 2007 encounter. The view is centered on the [[Aeolis quadrangle]], with [[Gale (crater)|Gale crater]], the landing site of the [[Curiosity (rover)|''Curiosity'' rover]], prominently visible just left of center. The darker, 

## Group Activity Problem 5

This format that Wikipedia uses to store the text of its articles is called __wikitext__. Discuss: how does wikitext indicate a link to another wikipedia article?

## Group Activity Problem 6

Here's a function that will allow you to pass in some wikitext as an argument. It returns a list of the titles of all the articles that this wikitext links to (at least, it is supposed to - feel free to fix any bugs you find). Add this function to your `.py` file, and then call it, passing the wikitext for the Mars article. Print out the results to make sure you're getting a list of linked pages from the Mars article.

In [None]:
def get_page_links(wikitext):
    """
    Get a list of all Wikipedia pages linked to from some wikitext
    
    Paremeters:
        wikitext: a string with some wikitext
        
    Returns:
        a list of the titles of all Wikipedia pages that are linked to
        from the provided wikitext
    """
    list_of_linked_pages = []
    i = 0
    while i < len(wikitext):
        #is this the start of a wiki link?
        if wikitext[i:i+2] == "[[":
            
            i = i+2
            linked_page_name = "" #accumulator string
            
            #keep adding on to the accumulator string until we see a
            # | or ] which will indicate we've reached the end of the
            # linked page's name
            while wikitext[i] != '|' and wikitext[i] != ']':
                linked_page_name += wikitext[i]
                i += 1
            
            list_of_linked_pages.append(linked_page_name)
                
        #move on to the next character
        else:
            i += 1
    return list_of_linked_pages

## Group Activity Problem 7

Write a program that uses the `get_page_links` function. It should ask the user to input the name of a Wikipedia article and then display the first 10 links from that page (hint: call `get_page_links` and then write a loop to display the first 10 results from the list). Allow the user to select another page based on those results and repeat. Here's an example of what the program's output could look like:

```
Enter a wikipedia page title (-1 to quit): Mars

Here are the first 10 articles linked from Mars:
0 File:Mars symbol (bold).svg
1 Rosetta (spacecraft)
2 Optical, Spectroscopic, and Infrared Remote Imaging System
3 Aeolis quadrangle
4 Gale (crater)
5 Curiosity (rover)
6 Terra Cimmeria
7 Elysium Planitia
8 Water on Mars
9 Martian
Enter a number 0-9 for one of these articles (-1 to quit): 5

Here are the first 10 articles linked from Curiosity (rover):
0 Mars Science Laboratory
1 Self-portrait
2 Mount Sharp
3 Mars rover
4 NASA
5 Jet Propulsion Laboratory
6 Ultra high frequency
7 Hertz
8 Data-rate units
9 X band
Enter a number 0-9 for one of these articles (-1 to quit): 2

Here are the first 10 articles linked from Mount Sharp:
0 Curiosity (rover)
1 Gale crater
2 Mars
3 NASA
4 Aeolis quadrangle
5 Robert P. Sharp
6 Mars
7 Gale (crater)
8 United States Geological Survey
9 Curiosity (rover)
Enter a number 0-9 for one of these articles (-1 to quit): 8

Here are the first 10 articles linked from United States Geological Survey:
0 John Wesley Powell
1 Reston, Virginia
2 United States
3 United States dollar
4 United States Department of the Interior
5 government agency
6 Federal government of the United States
7 scientist
8 landscape
9 United States
Enter a number 0-9 for one of these articles (-1 to quit): -1
```