<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Wikipedia_API_for_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wikipedia API for Python


## In this tutorial let us understand the usage of Wikipedia API.


![alt text](https://miro.medium.com/max/1400/1*1FHnsWYdcfxoygKxkTdJew.png)

# Introduction

Wikipedia, the world’s largest and free encyclopedia. It is the land full of information. I mean who would have used Wikipedia in their entire life (If you haven’t used it then most probably you are lying). The python library called `Wikipedia` allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia. We will see how can we do that today in this tutorial.




---




# Installation

The first step of using the API is manually installing it. Because, this is an external API it’s not built-in, so just type the following command to install it.

* If you are using a [jupyter notebook](https://colab.research.google.com/notebooks/intro.ipynb) then make sure you use the below command (with the ‘!’ mark — the reason for this is it tell the jupyter notebook environment that a command is being typed (AKA **command mode**).


In [0]:
!pip install wikipedia

* If you are using any IDE such as [Microsoft Visual Studio Code](https://code.visualstudio.com/), [PyCharm](https://www.jetbrains.com/pycharm/) and even [Sublime Text](https://www.sublimetext.com/3) then make sure in the terminal you enter the below command:


In [0]:
pip install wikipedia

After you enter the above command, in either of the above two cases you will be then prompted by success message like the one shown below. This is an indication that the library is successfully installed.


In [1]:
!pip install wikipedia

Collecting wikipedia
  Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-cp36-none-any.whl size=11686 sha256=d0d5cc5f62e177020a96252ea5991ec3839cb9e4a302f3d217426ce0a2c406d5
  Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0




---



# Search and Suggestion

Now let us see some of the built-in methods provided by the Wikipedia API. The first one is Search and Suggestion. I’m pretty sure you guys might know the usage of these two methods because of its name.

## Search

The search method returns the search result for a query. Just like other search engines, Wikipedia has its own search engine, you can have a look at it below:

[Wikipedia Search](https://en.wikipedia.org/w/index.php?search)

Now let us see how to retrieve the search results of a query using python. I will use **“Coronavirus”** as the topic in today’s tutorial because as well all know it’s trending and spreading worldwide. The first thing before starting to use API you need to first import it.


In [2]:
import wikipedia
print(wikipedia.search("Coronavirus"))

['Coronavirus', '2019–20 coronavirus pandemic', 'Severe acute respiratory syndrome coronavirus 2', 'Middle East respiratory syndrome-related coronavirus', 'Coronavirus disease 2019', '2020 coronavirus pandemic in California', 'Misinformation related to the 2019–20 coronavirus pandemic', 'Socio-economic impact of the 2019–20 coronavirus pandemic', 'Severe acute respiratory syndrome-related coronavirus', 'Severe acute respiratory syndrome coronavirus']


The above are some of the most searched queries on Wikipedia if you don’t believe me, go to the above link I have given and search for the topic and compare the results. And the search results change every hour probably.


There are some of the ways where you can filter the search results by using search parameters such as results and suggestion (I know don’t worry about the spelling). The result returns the maximum number of results and the suggestion if True, return results and suggestion (if any) in a tuple.


In [3]:
print(wikipedia.search("Coronavirus", results = 5, suggestion = True))

(['Coronavirus', '2019–20 coronavirus pandemic', 'Middle East respiratory syndrome-related coronavirus', 'Severe acute respiratory syndrome coronavirus 2', 'Severe acute respiratory syndrome-related coronavirus'], None)


## Suggestion

Now the suggestion as the name suggests returns the suggested Wikipedia title for the query or none if it doesn't get any.


In [4]:
print(wikipedia.suggest('Coronavir'))

coronavirus




---



# Summary

To get the summary of an article use the **“summary”** method as shown below:


In [6]:
print(wikipedia.summary("Coronavirus"))

Coronaviruses are a group of related viruses that cause diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19. Symptoms in other species vary: in chickens, they cause an upper respiratory tract disease, while in cows and pigs they cause diarrhea. There are yet to be vaccines or antiviral drugs to prevent or treat human coronavirus infections. 
Coronaviruses constitute the subfamily Orthocoronavirinae, in the family Coronaviridae, order Nidovirales, and realm Riboviria. They are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. The genome size of coronaviruses ranges from approximately 27 to 34 kilobases, the largest among known RNA viruses. The name coronavirus is derived from the Latin corona, meaning "crown" or "hal

But sometimes be careful, you might run into a `DisambiguationError`. Which means the same words with different meanings. For example, the word **“bass”** can represent a fish or beats or many more. At that time the summary method throws an error as shown below.



> **Hint**: Be specific in your approach




In [7]:
print(wikipedia.summary("bass"))



  lis = BeautifulSoup(html).find_all('li')


DisambiguationError: ignored

Also, Wikipedia API gives us an option to change the language that we want to read the articles. All you have to do it set the language to your desired language. **Any french readers in the house, I would be using the french language as a reference.**


In [8]:
wikipedia.set_lang("fr")
wikipedia.summary("Coronavirus")

"Coronavirus ou CoV (du latin, virus à couronne) est le nom d'un genre de virus correspondant à la sous-famille des orthocoronavirinæ  (de la famille des coronaviridæ). Le virus à couronne doit son nom à l'apparence des virions sous un microscope électronique, avec une frange de grandes projections bulbeuses qui ressemblent à la couronne solaire.  \nLes coronavirus sont munis d'une enveloppe virale ayant un génome à ARN de sens positif et une capside (coque) kilobases, incroyablement grosse pour un virus à ARN. Ils se classent parmi les Nidovirales, puisque tous les virus de cet ordre produisent un jeu imbriqué d'ARNm sous-génomique lors de l'infection. Des protéines en forme de pic, enveloppe, membrane et capside contribuent à la structure d'ensemble de tous les coronavirus. Ces virus à ARN sont monocaténaire (simple brin) et de sens positif (groupe IV de la classification Baltimore). Ils peuvent muter et se recombiner. \nLes chauves-souris et les oiseaux, en tant que vertébrés volant



---



# Languages supported

Now let us what languages does Wikipedia support, this might be a common question that people ask. Now here is the answer. Currently, Wikipedia supports **444 different languages**. To find it see the code below:


In [9]:
wikipedia.languages()

{'aa': 'Qafár af',
 'ab': 'Аҧсшәа',
 'abs': 'bahasa ambon',
 'ace': 'Acèh',
 'ady': 'адыгабзэ',
 'ady-cyrl': 'адыгабзэ',
 'aeb': 'تونسي/Tûnsî',
 'aeb-arab': 'تونسي',
 'aeb-latn': 'Tûnsî',
 'af': 'Afrikaans',
 'ak': 'Akan',
 'aln': 'Gegë',
 'als': 'Alemannisch',
 'am': 'አማርኛ',
 'an': 'aragonés',
 'ang': 'Ænglisc',
 'anp': 'अङ्गिका',
 'ar': 'العربية',
 'arc': 'ܐܪܡܝܐ',
 'arn': 'mapudungun',
 'arq': 'جازايرية',
 'ary': 'Maġribi',
 'arz': 'مصرى',
 'as': 'অসমীয়া',
 'ase': 'American sign language',
 'ast': 'asturianu',
 'atj': 'Atikamekw',
 'av': 'авар',
 'avk': 'Kotava',
 'awa': 'अवधी',
 'ay': 'Aymar aru',
 'az': 'azərbaycanca',
 'azb': 'تۆرکجه',
 'ba': 'башҡортса',
 'ban': 'Bali',
 'bar': 'Boarisch',
 'bat-smg': 'žemaitėška',
 'bbc': 'Batak Toba',
 'bbc-latn': 'Batak Toba',
 'bcc': 'جهلسری بلوچی',
 'bcl': 'Bikol Central',
 'be': 'беларуская',
 'be-tarask': 'беларуская (тарашкевіца)\u200e',
 'be-x-old': 'беларуская (тарашкевіца)\u200e',
 'bg': 'български',
 'bgn': 'روچ کپتین بلوچی',
 'bh': 

To check is a language is supported then write a condition as shown below:


In [10]:
'en' in wikipedia.languages()

True

Here **‘en’** stands for **‘English’** and you know the answer for the above code. Its obviously a **“True”** or **“False”**, here it’s **“True”**


Also, to get a possible language prefix please try:


In [11]:
wikipedia.languages()['en']

'English'



---



# Page Access


The API also gives us full access to the Wikipedia page, with the help of which we can access the title, URL, content, images, links of the complete page. In order to access the page you need to load the page first as shown below:

**Just a heads up, I will use a single article topic (Coronavirus) as a reference in this example:**



In [0]:
covid = wikipedia.page("Coronavirus")

## Title

To access the title of the above-provided page use:


In [15]:
print(covid.title)

Coronavirus


## URL
To get the URL of the page use:

In [16]:
print(covid.url)

https://en.wikipedia.org/wiki/Coronavirus


## Content
To access the content of the page use:


In [17]:
print(covid.content)

Coronaviruses are a group of related viruses that cause diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19. Symptoms in other species vary: in chickens, they cause an upper respiratory tract disease, while in cows and pigs they cause diarrhea. There are yet to be vaccines or antiviral drugs to prevent or treat human coronavirus infections. 
Coronaviruses constitute the subfamily Orthocoronavirinae, in the family Coronaviridae, order Nidovirales, and realm Riboviria. They are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. The genome size of coronaviruses ranges from approximately 27 to 34 kilobases, the largest among known RNA viruses. The name coronavirus is derived from the Latin corona, meaning "crown" or "hal



> **Hint**: You can get the content of the entire page using the above method



## Images

Yes, you are right we can get the images from the Wikipedia article. But the catch point here is, we can’t render the whole images here but we can get them as URL’s as shown below:


In [18]:
print(covid.images)

['https://upload.wikimedia.org/wikipedia/commons/8/82/SARS-CoV-2_without_background.png', 'https://upload.wikimedia.org/wikipedia/commons/9/96/3D_medical_animation_coronavirus_structure.jpg', 'https://upload.wikimedia.org/wikipedia/commons/f/f4/Coronavirus_replication.png', 'https://upload.wikimedia.org/wikipedia/commons/e/e5/Coronavirus_virion_structure.svg', 'https://upload.wikimedia.org/wikipedia/commons/d/dd/Phylogenetic_tree_of_coronaviruses.jpg', 'https://upload.wikimedia.org/wikipedia/commons/7/74/Red_Pencil_Icon.png', 'https://upload.wikimedia.org/wikipedia/commons/8/82/SARS-CoV-2_without_background.png', 'https://upload.wikimedia.org/wikipedia/commons/1/11/SARS-CoV_MERS-CoV_genome_organization_and_S-protein_domains.png', 'https://upload.wikimedia.org/wikipedia/commons/2/2f/Sida-aids.png', 'https://upload.wikimedia.org/wikipedia/commons/d/d6/WHO_Rod.svg', 'https://upload.wikimedia.org/wikipedia/commons/9/99/Wiktionary-logo-en-v2.svg', 'https://upload.wikimedia.org/wikipedia/en/

## Links
Similarly, we can get the links that Wikipedia used as a reference from different websites or research, etc.

In [19]:
print(covid.links)

['2002–2004 SARS outbreak', '2012 Middle East respiratory syndrome coronavirus outbreak', '2015 Middle East respiratory syndrome outbreak in South Korea', '2018 Middle East respiratory syndrome outbreak', '2019–2020 coronavirus pandemic', '2019–20 coronavirus pandemic', 'Acute bronchitis', 'Adenoid', 'Adenoviridae', 'Adenovirus infection', 'Adult T-cell leukemia/lymphoma', 'Alpaca', 'Alphacoronavirus', 'Anal cancer', 'Ancient Greek', 'Angiotensin-converting enzyme 2', 'Anorexia (symptom)', 'Antiviral drug', 'Arbovirus encephalitis', 'Astrovirus', 'Avian infectious bronchitis', 'Avian infectious bronchitis virus', 'Avian influenza', 'BCE', 'BK virus', 'Bacterial pneumonia', 'Bat', 'Bat-borne virus', 'Bead', 'Beluga whale coronavirus SW1', 'Betacoronavirus', 'Betacoronavirus 1', 'Bibcode', 'Birds', 'Bovine coronavirus', 'Bronchiolitis', 'Bronchitis', 'Bulbul coronavirus HKU11', "Burkitt's lymphoma", 'Canine coronavirus', 'Capsid', 'Cardiovascular disease', 'Central nervous system viral d



---



So, there you go, you have reached the end of the tutorial of Wikipedia API for Python. To know more methods visit [Wikipedia API](https://wikipedia.readthedocs.io/en/latest/code.html#api). I hope you guys had a lot of fun learning and implementing. If you guys have any comments or concerns let me know via the comment section below. Until then Good-Bye.


# Be Safe.

