<a href="https://colab.research.google.com/github/sonalmogra28/sonalmogra28/blob/main/Wikipedia_API_for_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wikipedia API for Python


## In this tutorial let us understand the usage of Wikipedia API.


# Introduction

Wikipedia, the world’s largest and free encyclopedia. It is the land full of information. I mean who would have used Wikipedia in their entire life (If you haven’t used it then most probably you are lying). The python library called `Wikipedia` allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia. We will see how can we do that today in this tutorial.




---




# Installation

The first step of using the API is manually installing it. Because, this is an external API it’s not built-in, so just type the following command to install it.

* If you are using a [jupyter notebook](https://colab.research.google.com/notebooks/intro.ipynb) then make sure you use the below command (with the ‘!’ mark — the reason for this is it tell the jupyter notebook environment that a command is being typed (AKA **command mode**).


* If you are using any IDE such as [Microsoft Visual Studio Code](https://code.visualstudio.com/), [PyCharm](https://www.jetbrains.com/pycharm/) and even [Sublime Text](https://www.sublimetext.com/3) then make sure in the terminal you enter the below command:


In [2]:
pip install wikipedia



After you enter the above command, in either of the above two cases you will be then prompted by success message like the one shown below. This is an indication that the library is successfully installed.


In [3]:
!pip install wikipedia





---



# Search and Suggestion

Now let us see some of the built-in methods provided by the Wikipedia API. The first one is Search and Suggestion. I’m pretty sure you guys might know the usage of these two methods because of its name.

## Search

The search method returns the search result for a query. Just like other search engines, Wikipedia has its own search engine, you can have a look at it below:

[Wikipedia Search](https://en.wikipedia.org/w/index.php?search)

Now let us see how to retrieve the search results of a query using python. I will use **“Coronavirus”** as the topic in today’s tutorial because as well all know it’s trending and spreading worldwide. The first thing before starting to use API you need to first import it.


In [4]:
import wikipedia
print(wikipedia.search("Coronavirus"))

['Coronavirus', 'COVID-19 pandemic', 'COVID-19 pandemic in the United States', 'COVID-19', 'Novel coronavirus', 'COVID-19 pandemic by country and territory', 'Coronavirus, Explained', 'Human coronavirus OC43', 'Timeline of the COVID-19 pandemic in the United States (2020)', 'SARS-CoV-2']


The above are some of the most searched queries on Wikipedia if you don’t believe me, go to the above link I have given and search for the topic and compare the results. And the search results change every hour probably.


There are some of the ways where you can filter the search results by using search parameters such as results and suggestion (I know don’t worry about the spelling). The result returns the maximum number of results and the suggestion if True, return results and suggestion (if any) in a tuple.


In [5]:
print(wikipedia.search("Coronavirus", results = 5, suggestion = True))

(['Coronavirus', 'COVID-19 pandemic', 'COVID-19 pandemic in the United States', 'COVID-19', 'COVID-19 pandemic by country and territory'], None)


## Suggestion

Now the suggestion as the name suggests returns the suggested Wikipedia title for the query or none if it doesn't get any.


In [6]:
print(wikipedia.suggest('Coronavir'))

coronaviru




---



# Summary

To get the summary of an article use the **“summary”** method as shown below:


In [7]:
print(wikipedia.summary("Coronavirus"))

Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the common cold (which is also caused by other viruses, predominantly rhinoviruses), while more lethal varieties can cause SARS, MERS and COVID-19. In cows and pigs they cause diarrhea, while in mice they cause hepatitis and encephalomyelitis.
Coronaviruses constitute the family Coronaviridae, order Nidovirales and realm Riboviria. They are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. The genome size of coronaviruses ranges from approximately 26 to 32 kilobases, one of the largest among RNA viruses. They have characteristic club-shaped spikes that project from their surface, which in electron micrographs create an image reminiscent of the stellar corona, from which their name derives.


But sometimes be careful, you might run into a `DisambiguationError`. Which means the same words with different meanings. For example, the word **“bass”** can represent a fish or beats or many more. At that time the summary method throws an error as shown below.



> **Hint**: Be specific in your approach




Also, Wikipedia API gives us an option to change the language that we want to read the articles. All you have to do it set the language to your desired language. **Any french readers in the house, I would be using the french language as a reference.**


In [11]:
wikipedia.set_lang("fr")
wikipedia.summary("Coronavirus")

"Les orthocoronavirus (CoV) sont des virus qui constituent la sous-famille Orthocoronavirinae de la famille Coronaviridae (les coronavirus). Le nom « coronavirus », du latin signifiant « virus à couronne », est dû à l'apparence des virions sous un microscope électronique, avec une frange de grandes projections bulbeuses qui évoquent une couronne solaire.\nCes coronavirus sont munis d'une enveloppe virale incluant une capside caractérisée par des protéines en forme de massue (appelées spicules). Ils ont un génome à ARN monocaténaire (c'est-à-dire à un seul brin), de sens positif (groupe IV de la classification Baltimore), de 26 à 32 kilobases (ce qui en fait les plus grands génomes parmi les virus à ARN). Ils se classent parmi les Nidovirales, ordre de virus produisant un jeu imbriqué d'ARNm sous-génomiques lors de l'infection. Des spicules, une enveloppe, membrane et capside contribuent à la structure d'ensemble de tous les coronavirus. Ils peuvent muter et se recombiner.\nLes chauves-



---



# Languages supported

Now let us what languages does Wikipedia support, this might be a common question that people ask. Now here is the answer. Currently, Wikipedia supports **444 different languages**. To find it see the code below:


In [12]:
wikipedia.languages()

{'aa': 'Qafár af',
 'aae': 'Arbërisht',
 'ab': 'аԥсшәа',
 'abs': 'bahasa ambon',
 'ace': 'Acèh',
 'acf': 'Kwéyòl Sent Lisi',
 'acm': 'عراقي',
 'ady': 'адыгабзэ',
 'ady-cyrl': 'адыгабзэ',
 'aeb': 'تونسي / Tûnsî',
 'aeb-arab': 'تونسي',
 'aeb-latn': 'Tûnsî',
 'af': 'Afrikaans',
 'aln': 'Gegë',
 'als': 'Alemannisch',
 'alt': 'алтай тил',
 'am': 'አማርኛ',
 'ami': 'Pangcah',
 'an': 'aragonés',
 'ang': 'Ænglisc',
 'ann': 'Obolo',
 'anp': 'अंगिका',
 'apc': 'شامي',
 'ar': 'العربية',
 'arc': 'ܐܪܡܝܐ',
 'arn': 'mapudungun',
 'arq': 'جازايرية',
 'ary': 'الدارجة',
 'arz': 'مصرى',
 'as': 'অসমীয়া',
 'ase': 'American sign language',
 'ast': 'asturianu',
 'atj': 'Atikamekw',
 'av': 'авар',
 'avk': 'Kotava',
 'awa': 'अवधी',
 'ay': 'Aymar aru',
 'az': 'azərbaycanca',
 'azb': 'تۆرکجه',
 'ba': 'башҡортса',
 'ban': 'Basa Bali',
 'ban-bali': 'ᬩᬲᬩᬮᬶ',
 'bar': 'Boarisch',
 'bat-smg': 'žemaitėška',
 'bbc': 'Batak Toba',
 'bbc-latn': 'Batak Toba',
 'bcc': 'جهلسری بلوچی',
 'bci': 'wawle',
 'bcl': 'Bikol Central',
 

To check is a language is supported then write a condition as shown below:


In [13]:
'en' in wikipedia.languages()

True

Here **‘en’** stands for **‘English’** and you know the answer for the above code. Its obviously a **“True”** or **“False”**, here it’s **“True”**


Also, to get a possible language prefix please try:


In [14]:
wikipedia.languages()['en']

'English'



---



# Page Access


The API also gives us full access to the Wikipedia page, with the help of which we can access the title, URL, content, images, links of the complete page. In order to access the page you need to load the page first as shown below:

**Just a heads up, I will use a single article topic (Coronavirus) as a reference in this example:**



In [None]:
covid = wikipedia.page("Coronavirus")

## Title

To access the title of the above-provided page use:


In [None]:
print(covid.title)

## URL
To get the URL of the page use:

In [None]:
print(covid.url)

## Content
To access the content of the page use:


In [None]:
print(covid.content)



> **Hint**: You can get the content of the entire page using the above method



## Images

Yes, you are right we can get the images from the Wikipedia article. But the catch point here is, we can’t render the whole images here but we can get them as URL’s as shown below:


In [None]:
print(covid.images)

## Links
Similarly, we can get the links that Wikipedia used as a reference from different websites or research, etc.

In [None]:
print(covid.links)



---



So, there you go, you have reached the end of the tutorial of Wikipedia API for Python. To know more methods visit [Wikipedia API](https://wikipedia.readthedocs.io/en/latest/code.html#api). I hope you guys had a lot of fun learning and implementing. If you guys have any comments or concerns let me know via the comment section below. Until then Good-Bye.


# Be Safe.

