# Package Expo: Wikipedia 1.4.0 
*Group 6: Joe Burnett, Sofia Cenciarelli, Harrison O'Neal, and Cressie Rynne*

## Introduction
Before we take a look into the Wikipedia package, let's start by defining what Wikipedia is. Wikipedia is a free, multilinguil content encyclopedia. It launched in 2001 and has exponenetially grown since, now containing over six million articles in English alone. All information provided on Wikipedia utliizes open collaboration meaning that everything is published and maintained by volunteers/users. Although considered to not be a reliable source of information, Wikipedia is the eighth-most-visited site in the world. With its high usage and diverity in data, we chose to look into how text-based analysis could be improved using the Wikipedia 1.4.0 package. 

Wikipedia 1.4.0 is an application programming interface (API). This particular API acts as a messenger that delivers requested information for Wikipedia to Python. When used, it creates a Python library using Wikipedia information making it easier to extract data from their evergrowing database.

## Installation
To initially install the program, enter the code below without the hashtag in a terminal window:

In [1]:
#!pip install wikipedia

Once installed, you will no longer need the installation line of code. Anytime you would like to use the package you will simply need to import the package using the code below without the hashtag:

In [2]:
import wikipedia

## In-Depth Description
The Wikipedia API is a useful tool to link to Wikipedia articles for reference material. In this way, the API call is much easier semantically than having to scrap a HTML page. This API is actually acting as a wrapper for the MediaWiki API, and in the words of the owner, “Wikipedia (works) so you can focus on using Wikipedia data, not getting it” For example, if you wanted to extract a summary information of something, let’s say “Python”, searching the entirety of Wikipedia is done in one quick line: 

In [3]:
wikipedia.search("Python")

['Python',
 'Python (programming language)',
 'Monty Python',
 'Ball python',
 'History of Python',
 'PYTHON',
 'Burmese python',
 'Reticulated python',
 'Python molurus',
 'Python (genus)']

After doing a basic search, we can see that the one we want is the “Python (Programming Language)”. To grab the summary of the page, we type: 

In [4]:
wikipedia.summary("Python (programming language)")

'Python is an interpreted high-level general-purpose programming language. Python\'s design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting and was discontinued with version 2.7.18 in 2020. Python 3.0 was released in 20

It’s that easy! Compared to web scraping, this is a much more efficient process to extract some basic information from the page without having to go the HTML tag system. 

Another key advantage of this approach over a normal web scraping procedure, besides time and effort saved, is the resiliency built into passing through an API VS hard coding. For example, let’s say the webpage for Python (programming language) changed. If you had hard coded in the website URL, then this would brick your application. Since we are passing our calls through this API, built in resiliency for webpage changes is taken care of on the APIs side so that the user can focus on just what they want to use the information for instead of building resiliency into their code (since this is already done for us!). For grabbing information quickly and efficiently, using the Wikipedia package is far superior to the alternative web scraping methods. 

## Functionalities

Search for relavant data with keywords and refine reults by popularity 

In [5]:
wikipedia.search("Python")

['Python',
 'Python (programming language)',
 'Monty Python',
 'Ball python',
 'History of Python',
 'PYTHON',
 'Burmese python',
 'Reticulated python',
 'Python molurus',
 'Python (genus)']

In [6]:
wikipedia.search("Python", results=5)

['Python',
 'Python (programming language)',
 'Monty Python',
 'Ball python',
 'PYTHON']

Summarize data on subject and refine the number of sentences returned. 

In [7]:
wikipedia.summary("PythonProgram")

'Python is an interpreted high-level general-purpose programming language. Python\'s design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting and was discontinued with version 2.7.18 in 2020. Python 3.0 was released in 20

In [8]:
wikipedia.summary("PythonProgram", sentences=2)

"Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation."

Errors can arise when trying to find summary for a common word as seen when the below code is run 

In [9]:
#wikipedia.summary("shell")

In [10]:
wikipedia.summary("shell (gas)", sentences =2)

'Royal Dutch Shell, commonly known as Shell, is an Anglo-Dutch multinational oil and gas company headquartered in The Hague, Netherlands, and incorporated in the United Kingdom as a public limited company. It is one of the oil and gas "supermajors" and, measured by 2020 revenues, the fifth-largest company in the world, the largest based in Europe, and the largest not based in either the United States or China.'

Retrieve page of data and its attributes 

In [11]:
wikipedia.page("Python")

<WikipediaPage 'Python (programming language)'>

In [12]:
wikipedia.page("Python").title

'Python (programming language)'

In [13]:
wikipedia.page("Python").url

'https://en.wikipedia.org/wiki/Python_(programming_language)'

In [14]:
wikipedia.page("Python").content

'Python is an interpreted high-level general-purpose programming language. Python\'s design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting and was discontinued with version 2.7.18 in 2020. Python 3.0 was released in 20

In [15]:
wikipedia.page("Python").references

['http://www.computerworld.com.au/index.php/id;66665771',
 'http://neopythonic.blogspot.be/2009/04/tail-recursion-elimination.html',
 'http://www.amk.ca/python/writing/gvr-interview',
 'http://cdsweb.cern.ch/journal/CERNBulletin/2006/31/News%20Articles/974627?ln=en',
 'http://www.2ality.com/2013/02/javascript-influences.html',
 'http://www.2kgames.com/civ4/blog_03.htm',
 'http://archive.adaic.com/standards/83lrm/html/lrm-11-03.html#11.3',
 'http://www.ainewsletter.com/newsletters/aix_0508.htm#python_ai_ai',
 'http://www.artima.com/intv/pythonP.html',
 'http://www.artima.com/weblogs/viewpost.jsp?thread=147358',
 'http://cobra-language.com/docs/acknowledgements/',
 'http://ebeab.com/2014/01/21/python-culture/',
 'http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=About_getting_started_with_writing_geoprocessing_scripts',
 'http://community.eveonline.com/news/dev-blogs/stackless-python-2.7/',
 'http://www.eweek.com/c/a/Application-Development/Python-Slithers-into-Systems/',
 'h

Language settings 

In [16]:
wikipedia.set_lang("es")
wikipedia.summary("Python", sentences =1)

'Python es un lenguaje de programación interpretado cuya filosofía hace hincapié en la legibilidad de su código.[2]\u200b Se trata de un lenguaje de programación multiparadigma, ya que soporta parcialmente la orientación a objetos, programación imperativa y, en menor medida, programación funcional.'

Retrieve image urls

In [17]:
wikipedia.page("Python").images[0]

'https://upload.wikimedia.org/wikipedia/commons/4/4a/Commons-logo.svg'

Retrieve HTML Output

In [18]:
wikipedia.page("Python").html()

'<div class="mw-parser-output"><div class="rellink noprint hatnote">Este artículo trata sobre el lenguaje de programación. Para el grupo de humoristas, véase <a href="/wiki/Monty_Python" title="Monty Python">Monty Python</a>.</div><div class="rellink noprint hatnote"> Para el revólver, véase <a href="/wiki/Colt_Python" title="Colt Python">Colt Python</a>.</div>\n<div class="rellink noprint hatnote"> Para otros usos de este término, véase <a href="/wiki/Pit%C3%B3n" class="mw-disambig" title="Pitón">Pitón</a>.</div>\n<table class="infobox" style="width:22.7em; line-height: 1.4em; text-align:left; padding:.23em;"><tbody><tr><th colspan="3" class="cabecera informática" style="text-align:center;background-color:#eee;color:black;">Python</th></tr><tr><td colspan="3" style="text-align:center;">\n<a href="/wiki/Archivo:Python-logo-notext.svg" class="image"><img alt="Python-logo-notext.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/100px-Python-logo-notext.

To learn more about how to code the fuctionalities in Wikipedia 1.4.0 [visit this site.](https://stackabuse.com/getting-started-with-pythons-wikipedia-api/)

## Main Use Case Scenarios

There are a wide variety of usecases for the Wikipedia package.

1. If you are trying to accomplish any sort of text based analysis and you need some sort of training data (text), then Wikipedia would be a great package to easily access nearly unlimited amounts of training data.
2. You could use Wikipedia to analyze company sentiment. For example, Coca-cola (KO), a $234B publicly traded company, has a section on Wikipedia on criticism. Using text based sentiment analysis, you can compare different companies and understand possible issues.
3. Wikipedia has longs of copyright-free images. If you need to programatically generate different images related to specific topics, you could use the Wikipedia package to obtain them.
4. Any sort of encyclopedia crawling.

## Helpful Links
Video Explanation: [Google Drive Link](https://drive.google.com/file/d/1aIEVLPMaikSp3gcJvszh4IQl32JqRURc/view?usp=sharing) or [Vimeo Link](https://vimeo.com/544842485)

[Python Package Index by Wikipedia on Wikipedia 1.4.0](https://pypi.org/project/wikipedia/)
