# Getting to Philosophy
Please write a Python script to check the "Getting to Philosophy" law.
<br>
https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy
 
Clicking on the first link in the main body of a Wikipedia article and repeating the process for subsequent articles would usually lead to the article Philosophy.
 
The program should receive a Wikipedia link as an input, go to another normal link and repeat this process until either Philosophy page is reached, or we are in an article without any outgoing Wikilinks, or stuck in a loop.


##### We will use 2 ways in our solution; Scrapy framework and Beautifulsoup

## Import libraries

In [1]:
from scrapy import Selector
import bs4
import requests
import urllib
import time

In [2]:
first_url = "https://en.wikipedia.org/wiki/Special:Random"
target_url = "https://en.wikipedia.org/wiki/Philosophy"

### Using Scrapy

In [3]:
def get_link(url):
    html = requests.get( url ).content
    sel = Selector( text = html )
    article_link = None
    article_link = sel.xpath('//div[@id="mw-content-text"]//a/@href').extract_first()
    #article_link = content_div.xpath('./a/@href')
    if not article_link:
        return
    first_link = urllib.parse.urljoin('https://en.wikipedia.org/', article_link)
    return first_link

### Conditions
We will stop getting a link until:
<li> Philosophy page is reached
<li> We are in an article without any outgoing Wikilinks
<li> Stuck in a loop
<br>
So, here we achieved that by check_ function and the while loop
    

In [4]:
def check_(visited, target_url, max_iterations=30):
    if visited[-1] == target_url:
        print("Destination reached!, We are going to Philosophy")
        return False
    elif len(visited) > max_iterations: 
        print("Long iterations")
        return False
    elif visited[-1] in visited[:-1]:
        print("Stuck in a loop")
        return False
    else:
        return True

In [5]:
visited = [first_url]
while(check_(visited, target_url)):
    print(visited[-1])
    next_url = get_link(visited[-1])
    if not next_url:
        print("We've arrived in an article without any outgoing Wikilinks")
        break
    visited.append(next_url)
    time.sleep(5)  

https://en.wikipedia.org/wiki/Special:Random
https://en.wikipedia.org/wiki/Franklin,_Indiana
https://en.wikipedia.org/wiki/Franklin,_Wayne_County,_Indiana
https://en.wikipedia.org/wiki/Unincorporated_area
https://en.wikipedia.org/wiki/File:Contra_Costa_Centre_sign.jpg
https://en.wikipedia.org/#file
https://en.wikipedia.org/wiki/Wikipedia
https://en.wikipedia.org/wiki/English_Wikipedia
https://en.wikipedia.org/wiki/File:Wikipedia-logo-v2-en.svg
Stuck in a loop


### Using beautifulsoup

In [6]:
def get_link(url):
    response = requests.get(url)
    content = response.text
    html = bs4.BeautifulSoup(content, "html.parser")
    article_link = None

    content_article = html.find(id="mw-content-text").find(class_="mw-parser-output")
    for element in content_article.find_all("p", recursive=False):
        if element.find("a", recursive=False):
            article_link = element.find("a", recursive=False).get('href')
            break

    if not article_link:
        return

    return urllib.parse.urljoin('https://en.wikipedia.org/', article_link)

In [7]:
def check_(visited, target_url, max_iterations=30):
    if visited[-1] == target_url:
        print("Destination reached!, We are going to Philosophy")
        return False
    elif len(visited) > max_iterations: 
        print("Long iterations")
        return False
    elif visited[-1] in visited[:-1]:
        print("Stuck in a loop")
        return False
    else:
        return True

In [8]:
visited = [first_url]
while(check_(visited, target_url)):
    print(visited[-1])
    next_url = get_link(visited[-1])
    if not next_url:
        print("We've arrived in an article without any outgoing Wikilinks")
        break
    visited.append(next_url)
    time.sleep(5)          
        

https://en.wikipedia.org/wiki/Special:Random
https://en.wikipedia.org/wiki/Andrew_Hill_(jazz_musician)
https://en.wikipedia.org/wiki/Jazz
https://en.wikipedia.org/wiki/Music_genre
https://en.wikipedia.org/wiki/Music
https://en.wikipedia.org/wiki/The_arts#Music
https://en.wikipedia.org/wiki/Creativity
https://en.wikipedia.org/wiki/Idea
Destination reached!, We are going to Philosophy
