# Exploring Spacy


[spacy getting started](https://spacy.io/)

In [1]:
import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')

# Process whole documents
text = (u"When Sebastian Thrun started working on self-driving cars at "
        u"Google in 2007, few people outside of the company took him "
        u"seriously. “I can tell you very senior CEOs of major American "
        u"car companies would shake my hand and turn away because I wasn’t "
        u"worth talking to,” said Thrun, now the co-founder and CEO of "
        u"online higher education startup Udacity, in an interview with "
        u"Recode earlier this week.")
doc = nlp(text)

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

# Determine semantic similarities
doc1 = nlp(u"my fries were super gross")
doc2 = nlp(u"such disgusting fries")
similarity = doc1.similarity(doc2)
print(doc1.text, doc2.text, similarity)

Sebastian Thrun PERSON
Google ORG
2007 DATE
American NORP
Thrun PERSON
Recode ORG
earlier this week DATE
my fries were super gross such disgusting fries 0.7139701576579747


# Dinosaur Text



In [3]:
default_answer = "People have always known of dinosaurs, though they have called them by many names. Old legends that place Western dragons in caves or beneath the earth may have originated with fossils. The plumed serpent, prominent in mythologies of Mexico and Latin America, is often a creator of life. The Rainbow Serpent of Aboriginal tales was present at the beginning of time, and helped prepare the landscapes for human beings and other animals. The Asian dragon, which combines features of many animals, symbolizes primordial energy and is the bringer of rain. These figures resemble our reconstructions of dinosaurs in appearance, and accounts place them in worlds that existed before humankind. The major reason for this similarity might be that human imagination works in much the same way as evolution. Both constantly recycle familiar forms such as wings, claws, crests, fangs, and scales, which may repeatedly vanish and then reappear through convergence. The figure of Tyrannosaurus rex suggests a kangaroo, while pterosaurs resemble bats, but the similarities are not due to common ancestry."

dino_doc = nlp(default_answer)

# Find named entities, phrases and concepts
for entity in dino_doc.ents:
    print(entity.text, entity.label_)

Mexico GPE
Latin America LOC
The Rainbow Serpent of Aboriginal PRODUCT
Asian NORP
Tyrannosaurus ORG


# Cabin Quest API

[cabinquest API](https://cabinquest.now.sh/)


In [7]:
import requests
from bs4 import BeautifulSoup

nautilis="https://cabinquest.now.sh/bellwoods/trees/getTreeByRSSUrl/:xmlUrl?xmlUrl=http:%2F%2Fnautil.us%2Frss%2Fall"
response = requests.get(nautilis)
json = response.json()

def get_txt():
    titles = []
    descriptions = []
    for branch in json["branches"]:
        title = branch["title"]
        titles.append(title)

        description = BeautifulSoup(branch["description"]).find('p').getText()
        descriptions.append(description)
    return {"titles":titles, "descriptions":descriptions}

txt = get_txt()

article_desc = txt["descriptions"][5] 

article_desc_doc = nlp(article_desc)

# Find named entities, phrases and concepts
for entity in article_desc_doc.ents:
    print(entity.text, entity.label_)

Being Rich and Successful Is in Your WORK_OF_ART
Guardian ORG
July 12 DATE
A New Genetic Test Could LAW
Determine Children PERSON
Newsweek ORG
July 10 DATE
“Our Fortunetelling Genes WORK_OF_ART
Wall Street Journal ORG
Nov. 16 DATE
