## Spark NLP - Explain Document (pretained pipeline)

We start by importing required modules.

In [None]:
import sparknlp 

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Now, we load a pipeline model that contains the following annotators as a default: 

- Tokenizer
- Deep Sentence Detector
- Lemmatizer
- Stemmer
- Part of Speech (POS)
- Context Spell Checker (NorvigSweetingModel)
- Word Embeddings (glove)
- NER-DL (trained by SOTA algorithm)


In [None]:
from sparknlp.pretrained import PretrainedPipeline
from sparknlp.base import *

pipeline = PretrainedPipeline('explain_document_dl')

We simply send the text we want to transform and the pipeline does the work.

In [None]:
text = 'John Smith would love to visit many beautful cities and take a pictre. He lives in Germany for the last 12 years.'
result = pipeline.annotate(text)

We can see the output of each annotator below. This one is doing so many things at once!

In [None]:
result.keys()

In [None]:
result['entities']

In [None]:
result['sentence']

In [None]:
list(zip(result['token'],result['stem'],result['lemma'],result['pos'],result['checked'],result['ner']))

In [None]:
import pandas as pd

df = pd.DataFrame(list(zip(result['token'],result['stem'],result['lemma'],result['pos'],result['checked'],result['ner'])),
            columns = ['token','stem', 'lemma', 'pos', 'spell_checked', 'ner'])

df

Lets print out the entire result

In [None]:
import pprint 
pp = pprint.PrettyPrinter(indent=4)

pp.pprint(result)