# Spark NLP

In this notebook, we will walk through some of the basic functionality of Spark NLP, which can be used to perform more advanced text processing operations than is possible with `pyspark.ml` alone.

Note that this notebook is intended to be run in AWS EMR Notebook (and can be launched [following these instructions](https://nlp.johnsnowlabs.com/docs/en/install#emr-support), using EMR Release 6.2.0).

-----

First, let's load our packages:

In [1]:
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
import sparknlp
from sparknlp.pretrained import PretrainedPipeline
from sparknlp.annotator import *
from sparknlp.common import RegexRule
from sparknlp.base import *

VBox()

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
0,application_1636238344833_0001,pyspark,idle,Link,Link,✔


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

And, then, let's create a sample DataFrame with some text entries in it that we can work with (drawn from the first several paragraphs of the [University of Chicago's Wikipedia page](https://en.wikipedia.org/wiki/University_of_Chicago)). This data is quite small (purposefully!) so that we can easily see all of the operations that are being performed. Of course, we're running this notebook on a Spark cluster, though, so we can perform these same operations on even the largest DataFrames using this same approach -- whether that is the Amazon Customer Reviews dataset that we've been working with or a text corpus as big as the Common Crawl.

In [2]:
sample = [
    ['The University of Chicago was incorporated as a coeducational institution in 1890 by the American Baptist Education Society, using $400,000 donated to the ABES to match a $600,000 donation from Baptist oil magnate and philanthropist John D. Rockefeller, and including land donated by Marshall Field. While the Rockefeller donation provided money for academic operations and long-term endowment, it was stipulated that such money could not be used for buildings. The Hyde Park campus was financed by donations from wealthy Chicagoans like Silas B. Cobb who provided the funds for the campus first building, Cobb Lecture Hall, and matched Marshall Fields pledge of $100,000. Other early benefactors included businessmen Charles L. Hutchinson (trustee, treasurer and donor of Hutchinson Commons), Martin A. Ryerson (president of the board of trustees and donor of the Ryerson Physical Laboratory) Adolphus Clay Bartlett and Leon Mandel, who funded the construction of the gymnasium and assembly hall, and George C. Walker of the Walker Museum, a relative of Cobb who encouraged his inaugural donation for facilities.'],
    ['The Hyde Park campus continued the legacy of the original university of the same name, which had closed in the 1880s after its campus was foreclosed on. What became known as the Old University of Chicago had been founded by a small group of Baptist educators in 1856 through a land endowment from Senator Stephen A. Douglas. After a fire, it closed in 1886. Alumni from the Old University of Chicago are recognized as alumni of the present University of Chicago. The university depiction on its coat of arms of a phoenix rising from the ashes is a reference to the fire, foreclosure, and demolition of the Old University of Chicago campus. As an homage to this pre-1890 legacy, a single stone from the rubble of the original Douglas Hall on 34th Place was brought to the current Hyde Park location and set into the wall of the Classics Building. These connections have led the dean of the college and University of Chicago and professor of history John Boyer to conclude that the University of Chicago has, a plausible genealogy as a pre–Civil War institution']
]

data = spark.createDataFrame(sample) \
            .toDF("text")

data.show()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+--------------------+
|                text|
+--------------------+
|The University of...|
|The Hyde Park cam...|
+--------------------+

You'll remember from `pyspark.ml` that pipelines can be useful approach for combining various estimators and transformers into a single workflow. Spark NLP extends this idea by introducing so-called "annotators" that can perform NLP-related estimation tasks (e.g. things that can be trained through `.fit()`) and transformation tasks (things that can transform one DataFrame into another DataFrame in some way). 

For instance, below, we transform raw text into a document, transform that document into tokens, and then identify the "part of speech" for each token based on a pre-trained POS-tagger. We can chain these transformers and estimators together into a single reproducible pipeline that can then be fit and used to transform data. Note as well that we're using the `Pipeline()` function from `pyspark.ml`, so it's also easy to use plug these annotators into our existing ML workflow.

In [3]:
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")

pos = PerceptronModel.pretrained("pos_anc", 'en')\
        .setInputCols("document", "token")\
        .setOutputCol("pos")

my_pipeline = Pipeline(
      stages = [
          documentAssembler,
          tokenizer,
          pos
      ])

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]

Once we transform our data, you can see that we have produced different columns for each of our different steps in the pipeline:

In [4]:
# not training anything, so pass empty dataframe to fit
pipelineModel = my_pipeline.fit(spark.createDataFrame([['']]).toDF("text"))

# transform data
result = pipelineModel.transform(data)

result.show()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+--------------------+--------------------+--------------------+--------------------+
|                text|            document|               token|                 pos|
+--------------------+--------------------+--------------------+--------------------+
|The University of...|[[document, 0, 11...|[[token, 0, 2, Th...|[[pos, 0, 2, DT, ...|
|The Hyde Park cam...|[[document, 0, 10...|[[token, 0, 2, Th...|[[pos, 0, 2, DT, ...|
+--------------------+--------------------+--------------------+--------------------+

If we take a closer look at the token-level data, we can see the parts of speech for each of the words in our DataFrame:

In [5]:
result.select(explode(arrays_zip('token.result',
                                 'token.begin',
                                 'token.end', 
                                 'pos.result', 
                                 )).alias("cols")) \
      .select(expr("cols['0']").alias("chunk"),
              expr("cols['1']").alias("begin"),
              expr("cols['2']").alias("end"),
              expr("cols['3']").alias("pos"),
             ) \
      .show(truncate=False)

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+-------------+-----+---+---+
|chunk        |begin|end|pos|
+-------------+-----+---+---+
|The          |0    |2  |DT |
|University   |4    |13 |NNP|
|of           |15   |16 |IN |
|Chicago      |18   |24 |NNP|
|was          |26   |28 |VBD|
|incorporated |30   |41 |VBN|
|as           |43   |44 |IN |
|a            |46   |46 |DT |
|coeducational|48   |60 |JJ |
|institution  |62   |72 |NN |
|in           |74   |75 |IN |
|1890         |77   |80 |CD |
|by           |82   |83 |IN |
|the          |85   |87 |DT |
|American     |89   |96 |JJ |
|Baptist      |98   |104|NNP|
|Education    |106  |114|NNP|
|Society      |116  |122|NNP|
|,            |123  |123|,  |
|using        |125  |129|VBG|
+-------------+-----+---+---+
only showing top 20 rows

Part-of-speech tagging is only [one of many available annotators](https://nlp.johnsnowlabs.com/docs/en/annotators) in the Spark NLP ecosystem, though, and you're encouraged to take a look through the documentation. Note, for instance, that there are many pre-trained annotators (using state-of-the-art training procedures) that can be used directly out-of-the-box and inserted into your pipelines.

Spark NLP also provides many predefined pipelines that will perform common series of transformations on your data according to pre-trained models (e.g. performing NER with various embedding models, for instance). Here, we'll load in a pre-trained pipeline, which produces NER labels (pre-trained through a series of neural networks) for each of our words to demonstrate how this can work on our mini dataset.

In [6]:
pipeline = PretrainedPipeline('explain_document_dl', lang='en')
result = pipeline.transform(data)

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

explain_document_dl download started this may take some time.
Approx size to download 169.4 MB
[OK!]

And we can then take a look at the results; not bad for a single line of code!

In [7]:
result.select(explode(arrays_zip('lemma.result',
                                 'stem.result', 
                                 'ner.result'
                                 )).alias("cols")) \
      .select(expr("cols['0']").alias("lemma"),
              expr("cols['1']").alias("stem"),
              expr("cols['2']").alias("ner"),
             ) \
      .show(truncate=False)

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+-------------+--------+-----+
|lemma        |stem    |ner  |
+-------------+--------+-----+
|The          |the     |O    |
|University   |univers |B-ORG|
|of           |of      |I-ORG|
|Chicago      |chicago |I-ORG|
|be           |wa      |O    |
|incorporate  |incorpor|O    |
|as           |a       |O    |
|a            |a       |O    |
|coeducational|coeduc  |O    |
|institution  |institut|O    |
|in           |in      |O    |
|1890         |1890    |O    |
|by           |by      |O    |
|the          |the     |O    |
|American     |american|B-ORG|
|Baptist      |baptist |I-ORG|
|Education    |educ    |I-ORG|
|Society      |societi |I-ORG|
|,            |,       |O    |
|use          |us      |O    |
+-------------+--------+-----+
only showing top 20 rows

In [8]:
result.select(explode(arrays_zip('ner.result')).alias('ner')) \
      .groupBy('ner') \
      .count() \
      .show()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+--------+-----+
|     ner|count|
+--------+-----+
|[B-MISC]|    3|
| [I-ORG]|   24|
| [I-PER]|   12|
| [I-LOC]|    7|
| [B-PER]|   16|
|     [O]|  305|
| [B-ORG]|   14|
| [B-LOC]|    8|
+--------+-----+

And finally, if we take at the annotated text with Spark NLP's Display feature, we can see that the results are pretty good (although a few things are slightly off, it's overally quite accurate).

In [None]:
sc.install_pypi_package('pandas==1.0.3')
sc.install_pypi_package('spark-nlp-display==1.7') # install to visualize NER in text

In [None]:
from sparknlp_display import NerVisualizer

html_0 = NerVisualizer().display(
    result = result.collect()[0],
    label_col = 'entities',
    document_col = 'document',
    return_html=True
)

html_1 = NerVisualizer().display(
    result = result.collect()[1],
    label_col = 'entities',
    document_col = 'document',
    return_html=True
)

# print results to copy, then display below with HTML magic
print(html_0 + html_1)

In [12]:
%%HTML
<style>
    @import url('https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap');
    @import url('https://fonts.googleapis.com/css2?family=Vistol Regular:wght@300;400;500;600;700&display=swap');
    
    .spark-nlp-display-scroll-entities {
        border: 1px solid #E7EDF0;
        border-radius: 3px;
        text-align: justify;
        
    }
    .spark-nlp-display-scroll-entities span {  
        font-size: 14px;
        line-height: 24px;
        color: #536B76;
        font-family: 'Montserrat', sans-serif !important;
    }
    
    .spark-nlp-display-entity-wrapper{
    
        display: inline-grid;
        text-align: center;
        border-radius: 4px;
        margin: 0 2px 5px 2px;
        padding: 1px
    }
    .spark-nlp-display-entity-name{
        font-size: 14px;
        line-height: 24px;
        font-family: 'Montserrat', sans-serif !important;
        
        background: #f1f2f3;
        border-width: medium;
        text-align: center;
        
        font-weight: 400;
        
        border-radius: 5px;
        padding: 2px 5px;
        display: block;
        margin: 3px 2px;
    
    }
    .spark-nlp-display-entity-type{
        font-size: 14px;
        line-height: 24px;
        color: #ffffff;
        font-family: 'Montserrat', sans-serif !important;
        
        text-transform: uppercase;
        
        font-weight: 500;

        display: block;
        padding: 3px 5px;
    }
    
    .spark-nlp-display-entity-resolution{
        font-size: 14px;
        line-height: 24px;
        color: #ffffff;
        font-family: 'Vistol Regular', sans-serif !important;
        
        text-transform: uppercase;
        
        font-weight: 500;

        display: block;
        padding: 3px 5px;
    }
    
    .spark-nlp-display-others{
        font-size: 14px;
        line-height: 24px;
        font-family: 'Montserrat', sans-serif !important;
        
        font-weight: 400;
    }

</style>
 <span class="spark-nlp-display-others" style="background-color: white">The </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> was incorporated as a coeducational institution in 1890 by the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">American Baptist Education Society </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white">, using $400,000 donated to the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">ABES </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> to match a $600,000 donation from </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #23C492"><span class="spark-nlp-display-entity-name">Baptist </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white"> oil magnate and philanthropist </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">John D </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #23C492"><span class="spark-nlp-display-entity-name">Rockefeller </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white">, and including land donated by </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Marshall Field </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. While the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">Rockefeller </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> donation provided money for academic operations and long-term endowment, it was stipulated that such money could not be used for buildings. The </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #23C492"><span class="spark-nlp-display-entity-name">Hyde Park </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white"> campus was financed by donations from wealthy </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #8B03B3"><span class="spark-nlp-display-entity-name">Chicagoans </span><span class="spark-nlp-display-entity-type">MISC</span></span><span class="spark-nlp-display-others" style="background-color: white"> like </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Silas B </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Cobb </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> who provided the funds for the campus first building, </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #23C492"><span class="spark-nlp-display-entity-name">Cobb Lecture Hall </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white">, and matched </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Marshall Fields </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> pledge of $100,000. Other early benefactors included businessmen </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Charles L </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Hutchinson </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> (trustee, treasurer and donor of </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">Hutchinson Commons </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white">), </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Martin A </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">Ryerson </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> (president of the board of trustees and donor of the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #A74C97"><span class="spark-nlp-display-entity-name">Ryerson Physical Laboratory </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white">) </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Adolphus Clay Bartlett </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> and </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Leon Mandel </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">, who funded the construction of the gymnasium and assembly hall, and </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">George C </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Walker </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> of the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #23C492"><span class="spark-nlp-display-entity-name">Walker Museum </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white">, a relative of </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #0A0A9E"><span class="spark-nlp-display-entity-name">Cobb </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> who encouraged his inaugural donation for facilities.</span></div>
<style>
    @import url('https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap');
    @import url('https://fonts.googleapis.com/css2?family=Vistol Regular:wght@300;400;500;600;700&display=swap');
    
    .spark-nlp-display-scroll-entities {
        border: 1px solid #E7EDF0;
        border-radius: 3px;
        text-align: justify;
        
    }
    .spark-nlp-display-scroll-entities span {  
        font-size: 14px;
        line-height: 24px;
        color: #536B76;
        font-family: 'Montserrat', sans-serif !important;
    }
    
    .spark-nlp-display-entity-wrapper{
    
        display: inline-grid;
        text-align: center;
        border-radius: 4px;
        margin: 0 2px 5px 2px;
        padding: 1px
    }
    .spark-nlp-display-entity-name{
        font-size: 14px;
        line-height: 24px;
        font-family: 'Montserrat', sans-serif !important;
        
        background: #f1f2f3;
        border-width: medium;
        text-align: center;
        
        font-weight: 400;
        
        border-radius: 5px;
        padding: 2px 5px;
        display: block;
        margin: 3px 2px;
    
    }
    .spark-nlp-display-entity-type{
        font-size: 14px;
        line-height: 24px;
        color: #ffffff;
        font-family: 'Montserrat', sans-serif !important;
        
        text-transform: uppercase;
        
        font-weight: 500;

        display: block;
        padding: 3px 5px;
    }
    
    .spark-nlp-display-entity-resolution{
        font-size: 14px;
        line-height: 24px;
        color: #ffffff;
        font-family: 'Vistol Regular', sans-serif !important;
        
        text-transform: uppercase;
        
        font-weight: 500;

        display: block;
        padding: 3px 5px;
    }
    
    .spark-nlp-display-others{
        font-size: 14px;
        line-height: 24px;
        font-family: 'Montserrat', sans-serif !important;
        
        font-weight: 400;
    }

</style>
 <span class="spark-nlp-display-others" style="background-color: white">The </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #2F307B"><span class="spark-nlp-display-entity-name">Hyde Park </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white"> campus continued the legacy of the original university of the same name, which had closed in the 1880s after its campus was foreclosed on. What became known as the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">Old University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> had been founded by a small group of </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #574517"><span class="spark-nlp-display-entity-name">Baptist </span><span class="spark-nlp-display-entity-type">MISC</span></span><span class="spark-nlp-display-others" style="background-color: white"> educators in 1856 through a land endowment from </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #4FC60B"><span class="spark-nlp-display-entity-name">Senator Stephen </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> A. </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #4FC60B"><span class="spark-nlp-display-entity-name">Douglas </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white">. After a fire, it closed in 1886. Alumni from the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">Old University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> are recognized as alumni of the present </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white">. The university depiction on its coat of arms of a phoenix rising from the ashes is a reference to the fire, foreclosure, and demolition of the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">Old University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> campus. As an homage to this pre-1890 legacy, a single stone from the rubble of the original </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #2F307B"><span class="spark-nlp-display-entity-name">Douglas Hall </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white"> on 34th Place was brought to the current </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #2F307B"><span class="spark-nlp-display-entity-name">Hyde Park </span><span class="spark-nlp-display-entity-type">LOC</span></span><span class="spark-nlp-display-others" style="background-color: white"> location and set into the wall of the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">Classics Building </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white">. These connections have led the dean of the college and </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> and professor of history </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #4FC60B"><span class="spark-nlp-display-entity-name">John Boyer </span><span class="spark-nlp-display-entity-type">PER</span></span><span class="spark-nlp-display-others" style="background-color: white"> to conclude that the </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #43BA3F"><span class="spark-nlp-display-entity-name">University of Chicago </span><span class="spark-nlp-display-entity-type">ORG</span></span><span class="spark-nlp-display-others" style="background-color: white"> has, a plausible genealogy as a pre?Civil </span><span class="spark-nlp-display-entity-wrapper" style="background-color: #574517"><span class="spark-nlp-display-entity-name">War </span><span class="spark-nlp-display-entity-type">MISC</span></span><span class="spark-nlp-display-others" style="background-color: white"> institution</span></div>



---------------------


That's all we'll cover with regard to Spark NLP, but you're encouraged to play around with it further (perhaps [training your own NER model on GPUs](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/blogposts/3.NER_with_BERT.ipynb)!) and read the [excellent documentation](https://nlp.johnsnowlabs.com/docs/en/concepts) and [tutorials](https://nlp.johnsnowlabs.com/classify_documents) in more depth.