## Installing and Using spaCY
Make sure that the install was done correctly for all packages, and the versions match. More details for installation here: [spaCy 101: Everything you need to know](https://spacy.io/usage) (this part can be frustrating, so validate and make sure everything works with the code blocks below).

In [4]:
# active the venv for the models (not available through conda)
!python -m spacy validate
!echo %CUDA_PATH%
import spacy

hello world

⠙ Loading compatibility table...
⠹ Loading compatibility table...
⠸ Loading compatibility table...
⠼ Loading compatibility table...
⠴ Loading compatibility table...
⠦ Loading compatibility table...
⠧ Loading compatibility table...
⠇ Loading compatibility table...
⠏ Loading compatibility table...
⠙ Loading compatibility table...
[2K✔ Loaded compatibility table
[1m
ℹ spaCy installation:
C:\Users\elvis\anaconda3\envs\jupyter\lib\site-packages\spacy

NAME              SPACY            VERSION    
en_core_web_sm    >=3.1.0,<3.2.0   3.1.0     ✔
en_core_web_trf   >=3.1.0,<3.2.0   3.1.0     ✔

C:\Users\elvis\anaconda3\envs\jupyter\Library


## Demo spaCy: NER and Visualization Basics
Here, we are trying to see how we can get spaCy to do what we want it to do. First, let's see what the base NER can do, and visualize it nicely with `displacy`.

In [2]:
texts = [
    "Net income was $9.4 million compared to the prior year of $2.7 million.",
    "Revenue exceeded twelve billion dollars, with a loss of $1b.",
]

nlp = spacy.load("en_core_web_sm")
for doc in nlp.pipe(texts, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"]):
    # Do something with the doc here
    print([(ent.text, ent.label_) for ent in doc.ents])

[('$9.4 million', 'MONEY'), ('the prior year', 'DATE'), ('$2.7 million', 'MONEY')]
[('twelve billion dollars', 'MONEY'), ('1b', 'MONEY')]


In [3]:
'''
Example with Visualizations
'''
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

# TODO: we should define nice colors to reuse for all our entity types
colors = {"ORG": "linear-gradient(90deg, #aa9cfc, #fc9ce7)"}
options = {"colors": colors}
doc.user_data["title"] = "Example Render of Entity Recognizer"
displacy.render(doc, style="ent", jupyter=True, options=options)
#displacy.render(doc, style="dep") # dependency parse

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


# Create Training Data
After we did some smaller tests to see how spaCy is working, let's compile the datasets into valuable training data to be used. This will then be used in order to create everything we need to deal with new data.

In [4]:
import spacy
test = "Add flour and stir to cook and absorb the oil, then add in 1 cup vegan creamer or plant milk and vegan chicken bouillon paste/powder/cube, then whisk/stir vigorously until the flour has cooked and thickened in the sauce (to make the roux). Add the grated vegan parm and stir to melt, then thin the sauce with either more plant milk or the reserved cooking pasta water."
nlp = spacy.load("en_core_web_lg")
doc = nlp(test)
for ent in doc.ents:
    print(ent.text, ent.label_)

1 CARDINAL
