# Lunch Time Python

## 25.11.2022: spaCy
<img style="width: 600px;" src="https://upload.wikimedia.org/wikipedia/commons/8/88/SpaCy_logo.svg">

[spaCy](https://spacy.io/) is an open-source natural language processing library written in Python and Cython.

spaCy focuses on production usage and is very fast and efficient. It also supports deep learning workflows through interfacing with [TensorFlow](https://www.tensorflow.org/) or [PyTorch](https://pytorch.org/), as well as the transformer model library [Hugging Face](https://github.com/huggingface).

*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*

[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)

# 0 What to do with spaCy

spaCy is very powerful for text annotation:
- sentencize and tokenize
- POS (part-of-speech) and lemma
- NER (named entity recognition)
- dependency parsing
- text classification
- morphological analysis
spaCy can also learn new tasks through integraton with your machine learning stack. It also provides multi-task learning with pretrained transformers like [BERT](https://arxiv.org/abs/1810.04805). 
(BERT is used in the google search engine.)


In [1]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_md")
doc = nlp("The Scientific Software Center offers lunch-time Python - an informal way to learn about new Python libraries.")
displacy.render(doc, style="dep")

In [3]:
displacy.render(doc, style="ent")

# 1 Install spaCy
You can install spaCy using `pip`:

`pip install spacy`

It is also available via `conda-forge`:

`conda install -c conda-forge spacy`

After installing spaCy, you also need to download the language model. For a medium-sized English model, you would do this using

`python -m spacy download en_core_web_md`

The available models are listed on the spaCy website: https://spacy.io/usage/models

Let's try it out!

In [5]:
nlp = spacy.load("en_core_web_md")
nlp("This is lunch-time Python.")

This is lunch-time Python.

In [7]:
doc = nlp("This is lunch-time Python.")
print(type(doc))
[i for i in doc]

<class 'spacy.tokens.doc.Doc'>


[This, is, lunch, -, time, Python, .]

In [9]:
t = doc[0]
type(t)

spacy.tokens.token.Token

In [11]:
t.pos_

'PRON'

In [12]:
displacy.render(doc)

In [16]:
spacy.explain("nsubj")

'nominal subject'

In [18]:
for t in doc:
    print(t, t.pos_, t.dep_, t.lemma_)

This PRON nsubj this
is AUX ROOT be
lunch NOUN compound lunch
- PUNCT punct -
time NOUN compound time
Python PROPN attr Python
. PUNCT punct .
