# Installing spaCy

SpaCy is a set of tools for Natural Language Processing. For more info: [spaCy](https://spacy.io/).

This notebook will help you install it, but you can also go to the [installation instructions](https://spacy.io/usage) for the best version for your system. 

## Two ways of installing spaCy

### 1. Jupyter notebook
You can install it from the notebook, by running the 2 lines below in your Jupyter notebook. Remember you only have to do this once. 

### 2. Command prompt
* In Windows, if you have Anaconda, you can open an Anaconda powershell prompt. If you don't have Anaconda, just open a Windows powershell in admin mode.
* On a Mac, open a terminal window (spotlight and type "terminal"). Or look for "terminal" in your apps folder. 

Now that you have a command window open, simply go to the spaCy website and choose your operating system to copy and paste the right commands (one at a time). Click on the right options for you from here: https://spacy.io/usage  

## Compatibility problems
If you get an error that says something about numpy, you can do two things, below.

Possible error messages:
* numpy.ndarray ...
* numpy.dtype ...

### 1. Follow instructions on the spaCy site
Go to the heading "Using build constraints when compiling from source" in https://spacy.io/usage. In a command prompt/terminal, type the two lines (one at a time) that start with `PIP_CONSTRAINT`.

### 2. Downgrade numpy
Type one of the two commands below, either in a notebook or in terminal/command prompt:

* In notebook: `!pip install numpy==1.26.4`
* In command prompt: `pip install numpy==1.26.4`

### Installing spaCy and language model

If running this notebook locally, you'll only have to do the next two lines once.

In [None]:
!pip install spacy

In [None]:
!python -m spacy download en_core_web_sm

### Loading spaCy and language model
Installation (if local) only needs to be done once. However, you need to import the spaCy module and load the language model every time you want to use it. 

Here, we are loading the small model for English derived from web data. There are other [models](https://spacy.io/usage/models) for English and for other languages. 

In [None]:
import spacy

In [None]:
nlp = spacy.load("en_core_web_sm")

### Testing installation

We'll define a sentence, process it with spaCy and check the output. This will test whether all the components are installed.

In [None]:
sentence = "This is a test sentence about Canada, but you can type whatever you want here."

### Converting string to doc with spaCy
spaCy has a special type of object, a `Doc`. It's the entire processing pipeline for any NLP system, in a single object. It takes a text, e.g., `sent1` and applies all the NLP steps to it (tokenization, tagging, named entity recognition). Once you have converted a string (a sentence) or a whole text to Doc, you can access everything that spaCy has done with it, i.e., the entire structure of language information that it has applied to it, with labels. spaCy refers to that language information and labels as 'linguistic annotations'. spaCy does this with a simple function, `nlp()`.

![spaCy pipeline](https://spacy.io/images/pipeline.svg)

Image from https://spacy.io/usage/processing-pipelines

In [None]:
doc = nlp(sentence)

### Accesing the information in the Doc object

`doc` contains lots of [useful information](https://spacy.io/api/doc):

* tokens (words)
* lemmas
* morphology
* part of speech tags (pos tags) 
* syntactic structure (a parse tree)
* named entities

In [None]:
# print word tokens

for token in doc:
    print(token)
    

In [None]:
# lemmas

for token in doc:
    print(token.lemma_)

In [None]:
# morphology

for token in doc:
    print(token.text, token.morph)

In [None]:
# POS tags (more on this below)

for token in doc:
    print(token.text, token.pos_)

In [None]:
# named entities

for ent in doc.ents:
    print(ent.text, ent.label_)