# spaCy

All-in-one package for performing basic and advanced natural language processing, with special optimization "quickstart" features for certain languages. See [spaCy Language Support](https://spacy.io/usage/models#languages) for details.

## Features

* **Tokenization**: Segmenting text into individual "tokens", that is, words, punctuations marks, numbers, etc.

* **Part-of-speech (POS) Tagging**: Assigning grammatical word types to tokens, like "verb" or "noun" (using [Universal POS Tags](https://universaldependencies.org/u/pos/) with `.pos_` and [Penn Part of Speech Tags](https://cs.nyu.edu/~grishman/jet/guide/PennPOS.html) with `.tag_`).

* **Dependency Parsing**: Assigning syntactic dependency labels, describing the relations between individual tokens, as in subject, object, dependent clause, etc.

* **Lemmatization**: Determining the base form, or *lemma* of a word.  The lemma of "went" is "to go", and the lemma of "trees" is "tree".

* **Named Entity Recognition (NER)**: Labeling "real-world" objects with names, such as persons, companies or locations.

## Installing Pre-Trained Models

```
%run -m spacy download en_core_web_sm
```

## Linguistic Basics

In [2]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [69]:
doc = nlp("""Wow! Oh no, I forgot to buy ten oranges and seven apples for the party tomorrow, but I promise I'll get them soon.""")
for token in doc:
    print(token, token.pos_, token.tag_)

Wow INTJ UH
! PUNCT .
Oh INTJ UH
no INTJ UH
, PUNCT ,
I PRON PRP
forgot VERB VBD
to PART TO
buy VERB VB
ten NUM CD
oranges NOUN NNS
and CCONJ CC
seven NUM CD
apples NOUN NNS
for ADP IN
the DET DT
party NOUN NN
tomorrow NOUN NN
, PUNCT ,
but CCONJ CC
I PRON PRP
promise VERB VBP
I PRON PRP
'll VERB MD
get VERB VB
them PRON PRP
soon ADV RB
. PUNCT .


In [70]:
for token in doc:
    print(token.text, "-->", token.lemma_)

Wow --> wow
! --> !
Oh --> oh
no --> no
, --> ,
I --> -PRON-
forgot --> forget
to --> to
buy --> buy
ten --> ten
oranges --> orange
and --> and
seven --> seven
apples --> apple
for --> for
the --> the
party --> party
tomorrow --> tomorrow
, --> ,
but --> but
I --> -PRON-
promise --> promise
I --> -PRON-
'll --> will
get --> get
them --> -PRON-
soon --> soon
. --> .


In [60]:
with open("data/Victorian/Conrad_HeartofDarkness.txt") as f:
    text = nlp(f.read())
print(text[:150])

The Nellie, a cruising yawl, swung to her anchor without a flutter of the sails, and was at rest. The flood had made, the wind was nearly calm, and being bound down the river, the only thing for it was to come to and wait for the turn of the tide.
The sea-reach of the Thames stretched before us like the beginning of an interminable waterway. In the offing the sea and the sky were welded together without a joint, and in the luminous space the tanned sails of the barges drifting up with the tide seemed to stand still in red clusters of canvas sharply peaked, with gleams of varnished sprits. A haze rested on the low shores that ran out to sea in vanishing flatness. The air was dark above Gravesend,


In [61]:
for sent in text.sents:
    print(sent)
    break

The Nellie, a cruising yawl, swung to her anchor without a flutter of the sails, and was at rest.


In [73]:
count = 1
for sent in text.sents:
    print(count, sent.text.strip())
    count += 1
    if count > 10:
        break

1 The Nellie, a cruising yawl, swung to her anchor without a flutter of the sails, and was at rest.
2 The flood had made, the wind was nearly calm, and being bound down the river, the only thing for it was to come to and wait for the turn of the tide.
3 The sea-reach of the Thames stretched before us like the beginning of an interminable waterway.
4 In the offing the sea and the sky were welded together without a joint, and in the luminous space the tanned sails of the barges drifting up with the tide seemed to stand still in red clusters of canvas sharply peaked, with gleams of varnished sprits.
5 A haze rested on the low shores that ran out to sea in vanishing flatness.
6 The air was dark above Gravesend, and farther back still seemed condensed into a mournful gloom, brooding motionless over the biggest, and the greatest, town on earth.
7 The Director of Companies was our captain and our host.
8 We four affectionately watched his back as he stood in the bows looking to seaward.
9 On 

In [64]:
sentence_list = list(text.sents)

In [65]:
len(sentence_list)

2700

In [66]:
sentence_list[1350]

How long would it last?