# Linguistics Features

## PoS (Part-of-Speech) Tagging

* *Part-of-speech*: a category of words or lexical items that have similar grammatical properties.
* *lexical item*: a word, part of a word, or group of words that form the basic elements of a lexicon (vocabulary). Examples of lexical items: by the way, it's raining cats and dogs, -able, -er, good
* POS tagging (also known as PoS tagging or POST): the assignment of a special label given to a token (word) in a text corpus based on definition and context. 

Some PoS:
1. noun (NOUN)
2. verb (VERB).
3. determiner (DET)
4. adjective (ADJ)
5. adposition (ADP) preposition (on, of, at, with, ...) and postposition (... ago).
6. pronoun (PRON)
7. adverb (ADV)
8. conjunction (CONJ
10. interjection (INTJ)


In [1]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("Wow, Apple is looking at buying U.K. startup for $1 billion")

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)
    
displacy.render(doc, style="dep", jupyter=True)

Wow wow INTJ UH intj Xxx True False
, , PUNCT , punct , False False
Apple Apple PROPN NNP nsubj Xxxxx True False
is be AUX VBZ aux xx True True
looking look VERB VBG ROOT xxxx True False
at at ADP IN prep xx True True
buying buy VERB VBG pcomp xxxx True False
U.K. U.K. PROPN NNP dobj X.X. False False
startup startup VERB VB advcl xxxx True False
for for ADP IN prep xxx True True
$ $ SYM $ quantmod $ False False
1 1 NUM CD compound d False False
billion billion NUM CD pobj xxxx True False


## Morphology

Morphology analyzes the structure of words and the effects of changes in structure on meaning and word class. For example, in English, the addition of -ing can be given to gerunds. The basic form of a word is modified by adding prefixes or suffixes that determine its grammatical function but do not change its part-of-speech.

In [2]:
import spacy

nlp = spacy.load("en_core_web_lg")
print("\nPipeline:", nlp.pipe_names)
doc = nlp("I was reading the paper.")

print("")
print("==== every token === ")
for token in doc:
    print("Available morphological features:", token.morph)
    #print(token.morph.get("PronType"))

print("")
print("==== second token === ")
token = doc[2]
print(token.morph)


Pipeline: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

==== every token === 
Available morphological features: Case=Nom|Number=Sing|Person=1|PronType=Prs
Available morphological features: Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin
Available morphological features: Aspect=Prog|Tense=Pres|VerbForm=Part
Available morphological features: Definite=Def|PronType=Art
Available morphological features: Number=Sing
Available morphological features: PunctType=Peri

==== second token === 
Aspect=Prog|Tense=Pres|VerbForm=Part
