https://spacy.io/usage/spacy-101

In [None]:
!pip install spacy

### 1. Basic Structure of Spacy

In [1]:
import spacy

#### Create a blank English nlp object

* contains the processing pipeline
* includes language-specific rules for tokenization etc.

https://spacy.io/usage/models#languages

In [2]:
nlp = spacy.blank("en")

#### Created by processing a string of text with the nlp object

In [3]:
doc = nlp("Hello world!")

#### Iterate over tokens in a Doc

In [4]:
for token in doc:
    print(token.text)

Hello
world
!


![alt text](token.png "Title")

### 2. Span and Indexing

#### Index into the Doc to get a single Token

In [5]:
doc = nlp("Hello world!")

#### Index into the Doc to get a single Token

In [6]:
token = doc[0]

#### Get the token text via the .text attribute

In [7]:
print(token.text)

Hello


#### A slice from the Doc is a Span object

![alt text](span.png "Title")

In [9]:
###Hello world!

span = doc[1:2]
print(span.text)

world!


### 3. Lexical Attributes

* The attributes of Token object, that give you information on the type of token.

https://spacy.io/usage/linguistic-features

In [18]:
doc = nlp("It costs $5.")

#### List Comprehension: https://www.w3schools.com/python/python_lists_comprehension.asp

In [None]:
print("Index:   ", [token.i for token in doc])

In [None]:
print("Text:    ", [token.text for token in doc])

In [None]:
print("is_alpha:", [token.is_alpha for token in doc])

In [None]:
print("is_punct:", [token.is_punct for token in doc])

In [None]:
print("like_num:", [token.like_num for token in doc])

In [19]:
print("Parts of Speech:", [token.pos_ for token in doc])

Parts of Speech: ['', '', '', '', '']


### Exercise: 

#### Number 1:

In [None]:
import spacy

# Create the English nlp object
nlp = spacy.blank("en")

# Process a text
doc = nlp("This is a sentence.")

# Print the document text
print(doc.text)

#### Number 2:

In [14]:
import spacy

# Create the German nlp object
nlp = spacy.blank("de")

# Process a text (this is German for: "Kind regards!")
doc = nlp("Liebe Grüße!")

# Print the document text
print(doc.text)

Liebe Grüße!


#### Number 3:

In [None]:
import spacy

# Create the Spanish nlp object
nlp = spacy.blank("es")

# Process a text (this is Spanish for: "How are you?")
doc = nlp("¿Cómo estás?")

# Print the document text
print(doc.text)

### Number 4:

In [16]:
import spacy

nlp = spacy.blank("en")

# Process the text
doc = nlp("I like tree kangaroos and narwhals.")

# Select the first token
first_token = doc[0]

# Print the first token's text
print(first_token.text)

I


### Number 5:

In [15]:
import spacy

nlp = spacy.blank("en")

# Process the text
doc = nlp("I like tree kangaroos and narwhals.")

# A slice of the Doc for "tree kangaroos"
tree_kangaroos = doc[2:4]
print(tree_kangaroos.text)

# A slice of the Doc for "tree kangaroos and narwhals" (without the ".")
tree_kangaroos_and_narwhals = doc[2:6]
print(tree_kangaroos_and_narwhals.text)

tree kangaroos
tree kangaroos and narwhals


### Number 6:

In [17]:
import spacy

nlp = spacy.blank("en")

# Process the text
doc = nlp(
    "In 1990, more than 60% of people in East Asia were in extreme poverty. "
    "Now less than 4% are."
)

# Iterate over the tokens in the doc
for token in doc:
    # Check if the token resembles a number
    if token.like_num:
        # Get the next token in the document
        next_token = doc[token.i + 1]
        # Check if the next token's text equals "%"
        if next_token.text == "%":
            print("Percentage found:", token.text)

Percentage found: 60
Percentage found: 4
