# Getting Started

Let's get started and try out spaCy! In this exercise, you'll be able to try out some of the 45+ [available languages](https://spacy.io/usage/models#languages).

This course introduces a lot of new concepts, so if you ever need a quick refresher, download the [spaCy Cheat Sheet](http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06) and keep it handy!

## Instructions:

* Import the `English` class from `spacy.lang.en` and create the `nlp` object.
* Create a `doc` and print its text.

In [None]:
# Import the English language class
from spacy.lang.____ import ____

# Create the nlp object
nlp = ____

# Process a text
doc = nlp("This is a sentence.")

# Print the document text
print(____.text)

### Solution: 

In [None]:
# %load Getting_Started-solution-1.py
# Import the English language class
from spacy.lang.en import English

# Create the nlp object
nlp = English()

# Process a text
doc = nlp("This is a sentence.")

# Print the document text
print(doc.text)

## Instructions:

* Import the `German` class from `spacy.lang.de` and create the `nlp` object.
* Create a `doc` and print its text.

In [None]:
# Import the German language class
from spacy.lang.____ import ____

# Create the nlp object
nlp = ____

# Process a text (this is German for: "Kind regards!")
doc = nlp("Liebe Grüße!")

# Print the document text
print(____.text)

### Solution: 

In [None]:
# %load Getting_Started-solution-2.py
# Import the German language class
from spacy.lang.de import German

# Create the nlp object
nlp = German()

# Process a text (this is German for: "Kind regards!")
doc = nlp("Liebe Grüße!")

# Print the document text
print(doc.text)

## Instructions:

* Import the `Spanish` class from `spacy.lang.es` and create the `nlp` object.
* Create a `doc` and print its text.

In [None]:
# Import the Spanish language class
from spacy.lang.____ import ____

# Create the nlp object
nlp = ____

# Process a text (this is Spanish for: "How are you?")
doc = nlp("¿Cómo estás?")

# Print the document text
print(____.text)

### Solution: 

In [None]:
%load Getting_Started-solution-3.py

# Documents, spans and tokens

When you call `nlp` on a string, spaCy first tokenizes the text and creates a document object. In this exercise, you'll learn more about the `Doc`, as well as its views `Token` and `Span`.

## Instructions:

* Import the `English` language class and create the `nlp` object.
* Process the text and instantiate a `Doc` object in the variable `doc`.
* Select the first token of the `Doc` and print its `text`.

In [None]:
# Import the English language class and create the nlp object
from ____ import ____
nlp = ____

# Process the text
doc = ____("I like tree kangaroos and narwhals.")

# Select the first token
first_token = doc[____]

# Print the first token's text
print(first_token.____)

__Hint:__ 

In [None]:
%run Documents_spans_tokens-hints-1.py

### Solution

In [None]:
%load Documents_spans_tokens-solution-1.py

## Instructions

* Create a slice of the `Doc` for the tokens "tree kangaroos" and "tree kangaroos and narwhals".

In [None]:
# Import the English language class and create the nlp object
from spacy.lang.en import English
nlp = English()

# Process the text
doc = nlp("I like tree kangaroos and narwhals.")

# A slice of the Doc for "tree kangaroos"
tree_kangaroos = ____
print(tree_kangaroos.text)

# A slice of the Doc for "tree kangaroos and narwhals" (without the ".")
tree_kangaroos_and_narwhals = ____
print(tree_kangaroos_and_narwhals.text)

__Hint:__

In [None]:
%run Documents_spans_tokens-hints-2.py

### Solution

In [None]:
%load Documents_spans_tokens-solution-2.py

# Lexical attributes

In this example, you'll use spaCy's `Doc` and `Token` objects, and lexical attributes to find percentages in a text. You'll be looking for two subsequent tokens: a number and a percent sign. The English `nlp` object has already been created.

## Instructions

* Use the `like_num` token attribute to check whether a token in the `doc` resembles a number.
* Get the token _following_ the current token in the document. The index of the next token in the `doc` is `token.i + 1`.
* Check whether the next token's `text` attribute is a percent sign "%".

In [None]:
# Process the text
doc = nlp("In 1990, more than 60% of people in East Asia were in extreme poverty. Now less than 4% are.")

# Iterate over the tokens in the doc
for token in doc:
    # Check if the token resembles a number
    if ____.____:
        # Get the next token in the document
        next_token = ____[____]
        # Check if the next token's text equals '%'
        if next_token.____ == '%':
            print('Percentage found:', token.text)

__Hint:__

In [None]:
%run Lexical_attributes-hints-1.py

### Solution

In [None]:
%load Lexical_attributes-solution-1.py

# Loading models

Let's start by loading a model. `spacy` is already imported.

## Instructions

* Use `spacy.load` to load the small English model `'en_core_web_sm'`.
* Process the text and print the document text.

In [None]:
# Load the 'en_core_web_sm' model – spaCy is already imported
nlp = ____

text = "It’s official: Apple is the first U.S. public company to reach a $1 trillion market value"

# Process the text
doc = ____

# Print the document text
print(____.____)

__Hint:__ 

In [None]:
%run Loading_models-hints-1.py

### Solution

In [None]:
%load Loading_models-solution-1.py

## Instructions 

* Use `spacy.load` to load the small German model `'de_core_news_sm'`.
* Process the text and print the document text.

In [None]:
# Load the 'de_core_news_sm' model – spaCy is already imported
nlp = ____

text = "Als erstes Unternehmen der Börsengeschichte hat Apple einen Marktwert von einer Billion US-Dollar erreicht"

# Process the text
doc = ____

# Print the document text
print(____.____)

__Hint:__

In [None]:
%run Loading_models-hints-2.py

### Solution

In [None]:
%load Loading_models-solution-2.py

# Predicting linguistic annotations

* You'll now get to try one of spaCy's pre-trained model packages and see its predictions in action. Feel free to try it out on your own text! The small English model is already available as the variable `nlp`.

* To find out what a tag or label means, you can call `spacy.explain` in the IPython shell. For example: `spacy.explain('PROPN')` or `spacy.explain('GPE')`.

## Instructions

* Process the text with the `nlp` object and create a `doc`.
* For each token, print the token text, the token's `.pos_` (part-of-speech tag) and the token's `.dep_` (dependency label).

In [None]:
text = "It’s official: Apple is the first U.S. public company to reach a $1 trillion market value"

# Process the text
doc = ____

for token in doc:
    # Get the token text, part-of-speech tag and dependency label
    token_text = ____.____
    token_pos = ____.____
    token_dep = ____.____
    # This is for formatting only
    print('{:<12}{:<10}{:<10}'.format(token_text, token_pos, token_dep))

__Hint:__

In [None]:
%run Predicting_linguistic_annotations-hints-1.py

### Solution

In [None]:
%load Predicting_linguistic_annotations-solution-1.py

## Instructions

* Process the text and create a `doc` object.
* Iterate over the `doc.ents` and print the entity text and `label_` attribute.

In [None]:
text = "It’s official: Apple is the first U.S. public company to reach a $1 trillion market value"

# Process the text
doc = ____

# Iterate over the predicted entities
for ent in ____.____:
    # print the entity text and its label
    print(ent.____, ____.____)

__Hint:__

In [None]:
%run Predicting_linguistic_annotations-hints-2.py

### Solution

In [None]:
%load Predicting_linguistic_annotations-solution-2.py

# Predicting named entities in context

Models are statistical and not always right. Whether their predictions are correct depends on the training data and the text you're processing. Let's take a look at an example. The small English model is available as the variable `nlp`.

## Instructions

* Process the text with the `nlp` object.
* Iterate over the entities with the iterator `ent` and print the entity text and label.

In [None]:
text = "New iPhone X release date leaked as Apple reveals pre-orders by mistake"

# Process the text
doc = ____

# Iterate over the entities
for ____ in ____.____:
    # print the entity text and label
    print(____.____, ____.____)

__Hint:__

In [None]:
%run Predicting_named_entities_context-hints-1.py

### Solution

In [None]:
%load Predicting_named_entities_context-solution-1.py

## Instructions

* Looks like the model didn't predict "iPhone X". Create a span for those tokens manually.

In [None]:
text = "New iPhone X release date leaked as Apple reveals pre-orders by mistake"

# Process the text
doc = nlp(text)

# Iterate over the entities
for ent in doc.ents:
    # print the entity text and label
    print(ent.text, ent.label_)

# Get the span for "iPhone X"
iphone_x = ____

# Print the span text
print('Missing entity:', iphone_x.text)

__Hint:__

In [None]:
%run Predicting_named_entities_context-hints-2.py

### Solution 

In [None]:
%load Predicting_named_entities_context-solution-2.py

# Using the Matcher

Let's try spaCy's rule-based `Matcher`. You'll be using the example from the previous exercise and write a pattern that can match the phrase "iPhone X" in the text. The `nlp` object and a processed `doc` are already available.

## Instructions

* Import the `Matcher` from `spacy.matcher`.
* Initialize it with the `nlp` object's shared `vocab`.

In [None]:
# Import the Matcher
from spacy.____ import ____

# Initialize the Matcher with the shared vocabulary
matcher = ____(____.____)

__Hint:__ 

In [None]:
%run Using_the_Matcher-hints-1.py

### Solution

In [None]:
%load Using_the_Matcher-solution-1.py

## Instructions

* Create a pattern that matches the `'TEXT'` values of two tokens: `"iPhone"` and `"X"`.
* Use the `matcher.add` method to add the pattern to the matcher.

In [None]:
# Import the Matcher
from spacy.matcher import Matcher

# Initialize the Matcher with the shared vocabulary
matcher = Matcher(nlp.vocab)

# Create a pattern matching two tokens: "iPhone" and "X"
pattern = [____]

# Add the pattern to the matcher
____.____('IPHONE_X_PATTERN', None, ____)

__Hint:__ 

In [None]:
%run Using_the_Matcher-hints-2.py

### Solution 

In [None]:
%load Using_the_Matcher-solution-2.py

## Instructions 

* Call the matcher on the `doc` and store the result in the variable `matches`.
* Iterate over the matches and get the matched span from the `start` to the `end` index.

In [None]:
# Import the Matcher and initialize it with the shared vocabulary
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

# Create a pattern matching two tokens: "iPhone" and "X"
pattern = [{'TEXT': 'iPhone'}, {'TEXT': 'X'}]

# Add the pattern to the matcher
matcher.add('IPHONE_X_PATTERN', None, pattern)

# Use the matcher on the doc
matches = ____
print('Matches:', [doc[start:end].text for match_id, start, end in matches])

__Hint:__

In [None]:
%run Using_the_Matcher-hints-3.py

### Solution

In [None]:
%load Using_the_Matcher-solution-3.py

# Writing match patterns

In this exercise, you'll practice writing more complex match patterns using different token attributes and operators. A matcher is already initialized and available as the variable matcher.

## Instructions

Write __one__ pattern that only matches mentions of the _full_ iOS versions: "iOS 7", "iOS 11" and "iOS 10".

In [None]:
doc = nlp("After making the iOS update you won't notice a radical system-wide redesign: nothing like the aesthetic upheaval we got with iOS 7. Most of iOS 11's furniture remains the same as in iOS 10. But you will discover some tweaks once you delve a little deeper.")

# Write a pattern for full iOS versions ("iOS 7", "iOS 11", "iOS 10")
pattern = [{'TEXT': ____}, {'IS_DIGIT': ____}]

# Add the pattern to the matcher and apply the matcher to the doc
matcher.add('IOS_VERSION_PATTERN', None, pattern)
matches = matcher(doc)
print('Total matches found:', len(matches))

# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print('Match found:', doc[start:end].text)

__Hint:__

In [None]:
%run Writing_match_patterns-hints-1.py

### Solution 

In [None]:
%load Writing_match_patterns-solution-1.py

## Instructions

Write __one__ pattern that only matches forms of "download" (tokens with the lemma "download"), followed by a token with the part-of-speech tag `'PROPN'` (proper noun).

In [None]:
doc = nlp("i downloaded Fortnite on my laptop and can't open the game at all. Help? so when I was downloading Minecraft, I got the Windows version where it is the '.zip' folder and I used the default program to unpack it... do I also need to download Winzip?")

# Write a pattern that matches a form of "download" plus proper noun
pattern = [{'LEMMA': ____}, {'POS': ____}]

# Add the pattern to the matcher and apply the matcher to the doc
matcher.add('DOWNLOAD_THINGS_PATTERN', None, pattern)
matches = matcher(doc)
print('Total matches found:', len(matches))

# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print('Match found:', doc[start:end].text)

__Hint:__ 

In [None]:
%run Writing_match_patterns-hints-2.py

### Solution 

In [None]:
%load Writing_match_patterns-solution-2.py

## Instructions

* Write __one__ pattern that matches adjectives ('ADJ') followed by one or two 'NOUN's (one noun and one optional noun).

In [None]:
doc = nlp("Features of the app include a beautiful design, smart search, automatic labels and optional voice responses.")

# Write a pattern for adjective plus one or two nouns
pattern = [{'POS': ____}, {'POS': ____}, {'POS': ____, 'OP': ____}]

# Add the pattern to the matcher and apply the matcher to the doc
matcher.add('ADJ_NOUN_PATTERN', None, pattern)
matches = matcher(doc)
print('Total matches found:', len(matches))

# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print('Match found:', doc[start:end].text)

__Hint:__

In [None]:
%run Writing_match_patterns-hints-3.py

### Solution 

In [None]:
%load Writing_match_patterns-solution-3.py