# How to use Spacy for noun phrase extraction

https://practicaldatascience.co.uk/data-science/how-to-use-spacy-for-noun-phrase-extraction

Noun phrase extraction is a Natural Language Processing technique that can be used to identify and extract noun phrases from text. Noun phrases are phrases that function grammatically as nouns in a sentence, and usually include a noun or pronoun as the headword, as well as any associated determiners, adjectives, and modifiers.

For example, given the sentence “The quick brown fox jumps over the lazy dog”, the noun phrases would be “The quick brown fox” and “the lazy dog.”

Noun phrase extraction can be very useful when analysing customer review data during review mining since it reveals more than just the nouns alone. It can be achieved during a range of NLP techniques, including dependency parsing, part of speech tagging, as well as shallow parsing and via Large Language Models and transformers.

In this project we’ll use the Spacy natural language processing library to extract some noun phrases from some text to show how easily it can be achieved.

In [1]:
# Install the packages
!pip install -U spacy
!python -m spacy download en_core_web_sm

Collecting spacy
  Downloading spacy-3.6.1-cp39-cp39-win_amd64.whl (12.1 MB)
     ---------------------------------------- 0.0/12.1 MB ? eta -:--:--
      --------------------------------------- 0.2/12.1 MB 4.1 MB/s eta 0:00:03
     -- ------------------------------------- 0.7/12.1 MB 7.7 MB/s eta 0:00:02
     ---- ----------------------------------- 1.3/12.1 MB 9.0 MB/s eta 0:00:02
     ------ --------------------------------- 1.8/12.1 MB 9.6 MB/s eta 0:00:02
     ------- -------------------------------- 2.3/12.1 MB 9.9 MB/s eta 0:00:01
     --------- ------------------------------ 2.9/12.1 MB 10.2 MB/s eta 0:00:01
     ----------- ---------------------------- 3.4/12.1 MB 10.4 MB/s eta 0:00:01
     ------------- -------------------------- 4.0/12.1 MB 11.0 MB/s eta 0:00:01
     -------------- ------------------------- 4.5/12.1 MB 11.0 MB/s eta 0:00:01
     ---------------- ----------------------- 5.0/12.1 MB 11.1 MB/s eta 0:00:01
     ------------------ --------------------- 5.6/12.1 M

Collecting en-core-web-sm==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     - -------------------------------------- 0.5/12.8 MB 15.0 MB/s eta 0:00:01
     --- ------------------------------------ 1.0/12.8 MB 13.0 MB/s eta 0:00:01
     ---- ----------------------------------- 1.6/12.8 MB 12.5 MB/s eta 0:00:01
     ------ --------------------------------- 2.1/12.8 MB 12.3 MB/s eta 0:00:01
     -------- ------------------------------- 2.7/12.8 MB 12.1 MB/s eta 0:00:01
     ---------- ----------------------------- 3.2/12.8 MB 12.0 MB/s eta 0:00:01
     ----------- ---------------------------- 3.7/12.8 MB 12.3 MB/s eta 0:00:01
     ------------- -------------------------- 4.2/12.8 MB 12.2 MB/s eta 0:00:01
     ------------- -------------------------- 4.2/12.8 MB 12.2 MB/s eta 0:00:01
     ------------- -------------

Import the packages

In [4]:
import spacy

Import Spacy then load the en_core_web_sm model using spacy.load(). Assign the model object to a variable called nlp. We’ll now be able to pass data to the nlp model and perform various Natural Language Processing tasks.

In [5]:
nlp = spacy.load("en_core_web_sm")

### Create a document to analyse
Now we need to create a variable or document that contains the text we want Spacy to analyse. We’ll store a sentence containing some noun phrases in a variable called text.

In [6]:
text = """
The data scientist hurriedly wrote some code on their Linux workstation to get everything completed before the deadline. 
"""

### Pass the text to Spacy

Next we need to pass our text variable to the nlp() model and assign the output to a variable so we can parse the results returned. We can do this by entering doc = nlp(text).

In [7]:
doc = nlp(text)

### Extract nouns and noun phrases

To extract nouns and noun phrases from the doc returned by Spacy we can use a couple of list comprehensions. The first one returns a list of values where the token.pos_ value is NOUN, which gives us a list of the nouns in our text.

In [8]:
print("Nouns:", [token.lemma_ for token in doc if token.pos_ == "NOUN"])

Nouns: ['data', 'scientist', 'code', 'workstation', 'deadline']


The second list comprehension extracts the noun phrases from the chunk.text using the noun_chunks feature of Spacy. This returns a list containing all the noun phrases Spacy extracted from the text.

In [9]:
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

Noun phrases: ['\nThe data scientist', 'some code', 'their Linux workstation', 'everything', 'the deadline']
