- Keyword extraction helps identify the most important words or phrases from a text, which can be helpful for summarization, tagging, and categorization.

In [1]:
!pip install rake-nltk

from rake_nltk import Rake

Collecting rake-nltk
  Downloading rake_nltk-1.0.6-py3-none-any.whl.metadata (6.4 kB)
Downloading rake_nltk-1.0.6-py3-none-any.whl (9.1 kB)
Installing collected packages: rake-nltk
Successfully installed rake-nltk-1.0.6


In [8]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [9]:
text = """
Natural Language Processing (NLP) is a fascinating field of artificial intelligence.
It helps computers understand, interpret, and respond to human language.
NLP applications include machine translation, sentiment analysis, chatbots,
and much more that aids in smooth communication between humans and computers.
"""

In [10]:
rake_nltk_var = Rake()    # initializing rake for keyword extraction

- RAKE is a simple and fast keyword extraction algorithm that extracts phrases based on the frequency and co-occurrence of words.

In [11]:
rake_nltk_var.extract_keywords_from_text(text)   # extract keywords from the text

In [12]:
keywords = rake_nltk_var.get_ranked_phrases()   # get the ranked keyword

- keywords will contain a list of the most important phrases in the text, ordered by importance.

In [13]:
print("The keywords are as:", keywords)

The keywords are as: ['nlp applications include machine translation', 'natural language processing', 'helps computers understand', 'human language', 'smooth communication', 'sentiment analysis', 'fascinating field', 'artificial intelligence', 'nlp', 'computers', 'respond', 'much', 'interpret', 'humans', 'chatbots', 'aids']


INTERPRETATION:

- Here, if observed all words are termed as keywords even if the words do not have much uniqueness.
- quite a few words have been identified as keywords, and that can sometimes happen with simpler keyword extraction methods like RAKE.
- Not every recurring word is equally meaningful.
- To filter out such extra words, more advanced NLP methods, such as Term Frequency-Inverse Document Frequency (TF-IDF) or named entity recognition (NER), can help zero in on truly unique and relevant terms that capture the core message.
- while RAKE is a good start, combining it with methods like TF-IDF or manual filtering can improve the accuracy of the keywords by filtering out overly general terms.