Basic Methods in NLP:

    Tokenization
    Stemming
    Lemmatization
    Stop Word Removal
    Part-of-Speech Tagging
    Named Entity Recognition (NER)
    Sentiment Analysis
    Text Classification
    Word Embeddings (e.g., Word2Vec, GloVe)
    Bag of Words (BoW)
    TF-IDF (Term Frequency-Inverse Document Frequency)
    Text Generation
    Language Modeling
    Machine Translation
    Text Summarization

Key Topics to Prepare for NLP:

    Basic Python Programming
    Data Preprocessing Techniques
    Regular Expressions
    NLP Libraries (e.g., NLTK, spaCy, Hugging Face Transformers)
    Understanding of Machine Learning Basics
    Deep Learning Fundamentals
    Neural Networks and RNNs (Recurrent Neural Networks)
    Transformers and Attention Mechanisms
    Evaluation Metrics for NLP (e.g., Precision, Recall, F1 Score)
    Ethics in NLP and AI
    Applications of NLP in Real-World Scenarios
    Text Data Formats (e.g., JSON, CSV)
    Handling Imbalanced Datasets
    Exploratory Data Analysis (EDA) for Text Data

In [None]:
# Steps to create environment under conda variant
# ==============================================

# use cmd or anaconda prompt to follow the steps


# conda create -n text_env python=3.9 -y
# conda activate text_env

# after activating environment try to install the following libs for NLP task
# pip install nltk spacy transformers

# python -m spacy download en_core_web_sm
# python -m spacy download en_core_web_md



## 1. Tokenization
    What is it? Tokenization is the process of breaking down text into smaller units called tokens, which can be words, phrases, or sentences. This is often the first step in NLP tasks.

Example:

    Input: "Hello, world! This is NLP."
    Output: ["Hello", ",", "world", "!", "This", "is", "NLP", "."]
    Implementation:

Using NLTK:

In [1]:
import nltk
nltk.download('all')


[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to
[nltk_data]    |     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to
[nltk_data]    |     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger_eng to
[nltk_data]    |     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger_eng is already
[nltk_data]    |       up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_

True

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:

from nltk.tokenize import word_tokenize, sent_tokenize

text = "Hello, world! This is NLP. welcome to the world of NLP."
word_tokens = word_tokenize(text)
sentence_tokens = sent_tokenize(text)

print("Word Tokens:", word_tokens)
print("Sentence Tokens:", sentence_tokens)

Word Tokens: ['Hello', ',', 'world', '!', 'This', 'is', 'NLP', '.', 'welcome', 'to', 'the', 'world', 'of', 'NLP', '.']
Sentence Tokens: ['Hello, world!', 'This is NLP.', 'welcome to the world of NLP.']


In [5]:
!python -m spacy download en_core_web_md

Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
     ---------------------------------------- 0.0/33.5 MB ? eta -:--:--
     - -------------------------------------- 1.3/33.5 MB 9.5 MB/s eta 0:00:04
     --- ------------------------------------ 2.9/33.5 MB 7.6 MB/s eta 0:00:05
     ------- -------------------------------- 6.0/33.5 MB 10.2 MB/s eta 0:00:03
     -------------- ------------------------ 12.6/33.5 MB 15.8 MB/s eta 0:00:02
     ----------------- --------------------- 14.9/33.5 MB 16.8 MB/s eta 0:00:02
     ------------------------- ------------- 21.8/33.5 MB 14.5 MB/s eta 0:00:01
     ---------------------------- ---------- 24.4/33.5 MB 14.8 MB/s eta 0:00:01
     ----------------------------------- --- 30.1/33.5 MB 15.4 MB/s eta 0:00:01
     --------------------------------------  33.3/33.5 MB 15.7 MB/s eta 0:00:01
     -----------------------------

In [6]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ------------ --------------------------- 3.9/12.8 MB 21.3 MB/s eta 0:00:01
     --------------------------- ------------ 8.7/12.8 MB 20.7 MB/s eta 0:00:01
     ------------------------------------ -- 12.1/12.8 MB 18.9 MB/s eta 0:00:01
     --------------------------------------- 12.8/12.8 MB 17.4 MB/s eta 0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


Using spaCy:

In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "Hello, world! This is NLP."
doc = nlp(text)
print(doc)

word_tokens = [token.text for token in doc]
print("Word Tokens:", word_tokens)

Hello, world! This is NLP.
Word Tokens: ['Hello', ',', 'world', '!', 'This', 'is', 'NLP', '.']


## 2. Stemming
    What is it? Stemming is the process of reducing words to their base or root form. For example, "running" becomes "run."

Example:

    Input: "running", "ran", "runner"
    Output: "run", "ran", "runner"
Implementation:

Using NLTK:

In [10]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
words = ["running", "ran", "runner"]
stemmed_words = [stemmer.stem(word) for word in words]

print("Stemmed Words:", stemmed_words)

Stemmed Words: ['run', 'ran', 'runner']


In [13]:
import re

def change_contractions(text):
   
    # Define contractions and their expansions
    contractions = {
        r"\bain't\b": "is not",
        r"\baren't\b": "are not",
        r"\bcan't\b": "cannot",
        r"\bcan't've\b": "cannot have",
        r"\bcould've\b": "could have",
        r"\bcouldn't\b": "could not",
        r"\bcouldn't've\b": "could not have",
        r"\bdidn't\b": "did not"
    }
    
    # Replace contractions in the text using regex
    for contraction, expansion in contractions.items():
        text = re.sub(contraction, expansion, text, flags=re.IGNORECASE)
    
    return text


In [16]:
import re

def tokenize_sentence(text):
 
    # Regular expression pattern to match words and punctuation
    pattern = r"\w+|[^\w\s]"
    
    # Use re.findall to extract tokens that match the pattern
    tokens = re.findall(pattern, text)
    
    return tokens




In [17]:
# Example usage
sentence = "Hello, how are you doing today? Let's go!"
tokens = tokenize_sentence(sentence)
print(tokens)

['Hello', ',', 'how', 'are', 'you', 'doing', 'today', '?', 'Let', "'", 's', 'go', '!']


In [14]:
print(change_contractions("can't you go to school"))

cannot you go to school


Using spaCy:

In [12]:
import spacy

nlp = spacy.load("en_core_web_sm")
words = ["running", "ran", "runner"]
stemmed_words = [token.lemma_ for token in nlp(" ".join(words))]

print("Stemmed Words:", stemmed_words)

Stemmed Words: ['run', 'ran', 'runner']


## 3. Lemmatization
    What is it? Lemmatization is similar to stemming but it reduces words to their base or dictionary form (lemma). It considers the context and converts words to their meaningful base forms.

Example:

    Input: "better", "running", "geese"
    Output: "good", "run", "goose"
    Implementation:

Using NLTK:

In [18]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
words = ["better", "running", "geese"]
lemmatized_words = [lemmatizer.lemmatize(word, pos='a') for word in words]

print("Lemmatized Words:", lemmatized_words)

Lemmatized Words: ['good', 'running', 'geese']


Using spaCy:

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
words = ["better", "running", "geese"]
lemmatized_words = [token.lemma_ for token in nlp(" ".join(words))]

print("Lemmatized Words:", lemmatized_words)

4. Stop Word Removal
What is it? Stop words are common words that are often removed from text as they may not add significant meaning (e.g., "is", "the", "and").

Example:

Input: "This is a sample sentence."
Output: ["sample", "sentence"]
Implementation:

Using NLTK:

In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
text = "This is a sample sentence."
word_tokens = word_tokenize(text)
filtered_words = [word for word in word_tokens if word.lower() not in stop_words]

print("Filtered Words:", filtered_words)

Using spaCy:

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "This is a sample sentence."
doc = nlp(text)
filtered_words = [token.text for token in doc if not token.is_stop]

print("Filtered Words:", filtered_words)

5. Part-of-Speech Tagging
What is it? Part-of-speech (POS) tagging is the process of labeling words in a text with their corresponding part of speech, such as noun, verb, adjective, etc.

Example:

Input: "The cat sits on the mat."
Output: [("The", "DT"), ("cat", "NN"), ("sits", "VBZ "), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
Implementation:

Using NLTK:

In [None]:
import nltk
from nltk.tokenize import word_tokenize

nltk.download('averaged_perceptron_tagger')
text = "The cat sits on the mat."
word_tokens = word_tokenize(text)
pos_tags = nltk.pos_tag(word_tokens)

print("POS Tags:", pos_tags)

Using spaCy:

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "The cat sits on the mat."
doc = nlp(text)
pos_tags = [(token.text, token.pos_) for token in doc]

print("POS Tags:", pos_tags)

6. Named Entity Recognition (NER)
What is it? Named Entity Recognition (NER) is the process of identifying and classifying named entities in text into predefined categories such as person names, organizations, locations, dates, etc.

Example:

Input: "Barack Obama was born in Hawaii."
Output: [("Barack Obama", "PERSON"), ("Hawaii", "GPE")]
Implementation:

Using NLTK:

In [None]:
import nltk
from nltk import ne_chunk, pos_tag, word_tokenize

nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')

text = "Barack Obama was born in Hawaii."
word_tokens = word_tokenize(text)
pos_tags = pos_tag(word_tokens)
named_entities = ne_chunk(pos_tags)

print("Named Entities:", named_entities)

Using spaCy:

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "Barack Obama was born in Hawaii."
doc = nlp(text)

named_entities = [(ent.text, ent.label_) for ent in doc.ents]
print("Named Entities:", named_entities)

7. Sentiment Analysis
What is it? Sentiment analysis is the process of determining the emotional tone behind a body of text, typically classifying it as positive, negative, or neutral.

Example:

Input: "I love this product!"
Output: Positive
Implementation:

Using NLTK:

In [None]:
from nltk.sentiment import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()

text = "I love this product!"
sentiment = sia.polarity_scores(text)

print("Sentiment Scores:", sentiment)

Using spaCy: 

    (Note: spaCy does not have built-in sentiment analysis, but you can use the textblob library for this purpose.)

In [None]:
from textblob import TextBlob

text = "I love this product!"
blob = TextBlob(text)
sentiment = blob.sentiment

print("Sentiment:", sentiment)

8. Text Classification
What is it? Text classification is the task of assigning predefined categories to text. This can be used for spam detection, topic labeling, sentiment classification, etc.

Example:

Input: "This is a great movie!"
Output: "Positive"
Implementation:

Using NLTK:
    (For demonstration, a simple classifier can be created using Naive Bayes.)

In [None]:
from nltk.corpus import movie_reviews
import random

nltk.download('movie_reviews')

# Prepare the dataset
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Feature extraction
def document_features(words):
    return {word: True for word in words}

featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train the classifier
from nltk import NaiveBayesClassifier
classifier = NaiveBayesClassifier.train(train_set)

# Test the classifier
print("Classifier accuracy:", nltk.classify.accuracy(classifier, test_set))

Using spaCy: 
    
    (You can use scikit-learn for a more structured approach, as spaCy does not include built-in classifiers.)

In [None]:
import spacy
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

nlp = spacy.load("en_core_web_sm")

# Sample data
data = ["I love this movie!", "This is a terrible film.", "What a great experience!", "I did not like it."]
labels = ["Positive", "Negative", "Positive", "Negative"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.25, random_state=42)

# Create a pipeline
model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(X_train, y_train)

# Test the classifier
accuracy = model.score(X_test, y_test)
print("Classifier accuracy:", accuracy)

9. Word Embeddings (e.g., Word2Vec, GloVe)
What is it? Word embeddings are vector representations of words that capture semantic meanings and relationships. They help in converting words into numerical format for machine learning models.

Example:

Input: "king" - "man" + "woman" = "queen"
Output: Vector representation of "queen"
Implementation:

Using Gensim for Word2Vec:

In [None]:
from gensim.models import Word2Vec

# Sample sentences
sentences = [["king", "queen", "man", "woman"], ["I", "love", "NLP"]]
model = Word2Vec(sentences, min_count=1)

# Get vector for a word
vector = model.wv['king']
print("Vector for 'king':", vector)

# Find similar words
similar_words = model.wv.most_similar('king')
print("Similar words to 'king':", similar_words)

Using spaCy for pre-trained embeddings:

In [None]:
import spacy

nlp = spacy.load("en_core_web_md")  # Load medium model with word vectors
king = nlp("king")
queen = nlp("queen")

# Calculate similarity
similarity = king.similarity(queen)
print("Similarity between 'king' and 'queen':", similarity)

10. Text Generation
What is it? Text generation is the process of automatically generating text based on a given input or context. This can be done using various models, including RNNs and transformers.

Example:

Input: "Once upon a time"
Output: "Once upon a time, in a land far away, there lived a brave knight."
Implementation:

Using NLTK for a simple Markov chain text generator:

In [None]:
import nltk
import random

nltk.download('punkt')

text = "Once upon a time, there was a brave knight. The knight fought many battles."
tokens = nltk.word_tokenize(text)
bigrams = list(nltk.bigrams(tokens))

# Create a dictionary of bigrams
bigram_dict = {}
for w1, w2 in bigrams:
    if w1 not in bigram_dict:
        bigram_dict[w1] = []
    bigram_dict[w1].append(w2)

# Generate text
current_word = random.choice(tokens)
generated_text = [current_word]

for _ in range(10):  # Generate 10 words
    next_words = bigram_dict.get(current_word, [None])
    current_word = random.choice(next_words)
    if current_word is None:
        break
    generated_text.append(current_word)

print("Generated Text:", ' '.join(generated_text))

Using Hugging Face Transformers for advanced text generation:

In [None]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')
text = generator("Once upon a time", max_length=50, num_return_sequences=1)

print("Generated Text:", text[0]['generated_text'])

### 11. Language Modeling 
What is it? Language modeling is the task of predicting the next word in a sequence given the previous words. It is fundamental for various NLP applications, including text generation and speech recognition.

Example:

Input: "The cat sat on the"
Output: "mat"
Implementation:

Using NLTK for a simple n-gram model:

In [None]:
import nltk
from nltk import ngrams
from collections import Counter

text = "The cat sat on the mat. The cat is happy."
tokens = nltk.word_tokenize(text)
bigrams = list(ngrams(tokens, 2))

# Count bigrams
bigram_counts = Counter(bigrams)
print("Bigram Counts:", bigram_counts)

Using Hugging Face Transformers for a pre-trained language model:

In [None]:
from transformers import pipeline

model = pipeline('fill-mask', model='bert-base-uncased')
text = "The cat sat on the [MASK]."
predictions = model(text)

print("Predictions:", predictions)

12. Text Summarization
What is it? Text summarization is the process of creating a concise and coherent summary of a longer text document while preserving its main ideas.

Example:

Input: "Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language."
Output: "NLP focuses on the interaction between computers and humans through language."
Implementation:

Using NLTK for extractive summarization:

In [None]:
from nltk.tokenize import sent_tokenize
from nltk.probability import FreqDist

text = "Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language."
sentences = sent_tokenize(text)
words = nltk.word_tokenize(text.lower())
freq = FreqDist(words)

# Rank sentences based on word frequency
ranking = {i: sum(freq[word] for word in nltk.word_tokenize(sent.lower())) for i, sent in enumerate(sentences)}
top_sentences = sorted(ranking, key=ranking.get, reverse=True)[:1]  # Get top 1 sentence

summary = [sentences[i] for i in sorted(top_sentences)]
print("Summary:", ' '.join(summary))

Using Hugging Face Transformers for abstractive summarization:

In [None]:
from transformers import pipeline

summarizer = pipeline('summarization')
text = "Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language."
summary = summarizer(text, max_length=30, min_length=10, do_sample=False)

print("Summary:", summary[0]['summary_text'])

13. Topic Modeling
What is it? Topic modeling is a technique used to discover abstract topics within a collection of documents. It helps in understanding the underlying themes in large datasets.

Example:

Input: A collection of news articles
Output: Topics such as "politics," "sports," "technology"
Implementation:

Using Gensim for LDA topic modeling:

In [None]:
from gensim import corpora
from gensim.models import LdaModel

documents = ["The cat sat on the mat.", "Dogs are great pets.", "Cats and dogs are popular animals."]
texts = [[word for word in doc.lower().split()] for doc in documents]

# Create a dictionary and corpus
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model
lda_model = LdaModel(corpus, num_topics=2, id2word=dictionary, passes=10)

for idx, topic in lda_model.print_topics(-1):
    print(f"Topic {idx}: {topic}")

Using spaCy with Scikit-learn for topic modeling:

In [None]:
import spacy
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

nlp = spacy.load("en_core_web_sm")
documents = ["The cat sat on the mat.", "Dogs are great pets.", "Cats and dogs are popular animals."]

# Preprocess text
texts = [' '.join([token.lemma_ for token in nlp(doc) if not token.is_stop]) for doc in documents]

# Vectorize text
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Train LDA model
lda = LatentDirichletAllocation(n_components=2, random_state=42)
lda.fit(X)

print("Topics found by LDA:", lda.components_)

### 14. Machine Translation
What is it? Machine translation is the process of automatically translating text from one language to another using algorithms and models.

Example:

Input: "Hello, how are you?"
Output: "Hola, ¿cómo estás?" (Spanish translation)
Implementation:

Using Hugging Face Transformers for translation:

In [None]:
from transformers import pipeline

translator = pipeline("translation_en_to_es")  # English to Spanish
text = "Hello, how are you?"
translation = translator(text)

print("Translation:", translation[0]['translation_text'])

15. Speech Recognition
What is it? Speech recognition is the technology that enables the conversion of spoken language into text. It is widely used in applications like virtual assistants and transcription services.

Example:

Input: "Turn on the lights."
Output: "Turn on the lights." (as text)
Implementation:

Using SpeechRecognition library:

In [None]:
import speech_recognition as sr

recognizer = sr.Recognizer()
with sr.Microphone() as source:
    print("Please say something:")
    audio = recognizer.listen(source)

try:
    text = recognizer.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Sorry, I could not understand the audio.")
except sr.RequestError:
    print("Could not request results from Google Speech Recognition service.")

16. Text-to-Speech
What is it? Text-to-speech (TTS) is a technology that converts written text into spoken words. It is used in applications like virtual assistants and accessibility tools.

Example:

Input: "Welcome to the world of NLP."
Output: Spoken version of the text.
Implementation:

Using gTTS (Google Text-to-Speech):

In [None]:
from gtts import gTTS
import os

text = "Welcome to the world of NLP."
tts = gTTS(text=text, lang='en')
tts.save("welcome.mp3")
os.system("start welcome.mp3")  # Play the audio file

17. Text Clustering
What is it? Text clustering is the task of grouping a set of documents into clusters based on their content. It helps in organizing and summarizing large datasets.

Example:

Input: A collection of news articles
Output: Clusters of articles on similar topics.
Implementation:

Using Scikit-learn for K-means clustering:

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

documents = ["The cat sat on the mat.", "Dogs are great pets.", "Cats and dogs are popular animals."]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

# K-means clustering
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)

print("Cluster labels:", kmeans.labels_)

18. Question Answering
What is it? Question answering is a task in NLP that involves automatically answering questions posed in natural language. It can be based on a specific context or general knowledge.

Example:

Input: "What is the capital of France?"
Output: "Paris."
Implementation:

Using Hugging Face Transformers for question answering:

In [None]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering")
context = "The capital of France is Paris."
question = "What is the capital of France?"
answer = qa_pipeline(question=question, context=context)

print("Answer:", answer['answer'])

19. Dependency Parsing
What is it? Dependency parsing is the process of analyzing the grammatical structure of a sentence to establish relationships between words. It helps in understanding the syntactic structure of sentences.

Example:

Input: "The cat sat on the mat."
Output: A tree structure showing the relationships between words.
Implementation:

Using spaCy for dependency parsing:

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "The cat sat on the mat."
doc = nlp(text)

for token in doc:
    print(f"{token.text} --> {token.dep_} --> {token.head.text}")

20. Coreference Resolution
What is it? Coreference resolution is the task of determining which words in a sentence refer to the same entity. It helps in understanding the context and relationships in text.

Example:

Input: "Alice went to the park. She enjoyed the sunshine."
Output: "She" refers to "Alice."
Implementation:

** Using spaCy for coreference resolution:**

In [None]:
import spacy
from spacy.tokens import Doc

nlp = spacy.load("en_core_web_sm")
text = "Alice went to the park. She enjoyed the sunshine."
doc = nlp(text)

# Note: spaCy does not have built-in coreference resolution, but you can use the neuralcoref library for this purpose.
# Here is a simple demonstration of how you might approach it with neuralcoref.

import neuralcoref

# Add neuralcoref to the spaCy pipeline
neuralcoref.add_to_pipe(nlp)

doc = nlp(text)
if doc._.has_coref:
    for cluster in doc._.coref_clusters:
        print(f"Cluster: {cluster}")
else:
    print("No coreference found.")