In [1]:
import spacy

In [2]:
m = spacy.load("en_core_web_sm")

In [3]:
example="'In natural language processing, chinking is a technique used in information extraction to remove parts of a chunk that do not fit a specific pattern. It is essentially the inverse of the process of chunking, which involves identifying and grouping together parts of a sentence that belong to a specific syntactic category. While chunking is useful for identifying meaningful chunks of text, chinking can be used to exclude parts of a chunk that are not relevant to the analysis.Chinking was first introduced by Abney in 1991 as part of his work on natural language parsing. Steven Abney is a computational linguist who has made significant contributions to the field of computational linguistics and natural language processing. He received his Ph.D. in Linguistics from MIT in 1987 and is currently a researcher at the University of Michigan.Abney's work on chinking helped to improve the accuracy of natural language parsing by allowing for more precise identification of phrases and their parts. The technique has since become a standard tool in natural language processing and is widely used in various applications, including sentiment analysis, named entity recognition, and machine translation"
doc = m(example)

for ent in doc.ents:
    print("{0} ... {1}".format(ent.text,ent.label_))

for token in doc:  
    # Print each token
    print(token, token.pos_, token.tag_)

Abney ... ORG
1991 ... DATE
Steven Abney ... PERSON
Ph.D. ... WORK_OF_ART
MIT ... ORG
1987 ... DATE
the University of Michigan ... ORG
Abney ... PERSON
' PUNCT ``
In ADP IN
natural ADJ JJ
language NOUN NN
processing NOUN NN
, PUNCT ,
chinking VERB VBG
is AUX VBZ
a DET DT
technique NOUN NN
used VERB VBN
in ADP IN
information NOUN NN
extraction NOUN NN
to PART TO
remove VERB VB
parts NOUN NNS
of ADP IN
a DET DT
chunk NOUN NN
that PRON WDT
do AUX VBP
not PART RB
fit VERB VB
a DET DT
specific ADJ JJ
pattern NOUN NN
. PUNCT .
It PRON PRP
is AUX VBZ
essentially ADV RB
the DET DT
inverse NOUN NN
of ADP IN
the DET DT
process NOUN NN
of ADP IN
chunking NOUN NN
, PUNCT ,
which PRON WDT
involves VERB VBZ
identifying NOUN NN
and CCONJ CC
grouping VERB VBG
together ADV RB
parts NOUN NNS
of ADP IN
a DET DT
sentence NOUN NN
that PRON WDT
belong VERB VBP
to ADP IN
a DET DT
specific ADJ JJ
syntactic ADJ JJ
category NOUN NN
. PUNCT .
While SCONJ IN
chunking VERB VBG
is AUX VBZ
useful ADJ JJ
for ADP IN
i

In [4]:
for ent in doc.ents:

    # Print the named entity and its label
    print(ent.text, ent.label_)

Abney ORG
1991 DATE
Steven Abney PERSON
Ph.D. WORK_OF_ART
MIT ORG
1987 DATE
the University of Michigan ORG
Abney PERSON


In [5]:
#Sentimental Analysis of IDMB reviews
#imoprt libraries 
import pandas as pd
df = pd.read_csv('IMDB Dataset.csv')

In [7]:
#Preprocess the text data:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# nltk.download('stopwords')
# nltk.download('wordnet')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    # Remove HTML tags and non-alphanumeric characters
    text = re.sub('<[^>]*>', '', text)
    text = re.sub('[^a-zA-Z0-9]', ' ', text)
    
    # Convert to lowercase and tokenize
    tokens = nltk.word_tokenize(text.lower())
    
    # Remove stop words and lemmatize tokens
    tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words]
    
    # Join tokens into a string
    return ' '.join(tokens)
    
df['review'] = df['review'].apply(preprocess_text)

In [8]:

#Split the data into training and testing sets:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.2, random_state=42)

In [9]:
#Vectorize the text data using the tf-idf vectorizer:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

In [10]:
#Train a classifier (e.g., logistic regression) on the vectorized data:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X_train_tfidf, y_train)

In [11]:
#Evaluate the classifier on the testing set:
from sklearn.metrics import accuracy_score

y_pred = clf.predict(X_test_tfidf)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)

Accuracy: 0.8958


In [12]:
#save the trained model using the pickle module in Python:
import pickle

# Save the vectorizer and classifier
with open('model.pkl', 'wb') as file:
    pickle.dump((vectorizer, clf), file)

In [13]:
# Load the saved vectorizer and classifier
with open('model.pkl', 'rb') as file:
    vectorizer, clf = pickle.load(file)