In [1]:
import pandas as pd

df=pd.read_csv('/kaggle/input/nips-papers/papers.csv')
df.head(25)

Unnamed: 0,id,year,title,event_type,pdf_name,abstract,paper_text
0,1,1987,Self-Organization of Associative Database and ...,,1-self-organization-of-associative-database-an...,Abstract Missing,767\n\nSELF-ORGANIZATION OF ASSOCIATIVE DATABA...
1,10,1987,A Mean Field Theory of Layer IV of Visual Cort...,,10-a-mean-field-theory-of-layer-iv-of-visual-c...,Abstract Missing,683\n\nA MEAN FIELD THEORY OF LAYER IV OF VISU...
2,100,1988,Storing Covariance by the Associative Long-Ter...,,100-storing-covariance-by-the-associative-long...,Abstract Missing,394\n\nSTORING COVARIANCE BY THE ASSOCIATIVE\n...
3,1000,1994,Bayesian Query Construction for Neural Network...,,1000-bayesian-query-construction-for-neural-ne...,Abstract Missing,Bayesian Query Construction for Neural\nNetwor...
4,1001,1994,"Neural Network Ensembles, Cross Validation, an...",,1001-neural-network-ensembles-cross-validation...,Abstract Missing,"Neural Network Ensembles, Cross\nValidation, a..."
5,1002,1994,Using a neural net to instantiate a deformable...,,1002-using-a-neural-net-to-instantiate-a-defor...,Abstract Missing,U sing a neural net to instantiate a\ndeformab...
6,1003,1994,Plasticity-Mediated Competitive Learning,,1003-plasticity-mediated-competitive-learning.pdf,Abstract Missing,Plasticity-Mediated Competitive Learning\n\nTe...
7,1004,1994,ICEG Morphology Classification using an Analog...,,1004-iceg-morphology-classification-using-an-a...,Abstract Missing,ICEG Morphology Classification using an\nAnalo...
8,1005,1994,Real-Time Control of a Tokamak Plasma Using Ne...,,1005-real-time-control-of-a-tokamak-plasma-usi...,Abstract Missing,Real-Time Control of a Tokamak Plasma\nUsing N...
9,1006,1994,Pulsestream Synapses with Non-Volatile Analogu...,,1006-pulsestream-synapses-with-non-volatile-an...,Abstract Missing,Real-Time Control of a Tokamak Plasma\nUsing N...


In [2]:
df['paper_text']

0       767\n\nSELF-ORGANIZATION OF ASSOCIATIVE DATABA...
1       683\n\nA MEAN FIELD THEORY OF LAYER IV OF VISU...
2       394\n\nSTORING COVARIANCE BY THE ASSOCIATIVE\n...
3       Bayesian Query Construction for Neural\nNetwor...
4       Neural Network Ensembles, Cross\nValidation, a...
                              ...                        
7236    Single Transistor Learning Synapses\n\nPaul Ha...
7237    Bias, Variance and the Combination of\nLeast S...
7238    A Real Time Clustering CMOS\nNeural Engine\nT....
7239    Learning direction in global motion: two\nclas...
7240    Correlation and Interpolation Networks for\nRe...
Name: paper_text, Length: 7241, dtype: object

* the column of interest is **paper_text** so I will ignore the others.
* You can adjust the **index** to refer to the specific paper you want to summarize."

In [3]:
index_of_paper=5
text=df['paper_text'][index_of_paper]
text

'U sing a neural net to instantiate a\ndeformable model\nChristopher K. I. Williams; Michael D. Revowand Geoffrey E. Hinton\nDepartment of Computer Science, University of Toronto\nToronto, Ontario, Canada M5S lA4\n\nAbstract\nDeformable models are an attractive approach to recognizing nonrigid objects which have considerable within class variability. However, there are severe search problems associated with fitting the\nmodels to data. We show that by using neural networks to provide\nbetter starting points, the search time can be significantly reduced.\nThe method is demonstrated on a character recognition task.\nIn previous work we have developed an approach to handwritten character recognition based on the use of deformable models (Hinton, Williams and Revow, 1992a;\nRevow, Williams and Hinton, 1993). We have obtained good performance with this\nmethod, but a major problem is that the search procedure for fitting each model to\nan image is very computationally intensive, because the

In [4]:
import re 
import nltk
import spacy
from bs4 import BeautifulSoup 
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


nlp = spacy.load("en_core_web_sm")

nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

# Text Processing 

**steps**:
*     Converts text to lowercase.
*      Removes HTML tags.
*      Removes digits.
*      Removes URLs and links.
*      Removes single characters followed by a period.
*      Removes parentheses and their contents.
*      Tokenizes the text into words.
*      Removes  stopwords.
*      Joins the cleaned tokens into a single string


In [5]:
def txt_processing(doc_text):
    
    doc_text = doc_text.lower() #convert to lower case        
    doc_text = BeautifulSoup(doc_text, "html.parser").text # Remove HTML tag       
    doc_text = re.sub(r'\d+', '', doc_text) # Remove digits 
    doc_text = re.sub(r'http\S+|www\S+|https\S+', '', doc_text, flags=re.MULTILINE)# Remove links
    doc_text = re.sub(r'\b\w\.\b', '', doc_text)
    doc_text = re.sub(r'\(.*?\)', '', doc_text)  # Removes () and their contents
   
    tokens = word_tokenize(doc_text) # Tokenize
    
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]# Remove stopwords  
   
   
    cleaned_doc = ' '.join(tokens) # join as a string
    
    return tokens,cleaned_doc 

In [6]:
doc = nlp(text)
tokens,doc = txt_processing(doc.text)

In [7]:
print(doc)

u sing neural net instantiate deformable model christopher k. i. williams ; michael d. revowand geoffrey e. hinton department computer science , university toronto toronto , ontario , canada ms la abstract deformable models attractive approach recognizing nonrigid objects considerable within class variability . however , severe search problems associated fitting models data . show using neural networks provide better starting points , search time significantly reduced . method demonstrated character recognition task . previous work developed approach handwritten character recognition based use deformable models ( hinton , williams revow , ; revow , williams hinton , ) . obtained good performance method , major problem search procedure fitting model image computationally intensive , efficient algorithm task . paper demonstrate possible `` compile '' knowledge gained fitting models data obtain better starting points significantly reduce search time . deformable models digit recognition b


# POS Tag Overview for Tokens

**POS** 
* > refers to **P**art **o**f **S**peech
* > involves assigning a **grammatical** category to each word in a text,** such as noun, verb, adjective, etc.**
* > This helps in **understanding the syntactic structure of sentences**

In [8]:
doc=nlp(doc)
print(f"{'Token':<15} {'Part-of-Speech':<15}")
print("-" * 30)
for token in doc[:50]: # for the first 50
    print(f"{token.text:<15} {token.pos_:<15}")

Token           Part-of-Speech 
------------------------------
u               NOUN           
sing            VERB           
neural          ADJ            
net             ADJ            
instantiate     ADJ            
deformable      ADJ            
model           NOUN           
christopher     PROPN          
k.              PROPN          
i.              PROPN          
williams        PROPN          
;               PUNCT          
michael         PROPN          
d.              PROPN          
revowand        PROPN          
geoffrey        PROPN          
e.              PROPN          
hinton          PROPN          
department      PROPN          
computer        PROPN          
science         PROPN          
,               PUNCT          
university      PROPN          
toronto         PROPN          
toronto         PROPN          
,               PUNCT          
ontario         PROPN          
,               PUNCT          
canada          PROPN          
ms       

# Filtered Tokens Based on POS Tags"

In [9]:
from string import punctuation

punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

**The abbreviations  refer to specific POS in NLP:**
> * **ADJ**: Adjective 
> * **VERB**: Verb 
> * **NOUN**: Noun
> * **INTJ**: Interjection

In [10]:

allowed_pos=['ADJ','VERB','NOUN','INTJ']
tokens_1=[]

for token in doc:
    if  token.text in punctuation:
        continue
    elif token.pos_ in allowed_pos:
        tokens_1.append(token.text)
        
print(tokens_1,end=" ")        

['u', 'sing', 'neural', 'net', 'instantiate', 'deformable', 'model', 'abstract', 'deformable', 'models', 'attractive', 'approach', 'recognizing', 'nonrigid', 'objects', 'considerable', 'class', 'variability', 'severe', 'search', 'problems', 'associated', 'fitting', 'models', 'data', 'show', 'using', 'neural', 'networks', 'provide', 'better', 'starting', 'points', 'search', 'time', 'reduced', 'method', 'demonstrated', 'character', 'recognition', 'task', 'previous', 'work', 'developed', 'approach', 'handwritten', 'character', 'recognition', 'based', 'use', 'deformable', 'models', 'revow', 'obtained', 'good', 'performance', 'method', 'major', 'problem', 'search', 'procedure', 'fitting', 'model', 'image', 'intensive', 'efficient', 'task', 'paper', 'demonstrate', 'possible', 'compile', 'knowledge', 'gained', 'fitting', 'models', 'data', 'obtain', 'starting', 'points', 'reduce', 'search', 'time', 'deformable', 'models', 'digit', 'recognition', 'basic', 'idea', 'using', 'deformable', 'models'

# Word Frequency Count of Filtered Tokens

In [11]:
from collections import Counter

word_freq=Counter(tokens_1)
print(word_freq,end="\t")

Counter({'model': 45, 'image': 27, 'net': 23, 'using': 23, 'models': 22, 'search': 20, 'neural': 19, 'parameters': 17, 'data': 16, 'network': 15, 'instantiation': 14, 'control': 14, 'prediction': 14, 'space': 13, 'performance': 12, 'set': 12, 'deformable': 11, 'points': 11, 'images': 11, 'used': 11, 'method': 10, 'recognition': 10, 'obtained': 10, 'annealing': 10, 'schedule': 10, 'output': 10, 'based': 9, 'possible': 9, 'digit': 9, 'test': 9, 'number': 9, 'parameter': 9, 'approach': 8, 'use': 8, 'given': 8, 'units': 8, 'networks': 7, 'p': 7, 'point': 7, 'described': 7, 'features': 7, 'hidden': 7, 'fitting': 6, 'task': 6, 'probability': 6, 'distribution': 6, 'spline': 6, 'locations': 6, 'input': 6, 'trivial': 6, 'position': 6, 'time': 5, 'work': 5, 'ink': 5, 'computer': 5, 'many': 5, 'object': 5, 'example': 5, 'digits': 5, 'generators': 5, 'mixture': 5, 'similar': 5, 'correspondence': 5, 'normalized': 5, 'schedules': 5, 'full': 5, 'instantiate': 4, 'character': 4, 'revow': 4, 'problem':

# Extract Sentences from the Text

In [12]:
doc=nlp(doc)
sent_token=[sent.text for sent in doc.sents]
len(sent_token)


188

# Sentence Scores Based on Frequencies

In [13]:
sent_score = {}
for sent in sent_token:
    sent_score[sent] = 0  # Initialize the sentence score to 0
    for word in sent.split():
        if word in word_freq:
            sent_score[sent] += word_freq[word]  # Increment the score by the word's frequency

print(sent_score)


{'u sing neural net instantiate deformable model christopher k. i. williams ; michael d. revowand geoffrey e. hinton department computer science , university toronto toronto , ontario , canada ms la abstract deformable models attractive approach recognizing nonrigid objects considerable within class variability .': 167, 'however , severe search problems associated fitting models data .': 69, 'show using neural networks provide better starting points , search time significantly reduced .': 97, 'method demonstrated character recognition task .': 31, 'previous work developed approach handwritten character recognition based use deformable models ( hinton , williams revow , ; revow , williams hinton , ) .': 91, 'obtained good performance method , major problem search procedure fitting model image computationally intensive , efficient algorithm task .': 151, "paper demonstrate possible `` compile '' knowledge gained fitting models data obtain better starting points significantly reduce searc

# Sentence Scores DataFrame

In [14]:
pd.DataFrame(list(sent_score.items()),columns=["sentence",'score'])

Unnamed: 0,sentence,score
0,u sing neural net instantiate deformable model...,167
1,"however , severe search problems associated fi...",69
2,show using neural networks provide better star...,97
3,method demonstrated character recognition task .,31
4,previous work developed approach handwritten c...,91
...,...,...
177,"zemel , r .",1
178,"s. hinton , g. e. .",0
179,discovering viewpoint-invariant relationships ...,6
180,"lippmann , r. p. , moody , j. e. , touretzky ,...",35


# Summarization depend on Sentences score


### SentiBERT Enhancements
- **Text Summarization**: Uses BART transformer with optimized performance.
- **Sentiment Analysis**: Integrated with TextBlob for sentiment polarity and subjectivity scoring.
- **Dashboard**: Real-time visualization of summarization and sentiment analysis results using Dash and Plotly.

* You can adjust the **num_of_sent**  to specify the number of sentences you want to include in the summary.

In [15]:
from heapq import nlargest

num_of_sent=3
nn=nlargest(num_of_sent,sent_score,key=sent_score.get)
" ".join(nn) # to string

'using neural net instantiate deformable model cps model cps model cps model figure : prediction network architecture . judging weight using neural net instantiate deformable model schedule number trivial net prediction net average time required settle one model . . . deformable models digit recognition basic idea using deformable models digit recognition digit model , test image classified finding model likely generated .'

# Summarization using Transformers


### SentiBERT Enhancements
- **Text Summarization**: Uses BART transformer with optimized performance.
- **Sentiment Analysis**: Integrated with TextBlob for sentiment polarity and subjectivity scoring.
- **Dashboard**: Real-time visualization of summarization and sentiment analysis results using Dash and Plotly.

In [16]:
from transformers import pipeline

summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="pt")


2024-07-31 11:42:21.408203: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-31 11:42:21.408389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-31 11:42:21.592515: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [17]:
result_text = " ".join(tokens_1)
(result_text)

'u sing neural net instantiate deformable model abstract deformable models attractive approach recognizing nonrigid objects considerable class variability severe search problems associated fitting models data show using neural networks provide better starting points search time reduced method demonstrated character recognition task previous work developed approach handwritten character recognition based use deformable models revow obtained good performance method major problem search procedure fitting model image intensive efficient task paper demonstrate possible compile knowledge gained fitting models data obtain starting points reduce search time deformable models digit recognition basic idea using deformable models digit model test image classified finding model generated quality match model test image depends deformation model amount ink attributed noise distance remaining ink deformed model current address department computer science applied mathematics important terms assessing 

In [18]:
summary = summarizer(result_text, max_length=150, min_length=30, do_sample=False)

print(summary[0]['summary_text'])

u sing neural net instantiate deformable models attractive approach recognizing nonrigid objects considerable class variability severe search problems associated fitting models data show using neural networks provide better starting points search time reduced . phd thesis dept computer discovering viewpoint invariant relationships characterize objects .
