**TextRank Algorithm**

In [1]:
import numpy as np
import pandas as pd
import nltk
nltk.download('punkt')
import re
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
nltk.download('stopwords')
from gensim.models import Word2Vec
from scipy import spatial
import networkx as nx

text='''
The Fifth generation (5G) network is projected to support large amount of data traffic and massive number of wireless connections. Different data traffic has different Quality of Service (QoS) requirements. 5G mobile network aims to address the limitations of previous cellular standards (i.e., 2G/3G/4G) and be a prospective key enabler for future Internet of Things (IoT). 5G networks support a wide range of applications such as smart home, autonomous driving, drone operations, health and mission critical applications, Industrial IoT (IIoT), and entertainment and multimedia. Based on end users’ experience, several 5G services are categorized into immersive 5G services, intelligent 5G services, omnipresent 5G services, autonomous 5G services, and public 5G services. In this paper, we present a brief overview of 5G technical scenarios. We then provide a brief overview of accepted papers in our Special Issue on 5G mobile services and scenarios. Finally, we conclude this paper.
'''

sentences=sent_tokenize(text)

sentences_clean=[re.sub(r'[^\w\s]','',sentence.lower()) for sentence in sentences]
stop_words = stopwords.words('english')
sentence_tokens=[[words for words in sentence.split(' ') if words not in stop_words] for sentence in sentences_clean]

w2v=Word2Vec(sentence_tokens,size=1,min_count=1,iter=1000)
sentence_embeddings=[[w2v[word][0] for word in words] for words in sentence_tokens]
max_len=max([len(tokens) for tokens in sentence_tokens])
sentence_embeddings=[np.pad(embedding,(0,max_len-len(embedding)),'constant') for embedding in sentence_embeddings]

similarity_matrix = np.zeros([len(sentence_tokens), len(sentence_tokens)])
for i,row_embedding in enumerate(sentence_embeddings):
    for j,column_embedding in enumerate(sentence_embeddings):
        similarity_matrix[i][j]=1-spatial.distance.cosine(row_embedding,column_embedding)

nx_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(nx_graph)

top_sentence={sentence:scores[index] for index,sentence in enumerate(sentences)}
top=dict(sorted(top_sentence.items(), key=lambda x: x[1], reverse=True)[:5])

string=''
for sent in sentences:
  if sent in top.keys():
    string+=sent.replace("\n"," ")
print(string)



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
 The Fifth generation (5G) network is projected to support large amount of data traffic and massive number of wireless connections.Different data traffic has different Quality of Service (QoS) requirements.5G mobile network aims to address the limitations of previous cellular standards (i.e., 2G/3G/4G) and be a prospective key enabler for future Internet of Things (IoT).Based on end users’ experience, several 5G services are categorized into immersive 5G services, intelligent 5G services, omnipresent 5G services, autonomous 5G services, and public 5G services.We then provide a brief overview of accepted papers in our Special Issue on 5G mobile services and scenarios.




In [2]:
text1='''
The Fifth generation (5G) network is projected to support large amount of data traffic and massive number of wireless connections. Different data traffic has different Quality of Service (QoS) requirements. 5G mobile network aims to address the limitations of previous cellular standards (i.e., 2G/3G/4G) and be a prospective key enabler for future Internet of Things (IoT). 5G networks support a wide range of applications such as smart home, autonomous driving, drone operations, health and mission critical applications, Industrial IoT (IIoT), and entertainment and multimedia. Based on end users’ experience, several 5G services are categorized into immersive 5G services, intelligent 5G services, omnipresent 5G services, autonomous 5G services, and public 5G services. In this paper, we present a brief overview of 5G technical scenarios. We then provide a brief overview of accepted papers in our Special Issue on 5G mobile services and scenarios. Finally, we conclude this paper.
'''
ref1='''
5G network is projected to support large amount of data traffic and massive number of wireless connections. 5G networks support a wide range of applications such as smart home, autonomous driving, drone operations, health and mission critical applications.
'''

text2='''
The modern technology demands the maintenance of the increasing data which are in structured and unstructured form. The text documents collected in the various platforms occupies a massive space in the architectural structure of the computer system both physically and virtually. Apparently the users demand the summarizing of the collected documents for easy access and usage. To enable this automatic text summarization came into phase. The automatic text summarization condenses the text documents into meaningful phrases and textual messages which helps the user to understand the conceptual ides behind each core values. The importance of automatic text summarization stands as a helping source in the growing data. This paper discusses the basic blocks of the automatic text summarization and its feature in identifying the intricate properties of the meaningful text through various approaches.
'''
ref2='''
Automatic text summarization condenses the text documents into meaningful phrases and textual messages. This paper discusses the basic blocks of the automatic text summarizing and its feature in identifying the intricate properties.
'''

text3='''
We present an Integer Linear Program for exact inference under a maximum coverage model for automatic summarization. We compare our model, which operates at the sub-sentence or “concept”-level, to a sentence-level model, previously solved with an ILP. Our model scales more efficiently to larger problems because it does not require a quadratic number of variables to address redundancy in pairs of selected sentences. We also show how to include sentence compression in the ILP formulation, which has the desirable property of performing compression and sentence selection simultaneously. The resulting system performs at least as well as the best systems participating in the recent Text Analysis Conference, as judged by a variety of automatic and manual content-based metrics.
'''
ref3='''
We present a model for exact inference under a maximum coverage model for automatic summarization. We compare our model, which operates at the sub-sentence or "concept"-level, to a sentence-level model. Our model scales more efficiently to larger problems because it does not require a quadratic number of variables.
'''

text4='''
In this paper, we propose a universal solution to web search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) to automatically cluster web page results and (2) to summarize all the information in web pages so that speech-to-speech interaction is used efficiently to access information.
'''
ref4='''
In this paper, we propose a universal solution to web search and web browsing on handheld devices for visually impaired people.
'''

text5='''
Automatic text summarization (ATS) is the process of generating a summary by condensing text document by a computer machine. In this paper, we explored voting-based extractive approaches for text summarization. The main issue with most of the feature-based ATS methods is to find optimal feature weights for sentence scoring to optimize the quality of summary. Voting-based methods are sensitive to initial ranking process. We proposed reciprocal ranking-based sentence scoring approach that alleviates the feature weighting and initial ranking problem. The proposed approach uses a specific prominent set of features for initial ranking that further enhance the performance. Experimental results on Document Understating Conference 2002 data-set using ROUGE evaluation matrices shows that our proposed method performs better as compared to other voting-based methods.
'''
ref5='''
Automatic text summarization is the process of generating a summary by condensing text document by a computer machine. Main issue with most of the feature-based ATS methods is to find optimal feature weights for sentence scoring to optimize the quality of summary.
'''

text6='''
Vehicular Ad-Hoc Networks (VANET) are considered as a subset of Mobile Ad-Hoc Networks (MANET). VANET is mainly used for the construction of an intelligent transport system. VANET enables communication between the vehicles (V2V) and vehicles to infrastructure (V2I). VANET can be used to coordinate the traffic, improve safety measures, support the drivers for hassle-free driving. It plays a major role in building smart cities in the near future. VANET is vulnerable to a number of security issues among which the DoS attack is a major part. DoS attack in VANET involves a malicious node flooding a huge amount of traffic using spoofed identities. This, in turn, may disrupt the services of vehicles in the network. The detection of the attack becomes very difficult due to fake identities. The detection scheme uses a cuckoo filter and IP detection technique to detect the attack in the network. Once the attack is detected it generates a broadcast message to all the other vehicles that are present in the network.
'''
ref6='''
VANET is a subset of Mobile Ad-Hoc Networks (MANET) It enables communication between the vehicles (V2V) and vehicles to infrastructure. VANET can be used to coordinate the traffic, improve safety measures, support the drivers for hassle-free driving.
'''

text7='''
Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures: (a) a logistic discriminant approach which learns the metric from a set of labelled image pairs (LDML) and (b) a nearest neighbour approach which computes the probability for two images to belong to the same class (MkNN). We evaluate our approaches on the Labeled Faces in the Wild data set, a large and very challenging data set of faces from Yahoo! News. The evaluation protocol for this data set defines a restricted setting, where a fixed set of positive and negative image pairs is given, as well as an unrestricted one, where faces are labelled by their identity. We are the first to present results for the unrestricted setting, and show that our methods benefit from this richer training data, much more so than the current state-of-the-art method. Our results of 79.3% and 87.5% correct for the restricted and unrestricted setting respectively, significantly improve over the current state-of-the-art result of 78.5%. Confidence scores obtained for face identification can be used for many applications e.g. clustering or recognition from a single training example. We show that our learned metrics also improve performance for these tasks.
'''
ref7='''
Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures. We show our methods benefit from richer training data, much more so than the current state-of-the-art method.
'''

text8='''
Identification of different risk factors and early prediction of mortality for patients with heart failure are crucial for guiding clinical decision-making in Intensive care unit cohorts. In this paper, we developed a comprehensive risk model for predicting heart failure mortality with a high level of accuracy using an improved random survival forest (iRSF). Utilizing a novel split rule and stopping criterion, the proposed iRSF was able to identify more accurate predictors to separate survivors and nonsurvivors and thus improve discrimination ability. Based on the public MIMIC II clinical database with 8 059 patients, 32 risk factors, including demographics, clinical, laboratory information, and medications, were analyzed and used to develop the risk model for patients with heart failure. Compared with previous studies, more critical laboratory predictors were identified that could reveal difficult-to-manage comorbidities, including aspartate aminotransferase, alanine aminotransferase, total bilirubin, serum creatine, blood urea nitrogen, and their inherent effects on events; these were determined to be critical indicators for predicting heart failure mortality with the proposed iRSF. The experimental results showed that the developed risk model was superior to those used in previous studies and the conventional random survival forest-based model with an out-of-bag C-statistic value of 0.821. Therefore, the developed iRSF-based risk model could serve as a valuable tool for clinicians in heart failure mortality prediction.
'''
ref8='''
Early prediction of mortality for patients with heart failure is crucial for guiding clinical decision-making. In this paper, we developed a risk model for predicting heart failure mortality with a high level of accuracy using an improved random survival forest (iRSF) The developed iRSF-based risk model could serve as a valuable tool for clinicians.
'''

text9='''
Naive Bayes is one of the most widely used algorithms in classification problems because of its simplicity, effectiveness, and robustness. It is suitable for many learning scenarios, such as image classification, fraud detection, web mining, and text classification. Naive Bayes is a probabilistic approach based on assumptions that features are independent of each other and that their weights are equally important. However, in practice, features may be interrelated. In that case, such assumptions may cause a dramatic decrease in performance. In this study, by following preprocessing steps, a Feature Dependent Naive Bayes (FDNB) classification method is proposed. Features are included for calculation as pairs to create dependence between one another. This method was applied to the software defect prediction problem and experiments were carried out using widely recognized NASA PROMISE data sets. The obtained results show that this new method is more successful than the standard Naive Bayes approach and that it has a competitive performance with other feature-weighting techniques. A further aim of this study is to demonstrate that to be reliable, a learning model must be constructed by using only training data, as otherwise misleading results arise from the use of the entire data set.
'''
ref9='''
Naive Bayes is a probabilistic approach based on assumptions that features are independent of each other. It is suitable for many learning scenarios, such as image classification, fraud detection, web mining, and text classification. In this study, by following preprocessing steps, a Feature Dependent Naives (FDNB) classification method is proposed.
'''

text10='''
Extracting relevant feature and classification are significant in brain-computer interface (BCI) systems. Deep learning have achieved remarkable growth in many fields like speech recognition and computer vision. However, deep learning in biomedical field is yet to be fully utilized. In this paper, We propose a novel methodology for convolutional neural network (CNN) based motor imagery (MI) classification using new form of input. Continuous Wavelet Transform (CWT) is applied to the input Electroencephalography (EEG) signal to extract the features of MI. After transformation, we consider the real part and imaginary part of the transformed signal to exploit magnitude and phase information at the same time. This feature is fed to the CNN having one convolution layer, one max-pooling layer and one fully connected layer. The classification accuracy is tested on two public BCI datasets: BCI competition IV dataset IIb and BCI competition II dataset III. The proposed method shows increase in classification accuracy compared to other MI classification methods. The results show that the method using CNN with magnitude and phase based features can be better than other state-of-the-art approaches.
'''
ref10='''
Deep learning in biomedical field is yet to be fully utilized. Method using CNN with magnitude and phase based features can be better than other state-of-the-art approaches. The proposed method shows increase in classification accuracy compared to other MI classification methods.
'''



In [3]:
from nltk.translate.bleu_score import sentence_bleu

candidate = string.split()
print('BLEU score -> {}'.format(sentence_bleu(ref10, string )))


print('Individual 1-gram: %f' % sentence_bleu(ref10, string, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % sentence_bleu(ref10, string, weights=(0, 1, 0, 0)))
print('Individual 3-gram: %f' % sentence_bleu(ref10, string, weights=(0, 0, 1, 0)))
print('Individual 4-gram: %f' % sentence_bleu(ref10, string, weights=(0, 0, 0, 1)))

Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().


BLEU score -> 0.44301394175021547
Individual 1-gram: 0.038519
Individual 2-gram: 1.000000
Individual 3-gram: 1.000000
Individual 4-gram: 1.000000


In [4]:
!pip install rouge

Collecting rouge
  Downloading https://files.pythonhosted.org/packages/43/cc/e18e33be20971ff73a056ebdb023476b5a545e744e3fc22acd8c758f1e0d/rouge-1.0.0-py3-none-any.whl
Installing collected packages: rouge
Successfully installed rouge-1.0.0


In [5]:
from rouge import Rouge 
rouge = Rouge()
scores = rouge.get_scores(ref10, string)
print(scores)


[{'rouge-1': {'f': 0.09589040686057437, 'p': 0.16666666666666666, 'r': 0.0673076923076923}, 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0}, 'rouge-l': {'f': 0.10526315351031106, 'p': 0.16216216216216217, 'r': 0.07792207792207792}}]
