### [Web Scrapping](https://github.com/prasadpatil99/Web-Scrapping/blob/master/Simple-Web-Scrapping.py)

In [1]:
# %load Simple-Web-Scrapping.py
import urllib
import nltk
import re
import bs4 
from nltk.corpus import stopwords

link = 'https://en.wikipedia.org/wiki/Quantum_machine_learning'

def req(link):
    raw = urllib.request.urlopen(link).read()
    soup = bs4.BeautifulSoup(raw, 'lxml')
    return soup

raw_data = req(link)

text = ""
for paragraph in raw_data.find_all('p'):   # extract from 'p' tag
    text += paragraph.text
    
def preprocess(data):
    text = re.sub(r'\[[0-9]*\]',' ',data) # remove numbers
    text = re.sub(r'\s+',' ',text) # eliminate duplicate whitespaces
    text = text.lower()
    text = re.sub(r'\s+',' ',text)
    return text

dataset = preprocess(text)

### Tokenize Sentence

In [2]:
clean_text = nltk.sent_tokenize(dataset)
clean_text[0:5]

['quantum machine learning is an emerging interdisciplinary research area at the intersection of quantum physics and machine learning.',
 'the most common use of the term refers to machine learning algorithms for the analysis of classical data executed on a quantum computer, i.e.',
 'quantum-enhanced machine learning.',
 'while machine learning algorithms are used to compute immense quantities of data, quantum machine learning increases such capabilities intelligently, by creating opportunities to conduct analysis on quantum states and systems.',
 'this includes hybrid methods that involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device.']

### Initialize Stopwords 

In [3]:
stop_words = nltk.corpus.stopwords.words('english')

In [4]:
# Function to display dictionary within range
def display(feat):
    import itertools 
    out = dict(itertools.islice(feat.items(), 9))   
    print(" " + str(out))

### Word Count

Store all the words occurence along with its name

In [5]:
wordfreq = {}

for word in nltk.word_tokenize(dataset):
    if word not in stop_words:
        if word not in wordfreq.keys():
            wordfreq[word] = 1
        else:
            wordfreq[word] += 1
            
display(wordfreq)

 {'quantum': 139, 'machine': 35, 'learning': 69, 'emerging': 1, 'interdisciplinary': 1, 'research': 7, 'area': 1, 'intersection': 1, 'physics': 2}


### Weighted Frequency of Words

Convert corresponding word occurence with the maximum number of count

In [6]:
max_count = max(wordfreq.values())

for key in wordfreq.keys():
    wordfreq[key] = wordfreq[key]/max_count
    
display(wordfreq)

 {'quantum': 0.8323353293413174, 'machine': 0.20958083832335328, 'learning': 0.41317365269461076, 'emerging': 0.005988023952095809, 'interdisciplinary': 0.005988023952095809, 'research': 0.041916167664670656, 'area': 0.005988023952095809, 'intersection': 0.005988023952095809, 'physics': 0.011976047904191617}


### Evaluate Sentence Score

Store texts that contain words less than 25 and which are not present previously, if present add weighted frequency value to its existing word<br> and if not add word in dictonary giving it weighted frequency value of first word. 

In [7]:
sentscore = {}

for sentence in clean_text:
    for word in nltk.word_tokenize(dataset):
        if word in wordfreq.keys():
            if len(sentence.split(' ')) < 25:
                if sentence not in sentscore.keys():
                    sentscore[sentence] = wordfreq[word]
                else:
                    sentscore[sentence] += wordfreq[word]

### Results
Obtain first 7 line

In [8]:
summ=[]
for _ in sentscore.keys():
    summ.append(_)
    if len(summ)>7:
        break

In [9]:
for _ in summ:
    print(_,end='')

quantum machine learning is an emerging interdisciplinary research area at the intersection of quantum physics and machine learning.the most common use of the term refers to machine learning algorithms for the analysis of classical data executed on a quantum computer, i.e.quantum-enhanced machine learning.this includes hybrid methods that involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device.these routines can be more complex in nature and executed faster with the assistance of quantum devices.furthermore, quantum algorithms can be used to analyze quantum states instead of classical data.beyond quantum computing, the term "quantum machine learning" is often associated with classical machine learning methods applied to data generated from quantum experiments (i.e.machine learning of quantum systems), such as learning quantum phase transitions or creating new quantum experiments.