# How to Optimize Your Content for Search Questions using Deep Learning

In this notebook, we will be testing BERT for Question Answering. 

BERT powers recent breakthrough advancements in how major search engines understand what users search for.

This is a company to Hamlet Batista's article in the [Bing Webmaster Blog](https://blogs.bing.com/webmaster/july-2020/How-to-Optimize-Your-Content-for-Search-Questions-using-Deep-Learning) 

Notebook prepared by [Anirudh Tatavarthi](https://www.linkedin.com/in/anirudh-tatavarthi-5840b31a8/), Tech Support Intern at [RankSense](https://www.ranksense.com/)

## Testing BERT Question Answering

First, we will test BERT question answering by entering a question based off of manually written context. 

In [None]:
#install the necessary libraries and imports
!pip install transformers==2.4.1
from transformers import pipeline

In [None]:
#run this block to see how it works
nlp = pipeline('question-answering')

nlp({
    'question': 'What is the name of the repository ?',
    'context': 'Pipeline have been included in the huggingface/transformers repository'
})

The output should consist of an answer, a score (the confidence in the answer), and the starting and ending indicies in the source context.

## Testing BERT Question Answering on Your Content

We will test BERT by asking a question and feeding the text of an entire blog as the context. **(You will to copy the CSS selector to scrape the body of the post)**

In [None]:
#This is necessary to scrape the body text of your web page
!pip install requests-html

In [None]:
#run this block to see how it works
from requests_html import HTMLSession
session = HTMLSession()

url = "https://azure.microsoft.com/en-us/blog/bing-delivers-its-largest-improvement-in-search-experience-using-azure-gpus/"

selector = "#main > div > div.row.row-size2 > div.column.medium-8 > article > div:nth-child(1) > div.blog-postContent"

with session.get(url) as r:
  post = r.html.find(selector, first=True)
  text = post.text

# Allocate a pipeline for question-answering
nlp = pipeline('question-answering')

nlp({
    'question': 'When did Bing start using using BERT?',
    'context': text
})

You can click on the link for yourself to double check the answer given by BERT

## Test the similarity of a list of words with no context

By simply feeding a list of words, you can see how embeddings gauge the similarity or relationship between each of the words. This is called **context-free embeddings**


In [None]:
#Install the necessary packages
!pip install spacy
!python -m spacy download en_core_web_lg

**You may have to navigate to "Runtime>Restart and run all" in order to avoid any errors for the next block of code**

In [None]:
#run this block to see how it works
import spacy

nlp = spacy.load('en_core_web_lg') 

tokens = nlp(u'computer laptop mouse house')
 
for token1 in tokens:
    for token2 in tokens:
        print(token1.text, token2.text, token1.similarity(token2))

Notice how computer and laptop have a higher similarity rating than mouse and house, despite mouse and house looking and sounding similar.

## Test the similarity of two words with context


This time, let's test **context-aware embeddings** by viewing the similarity between the word "Lincoln" used in two sentences with different contexts.


In [None]:
#install the necessary packages
!pip install spacy-transformers
!python -m spacy download en_trf_bertbaseuncased_lg

**You may have to navigate to "Runtime>Restart and run all" in order to avoid any errors for the next block of code**

In [None]:
#run this block to see how it works
import spacy

nlp = spacy.load("en_trf_bertbaseuncased_lg")

tokens = nlp(u'I will take the Lincoln Tunnel to go to NYC. Abraham Lincoln was the 16th president of the United States')
 
print(tokens[4].similarity(tokens[12]))

Notice how the similarity rating is not 1, despite comparing the same exact word. This is because both sentences are not referring to the same "Lincoln"