# Generate Multiple Choice Questions from a Text

## PROBLEM STATEMENT
Given a text phrase, we want to generate multiple choice questions (MCQ)s automatically from the text. The tool can be used by educators, teachers or by professionals to create assesments, evaluations, quizes surveys at scale. 

## Introduction

MCQs have several advantages, including rapid evaluation, shorter testing time, uniform scoring, and the option of an electronic evaluation. Many examinations employ MCQ-based question papers administered in a computerised setting. Manually preparing MCQs, on the other hand, is time-consuming and costly. As a result, the research community expended much effort in developing approaches for the automatic creation of MCQs. We will explore one such method here using machine learning and natural language processing.

In [None]:
# We will use following text as an example
# source: https://www.fresherslive.com/online-test/reading-comprehension-test-questions-and-answers
text = """The Dust Bowl, considered one of the greatest man-made ecological disasters, was a period of severe dust storms that lasted nearly a decade, starting 1931, and engulfed large parts of the US. The dust storms originated in the Great Plains-from states like Texas, Oklahoma, New Mexico, Colorado and Kansas. They were so severe that they choked everything and blocked out the sun for days. Sometimes, the storms travelled thousands of kilometres and blotted out monuments such as the Statue of Liberty. Citizens developed “dust pneumonia” and experienced chest pain and difficulty in breathing. The storms damaged the soil in around 100 million acres of land, leading to the greatest short-time migration in the American history, with approximately 3.5 million people abandoning their farms and fields.

Dust storms are an annual weather pattern in the northern region of India comprising Delhi, Haryana, Punjab, Uttar Pradesh and Rajasthan and Punjab, as also in the Sindh region of Pakistan. But, they are normally low in intensity and accompanied by rains. In fact, people welcome dust storms as they bring down temperatures and herald the arrival of the monsoons. But, the dust storms that have hit India since February this year have been quantitatively and qualitatively different from those in the past. They are high-powered storms travelling long distances and destroying properties and agricultural fields. Since February, they have affected as many as 16 states and killed more than 500 people. Cities like Delhi were choked in dust for days, with air quality level reaching the “severe” category on most days.

The Dust Bowl areas of the Great Plains are largely arid and semi-arid and prone to extended periods of drought. The US federal government encouraged settlement and development of large-scale agriculture by giving large parcels of grasslands to settlers. Waves of European settlers arrived at the beginning of the 20th century and converted grasslands into agricultural fields. At the same time, technological improvements allowed rapid mechanization of farm equipment, especially tractors and combined harvesters, which made it possible to operate larger parcels of land.

For the next two decades, agricultural land grew manifold and farmers undertook extensive deep ploughing of the topsoil with the help of tractors to plant crops like wheat. This displaced the native, deep-rooted grasses that trapped soil and moisture even during dry periods and high winds. Then, the drought struck. Successive waves of drought, which started in 1930 and ended in 1939, turned the Great Plains into bone-dry land. As the soil was already loose due to extensive ploughing, high winds turned them to dust and blew them away in huge clouds. Does this sound familiar? The dust storm regions of India and Pakistan too are largely arid and semi-arid. But they are at a lower altitude and hence less windy compared to the Great Plains. Over the last 50 years, chemical- and water-intensive agriculture has replaced the traditional low-input agriculture. Canal irrigation has been overtaken by the groundwater irrigation. In addition, mechanized agriculture has led to deeper ploughing, loosening more and more topsoil. The result has been devastating for the soil and groundwater. In most of these areas, the soil has been depleted and groundwater levels have fallen precipitously. On top of the man-made ecological destruction, the natural climatic cycle along with climate change is affecting the weather pattern of this region.

First, this area too is prone to prolonged drought. In fact, large parts of Haryana, Punjab, Delhi and western UP have experienced mildly dry to extremely dry conditions in the last six years. The Standardized Precipitation Index (SPI), which specifies the level of dryness or excess rains in an area, of large parts of Haryana, Punjab and Delhi has been negative since 2012. Rajasthan, on the other hand shows a positive SPI or excess rainfall. Second, this area is experiencing increasing temperatures. In fact, there seems to be a strong correlation between the dust storms and the rapid increase in temperature. Maximum temperatures across northern and western India have been far higher than normal since April this year. Last, climate change is affecting the pattern of Western Disturbances (WDs), leading to stronger winds and stronger storms. WDs are storms originating in the Mediterranean region that bring winter rain to northwestern India. But because of the warming of the Arctic and the Tibetan Plateau, indications are that the WDs are becoming unseasonal, frequent and stronger.

The Dust Bowl led the US government to initiate a large-scale land-management and soil-conservation programme. Large-scale shelterbelt plantations, contour ploughing, conservation agriculture and establishment of conservation areas to keep millions of acres as grassland, helped halt wind erosion and dust storms. It is time India too recognizes its own Dust Bowl and initiates a large-scale ecological restoration programme to halt it. Else, we will see more intense dust storms, and a choked Delhi would be a permanent feature.
"""

In [None]:
len(text)

5163

## Sentence Selection and Question Generation
In order to generate questions we must first identify significant sentences that hold fact or knowledge. There two broad ways to do this:

### a. Name Entity Recognition: 
We can identify important names, locations and formulate them as questions.
If we want to test grammar skills we can idenity instead verbs, nouns and   other adpositions. 

### b. Keyword Extraction:
We identify important keywords from our text, and then formulate questions such that those keywords as answers. We will explore this method in this notebook.

### Question Generation:
For generating question,  we have two broad ways:<br>
a. Either we can train a model that takes answer and context as input, and generates a question. This would be a sequence to sequence problem. This is opposite of Question Answering task where we feed the model and Question, and context and it generates a answer. We can use the same dataset instead to formulate questions.  The advantage of this method is we can proper questions.

b. Another simple way we can formulate questions, is  just by replacing the keyword in the original text with a blank. This would generate only fill in the blanks type declarative questions. It will be very straightforward and easy to implement. The advantage is it's easy to implement and we don't need any machine learning model.

We explore both ways in this notebook, but we preferred the second method in production.

## Identifying Keywords

In [None]:
!pip install keybert > /dev/null
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT
from nltk.tokenize import sent_tokenize
import nltk

# we use nltk library to tokenize our text
nltk.download('punkt')

# KeyBert uses BERT-embeddings and simple cosine similarity to find the sub-phrases in a document that are the most similar to the document itself.
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
kw_model = KeyBERT(sentence_model)

def get_keywords(text):
    """
    Given @input text, identify important keywords. 
    Here we use Sentence Transformer to extract keywords that best describe the text
    """
    keywords_with_scores = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=5, stop_words='english')
    keywords = [kw[0] for kw in keywords_with_scores]
    scores = [kw[1] for kw in keywords_with_scores]
    return keywords

def tokenize_sentences(text):
    """
    Given a @text input, returns tokenized sentences
    """
    sentences = [sent_tokenize(text)]
    sentences = [sentence for paragraph in sentences for sentence in paragraph]

    # Remove sentences shorter than 20 letters.
    sentences = [sentence.strip() for sentence in sentences if len(sentence) > 20]
    return sentences

def get_sentences_for_keyword(kw_model, sentences, lemmatizer):
    """
    @kw_model: keyBERT model to extract keywords
    @sentences: list of tokenized sentences
    returns a map with keywords as keys mapped to the sentences they appear in.
    """
    keyword_sentences = {}
    for sentence in sentences:
        keywords_found = [kw[0] for kw in kw_model.extract_keywords(sentence, keyphrase_ngram_range=(1, 2), top_n=10) if len(kw[0]) > 2]
        for key in keywords_found:
            keyword_sentences[key] = keyword_sentences.get(key, [])
            keyword_sentences[key].append(sentence)

    return keyword_sentences

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


2022-01-15 16:54:47.682 Load pretrained SentenceTransformer: all-MiniLM-L6-v2


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


2022-01-15 16:54:51.989 Use pytorch device: cpu


In [None]:
# Tokenize text into sentences
sentences = tokenize_sentences(text)

# find important keywords along with sentences they appear in
keyword_to_sentences_map = get_sentences_for_keyword(kw_model, sentences)

for word in keyword_to_sentences_map:
    print (word, " : ",keyword_to_sentences_map[word],"\n")

dust  :  ['The Dust Bowl, considered one of the greatest man-made ecological disasters, was a period of severe dust storms that lasted nearly a decade, starting 1931, and engulfed large parts of the US.', 'Dust storms are an annual weather pattern in the northern region of India comprising Delhi, Haryana, Punjab, Uttar Pradesh and Rajasthan and Punjab, as also in the Sindh region of Pakistan.', 'But, the dust storms that have hit India since February this year have been quantitatively and qualitatively different from those in the past.', 'As the soil was already loose due to extensive ploughing, high winds turned them to dust and blew them away in huge clouds.', 'It is time India too recognizes its own Dust Bowl and initiates a large-scale ecological restoration programme to halt it.', 'Cities like Delhi were choked in dust for days, with air quality level reaching the “severe” category on most days.', 'The dust storms originated in the Great Plains-from states like Texas, Oklahoma, Ne

## Generating Distractors with BERT

In [None]:
# load BERT model
from transformers import pipeline
unmasker = pipeline('fill-mask', model='distilbert-base-uncased')

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [None]:
import random

def get_questions(keyword_to_sentences_map, model, k=5):
    """"
    Generates questions along with distractors
    @input keyword_to_sentences_map : maps keywords to sentences they appear in
    @model: BERT model that will be used to mask the keyword and generate distractors
    @k (default 5), number of questions to return
    """

    # we can choose answer keys randomly from the pool of keywords
    keys = random.choices( list(keyword_to_sentences_map.keys()), k=k)

    for word in keys:
        answer = word
        #print(get_question(answer, context))
        questions = keyword_to_sentences_map[word]
        q = max(questions, key=len)

        a = q.lower().find(answer)
        b = a + len(answer)
        q = q.replace(q[a:b], '_____')

        results= model(q.replace('_____', '[MASK]'))
        #print(results)

        options = [result['token_str'] for result in results if isinstance(result, dict) and (answer not in result['token_str'].lower())]

        if options:
            print(q)
            print(f'Ans: {answer}')
            print(options)
            print()


get_questions(keyword_to_sentences_map, unmasker, 10)

This displaced the _____, deep-rooted grasses that trapped soil and moisture even during dry periods and high winds.
Ans: native
['dense', 'thick', 'tall', 'coarse', 'thin']

The storms damaged the soil in around 100 million acres of land, leading to the greatest short-time migration in the American history, with approximately 3.5 million people _____ their farms and fields.
Ans: abandoning
['leaving', 'losing', 'fleeing', 'destroying']

In fact, people welcome dust storms as they bring down temperatures and herald the arrival of the _____.
Ans: monsoons
['monsoon', 'sun', 'moon', 'rain', 'comet']

But, the dust storms that have hit India since February this year have been quantitatively and qualitatively different from those in the _____.
Ans: past
['caribbean', 'himalayas', 'philippines', 'west']

Else, we will see more _____ dust storms, and a choked Delhi would be a permanent feature.
Ans: intense
['frequent', 'severe', 'recent', 'permanent']

Large-scale shelterbelt plantations, c

In [1]:
!mkdir -p src

In [2]:
%%writefile src/answerkey.py
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT
from nltk.tokenize import sent_tokenize
import nltk
# we use nltk library to tokenize our text
nltk.download('punkt')

class AnswerKey:
    """
    Generate answers using keyword extraction, and map them to sentences they appear in
    """

    def __init__(self, text):
        self.text = text

        # KeyBert uses BERT-embeddings and simple cosine similarity to find the sub-phrases in a document that are the most similar to the document itself.
        sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
        self.kw_model = KeyBERT(sentence_model)

    def get_keywords(self, text):
        """
        Given @input text, identify important keywords. 
        Here we use Sentence Transformer to extract keywords that best describe the text
        """
        keywords_with_scores = self.kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=5, stop_words='english')
        keywords = [kw[0] for kw in keywords_with_scores]
        scores = [kw[1] for kw in keywords_with_scores]
        return keywords

    def tokenize_sentences(self, text):
        """
        Given a @text input, returns tokenized sentences
        """
        sentences = [sent_tokenize(text)]
        sentences = [sentence for paragraph in sentences for sentence in paragraph]

        # Remove sentences shorter than 20 letters.
        sentences = [sentence.strip() for sentence in sentences if len(sentence) > 20]
        return sentences

    def get_sentences_for_keyword(self, kw_model, sentences, ngram_range=(1, 1), top_n=10):
        """
        @kw_model: keyBERT model to extract keywords
        @sentences: list of tokenized sentences
        returns a map with keywords as keys mapped to the sentences they appear in.
        """
        keyword_sentences = {}
        for sentence in sentences:
            keywords_found = [kw[0] for kw in kw_model.extract_keywords(sentence, keyphrase_ngram_range=ngram_range, top_n=top_n) if len(kw[0]) > 2]

            for key in keywords_found:
                keyword_sentences[key] = keyword_sentences.get(key, [])
                keyword_sentences[key].append(sentence)

        return keyword_sentences

    def get_answers(self, ngram_range=(1, 2), top_n=10):
        sentences = self.tokenize_sentences(self.text)
        keyword_to_sentences = self.get_sentences_for_keyword(self.kw_model, sentences, ngram_range=ngram_range, top_n=top_n)
        return keyword_to_sentences

Writing src/answerkey.py


In [34]:
%%writefile src/t5model.py

import requests
import json

API_URL = "https://api-inference.huggingface.co/models/mrm8488/t5-base-finetuned-question-generation-ap"

with open('../.env') as f:
    API_TOKEN = str(f.read()).strip('\n')
    API_TOKEN = API_TOKEN.strip()
    
headers = {"Authorization": f"{API_TOKEN}"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    output =  json.loads(response.content.decode("utf-8"))
    return output

# ping model to wake it up when this package is imported
print(query("answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google"))

Overwriting src/t5model.py


In [32]:
%%writefile src/model.py

from transformers import pipeline
from .t5model import query
import random


# load BERT model once
unmasker = pipeline('fill-mask', model='distilbert-base-uncased')


class Model:

    def get_questions(self, keyword_to_sentences_map, model, k=5, declarative=True):
        """"
        Generates questions along with distractors
        @input keyword_to_sentences_map : maps keywords to sentences they appear in
        @model: BERT model that will be used to mask the keyword and generate distractors
        @k (default 5), number of questions to return
        """

        results = []

        # we can choose answer keys randomly from the pool of keywords
        answer_keys = random.choices( list(keyword_to_sentences_map.keys()), k=k)

        for answer in answer_keys:
            sentences = keyword_to_sentences_map[answer]
            sentence = max(sentences, key=len)

            if len(sentence) < 20:
                continue

            start_idx = sentence.lower().find(answer)
            end_idx = start_idx + len(answer)

            # replace answer in sentence with blank line to form question
            question = sentence.replace(sentence[start_idx: end_idx], '__________')

            # generate distractors from BERT model 
            distractors = model(question.replace('__________', '[MASK]'))
            options = [option['token_str'] for option in distractors if isinstance(option, dict) and (answer not in option['token_str'].lower())]
            #print(distractors)

            # generate question
            if not declarative:
                context = sentence
                output = query(f"answer: {answer} context: {context}")
                question_t5 = output[0]["generated_text"]

                if options:
                    results.append((question_t5, options, answer))
            else:
                if options:
                    results.append((question, options, answer))

        return results


Overwriting src/model.py


In [33]:
%%writefile app.py

from src.answerkey import AnswerKey
from src.model import Model, unmasker
import streamlit as st

PAGE_CONFIG = {"page_title":"MCQ-App by Glad Nayak","page_icon":":white_check_mark:"}
st.set_page_config(**PAGE_CONFIG)

def render_input():
    """
    Renders text area for input, and button
    """
    # source of default text: https://www.fresherslive.com/online-test/reading-comprehension-test-questions-and-answers
    text = """The Dust Bowl, considered one of the greatest man-made ecological disasters, was a period of severe dust storms that lasted nearly a decade, starting 1931, and engulfed large parts of the US. The dust storms originated in the Great Plains-from states like Texas, Oklahoma, New Mexico, Colorado and Kansas. They were so severe that they choked everything and blocked out the sun for days. Sometimes, the storms travelled thousands of kilometres and blotted out monuments such as the Statue of Liberty. Citizens developed “dust pneumonia” and experienced chest pain and difficulty in breathing. The storms damaged the soil in around 100 million acres of land, leading to the greatest short-time migration in the American history, with approximately 3.5 million people abandoning their farms and fields.

    Dust storms are an annual weather pattern in the northern region of India comprising Delhi, Haryana, Punjab, Uttar Pradesh and Rajasthan and Punjab, as also in the Sindh region of Pakistan. But, they are normally low in intensity and accompanied by rains. In fact, people welcome dust storms as they bring down temperatures and herald the arrival of the monsoons. But, the dust storms that have hit India since February this year have been quantitatively and qualitatively different from those in the past. They are high-powered storms travelling long distances and destroying properties and agricultural fields. Since February, they have affected as many as 16 states and killed more than 500 people. Cities like Delhi were choked in dust for days, with air quality level reaching the “severe” category on most days.

    The Dust Bowl areas of the Great Plains are largely arid and semi-arid and prone to extended periods of drought. The US federal government encouraged settlement and development of large-scale agriculture by giving large parcels of grasslands to settlers. Waves of European settlers arrived at the beginning of the 20th century and converted grasslands into agricultural fields. At the same time, technological improvements allowed rapid mechanization of farm equipment, especially tractors and combined harvesters, which made it possible to operate larger parcels of land.

    For the next two decades, agricultural land grew manifold and farmers undertook extensive deep ploughing of the topsoil with the help of tractors to plant crops like wheat. This displaced the native, deep-rooted grasses that trapped soil and moisture even during dry periods and high winds. Then, the drought struck. Successive waves of drought, which started in 1930 and ended in 1939, turned the Great Plains into bone-dry land. As the soil was already loose due to extensive ploughing, high winds turned them to dust and blew them away in huge clouds. Does this sound familiar? The dust storm regions of India and Pakistan too are largely arid and semi-arid. But they are at a lower altitude and hence less windy compared to the Great Plains. Over the last 50 years, chemical- and water-intensive agriculture has replaced the traditional low-input agriculture. Canal irrigation has been overtaken by the groundwater irrigation. In addition, mechanized agriculture has led to deeper ploughing, loosening more and more topsoil. The result has been devastating for the soil and groundwater. In most of these areas, the soil has been depleted and groundwater levels have fallen precipitously. On top of the man-made ecological destruction, the natural climatic cycle along with climate change is affecting the weather pattern of this region.

    First, this area too is prone to prolonged drought. In fact, large parts of Haryana, Punjab, Delhi and western UP have experienced mildly dry to extremely dry conditions in the last six years. The Standardized Precipitation Index (SPI), which specifies the level of dryness or excess rains in an area, of large parts of Haryana, Punjab and Delhi has been negative since 2012. Rajasthan, on the other hand shows a positive SPI or excess rainfall. Second, this area is experiencing increasing temperatures. In fact, there seems to be a strong correlation between the dust storms and the rapid increase in temperature. Maximum temperatures across northern and western India have been far higher than normal since April this year. Last, climate change is affecting the pattern of Western Disturbances (WDs), leading to stronger winds and stronger storms. WDs are storms originating in the Mediterranean region that bring winter rain to northwestern India. But because of the warming of the Arctic and the Tibetan Plateau, indications are that the WDs are becoming unseasonal, frequent and stronger.

    The Dust Bowl led the US government to initiate a large-scale land-management and soil-conservation programme. Large-scale shelterbelt plantations, contour ploughing, conservation agriculture and establishment of conservation areas to keep millions of acres as grassland, helped halt wind erosion and dust storms. It is time India too recognizes its own Dust Bowl and initiates a large-scale ecological restoration programme to halt it. Else, we will see more intense dust storms, and a choked Delhi would be a permanent feature.
    """
    st.sidebar.subheader('Enter Text:')
    text = st.sidebar.text_area('', text.strip(), height = 275)

    ngram_range = st.sidebar.slider('answer ngram range:', value=[1, 2], min_value=1, max_value=3, step=1)
    num_questions = st.sidebar.slider("number of questions:", value=10, min_value=10, max_value=20, step=1)
    question_type_str = st.sidebar.radio('question type:', ('declarative (fill in the blanks)', 'imperative'))
    question_type = question_type_str == 'declarative (fill in the blanks)'

    button = st.sidebar.button('Generate')

    if button:
        return (text, ngram_range, num_questions, question_type)

def main():
    # Render input text area
    inputs = render_input()

    if not inputs:
        st.title('Generate Multiple Choice Questions(MCQs) from Text Automatically')
        st.subheader('Enter Text, select how long a single answer should be(ngram_range), and number of questions to get started.')

    else:
        with st.spinner('Loading questions and distractors using BERT model'):
            st.subheader("")
            st.title("")
            text, ngram_range, num_questions, question_type = inputs

            # Load model
            answerkeys = AnswerKey(text)
            keyword_to_sentence = answerkeys.get_answers(ngram_range, num_questions)

            model = Model()
            quizzes = model.get_questions(keyword_to_sentence, unmasker, k=num_questions, declarative=question_type)

            st.subheader('Questions')
            for id, quiz in enumerate(quizzes):
                question, options, answer = quiz
                st.write(question)

                for option in options[:3]:
                    st.checkbox(option, key=id)

                ans_button = st.checkbox(answer, key=id, value=True)

            st.balloons()
            st.button('Save')


if __name__ == '__main__':
    main()
        


Overwriting app.py


## Local Deployment

In [6]:
# to test locally we install localtunnel
!pip install streamlit > /dev/null
!npm install -g localtunnel > /dev/null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-console 5.2.0 requires prompt-toolkit<2.0.0,>=1.0.0, but you have prompt-toolkit 3.0.24 which is incompatible.
google-colab 1.0.0 requires ipykernel~=4.10, but you have ipykernel 6.7.0 which is incompatible.
google-colab 1.0.0 requires ipython~=5.5.0, but you have ipython 7.31.0 which is incompatible.[0m
[K[?25h

## Generate requirements.txt and Install Requirements

In [7]:
!pip install pipreqs

Collecting pipreqs
  Downloading pipreqs-0.4.11-py2.py3-none-any.whl (32 kB)
Collecting yarg
  Downloading yarg-0.1.9-py2.py3-none-any.whl (19 kB)
Installing collected packages: yarg, pipreqs
Successfully installed pipreqs-0.4.11 yarg-0.1.9


In [8]:
!pipreqs .

INFO: Successfully saved requirements file in ./requirements.txt


In [9]:
!pip install -r requirements.txt

Collecting keybert==0.5.0
  Downloading keybert-0.5.0.tar.gz (19 kB)
Collecting sentence_transformers==2.1.0
  Downloading sentence-transformers-2.1.0.tar.gz (78 kB)
[K     |████████████████████████████████| 78 kB 3.0 MB/s 
Collecting transformers==4.15.0
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
[K     |████████████████████████████████| 3.4 MB 10.6 MB/s 
Collecting rich>=10.4.0
  Downloading rich-11.0.0-py3-none-any.whl (215 kB)
[K     |████████████████████████████████| 215 kB 56.6 MB/s 
Collecting tokenizers>=0.10.3
  Downloading tokenizers-0.11.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 41.1 MB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 55.8 MB/s 
[?25hCollecting huggingface-hub
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████

## Run App

In [19]:
!streamlit run app.py --server.enableCORS=false &>/dev/null&

!lt --Bypass-Tunnel-Reminder --subdomain 'myapp' --port 8501 &>/dev/null&

In [11]:
# kill app and clean up memory
st_id = !pgrep streamlit
!kill {st_id[0]}

lt_id = !pgrep lt
!kill {lt_id[0]}

## Setup CI/CD Pipeline

Now, for production deployment, we would first setup a CI/CD pipeline.

1. first we, dockerize our app and push the code to container registry (Google's GCR) using Dockerfile and docker build.

2. Automate deployment using Cloud Build such that everytime we push our code to Github our app gets deployed with latest changes.


In [35]:
%%writefile Dockerfile
#Base Image to use
FROM python:3.7.9-slim

#Expose port 8080
EXPOSE 8080

#Optional - install git to fetch packages directly from github
RUN apt-get update && apt-get install -y git

#Copy Requirements.txt file into app directory
COPY requirements.txt app/requirements.txt
COPY env app/.env

#install all requirements in requirements.txt
RUN pip install -r app/requirements.txt

#Copy all files in current directory into app directory
COPY . /app

#Change Working Directory to app directory
WORKDIR /app

#Run the application on port 8080
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8080", "--server.address=0.0.0.0"]

Writing Dockerfile


In [41]:
%%writefile cloudbuild.yaml

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: [ 'build', '-t', 'gcr.io/mcq-from-text/mcq-app:latest', '.' ]
images: [gcr.io/mcq-from-text/mcq-app:latest]

Writing cloudbuild.yaml


In [36]:
from google.colab import auth
auth.authenticate_user()

# setup Google Cloud project

PROJECT_ID = 'mcq-from-text'
!gcloud config set project {PROJECT_ID}

Updated property [core/project].


In [None]:
# now, we package our code as docker image using Dockerfile and push it to Google's Container Registry

# 1. build image locally and test run at localhost
!docker build  --timeout 20m --tag gcr.io/{PROJECT_ID}/mcq-app:latest .

# 2. build using gcloud
!docker build  --timeout 20m --tag gcr.io/{PROJECT_ID}/mcq-app:latest .

# 3. submit to Google's Container registry
!gcloud builds submit --timeout 30m  --tag gcr.io/${PROJECT_ID}/mcq-app:latest
