#Installing Haystack

This is latest version of haystack installed using pip

Set the logging level to INFO

In [52]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger().setLevel(logging.DEBUG)
logging.getLogger("haystack").setLevel(logging.INFO)

# Initialising the document store


In [53]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)

INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
DEBUG:urllib3.connectionpool:Resetting dropped connection: tm.hs.deepset.ai


The DocumentStore is now ready. Now it's time to fill it with some Documents.

# Preparing Documents

1.Download harry potter book from the internet. You can find them in data/harrypotter as of .txt file.

In [54]:
from haystack.utils import fetch_archive_from_http

doc_dir = "/content/data"

fetch_archive_from_http(
    url="/content/data/microsoft-certified-asure-fundamentals.txt",
    output_dir=doc_dir
)

INFO:haystack.utils.import_utils:Found data stored in '/content/data'. Delete this first if you really want to fetch new data.


False

2.Use TextIndexingPipeline to convert the files you just downloaded into Haystack Document objects and write them into the DocumentStore:

In [55]:
import os
from haystack.pipelines.standard_pipelines import TextIndexingPipeline

files_to_index = [doc_dir + "/" + "microsoft-certified-asure-fundamentals.txt"]
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)

INFO:haystack.pipelines.base:It seems that an indexing Pipeline is run, so using the nodes' run method instead of run_batch.


Converting files:   0%|          | 0/1 [00:00<?, ?it/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

DEBUG:urllib3.connectionpool:https://tm.hs.deepset.ai:443 "POST /batch/ HTTP/1.1" 200 13


Updating BM25 representation...:   0%|          | 0/400 [00:00<?, ? docs/s]

{'documents': [<Document: {'content': 'Telegram Channel : @IRFaraExam\nTelegram Channel : @IRFaraExam\nMicrosoft Certified\nAzure Fundamentals\nStudy Guide\n\nTelegram Channel : @IRFaraExam\nTelegram Channel : @IRFaraExam\nMicrosoft Certified\nAzure Fundamentals\nStudy Guide\nExam AZ-900\n\nJim Boyce\n\nTelegram Channel : @IRFaraExam\nCopyright © 2021 by John Wiley & Sons, Inc., Indianapolis, Indiana\nPublished simultaneously in Canada\nISBN: 978-1-119-77092-3\nISBN: 978-1-119-76820-3 (ebk.)ISBN: 978-1-119-77115-9 (ebk.)No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any\nmeans, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections\n107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or\nauthorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood\nDrive, Danvers, 

# Initializing the Retriever

Our search system will use a Retriever, so we need to initialize it. A Retriever sifts through all the Documents and returns only the ones relevant to the question



In [56]:
from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store=document_store)

DEBUG:urllib3.connectionpool:https://tm.hs.deepset.ai:443 "POST /batch/ HTTP/1.1" 200 13


# Initializing the Reader

In [57]:
from haystack.nodes import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /deepset/roberta-base-squad2/resolve/main/config.json HTTP/1.1" 200 0
INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /deepset/roberta-base-squad2/resolve/main/config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://tm.hs.deepset.ai:443 "POST /batch/ HTTP/1.1" 200 13
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.
DEBUG:urllib3.connectionpool:Starting new HTTPS 

We've initalized all the components for our pipeline. We're now ready to create the pipeline.

# Creating the Retriever-Reader Pipeline

It connects the Reader and the Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on.

In [58]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

# Asking a Question

1.Use the pipeline run() method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the top-k parameter. 

In [59]:
prediction = pipe.run(
    query = "What is Azure Resource Manager",
    params={
        "Retriever": {"top_k": 2},
        "Reader": {"top_k": 2}
    }
)



Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

2.Simplify the printed answers:

In [60]:
print_answers(prediction, details="minimum")

from haystack.utils import print_answers
print_answers(
    prediction,
    details="medium"
)



Query: What is Azure Resource Manager
Answers:
[   {   'answer': 'enables you to deploy multiple resources using JSON-based\n'
                  'templates',
        'context': 'that appear below?\n'
                   'Azure Resource Manager enables you to deploy multiple '
                   'resources using JSON-based\n'
                   'templates.\n'
                   'A.\n'
                   '\n'
                   'is the primary tool you use to mana'},
    {   'answer': 'ARM) templates',
        'context': '\n'
                   '■\n'
                   '\n'
                   'Describe the functionality and usage of Azure Resource '
                   'Manager (ARM) templates\n'
                   '\n'
                   'Telegram Channel : @IRFaraExam\n'
                   'Microsoft Azure consists of a multi'}]

Query: What is Azure Resource Manager
Answers:
[   {   'answer': 'enables you to deploy multiple resources using JSON-based\n'
                  'templates',
 

#SUMMARISATON



In [61]:
from haystack.nodes import TransformersSummarizer
from haystack import Document
from haystack.pipelines import SearchSummarizationPipeline

summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
pipeline = SearchSummarizationPipeline(summarizer=summarizer, retriever=retriever, generate_single_summary=True)



INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /google/pegasus-xsum/resolve/main/config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://tm.hs.deepset.ai:443 "POST /batch/ HTTP/1.1" 200 13
DEBUG:urllib3.connectionpool:https://tm.hs.deepset.ai:443 "POST /batch/ HTTP/1.1" 200 13


Query for summarisation


In [62]:
query = "What is Private Cloud"
result = pipeline.run(query=query, params={"Retriever": {"top_k": 2}})



In [63]:


for doc in result['documents']:
    doc.content = doc.content.replace('\n', ' ')
    # Upper query string in content
    doc.content = doc.content.replace(query_str, query_str.upper())
    # Remove unnecessary text
    doc.content = doc.content.replace("Get free e-books and video tutorials at www.passuneb.com", "")

# Print the output as a single paragraph
output = " ".join([doc.content for doc in result['documents']])
line = ""
for i, char in enumerate(output):
    line += char
    if (i + 1) % 100 == 0:
        print(line)
        line = ""
if line:
    print(line)



The services and data that you host in Azure are secure and inaccessible by people outside of your o
rganization (unless you specifically provide guest access to certain services, such as your company 
website).  Telegram Channel : @IRFaraExam Public, Private, and Hybrid Cloud Models  17  Using a publ
ic cloud to host some of your IT services doesn’t mean that all of your services are moved to the cl
oud. You might put some of your services in Azure but still maintain other services on-premises in y
our own data center.  Private Cloud A private cloud is one in which the cloud serves a single organi
zation, whether hosted in your own data center or by someone else. A private cloud offers many of th
e same benefits of scalability, elasticity, and other aspects of a public cloud. Because a private c
loud is dedicated to one organization, it offers some additional capabilities to meet regulatory req
uirements because you can impose controls and processes in the private cloud that would not