[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/search/question-answering/table-qa.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/search/question-answering/table-qa.ipynb)

# Table Question Answering with Pinecone

Table Question Answering (Table QA) refers to providing precise answers from tables to answer a user's question. With recent works on Table QA, is it now possible to answer natural language queries from tabular data. This notebook demonstrates how you can build a Table QA system that can answer your natural language queries using the Pinecone vector database. 

We need three main components to build the Table QA system:

- A vector index to store table embeddings
- A retriever model for embedding queries and tables
- A reader model to read the tables and extract answers

# Install Dependencies

In [7]:
# torch-scatter may take few minutes to install
!pip install datasets pinecone-client sentence_transformers torch-scatter

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.11.0-py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.7/468.7 kB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pinecone-client
  Downloading pinecone_client-2.2.1-py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.2/177.2 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting torch-scatter
  Downloading torch_scatter-2.1.1.tar.gz (107 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.6/107.6 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparin

In [3]:

import torch
torch.__version__

'2.0.0+cu118'

In [8]:
!pip install -q transformers==4.4.2

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mBuilding wheel for tokenizers [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for tokenizers (pyproject.toml) ... [?25l[?25herror
[31m  ERROR: Failed building wheel for tokenizers[0m[31m
[0m[31mERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects[0m[31m
[0m

In [34]:

from transformers import pipeline
import pandas as pd

In [26]:

# tqa = pipeline(task="table-question-answering", 
#                model="google/tapas-base-finetuned-wtq")
# tqa = pipeline(task="table-question-answering", 
#               model="microsoft/tapex-base")
tqa = pipeline(task ="table-question-answering",
               model = "google/tapas-large-finetuned-wtq")

Downloading pytorch_model.bin:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/262k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

In [69]:
table = pd.read_csv("/content/FAQ_26_04.csv",encoding='cp1252')
table = table.astype(str)

In [72]:
query = "Loading a dishwasher will take how much time?"
print(tqa(table=table, query=query)["answer"])


SUM > nan


In [65]:
query = [ 
         "product link of HSG7361B1?",
         "price of HSG7361B1",
         "what are features  of HSG7361B1",
         "what is the volume of HSG7361B1"]
answer = tqa(table=table, query=query)
for ans in answer:
    print(ans["answer"])

https://www.bosch-home.in/productlist/cooking-baking/ovens/built-in-oven/HSG7361B1#/Togglebox=manuals/
Rs.284,990.00
Hotair Gentle : saves energy while baking and roasting.
71 l


In [52]:
pip install gradio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.28.0-py3-none-any.whl (17.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m80.6 MB/s[0m eta [36m0:00:00[0m
Collecting semantic-version
  Using cached semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Collecting gradio-client>=0.1.3
  Downloading gradio_client-0.1.4-py3-none-any.whl (286 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.7/286.7 kB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles
  Downloading aiofiles-23.1.0-py3-none-any.whl (14 kB)
Collecting fastapi
  Downloading fastapi-0.95.1-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.0/57.0 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting mdit-py-plugins<=0.3.3
  Downloadi

In [53]:
import gradio as gr

In [54]:
index = None

In [62]:
def build_the_bot(input_text):
  
  text_list = [input_text]
  documents = [Document(t) for t in text_list]
  query = text_list
  global index
  service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)
  index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
  return('Index saved successfull!!!')

In [60]:

def chat(chat_history, user_input):
  
  bot_response = index.query(user_input)
  #print(bot_response)
  response = ""
  for letter in ''.join(bot_response.answer): #[bot_response[i:i+1] for i in range(0, len(bot_response), 1)]:
      response += letter + ""
      yield chat_history + [(user_input, answer)]

In [61]:

with gr.Blocks() as demo:
    gr.Markdown('# Q&A Bot with Hugging Face Models')
    with gr.Tab("Ask question related to bosch washer"):
        text_input = gr.Textbox()
        text_output = gr.Textbox()
        text_button = gr.Button("Build the Bot!!!")
        text_button.click(build_the_bot, text_input, text_output)
    

demo.queue().launch(debug = True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://61820caaa6d8975094.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces


Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 399, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1022, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "<ipython-input-55-b89af1cd8210>", line 3, in build_the_bot
    documents = [Document(t) for t in text_list]
  File "<

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://61820caaa6d8975094.gradio.live


