<a href="https://colab.research.google.com/github/sonalben-ops/Sentiment_Analysis/blob/main/all_MiniLM_L6_v2_BudgetSpeech.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [17]:
!pip install sentence-transformers
!pip install pinecone-client
!pip install python-docx



In [18]:
import pinecone
from sentence_transformers import SentenceTransformer, util
import json
import os
from google.colab import files
import docx


In [19]:
# Initialize Pinecone
from pinecone import Pinecone, ServerlessSpec

api_key = 'c0173835-a69d-4864-bca8-a343e18a78aa'
pc = Pinecone(api_key=api_key)

# Create an index (if not already created)
index_name = 'chatbot-index-budget'
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384,
        metric='cosine',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

# Connect to the index
index = pc.Index(index_name)


In [20]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

In [21]:
def read_word_file(file_path):
    doc = docx.Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    return '\n'.join(full_text)

document_text = read_word_file('/content/budget_speech.docx')

# Split document into chunks (adjust chunk size as needed)
def split_text(text, chunk_size=100):
    sentences = text.split('. ')
    chunks = [' '.join(sentences[i:i + chunk_size]) for i in range(0, len(sentences), chunk_size)]
    return chunks

chunks = split_text(document_text)


In [23]:
print(len(chunks))

3


In [24]:
print(chunks[0])



Budget 2024-2025

Speech of
Nirmala Sitharaman
Minister of Finance
July 23, 2024
Hon’ble Speaker, 
	I present the Budget for 2024-25 
Introduction
The people of India have reposed their faith in the government led by the Hon’ble Prime Minister Shri Narendra Modi and re-elected it for a historic third term under his leadership We are grateful for their support, faith and trust in our policies We are determined to ensure that all Indians, regardless of religion, caste, gender and age, make substantial progress in realising their life goals and aspirations.
Global Context
The global economy, while performing better than expected, is still in the grip of policy uncertainties Elevated asset prices, political uncertainties and shipping disruptions continue to pose significant downside risks for growth and upside risks to inflation 
In this context, India’s economic growth  continues to be the shining exception and will remain so in the years ahead India’s inflation continues to be low, sta

In [25]:
print(chunks[1])

While being in the ‘special mention account’ (SMA) stage for reasons beyond their control, MSMEs need credit to continue their business and to avoid getting into the NPA stage Credit availability will be supported through a guarantee from a government promoted fund  
Mudra Loans
The limit of Mudra loans will be enhanced to ₹ 20 lakh from the current ₹ 10 lakh for those entrepreneurs who have availed and successfully repaid previous loans under the ‘Tarun’ category.
Enhanced scope for mandatory onboarding in TReDS
For facilitating MSMEs to unlock their working capital by converting their trade receivables into cash, I propose to reduce the turnover threshold of buyers for mandatory onboarding on the TReDS platform from ` 500 crore to ` 250 crore This measure will bring 22 more CPSEs and 7000 more companies onto the platform Medium enterprises will also be included in the scope of the suppliers.
SIDBI branches in MSME clusters
SIDBI will open new branches to expand its reach to serve all

In [26]:
print(chunks[2])

These will also facilitate improving the financial position of urban local bodies  
Labour related reforms
Services to Labour
Our government will facilitate the provision of a wide array of services to labour, including those for employment and skilling A comprehensive integration of e-shram portal with other portals will facilitate such one-stop solution Open architecture databases for the rapidly changing labour market, skill requirements and available job roles, and a mechanism to connect job-aspirants with potential employers and skill providers will be covered in these services 
Shram Suvidha & Samadhan Portal
Shram Suvidha and Samadhan portals will be revamped to enhance  ease of compliance for  industry and trade  
Capital and entrepreneurship related reforms
Financial sector vision and strategy
For meeting financing needs of the economy, our government will bring out a financial sector vision and strategy document to prepare the sector in terms of size, capacity and skills This

In [28]:
vectors = model.encode(chunks).tolist()
ids = [f'chunk_{i}' for i in range(len(chunks))]

# Upsert the vectors
index.upsert(vectors=zip(ids, vectors))

{'upserted_count': 3}

In [33]:
def query_pinecone(query, top_k=1):
    query_vector = model.encode(query).tolist()
    results = index.query(vector=query_vector, top_k=top_k)
    return results['matches']

def chatbot(query):
    results = query_pinecone(query)
    if results:
        most_relevant = results[0]
        chunk_id = most_relevant['id']
        chunk_text = chunks[int(chunk_id.split('_')[1])]
        return chunk_text[:200]  # Limit the response to 200 characters
    else:
        return "Sorry, I couldn't find an answer to your question."


In [34]:
user_query = "Please provide a summary of the total outlay in the 2024 budget."
response = chatbot(user_query)
print(response)




Budget 2024-2025

Speech of
Nirmala Sitharaman
Minister of Finance
July 23, 2024
Hon’ble Speaker, 
	I present the Budget for 2024-25 
Introduction
The people of India have reposed their faith in the
