# Getting Started with Granite Code

# Introduction - Text to SQL

In this notebook, you will learn how to convert natural language text to SQL through LLM. DB2 is used in this example and Langchain APIs are used to connect to database and execute queries.

Note: This notebook was tested in Windows OS

##  Pre-requisites

1. Python 3.11 or later
2. DB2 driver (DB2 is used in this example)
3. Install `langchain` and `replicate` Python packages using the following command in your terminal or command prompt:


In [None]:
!pip install langchain replicate

## Getting the variable from .env file 

This cell outlines how to load environment variables from a `.env` file and verify the availability of an API token. Create '.env' file with these variables

REPLICATE_API_TOKEN= "replicate token"

DATABASE_URI="database URI"


Here are the various formats of different Databases URIs to connect 

DB2 - db2+ibm_db://username:password@hostname:port/database

Oracle - oracle+cx_oracle://username:password@hostname:port/?service_name=service

PostgreSQL - postgresql://username:password@hostname:port/database

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

# Retrieve the API token directly from environment variables
replicate_api_token = os.getenv('REPLICATE_API_TOKEN')

if replicate_api_token:
    print('API token loaded successfully:', replicate_api_token)
else:
    print('Failed to load API token. Please check your .env file.')

## Initialize the Model 
In this section, we specify the model ID used to invoke specific models from IBM Granite on the Replicate platform. Depending on the model's capacity and the task complexity, you can choose between models with different instruction capacities like `8b` or `20b`.
### Model Retrieval

Here, we retrieve the model using the find_langchain_model function. This function is designed to locate and initialize models based on the platform ("Replicate") and the specified model_id.

In [3]:
from ibm_granite_community.langchain_utils import find_langchain_model
model_id = "ibm-granite/granite-8b-code-instruct-128k"
# model_id = "ibm-granite/granite-20b-code-instruct-8k"

granite_via_replicate = find_langchain_model(platform="Replicate", model_id=model_id)

## Connecting to Database and generating the SQL with a language model 



NOTE : Please download drivers of database you're using and try to install and make note of the path as its been used in the script. Below cell will interact with a DB2 database and set the schema.

In [None]:
#import re
from langchain_community.utilities import SQLDatabase
import os
import ibm_db_dbi

# Add the directory containing the DB2 DLLs to the search path
os.add_dll_directory('C:\\Program Files\\IBM\\SQLLIB\\BIN')

# Initialize the SQLDatabase
db = SQLDatabase.from_uri("<DATABASE_URI>")

# The function get_schema() dynamically fetches the current schema to ensure the prompt is accurate and up-to-date

def get_schema():
    return db.get_table_info()

try:
    db.run("SET SCHEMA 'OMDB'")  # Replacing 'execute' with 'run'
except ibm_db_dbi.ProgrammingError as pe:
    print(f"SQL Programming Error: {pe}")
except ibm_db_dbi.OperationalError as oe:
    print(f"Operational Error: {oe}")
except Exception as e:
    print(f"General SQL Error: {e}")

# Fetch schema information
try:
    schema_info = db.get_table_info()  # Using get_table_info to fetch schema info
    print(schema_info)
except AttributeError:
    print("Method get_schema() not found in SQLDatabase class.")
except Exception as e:
    print(f"Error fetching schema information: {e}")



## SQL Query Generation and Execution Documentation

A prompt is constructed to instruct the language model on what is expected. This prompt includes the current schema of the database and explicitly asks for an SQL query that answers the given question. 

Generates the SQL Query based on a language model, catering to specific database-related questions. It ensures that the queries are not only generated but also validated and executed securely.

In [None]:
# Define a question relevant to the database for example, for an order management system, the questions could be 'Show all orders', 'what is the total count of orders'
question = "Show all orders from yfs_order_header"

# Create a prompt for the model to generate an SQL query
prompt = f"""Based on the table schema below, write an SQL query that would answer the user's question; just return the SQL query and nothing else.
Schema:
{get_schema()}

Question: {question}

SQL Query:"""

print("Prompt for the model:", prompt)

# Invoke the language model to generate the SQL query
try:
    answer = granite_via_replicate.invoke(prompt)
    # Use regular expression to find a common SQL pattern
    match = re.search(r'\b(SELECT|INSERT|UPDATE|DELETE)\b.*?;', answer, re.IGNORECASE | re.DOTALL)
    if match:
        cleaned_answer = match.group(0).strip()
    else:
        cleaned_answer = "No valid SQL query found."
    print("Generated SQL Query:", cleaned_answer)
except Exception as e:
    print("Error invoking the model:", e)

# Validate and execute the SQL query if it is a valid string
if cleaned_answer and cleaned_answer != "No valid SQL query found.":
    print("Final SQL Query to execute:", cleaned_answer)
    try:
        result = db.run(cleaned_answer)  # Execute the cleaned SQL command and fetch results
        print("Query Results:", result)
    except ibm_db_dbi.ProgrammingError as pe:
        print("SQL Programming Error:", pe)
    except Exception as e:
        print("Error executing the SQL query:", e)
else:
    print(cleaned_answer)
