##  Using HuggingFace models on AWS Sagemaker for `text-to-sql` operations

### Environment setup (using conda)

Using Conda facilitates dependency management by creating isolated environments. Each environment can have distinct packages, preventing version conflicts. 

Once a you open a new terminal, the base environment of this notebook can be activated and pip can be used within it, combining pip's extensive package availability with Conda's isolation benefits. 

This approach ensures both flexibility and stability in projects, promoting consistent code execution across different setups.

#### Opening the Notebook Terminal

On the top pane of this notebook, you should see an icon that looks like a terminal (a square with an underscore).
Click on this icon. A new terminal tab will be opened.

In [None]:
# List the conda environments by running the following on the terminal window (NOT ON THIS NOTEBOOK).

!conda env list

#### Activate a conda environment

In [None]:
# To activate any conda environment, run the following command in the terminal.

!conda activate base

<div class="alert alert-block alert-info"> <b>NOTE</b> After installing the dependencies on the CLI, you may get warnings mentioning some depedencies are old / outdated. For the scope of this workshop we are not going to train any model instead we will only use the existing huggingface model, hence these messasges can be ignored since the below code will work regardless of those wranings.</div>

In [None]:
!pip install transformers tensorflow torch torchvision pytorch_lightning sentencepiece

In [3]:
import json

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_name = "mrm8488/t5-base-finetuned-wikiSQL"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

2023-08-15 02:35:12.420438: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
You are using the legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request avai

In [4]:

def text_to_SQL(input_text):

    # Prepending english to sql for the incoming prompt text
    output = generator("translate English to SQL: %s </s>" % input_text)

    # Extract the first key from the output
    json_output = json.dumps(output[0])

    # Parsing the output as JSON
    sql_query = json.loads(json_output)

    # Return the generated reseponse from the model
    return sql_query['generated_text']

'SELECT AVG Age (years) FROM table WHERE Device = mobile'

In [None]:
text_to_SQL("What is the average age of the respondents using a mobile device?")