<a href="https://colab.research.google.com/github/hakeem750/Know-SQL/blob/main/huggingface_model_fine_tune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install datasets transformers lightning

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting lightning
  Downloading lightning-2.2.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━

# Introduction

<hr>

**Fine Tune a NLP Model:**

Fine-tuning a natural language processing (NLP) model involves adjusting the hyperparameters and architecture of the model, and often also involves adjusting the dataset, to improve the performance of the model on a specific task. This can be done by adjusting the learning rate, the number of layers in the model, the size of the embeddings, and many other factors. Fine-tuning is often used to adapt a pre-trained model to a new dataset or task, and can be a time-consuming process that requires a good understanding of the model and the task at hand.

![Transformder.png](attachment:a9dd3e11-21f9-43cf-b20d-01823fe00b8f.png)

**Fine-tuning a machine learning model can offer several benefits:**

**Improved performance:** Fine-tuning a model can help to improve its performance on a specific task, by adjusting the hyperparameters and architecture of the model to suit the characteristics of the task and the dataset.

**Use of transfer learning:** Fine-tuning a pre-trained model can allow you to take advantage of transfer learning, which means using the knowledge learned by the model on one task to improve its performance on a related task. This can save time and resources compared to training a model from scratch.

**Fine-grained control:** Fine-tuning a model allows you to have fine-grained control over the model's behavior and make specific adjustments to suit your needs.

**Customization:** Fine-tuning a model allows you to customize it for a specific task or dataset, which can be useful if you have unique requirements that are not met by off-the-shelf models.




# Import necessary libraries

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


import torch
import datasets
from transformers import T5Tokenizer
from transformers import T5ForConditionalGeneration, AdamW

import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint

pl.seed_everything(100)

import warnings
warnings.filterwarnings("ignore")

INFO:lightning_fabric.utilities.seed:Seed set to 100


# About Hugging Face Models

<hr>

Hugging Face is a company that provides a platform for training and deploying natural language processing (NLP) models. The platform includes a library of pre-trained models that can be used for a variety of NLP tasks, such as language translation, text generation, and question answering. These models are trained on large datasets and are designed to perform well on a wide range of NLP tasks.

The Hugging Face platform also provides tools for fine-tuning pre-trained models on specific datasets, which can be useful for adapting models to specific domains or languages. Additionally, the platform provides APIs for accessing and using the pre-trained models in applications, as well as tools for building custom models and deploying them to the cloud.

There are several benefits to using the Hugging Face library for natural language processing (NLP) tasks:

**Wide selection of models:** The Hugging Face library provides access to a large number of pre-trained NLP models, including models trained on tasks such as language translation, question answering, and text classification. This makes it easy to find a model that is suited to your specific needs.

**Compatibility with multiple platforms:** The Hugging Face library can be used with popular deep learning platforms such as TensorFlow, PyTorch, and Keras, making it easy to incorporate into your existing workflow.

**Easy fine-tuning:** The Hugging Face library includes tools for fine-tuning pre-trained models on your own dataset, which can save you time and effort compared to training a model from scratch.

**Active community:** The Hugging Face library has a large and active community of users, which means that you can get help and support when you need it, and also contribute to the development of the library.

**Well-documented:** The Hugging Face library has comprehensive documentation, which makes it easy to get started and learn how to use the library effectively.

# Import Dataset

In [4]:
dataset = datasets.load_dataset('Hakeem750/know__sql')

Downloading readme:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/4.70M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/524k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/44510 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/4946 [00:00<?, ? examples/s]

In [13]:
df = dataset["train"].to_pandas()
test = dataset["test"].to_pandas()

In [6]:
df.head()

Unnamed: 0,context,question,answer
0,CREATE TABLE table_name_20 (programming VARCHA...,What is the programming for Kanal 5?,SELECT programming FROM table_name_20 WHERE na...
1,CREATE TABLE table_21091127_1 (opponent VARCHA...,Which opponent has a record of 6-2?,SELECT opponent FROM table_21091127_1 WHERE re...
2,"CREATE TABLE table_name_91 (lost INTEGER, poin...",Which Lost has Points larger than 13?,SELECT AVG(lost) FROM table_name_91 WHERE poin...
3,"CREATE TABLE table_name_10 (rank INTEGER, game...",Which lowest rank(player) has a rebound averag...,SELECT MIN(rank) FROM table_name_10 WHERE reb_...
4,"CREATE TABLE table_name_97 (bronze INTEGER, na...","Which Bronze has a Gold smaller than 16, a Ran...",SELECT AVG(bronze) FROM table_name_97 WHERE go...


In [14]:
df = df[['context','question', 'answer']]
test = test[['context','question', 'answer']]

In [8]:
print("Number of records: ", df.shape[0])

Number of records:  44510


# Problem Statement

<hr>

#### "To build a model that can generate answers based on given context and questions"

For example,

*Context = "Clustering groups of similar cases, for example, can find similar patients, or can be used for customer segmentation in the banking field. Association technique is used for finding items or events that often co-occur, for example, grocery items that are usually bought together by a particular customer. Anomaly detection is used to discover abnormal
and unusual cases, for example, it is used for credit card fraud detection."*

*Question = "what is the example of Anomaly detection?"*

*Answer =  ????????????????????????????????*


In [9]:
# # df["context"] = df["context"].str.lower()
# df["question"] = df["question"].str.lower()
# # df["text"] = df["text"].str.lower()

df.head()

Unnamed: 0,context,question,answer
0,CREATE TABLE table_name_20 (programming VARCHA...,What is the programming for Kanal 5?,SELECT programming FROM table_name_20 WHERE na...
1,CREATE TABLE table_21091127_1 (opponent VARCHA...,Which opponent has a record of 6-2?,SELECT opponent FROM table_21091127_1 WHERE re...
2,"CREATE TABLE table_name_91 (lost INTEGER, poin...",Which Lost has Points larger than 13?,SELECT AVG(lost) FROM table_name_91 WHERE poin...
3,"CREATE TABLE table_name_10 (rank INTEGER, game...",Which lowest rank(player) has a rebound averag...,SELECT MIN(rank) FROM table_name_10 WHERE reb_...
4,"CREATE TABLE table_name_97 (bronze INTEGER, na...","Which Bronze has a Gold smaller than 16, a Ran...",SELECT AVG(bronze) FROM table_name_97 WHERE go...


# Initialize Parameters

<hr>


**Input length:** This refers to the number of input tokens (e.g. words or characters) in a single example that is fed into the model during training. For example, if you are training a language model to predict the next word in a sentence, the input length would be the number of words in the sentence.

**Output length:** This refers to the number of output tokens (e.g. words or characters) in a single example that the model is expected to generate during training. The output length would be the number of words in the sentence that the model is expected to predict.

**Training batch size:** This refers to the number of examples that are processed by the model at a time during training. For example, if the training batch size is 32, the model will process 32 examples (e.g. 32 sentences) at a time before updating the model weights.

**Validating batch size:** This is similar to the training batch size, but it refers to the number of examples that are processed by the model at a time during validation (i.e. when the model is being evaluated on a hold-out dataset).

**Epochs:** An epoch refers to a single pass through the entire training dataset. So, if the training dataset contains 1000 examples and the training batch size is 32, then it will take 32 training steps to complete one epoch. If the model is trained for 10 epochs, it will have processed a total of 10 * 1000 = 10000 examples.

In [10]:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_MAX_LEN = 512 # Input length
OUT_MAX_LEN = 128 # Output Length
TRAIN_BATCH_SIZE = 8 # Training Batch Size
VALID_BATCH_SIZE = 2 # Validation Batch Size
EPOCHS = 5 # Number of Iteration

# T5 Transformer

<hr>

The T5 model is based on the Transformer architecture, which is a type of neural network that is designed to process sequential input data efficiently. It consists of an encoder and a decoder, which are both made up of a series of interconnected "layers."

Each layer in the encoder and decoder is made up of a series of "attention" mechanisms and "feedforward" networks. The attention mechanisms allow the model to focus on different parts of the input sequence at different times, while the feedforward networks transform the input data using a series of weights and biases.

The T5 model also uses something called "self-attention," which allows each element in the input sequence to attend to all of the other elements in the sequence. This enables the model to capture relationships between words and phrases in the input data, which is important for many NLP tasks.

In addition to the encoder and decoder, the T5 model also includes something called a "language model head," which is used to predict the next word in a sequence given the previous words. This is important for tasks like translation and text generation, where the model needs to generate coherent and natural-sounding output.

Overall, the T5 model is a very large and complex neural network, but it is designed to be highly efficient and effective at processing sequential data. It has been trained on a massive dataset of text and can perform a wide range of NLP tasks with state-of-the-art accuracy.

Research Paper: https://arxiv.org/pdf/1910.10683.pdf

# T5Tokenizer

<hr>

T5Tokenizer is used to convert a piece of text into a list of tokens, where each token is a sequence of characters that represents a single word or punctuation mark. The tokenizer also adds special tokens to the input text to indicate the beginning and end of the text, as well as to separate different sentences.

The T5Tokenizer uses a combination of character-level and word-level tokenization, with a subword-level tokenization approach similar to that used by the SentencePiece tokenizer. It divides the input text into subwords by taking into account the frequency of each character or character sequence in the training data. This helps the tokenizer handle out-of-vocabulary (OOV) words that are not in the training data but still appear in the test data.

The T5Tokenizer also adds special tokens to the text to indicate the beginning and end of the text and to separate different sentences. For example, it adds the tokens < s > and </ s > to indicate the beginning and end of a sentence, respectively, and < pad > to indicate padding.
    

In [11]:
MODEL_NAME = "t5-base"

tokenizer = T5Tokenizer.from_pretrained(MODEL_NAME, model_max_length= INPUT_MAX_LEN)

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [12]:
print("eos_token: {} and id: {}".format(tokenizer.eos_token, tokenizer.eos_token_id)) # End of token (eos_token)
print("unk_token: {} and id: {}".format(tokenizer.unk_token, tokenizer.eos_token_id)) # Unknown token (unk_token)
print("pad_token: {} and id: {}".format(tokenizer.pad_token, tokenizer.eos_token_id)) # Pad token (pad_token)

eos_token: </s> and id: 1
unk_token: <unk> and id: 1
pad_token: <pad> and id: 1


# Dataset Preparation

<hr>

When working with PyTorch, you typically use a dataset class to prepare your data for use with the model. The dataset class is responsible for loading the data from disk and performing any necessary preprocessing steps, such as tokenization and numericalization. The class should also implement the __getitem__ method, which is used to retrieve a single item from the dataset by index.

The __init__ method initializes the dataset with the list of texts, the list of labels, and the tokenizer.

The __len__ method returns the number of samples in the dataset.

The __getitem__ method is used to retrieve a single item from the dataset by index. It takes an index idx as an input and return the tokenized input and labels as output.

It is also common to add some additional preprocessing step such as padding and truncating the tokenized inputs. You could also convert the labels to a tensor.

In [15]:
class T5Dataset:

    def __init__(self, context, question, target):
        self.context = context
        self.question = question
        self.target = target
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN

    def __len__(self):
        return len(self.context)

    def __getitem__(self, item):
        context = str(self.context[item])
        context = " ".join(context.split())

        question = str(self.question[item])
        question = " ".join(question.split())

        target = str(self.target[item])
        target = " ".join(target.split())


        inputs_encoding = self.tokenizer(
            context,
            question,
            add_special_tokens=True,
            max_length=self.input_max_len,
            padding = 'max_length',
            truncation='only_first',
            return_attention_mask=True,
            return_tensors="pt"
        )


        output_encoding = self.tokenizer(
            target,
            None,
            add_special_tokens=True,
            max_length=self.out_max_len,
            padding = 'max_length',
            truncation= True,
            return_attention_mask=True,
            return_tensors="pt"
        )


        inputs_ids = inputs_encoding["input_ids"].flatten()
        attention_mask = inputs_encoding["attention_mask"].flatten()
        labels = output_encoding["input_ids"]

        labels[labels == 0] = -100  # As per T5 Documentation

        labels = labels.flatten()

        out = {
            "context": context,
            "question": question,
            "answer": target,
            "inputs_ids": inputs_ids,
            "attention_mask": attention_mask,
            "targets": labels
        }


        return out

# DataLoader

<hr>

The DataLoader class is used to load data in parallel and in batches, making it easy to work with large datasets that do not fit in memory. The DataLoader class is typically used in conjunction with a dataset class that provides the data to be loaded.

When training a transformer model, the dataloader is responsible for iterating through the dataset, returning a batch of data to the model for training or evaluation. The DataLoader class provides several options for controlling how the data is loaded and preprocessed, including the batch size, the number of worker threads to use for loading the data, and whether to shuffle the data before each epoch.

In [19]:
class T5DatasetModule(pl.LightningDataModule):

    def __init__(self, df_train, df_valid):
        super().__init__()
        self.df_train = df_train
        self.df_valid = df_valid
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN


    def setup(self, stage=None):

        self.train_dataset = T5Dataset(
        context=self.df_train.context.values,
        question=self.df_train.question.values,
        target=self.df_train.answer.values
        )

        self.valid_dataset = T5Dataset(
        context=self.df_valid.context.values,
        question=self.df_valid.question.values,
        target=self.df_valid.answer.values
        )

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
         self.train_dataset,
         batch_size= TRAIN_BATCH_SIZE,
         shuffle=True,
         num_workers=4
        )


    def val_dataloader(self):
        return torch.utils.data.DataLoader(
         self.valid_dataset,
         batch_size= VALID_BATCH_SIZE,
         num_workers=1
        )

# Model Building

<hr>

When building a transformer model in PyTorch, you typically start by defining a custom class that inherits from **torch.nn.Module**. This class defines the architecture of the model, including the layers and the forward function.

The __init__ method of the class is used to define the architecture of the model, typically by instantiating the various layers of the model and assigning them as attributes of the class.

The **forward method** is responsible for performing the forward pass of the data through the model. The forward method takes in the input data and applies the model's layers to the input in order to produce the output. The forward method should implement the logic of the model, such as applying the input through a series of layers and returning the output.

The __init__ method of the class instantiates an embedding layer, a transformer layer, and a fully connected layer and assigns them as attributes of the class.

The **forward method** takes in the input data x , applies the input through the defined layers and returns the output.

When training a transformer model, the training process typically consists of two main steps: **the training step** and **the validation step**.

The **training_step** method defines the logic for performing a single step of training, which typically includes:

- forward pass through the model
- computing the loss
- computing gradients
- updating the model's parameters

The **val_step** method is similar to the training_step method, but it is used to evaluate the model on a validation set. It typically includes:

- forward pass through the model
- computing the evaluation metrics

In [20]:
class T5Model(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict=True)

    def forward(self, input_ids, attention_mask, labels=None):

        output = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        return output.loss, output.logits


    def training_step(self, batch, batch_idx):

        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)


        self.log("train_loss", loss, prog_bar=True, logger=True)

        return loss

    def validation_step(self, batch, batch_idx):
        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)

        self.log("val_loss", loss, prog_bar=True, logger=True)

        return loss


    def configure_optimizers(self):
        return AdamW(self.parameters(), lr=0.0001)

# Model Training

<hr>

Training a transformer model typically involves iterating through the dataset in batches, passing the data through the model, and updating the model's parameters based on the computed gradients and a set of optimization rules.

In [24]:
def run():

    df_train, df_valid = df, test

    df_train = df_train.fillna("none")
    df_valid = df_valid.fillna("none")

    df_train['context'] = df_train['context'].apply(lambda x: " ".join(x.split()))
    df_valid['context'] = df_valid['context'].apply(lambda x: " ".join(x.split()))

    df_train['answer'] = df_train['answer'].apply(lambda x: " ".join(x.split()))
    df_valid['answer'] = df_valid['answer'].apply(lambda x: " ".join(x.split()))

    df_train['question'] = df_train['question'].apply(lambda x: " ".join(x.split()))
    df_valid['question'] = df_valid['question'].apply(lambda x: " ".join(x.split()))


    df_train = df_train.reset_index(drop=True)
    df_valid = df_valid.reset_index(drop=True)

    dataModule = T5DatasetModule(df_train, df_valid)
    dataModule.setup()

    device = DEVICE
    models = T5Model()
    models.to(device)

    checkpoint_callback  = ModelCheckpoint(
        dirpath="./",
        filename="best_checkpoint",
        save_top_k=2,
        verbose=True,
        monitor="val_loss",
        mode="min"
    )

    trainer = pl.Trainer(
        callbacks = checkpoint_callback,
        max_epochs= EPOCHS,
        accelerator="auto"
    )

    trainer.fit(models, dataModule)

run()

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 222 M 
-----------------------------------------------------
222 M     Trainable params
0         Non-trainable params
222 M     Total params
891.614   Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 5564: 'val_loss' reached 0.04067 (best 0.04067), saving model to '/content/best_checkpoint.ckpt' as top 2


# Model Prediction

<hr>

In [38]:
train_model = T5Model.load_from_checkpoint("./best_checkpoint.ckpt").to(DEVICE)

train_model.freeze()

def generate_question(context, question):

    inputs_encoding =  tokenizer(
        context,
        question,
        add_special_tokens=True,
        max_length= INPUT_MAX_LEN,
        padding = 'max_length',
        truncation='only_first',
        return_attention_mask=True,
        return_tensors="pt"
        ).to(DEVICE)


    generate_ids = train_model.model.generate(
        input_ids = inputs_encoding["input_ids"],
        attention_mask = inputs_encoding["attention_mask"],
        max_length = INPUT_MAX_LEN,
        num_beams = 4,
        num_return_sequences = 1,
        no_repeat_ngram_size=2,
        early_stopping=True,
        )

    preds = [
        tokenizer.decode(gen_id,
        skip_special_tokens=True,
        clean_up_tokenization_spaces=True)
        for gen_id in generate_ids
    ]

    return "".join(preds)



## Prediction

In [25]:
test.head()

Unnamed: 0,context,question,answer
0,CREATE TABLE table_name_24 (weight__kg_ INTEGE...,What is the highest weight of the position scr...,SELECT MAX(weight__kg_) FROM table_name_24 WHE...
1,"CREATE TABLE table_name_23 (money___ INTEGER, ...","Which money has a Country of united states, an...",SELECT AVG(money___) AS $__ FROM table_name_23...
2,"CREATE TABLE table_name_19 (year INTEGER, engi...",Name the average yeara for engine of renault v...,SELECT AVG(year) FROM table_name_19 WHERE engi...
3,CREATE TABLE table_name_16 (transfer_window VA...,What is the transfer period for habarugira?,SELECT transfer_window FROM table_name_16 WHER...
4,CREATE TABLE table_24565004_20 (appearances¹ V...,Name the number of appearances for yugoslavia,SELECT COUNT(appearances¹) FROM table_24565004...


In [27]:
test["context"][0]

'CREATE TABLE table_name_24 (weight__kg_ INTEGER, position VARCHAR)'

In [28]:
context = test["context"][0]

In [30]:
test["question"][0]

'What is the highest weight of the position scrum half?'

In [31]:
que = test["question"][0]

In [39]:
print(generate_question(context, que))

SELECT MAX(weight__kg_) FROM table_name_24 WHERE position = "scram half"


In [54]:
idx = 100

context = test["context"][idx]

que = test["question"][idx]
print(f"Schema: {context}")
print()
print()
print(f"Que: {que}")
print()
print()
print(f"True: {test['answer'][idx]}")
print()
print()
print(f"Predicted: {generate_question(context, que)}")

Schema: CREATE TABLE table_name_60 (house VARCHAR, abbr VARCHAR)


Que: Which house has an abbreviation of G?


True: SELECT house FROM table_name_60 WHERE abbr = "g"


Predicted: SELECT house FROM table_name_60 WHERE abbr = "g"


In [64]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [65]:
from transformers import T5ForConditionalGeneration, TFT5ForConditionalGeneration, FlaxT5ForConditionalGeneration
from transformers import AutoTokenizer

In [59]:
model_name = "t5-know-sql-generation"

In [61]:
train_model.model.save_pretrained(f"./{model_name}")

In [None]:
tokenizer.save_pretrained(MODEL_NAME)

In [69]:
#ttokenizer = AutoTokenizer.from_pretrained(f"./{model_name}")

In [70]:

pt_model = T5ForConditionalGeneration.from_pretrained(f"./{model_name}")

tokenizer.push_to_hub(model_name)
pt_model.push_to_hub(model_name)

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Hakeem750/t5-know-sql-generation/commit/839dc1726ecf57a881d285c3883c1905207f32e9', commit_message='Upload T5ForConditionalGeneration', commit_description='', oid='839dc1726ecf57a881d285c3883c1905207f32e9', pr_url=None, pr_revision=None, pr_num=None)