 # Hugging Face's Transformer Library

 **Better to run in Google Colab** 
 
 Hugging Face Transformers is an open-source framework for deep learning created by Hugging Face.
 It provides APIs and tools to download state-of-the-art pre-trained models and further tune them to maximize performance.
 These models support common tasks in different modalities, such as natural language processing, computer vision, audio, and multi-modal applications.
 Using pretrained models can reduce your compute costs, carbon footprint,
 and save you the time and resources required to train a model from scratch.

 - https://huggingface.co/docs/transformers/index
 - https://huggingface.co/docs/hub/index

 Accelerate library to help users easily train a 🤗 Transformers model on any type of distributed setup,
 whether it is multiple GPU's on one machine or multiple GPU's across several machines.

 Note: To use the LLAMA model, you need to get access to Llama first. You can do so by filling out this [form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)


In [25]:
# Install libraries in Google colab
# !pip install -q transformers langchain huggingface_hub accelerate sentencepiece

In [1]:
# we need to login to Hugging Face to have access to their inference API.
# This step requires a free Hugging Face token.

from huggingface_hub import login
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch

from langchain import PromptTemplate,  LLMChain


# Log in to Hugginb face using your token.
login("YOUR API KEY")  # Hugging face token

  from .autonotebook import tqdm as notebook_tqdm


Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/kdlx593/.cache/huggingface/token
Login successful


In [30]:

# Let's bring the Lama2 model into play. 
# Remember to get access to LAMA2 beforehand!
# https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

# model = "meta-llama/Llama-2-7b-chat-hf" -> Requires access from META
model = "mrm8488/t5-base-finetuned-wikiSQL"
# Load tokenizer and model
# Here we are loading the tokenization used for this particular model. 
# Tokenization basically treats each word (and some characters) as an object
# which has valuable attributes used in NLP
tokenizer = AutoTokenizer.from_pretrained(model)

# Set up text generation pipeline. This will handle many of the operations
# for you in the background, including tokenization, loading the model, etc.
# For now let's just leave this configurations as they are. 
pipeline = transformers.pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    # torch_dtype=torch.bfloat16,  # activate this option for llama2
    device_map="auto",  # GPU or CPU, auto for automatic selection
    max_new_tokens = 512,
    # do_sample=True,  # activate for llama2
    top_k=10,
    # num_return_sequences=1,  # activate for llama2
    # eos_token_id=tokenizer.eos_token_id,  # activate for llama2
)

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


In [31]:
# 'HuggingFacePipeline' class creates a custom pipeline for text generation, and we are passing
# the pipeline that we defined earlier along with some model-specific keyword arguments - temperature here.

temperature = 0  # give any value between 0 and 1
llm = HuggingFacePipeline(
    pipeline=pipeline, 
    model_kwargs={
        'temperature': temperature
    }
)

In [32]:
# We create the prompt that we are going to use to feed to the llm.
template = """
   Transalte English to SQL from the prompt below:
   ```{text}```
"""

prompt = PromptTemplate(template=template, input_variables=["text"])
# This will allow us to just add our query and have the answer we are looking for
llm_chain = LLMChain(prompt=prompt, llm=llm)


# Let's test it

In [33]:
text = """
What is the averege age of our clients, grouped by store
"""
print(llm_chain.run(text))



SELECT Age (years) FROM table WHERE Group by store = 


In [34]:
text = """
How many people live in Barcelona
"""
print(llm_chain.predict(text=text))

SELECT COUNT Population (2001 census) FROM table WHERE City = Barcelona


In [35]:
text = """
How many people live in Barcelona
"""
print(llm_chain.run(text))



SELECT COUNT Population (2001 census) FROM table WHERE City = Barcelona


In [36]:
text = """
I want to know the maximum Spend per day by client in table ExampeTable3, 
"""
print(llm_chain.run(text))



SELECT MAX Spend per day by client FROM table ExampeTable3


In [43]:
# What if I don't want to import and load the model to disc. 


import requests


API_URL = "https://api-inference.huggingface.co/models/mrm8488/t5-base-finetuned-wikiSQL"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": " translate English to SQL: How many people live in Spain",
})

In [44]:
output

[{'generated_text': 'SELECT COUNT Population FROM table WHERE Country = Spain'}]