# CTranslate2

## Overview

CTranslate2 is a C++ and Python library for efficient inference with Transformer models. The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

I try to use it for Flan-T5 models by refer to the sample codes from https://opennmt.net/CTranslate2/guides/transformers.html#t5

## Setup


In [None]:
%pip install -qU ctranslate2 transformers[torch] sentencepiece

## Convert Model
### Use Code


In [None]:
#import ctranslate2 
#model_id = "google/flan-t5-small"
#ct = ctranslate2.converters.TransformersConverter(model_name_or_path=model_id)
#ct.convert(output_dir="google/flan-t5-small-ct2", force=True)

### Use command

In [None]:
!ct2-transformers-converter --model google/flan-t5-small --output_dir google/flan-t5-small-ct2 --quantization int8 --force


## Sample Codes


In [None]:
import ctranslate2
import transformers

translator = ctranslate2.Translator("google/flan-t5-xl-ct2")
tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-xl-ct2")

input_text = "translate English to German: The house is wonderful."
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

results = translator.translate_batch([input_tokens])

output_tokens = results[0].hypotheses[0]
output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

print(output_text)


In [None]:
input_text = "What is AI? Tell me more about it."
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

results = translator.generate_tokens(input_tokens)

END_TOKEN = '</s>'
for i, result in enumerate(results):
    if result.token != END_TOKEN:
        if i == 0:
            print(result.token.replace('▁', ''), end='')
        else:    
            print(result.token.replace('▁', ' '), end='')

In [None]:
input_text = "What is AI? Tell me more about it."
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

results = translator.translate_iterable([input_tokens])

for result in results:
    for hypothesis in result.hypotheses:
        print(tokenizer.decode(tokenizer.convert_tokens_to_ids(hypothesis)))

In [None]:
from custom.llms.ctranslate2 import Ct2Translator
llm = Ct2Translator(model_path="../ct2/ct2fast-flan-alpaca-xl")
llm

In [None]:
llm.init()

In [None]:
from langchain import PromptTemplate, LLMChain
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
llm_chain.run("What is AI?")