# Complete your thoughts with GPT-J On Intel Xeon using TensorFlow

This notebook uses HuggingFace's GPT-J model to perform text generation on Intel Xeon 

## Model :GPT-J (6B)
 **[GPT-J(6B)] (https://huggingface.co/EleutherAI/gpt-j-6b): released in March 2021.It was the largest open source GPT-3-style language model in the world at the time of release.**

 **GPT-J is similar to ChatGPT in ability, although it does not function as a chat bot, only as a text predictor.   Developed using Mesh     Tranformer & xmap in JAX**

 *The model consists of :
>
     - 28 layers 
     - Model dimension of 4096 
     - Feedforward dimension of 16384
     - 16 heads, each with a dimension of 256.*
>
The model is trained with a tokenization vocabulary of 50257, using the same set of Byte Pair Encoding(BPEs) as GPT-2/GPT-3.*


In [1]:
# importing libraries
import tensorflow as tf
import transformers
from transformers import (
    AutoConfig,
    AutoTokenizer,
    TFAutoModelForCausalLM
)
import time
import warnings
warnings.filterwarnings('ignore')

2023-09-15 11:38:51.378100: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-15 11:38:51.380051: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-15 11:38:51.419387: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-15 11:38:51.420232: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Get Config and Tokenizer for the model

In [2]:
tf.keras.mixed_precision.set_global_policy('mixed_bfloat16')

model_name = "EleutherAI/gpt-j-6B"
max_output_tokens = 32

# Initialize the text tokenizer
config = AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = 'left'

### Load Model

In [3]:
# Load the model weights
model = TFAutoModelForCausalLM.from_pretrained(model_name, config=config)
model.compile()

All model checkpoint layers were used when initializing TFGPTJForCausalLM.

All the layers of TFGPTJForCausalLM were initialized from the model checkpoint at EleutherAI/gpt-j-6B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPTJForCausalLM for predictions without further training.


In [4]:
generate_kwargs = dict(do_sample=False, num_beams=4, eos_token_id=model.config.eos_token_id)
gen = tf.function(lambda x: model.generate(x, max_new_tokens=max_output_tokens, **generate_kwargs))

In [5]:
def complete_my_thought(x):
    tokenized_data = tokenizer([x], return_tensors="tf").input_ids
    output = gen(tokenized_data)
    decoded = tokenizer.batch_decode(output, skip_special_tokens=True)
    return decoded

In [6]:
warmup_sentence = "This is a warmup sentence. Warmup helps get the model ready to showcase its capabilities."

In [7]:
complete_my_thought(warmup_sentence);

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
2023-09-15 11:41:43.286767: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f31a042d840 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-09-15 11:41:43.286844: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-09-15 11:41:43.315967: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


## Start Text Generation

In [8]:

input_sentence1 = "Ann Arbor is very pleasant in summers. The Huron river is an ideal spot for people to"
input_sentence2 = "Space is an intersting place. Stephen Hawking hypothesized that there might be multiple universes in which"
input_sentence3 = "In a shocking finding, scientists discovered a herd of unicorns living in a remote previously unexplored"
input_sentence4 = "Coffee is one of the most popular drinks in the world. It goes very well with"
input_sentence5 = "Dogs are often referred to as man's best friend. There are a number of reasons why"


In [9]:
out = complete_my_thought(input_sentence1)
print(out)

['Ann Arbor is very pleasant in summers. The Huron river is an ideal spot for people to relax and enjoy the fresh air. There are many places to visit in Ann Arbor. The University of Michigan is one of the best places to visit in Ann Arbor']


In [10]:
complete_my_thought(input_sentence2)

['Space is an intersting place. Stephen Hawking hypothesized that there might be multiple universes in which our universe is just one of many. If this is true, then there is a possibility that we are not the only intelligent life in the universe.\n\n']

In [11]:
complete_my_thought(input_sentence3)

["In a shocking finding, scientists discovered a herd of unicorns living in a remote previously unexplored part of the Amazon rainforest.\n\nThe discovery was made by a team of researchers from Brazil's National Institute of Amazonian Research (INPA) and"]

In [12]:
complete_my_thought(input_sentence4)

['Coffee is one of the most popular drinks in the world. It goes very well with milk and sugar, and it is a great way to start the day. But did you know that coffee can also help you lose weight?\n\nCoff']

In [13]:
complete_my_thought(input_sentence5)

["Dogs are often referred to as man's best friend. There are a number of reasons why this is the case. First of all, dogs are loyal, loving, and affectionate. They are also very intelligent and can be trained to do almost anything"]