In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# **Install required libraries**


**huggingface_hub** : 

    The huggingface_hub library is a Python package by Hugging Face, designed for easy access to NLP resources like pre-trained models and datasets hosted on the Hugging Face Hub. It simplifies tasks like model access, sharing, fine-tuning, and dataset downloading.

**transformers** : 

    The Transformers library, developed by Hugging Face, is a popular open-source Python library for natural language processing (NLP). It provides easy access to a wide range of pre-trained models for various NLP tasks, such as text classification, named entity recognition, text generation, and more. The library is built on top of PyTorch and TensorFlow, allowing users to easily implement state-of-the-art models like BERT, GPT, RoBERTa, and many others. With Transformers, users can fine-tune these pre-trained models on custom datasets, perform inference on new data, and even create their own models. It's widely used in both research and industry for tasks ranging from sentiment analysis to machine translation.

**accelerate** : 

    The Accelerate library is a Python package developed by Hugging Face. It's designed to optimize the training and inference performance of deep learning models, particularly those built with the Transformers library. Accelerate leverages mixed precision training, distributed training, and efficient data loading techniques to speed up the training process and improve resource utilization. By using Accelerate, users can train large-scale models faster and more efficiently, making it a valuable tool for researchers and practitioners working in natural language processing (NLP) and other deep learning domains.

**bitsandbytes** : 

    bitsandbytes enables accessible large language models via k-bit quantization for PyTorch. bitsandbytes provides three main features for dramatically reducing memory consumption for inference and training:

    8-bit optimizers uses block-wise quantization to maintain 32-bit performance at a small fraction of the memory cost.
    LLM.Int() or 8-bit quantization enables large language model inference with only half the required memory and without any performance degradation. This method is based on vector-wise quantization to quantize most features to 8-bits and separately treating outliers with 16-bit matrix multiplication.
    QLoRA or 4-bit quantization enables large language model training with several memory-saving techniques that don’t compromise performance. This method quantizes a model to 4-bits and inserts a small set of trainable low-rank adaptation (LoRA) weights to allow training.

In [None]:
!pip install -q -U huggingface_hub
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes

# Access Huggingface token at the notebook to connect Kaggle notebook to Huggingface

In [1]:
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
access_token = UserSecretsClient().get_secret("HUGGINGFACE_TOKEN")
login(token=access_token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Create instance of Gemma LLM using Huggingface transformer library


**AutoTokenizer: ** In the Hugging Face library, AutoTokenizer is a class designed to provide an easy way to instantiate tokenizers for various pre-trained models without needing to know the exact tokenizer class name or import statement for a specific model architecture. It's part of the transformers package.

Here's how AutoTokenizer works:

    Automatic Selection: When you instantiate AutoTokenizer with the name of a pre-trained model, it automatically selects the appropriate tokenizer for that model.

    Model-Agnostic: You can use AutoTokenizer with any model architecture supported by Hugging Face, whether it's BERT, GPT, RoBERTa, etc.

    Usage: You can instantiate AutoTokenizer with the model name, and then use the tokenizer as you would with any other tokenizer object.

      
**AutoModelForCausalLM:** It is a class provided by the Hugging Face Transformers library. It's designed to automatically select and instantiate a pre-trained language model for causal language modeling (LM) tasks.

Here's what you need to know about AutoModelForCausalLM:

    Automatic Model Selection: Similar to AutoTokenizer, AutoModelForCausalLM automatically selects the appropriate pre-trained model architecture for causal language modeling tasks based on the model name provided.

    Causal Language Modeling: Causal language modeling is a type of language modeling task where the model predicts the next token in a sequence given the preceding tokens. It's commonly used for tasks like text generation.

    Usage: You can instantiate AutoModelForCausalLM with the name of the pre-trained model you want to use, and then use the model for tasks such as generating text.
    

**BitsAndBytesConfig:** [BitsAndBytesConfig](https://huggingface.co/blog/4bit-transformers-bitsandbytes)


[](https://huggingface.co/blog/4bit-transformers-bitsandbytes)

In [2]:
%%time
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Check what type of Device enabled (GPU or CPU)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
# Load the model
quantization_config = BitsAndBytesConfig(
                                        load_in_4bit=True,
                                        bnb_4bit_use_double_quant=True,
                                        bnb_4bit_quant_type="nf4",
                                        bnb_4bit_compute_dtype=torch.bfloat16,)
tokenizer = AutoTokenizer.from_pretrained("/kaggle/input/gemma/transformers/2b-it/2")
model = AutoModelForCausalLM.from_pretrained("/kaggle/input/gemma/transformers/2b-it/2", quantization_config=quantization_config, low_cpu_mem_usage=True)# Use the model

cuda:0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

CPU times: user 10.7 s, sys: 5.47 s, total: 16.2 s
Wall time: 36.4 s


# Generate basic response from LLM

In [3]:
%%time
input_text = "What is the best thing about Kaggle?"
# Encode input text to PyTorch tensors
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=True, max_new_tokens=100, temperature=0.5)

print(tokenizer.decode(outputs[0]))

2024-03-09 11:54:11.768203: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-09 11:54:11.768340: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-09 11:54:11.969685: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


<bos>What is the best thing about Kaggle?

**Answer:**

**The best thing about Kaggle is its vast and diverse dataset of labeled and unlabeled data, covering a wide range of domains and industries.** This allows users to explore different problems, learn new skills, and build robust data analytics models.

Here are some of the other key benefits of Kaggle:

* **Community engagement:** Kaggle is a vibrant community of data scientists, analysts, and entrepreneurs who share knowledge, resources, and insights.
* **Collaboration tools
CPU times: user 14.3 s, sys: 1.2 s, total: 15.5 s
Wall time: 21.1 s


****Import the Kaggle Winning Solution CSV file

# Add Kaggle winning solution dataset at the notebook to generate summary of solution writeup

[Kaggle Winning Solution Dataset](https://www.kaggle.com/datasets/thedrcat/kaggle-winning-solutions-methods)

In [4]:
import pandas as pd
data = pd.read_csv('/kaggle/input/kaggle-winning-solutions-methods/kaggle_winning_solutions_methods.csv')
data.head(2)

Unnamed: 0,link,place,competition_name,prize,team,kind,metric,year,nm,writeup,num_tokens,methods,cleaned_methods
0,https://www.kaggle.com/c/asl-signs/discussion/...,2,Google - Isolated Sign Language Recognition,"$100,000",1165,Research,PostProcessorKernelDesc,2023,406306,<h2>TLDR</h2>\n<p>We used an approach similar ...,2914,"['EfficientNet-B0', 'Data Augmentation', 'Norm...",Replace augmentation
1,https://www.kaggle.com/c/asl-signs/discussion/...,2,Google - Isolated Sign Language Recognition,"$100,000",1165,Research,PostProcessorKernelDesc,2023,406306,<h2>TLDR</h2>\n<p>We used an approach similar ...,2914,"['EfficientNet-B0', 'Data Augmentation', 'Norm...",Finger tree rotate


# Check the first row and writup field text to understand the input text

In [5]:
data['writeup'][1]

'<h2>TLDR</h2>\n<p>We used an approach similar to audio spectrogram classification using the EfficientNet-B0 model, with numerous augmentations and transformer models such as BERT and DeBERTa as helper models. The final solution consists of one EfficientNet-B0 with an input size of 160x80, trained on a single fold from 8 randomly split folds, as well as DeBERTa and BERT trained on the full dataset. A single fold model using EfficientNet has a CV score of 0.898 and a leaderboard score of ~0.8.</p>\n<p>We used only competition data.</p>\n<h2>1. Data Preprocessing</h2>\n<h3>1.1 CNN Preprocessing</h3>\n<ul>\n<li>We extracted 18 lip points, 20 pose points (including arms, shoulders, eyebrows, and nose), and all hand points, resulting in a total of 80 points.</li>\n<li>During training, we applied various augmentations.</li>\n<li>We implemented standard normalization.</li>\n<li>Instead of dropping NaN values, we filled them with zeros after normalization.</li>\n<li>We interpolated the time ax

# Create context from the loaded dataset to summarize the writeup. 

Here first row is taken to create context. You can uncomment next notebook cell to create a seperate 'context' field in the pandas dataframe in case you want to run for loop to generate summary of multiple solutions writeup.

To summarize the Kaggle Winning Solution writeup, you need other details as well other  than writup like provided in the below cell so that you get meaningful summarization.

Also you can remove hyperlinks or any other unwanted texts that may not be useful for the summarization after doing pre-processing on the context.

In [7]:
context = "Competition Name: " + data['competition_name'][1] + \
    ",\nYear: " + data['year'][1].astype(str) + \
    ",\nPlace: " + data['place'][1].astype(str) + \
    ",\nMethods Used: " + data['methods'][1] + \
    ",\nSolution: " + data['writeup'][1]


print(context)

Competition Name: Google - Isolated Sign Language Recognition,
Year: 2023,
Place: 2,
Methods Used: ['EfficientNet-B0', 'Data Augmentation', 'Normalization', 'Interpolation', 'BERT', 'DeBERTa', 'Mixup', 'Replace augmentation', 'Time and frequence masking', 'Random affine', 'Random interpolation', 'Flip pose', 'Finger tree rotate', 'Onecycle scheduler', 'Weighted CrossEntropyLoss', 'Hypercolumn', 'Ranger optimizer', 'Transformer', 'Knowledge distillation', 'Optuna', 'Label smoothing'],
Solution: <h2>TLDR</h2>
<p>We used an approach similar to audio spectrogram classification using the EfficientNet-B0 model, with numerous augmentations and transformer models such as BERT and DeBERTa as helper models. The final solution consists of one EfficientNet-B0 with an input size of 160x80, trained on a single fold from 8 randomly split folds, as well as DeBERTa and BERT trained on the full dataset. A single fold model using EfficientNet has a CV score of 0.898 and a leaderboard score of ~0.8.</p>
<

# Uncomment below cell if you want to create a separate context field in the dataframe. It helps to work on mutliple rows easily without hardcoding row number.

In [8]:
#data['context'] = ("Competition Name: " + data['competition_name'] + \
#    ",\nYear: " + data['year'].astype(str) + \
#    ",\nPlace: " + data['place'].astype(str) + \
#    ",\nMethods Used: " + data['methods'] + \
#    ",\nSolution: " + data['writeup'])

#context = data['context'][1]
#print(context)

Function to generate summary based on the chat prompt template. 

Since we are utilizing Gemma 2b-it, an instruction-tuned version or chat-oriented variant of a Language Model (LM), it is well-suited for conversational tasks or when provided with chat prompt templates.


In [9]:
def generate_summary():
    prompt_template = f"""Provide summary of following context in 500 words. 

    Provide only useful information: 
    
    Context: {context}"""


    messages = [
        {"role": "user", "content": prompt_template},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    input_ids = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt").to('cuda')
    
    outputs = model.generate(input_ids, max_new_tokens=900)
    
    response = tokenizer.decode(outputs[0])
    return response

# Regex funtion to post-process the output.

You can use any other methods to post-process the output if you find that better.

In [10]:
import re
def extract_content(text):
    # Find the index of '<start_of_turn>model'
    index = text.find('<start_of_turn>model')

    # Extract the content after '<start_of_turn>model'
    if index != -1:
        content_after_model = text[index + len('<start_of_turn>model'):].strip()
    else:
        return "Content not found after '<start_of_turn>model'"
    return content_after_model

# Generate Summary after post-process the model generated output

In [11]:
summary = generate_summary()
final_summary = extract_content(summary)
print(final_summary)

Sure, here's a summary of the context in 500 words:

The context describes the development and training of an EfficientNet-B0 model for isolated sign language recognition. The model is trained on a single fold from 8 randomly split folds, using a combination of common and augmentation augmentations.

**Data Preprocessing:**

* 80 lip and 21 hand points are extracted from the image.
* Various augmentations and normalizations are applied to the extracted features.
* Motion features are included, including future and history motion.
* Finger tree rotation is used to generate features for the fingers.

**Training:**

* EfficientNet-B0 is trained on one fold with a random split.
* A onecycle scheduler with 0.1 warmup is used.
* Weighted CrossEntropyLoss is used to improve the model's performance.
* A hypercolumn is implemented for EfficientNet.
* BERT and DeBERTa are trained on the full dataset.

**Ensemble:**

* An ensemble of models is created by aggregating the outputs of the individual 

# Beautify the output by applying Markdown display library

In [None]:
from IPython.display import Markdown as md
md(final_summary)