<a href="https://colab.research.google.com/github/mayureshpawashe/IO_Test/blob/main/IO_Test_GenAI_Mayuresh_Pawashe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain transformers torch

In [None]:
!pip install --upgrade langchain langchain-community transformers torch


###Coding: Generate text using GPT-2

In [8]:
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2", max_length=100)
llm = HuggingFacePipeline(pipeline=generator)
prompt = "Once upon a time in a futuristic world,"
output = llm.invoke(prompt)

print(output)


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time in a futuristic world, a man and robot (the same entity which killed the girl in the first game) meet in various parts of the universe and go into a battle with each other. As a side effect, a man (the same entity responsible for the death of the protagonist and her siblings) also became the antagonist of this story.

For many, this was one of the most influential series from the series (to the point that for many of us, the ending


###Coding: Tokenize a sentence using AutoTokenizer

In [9]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
sentence = "Python is my favorite language"
tokens = tokenizer(sentence)
print(tokens)


{'input_ids': [37906, 318, 616, 4004, 3303], 'attention_mask': [1, 1, 1, 1, 1]}


In [10]:
print("Token IDs:", tokens["input_ids"])
print("Tokens:", tokenizer.convert_ids_to_tokens(tokens["input_ids"]))

Token IDs: [37906, 318, 616, 4004, 3303]
Tokens: ['Python', 'Ġis', 'Ġmy', 'Ġfavorite', 'Ġlanguage']


###Coding: Generate embeddings using GPT-3.5

In [None]:
!pip install -q langchain langchain-openai


In [19]:
from google.colab import userdata
from langchain_openai import OpenAIEmbeddings
api_key = userdata.get("OPENAI_API_KEY")

# Load OpenAI Embeddings (GPT-3.5-turbo)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", openai_api_key=api_key)
text = "Hello my name is mayuresh this is test message"
embedding_vector = embeddings.embed_query(text)


print(embedding_vector)


[0.01270431000739336, -0.02485966868698597, 0.03761626034975052, 0.00220560934394598, -0.0209647249430418, -0.05609763041138649, 0.002003020141273737, 0.010220956988632679, -0.00910998322069645, -0.04216471686959267, 0.0007776815327815711, -0.0008287372766062617, -0.03382587805390358, 0.017448820173740387, 0.04720984399318695, 0.029774092137813568, -0.05165373906493187, 0.013580018654465675, -0.024598263204097748, 0.028153378516435623, 0.0364399328827858, -0.001816768548451364, 0.07633042335510254, 0.04603351652622223, 0.016429338604211807, -0.04279208928346634, -0.01778864860534668, 0.016024161130189896, 0.0260621327906847, -0.06979528069496155, -0.0060548060573637486, -0.018089264631271362, 0.005564670544117689, 0.008064361289143562, -0.02147446572780609, -0.014390375465154648, 0.008992350660264492, 0.012050796300172806, -0.023500358685851097, -0.021696660667657852, -0.025813797488808632, -0.03918469324707985, 0.008195064030587673, 0.006205114535987377, -0.0055058542639017105, -0.008

###Coding: Summarize text using a pre-trained model

In [24]:
!pip install groq

import os
from groq import Groq
from google.colab import userdata

api_key = userdata.get("GROQ_API_KEY")
client = Groq(api_key=api_key)

# Input text
text = """
Data Engineering is an essential skill in modern data processing.
It involves designing and building systems that collect, store, and analyze large datasets efficiently.
With the rise of big data, data engineers play a crucial role in ensuring data pipelines run smoothly.
"""

# Summarization prompt
summary_prompt = f"Summarize the following text concisely:\n{text}"
summary_response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "system", "content": "You are a helpful AI assistant."},
              {"role": "user", "content": summary_prompt}]
)

summary = summary_response.choices[0].message.content

question_prompt = f"Generate a relevant question based on this summary: {summary}"
question_response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "system", "content": "You are a helpful AI assistant."},
              {"role": "user", "content": question_prompt}]
)

question = question_response.choices[0].message.content

print("Summary:", summary)
print("\nQuestion:", question)


Summary: Data Engineering involves designing and building systems to efficiently collect, store, and analyze large datasets, playing a crucial role in modern data processing.

Question: What are some key skills and technologies that a data engineer should possess to effectively design and build systems for handling large datasets in modern data processing environments?


###Short Answer: What is fine-tuning in Generative AI?

Fine-tuning in Generative AI is the process of customizing a pre-trained model for a specific task by training it on new data. Instead of starting from scratch, the model is refined using domain-specific information, making it more accurate and efficient. This process adjusts the model’s weights based on the new dataset, improving its performance in targeted areas. Fine-tuning is much faster and more cost-effective than training an AI model from the ground up. It is widely used in industries like healthcare, finance, and customer service to make AI more specialized. For example, fine-tuning GPT-3.5 on legal documents helps it provide better legal advice. Many companies fine-tune AI assistants, chatbots, and recommendation systems to improve user experience. The process requires high-quality labeled data and computational resources to achieve optimal results. Fine-tuned models perform significantly better in their specialized domains than generic models. This approach helps businesses and researchers create AI solutions tailored to their needs.