<a href="https://colab.research.google.com/github/rajatgarg01/LLM/blob/main/llm-prompting-and-intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

This command will install the google-generativeai package, ensuring the version is 0.8.3 or higher, and will suppress unnecessary output due to the -q flag. If you encounter any issues during installation, feel free to ask!

In [None]:
%pip install -U -q "google-generativeai>=0.8.3"

In [None]:
import google.generativeai as genai
from IPython.display import Markdown,display

HTML: The heading "Welcome to IPython Display Examples!" will be displayed in green color using HTML formatting.
Markdown: A bulleted list, link, code snippet, and blockquote are demonstrated using Markdown syntax.
display: This function is used to render both html_content and markdown_content in a single output cell.
This code should work seamlessly in a Jupyter Notebook, giving you a combined output of HTML and Markdown elements!

In [None]:
from kaggle_secrets import UserSecretsClient
GOOGLE_API_KEY=UserSecretsClient().get_secret("google_api_key")
genai.configure(api_key=GOOGLE_API_KEY)

The code snippet you've posted suggests that you are trying to access a Google API key securely using the UserSecretsClient and configure the genai library with that key. Here's a breakdown of how the code works:

Code Explanation
UserSecretsClient: This is often used in environments like Jupyter Notebooks (e.g., Azure Notebooks or JupyterHub) where secrets (like API keys) are stored securely. You retrieve a secret using the get_secret method, which fetches a secret (in this case, "GOOGLE_API_KEY").

API Configuration: The genai.configure() function is used to set up the google-generativeai library with the retrieved API key for making authenticated API requests

**FIRST PROMPT**

In [None]:
flash=genai.GenerativeModel("gemini-1.5-flash")
response=flash.generate_content("Explain AI to me like I'm a kid")
print(response.text)

In [None]:
Markdown(response.text)

***START A CHAT***

In [None]:
chat=flash.start_chat(history=[])
response=chat.send_message('Hello! my name is rajat garg')
print(response.text)

In [None]:
response=chat.send_message('can you tell me something intresting about you')
print(response.text)

In [None]:
Markdown(response.text)

In [None]:
response=chat.send_message('do you still remember my name')
print(response.text)

**CHOOSE A MODEL**

In [None]:
for model in genai.list_models():
    print(model.name)

Here's a description of each AI model you mentioned, categorized based on their functions:

*Chat Models*
models/chat-bison-001: A conversational AI model designed to handle general-purpose dialogues, chat interactions, and more complex conversation flows. It focuses on generating coherent and contextually relevant text responses for various scenarios.

*Text Generation Models*
models/text-bison-001: This model is tailored for text generation tasks. It specializes in creating or extending written content based on a prompt, making it suitable for storytelling, blog writing, or content creation tasks.

*Embedding Models*
models/embedding-gecko-001: An embedding model used for creating dense vector representations of textual data. It’s useful for tasks like similarity matching, semantic search, and clustering.

models/embedding-001: Another version of an embedding model for generating numerical embeddings from textual inputs. These embeddings are ideal for understanding the semantic meaning of texts.

models/text-embedding-004: A more advanced embedding model with improved semantic understanding, suitable for natural language processing tasks, including text classification and information retrieval.

*Gemini Models (Advanced Language Models)*
These models are iterations of more advanced AI systems, likely fine-tuned for specific domains or improved with newer techniques. They represent enhancements in generative AI with increased capabilities in understanding, generating, and interacting with complex information.

models/gemini-1.0-pro-latest: The latest iteration in the 1.0 series, optimized for complex tasks involving professional-quality text generation and understanding.

models/gemini-1.0-pro: A high-quality generative AI model designed for professional use, emphasizing accuracy, fluency, and depth in content creation.

models/gemini-pro: A generalized version focusing on professional-grade applications, suitable for business tasks or industry-specific needs.

models/gemini-1.0-pro-001: A specific version within the 1.0 series, fine-tuned for professional tasks that require more precision in outputs.

models/gemini-1.0-pro-vision-latest: A version that likely integrates visual understanding or data, making it capable of handling multimedia inputs like text + image.

models/gemini-pro-vision: This model suggests a focus on multimodal tasks—those requiring both visual and textual understanding.
Gemini 1.5 Models (Enhanced Capabilities)
*These are enhanced models with more advanced capabilities, offering improvements over the 1.0 series.*

models/gemini-1.5-pro-latest: The latest in the 1.5 series, optimized for professional, high-accuracy generative tasks.

models/gemini-1.5-pro-001: A specific version of the 1.5 series, tuned for precise, professional applications.

models/gemini-1.5-pro-002: Another variant in the 1.5 series, likely featuring more tweaks for professional contexts.

models/gemini-1.5-pro: A general-purpose model in the 1.5 series, focusing on accuracy and quality for professional use cases.

models/gemini-1.5-pro-exp-0801: An experimental model version released on August 1st, potentially testing new features or optimizations.

models/gemini-1.5-pro-exp-0827: Another experimental model from the 1.5 series, launched on August 27th, possibly fine-tuning specific aspects of
the model's capabilities.
*Flash Models (Fast and Efficient)*

These models focus on faster generation while maintaining good quality, making them suitable for scenarios requiring quick responses.

models/gemini-1.5-flash-latest: The most recent in the 1.5 Flash series, optimized for speed while preserving accuracy.

models/gemini-1.5-flash-001: An early version of the Flash series, focusing on quicker generation.

models/gemini-1.5-flash-001-tuning: A version specifically adjusted for particular tasks or datasets, allowing fine control over the generation
quality.

models/gemini-1.5-flash: A general-purpose model in the Flash category, balancing speed and quality.

models/gemini-1.5-flash-exp-0827: An experimental version of the Flash model released on August 27th, likely testing faster generation methods.

models/gemini-1.5-flash-002: A refined version in the 1.5 Flash series, improving on its predecessors.
*Flash 8B Models (Optimized with a Focus on Efficiency)*
These models are part of a Flash sub-category labeled "8B," possibly indicating a focus on efficiency or a particular architecture.

models/gemini-1.5-flash-8b: A model in the Flash 8B series, targeting quick and efficient content generation.

models/gemini-1.5-flash-8b-001: The first version in the 8B series, focusing on a balance between speed and content depth.

models/gemini-1.5-flash-8b-latest: The latest version in the Flash 8B line, aiming to improve efficiency.

models/gemini-1.5-flash-8b-exp-0827: An experimental 8B version released on August 27th, potentially introducing novel efficiencies.

models/gemini-1.5-flash-8b-exp-0924: An experimental release in the Flash 8B series from September 24th, aiming to explore advanced generation techniques.

*more info of model*

In [None]:
for model in genai.list_models():
    if model.name=='models/gemini-1.5-flash':
        print(model)
        break

***EXPLORE GENERATIVE PARAMETERS***

*OUTPUT LENGTH*

In [None]:
short_model=genai.GenerativeModel("gemini-1.5-flash",generation_config=genai.GenerationConfig(max_output_tokens=200))
responce=short_model.generate_content("write a 1000 word essay on the importance of AI")
Markdown(responce.text)

In [None]:
response=short_model.generate_content('write a short poem on life')
Markdown(response.text)

**TEMPERATURE**

Definition: Temperature modifies the probability distribution of the next word by "sharpening" or "softening" the model's predictions.

How It Works: The probability of each token is adjusted by raising it to the power of 1/temperature.
If the temperature is lower than 1 (e.g., 0.7), high-probability tokens become even more likely, making the output more deterministic.
If the temperature is higher than 1 (e.g., 1.5), the probability distribution flattens, making less likely tokens more probable and increasing randomness.

Impact:
Controls randomness: A high temperature (greater than 1) leads to more diverse and unpredictable outputs, while a low temperature (between 0 and 1) makes the output safer and more predictable.

Influences all tokens: Unlike top-k, temperature affects the entire probability distribution rather than just a subset.

Example: A temperature of 0.5 makes the model more likely to choose the highest-probability token, while a temperature of 1.5 allows more variety by considering a wider range of tokens

In [None]:
from google.api_core import retry
high_temp_model=genai.GenerativeModel("gemini-1.5-flash",generation_config=genai.GenerationConfig(temperature=2.0))
retry_policy={"retry":retry.Retry(predicate=retry.if_transient_error,initial=10,multiplier=1.5,timeout=300)}## Retry if a transient error occurs
                                  # Start with an initial delay of 10 seconds
                              # Multiply the delay by 1.5 after each retry
                                   # Set a maximum total time of 300 seconds (5 minutes)
for _ in range(5):
    response=high_temp_model.generate_content("pick a random colour .....(responce in a single word)")
    if responce.parts:
        print(response.text,'_'*25)


In [None]:
from google.api_core import retry
low_temp_model=genai.GenerativeModel("gemini-1.5-flash",generation_config=genai.GenerationConfig(temperature=0.0))
retry_policy={"retry":retry.Retry(predicate=retry.if_transient_error,initial=10,multiplier=1.5,timeout=300)}## Retry if a transient error occurs
                                  # Start with an initial delay of 10 seconds
                              # Multiply the delay by 1.5 after each retry
                                   # Set a maximum total time of 300 seconds (5 minutes)
for _ in range(5):
    response=low_temp_model.generate_content("pick a random colour .....(responce in a single word)")
    if responce.parts:
        print(response.text,'_'*25)

**TOP-K AND TOP-P**

Top-k and Top-p (also known as nucleus sampling) are two common techniques used in language models (LLMs) to control randomness and diversity in text generation. These sampling methods help generate more coherent and contextually relevant outputs by controlling how the model selects words during generation.

1. Top-k Sampling
Definition: In top-k sampling, the model considers only the top k most probable tokens (words) when generating the next word in a sentence. The rest of the words are ignored.

How it works: After the model predicts a probability distribution for the next token, only the top k tokens (with the highest probabilities) are kept, and the remaining are filtered out. Then, one of the k tokens is selected randomly based on their probabilities.
Effect: It ensures that only the most likely tokens are considered, which limits randomness but can make the text more predictable if k is small.
Example:

If k=5, only the top 5 most likely tokens are kept for random selection. This prevents the model from picking a highly unlikely token.
2. Top-p Sampling (Nucleus Sampling)
Definition: In top-p sampling, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to a threshold p.

How it works: Instead of selecting a fixed number of tokens, it considers the most probable tokens until their cumulative probability reaches p (e.g., p=0.9). The tokens are then sampled randomly based on their probabilities.
Effect: This method allows dynamic control over how many tokens are considered, leading to more diverse outputs. When p is high (close to 1), more tokens are considered, increasing randomness.
Example:

If p=0.9, tokens are chosen until their cumulative probability is 90%. This set may contain a different number of tokens for each step, depending on the prediction.
Key Differences
Top-k: Limits the number of tokens considered, making it a fixed and deterministic cut-off. It provides more control over generation by setting a hard limit.
Top-p: Uses a probability threshold, leading to a dynamic number of options. This gives more flexibility in how many words are considered, depending on the prediction's confidence

In [None]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        # These are the default values for gemini-1.5-flash-001.
        temperature=1.0,
        top_k=64,
        top_p=0.95,
    ))

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = model.generate_content(story_prompt, request_options=retry_policy)
print(response.text)

**Explore available models**

You will be using the embedContent API method to calculate embeddings in this guide. Find a model that supports it through the models.list endpoint. You can also find more information about the embedding models on the models page.

text-embedding-004 is the most recent embedding model, so you will use it for this exercise.

In [None]:
for m in genai.list_models():
    if "embedContent" in m.supported_generation_methods:
        print(m.name)