## Install Gemini GenerativeAI

In [1]:
!pip install google-generativeai



In [2]:
import google.generativeai as genai

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
f = open("keys/.googleapi.txt")
key = f.read()
genai.configure(api_key = key)

### Available Gemini Models

In [5]:
for m in genai.list_models():
    print(m.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash
models/embedding-001
models/text-embedding-004
models/aqa


### Prompt to the Gemini model

In [7]:
model = genai.GenerativeModel(model_name="gemini-1.5-flash")
user_prompt = """ Chandrayan is a space mission of ISRO from india. can you tell"""
response = model.generate_content(user_prompt)
print(response.text)

ISRO stands for **Indian Space Research Organisation**. 



In [8]:
user_prompt = """Generate some factual information to complete the following in 2-3 lines:
                ISRO is india's space station and it """
response = model.generate_content(user_prompt)
print(response.text)

ISRO is India's **space research organization**, and it is responsible for developing and operating India's space program. ISRO has launched numerous satellites and conducted several successful missions, including the Chandrayaan lunar missions and the Mangalyaan mission to Mars. 



### Adding a System Prompt

**Important Note:** System Prompt can be specified using <mark style="background-color: lightblue;">system_instruction</mark>. <mark style="background-color: lightblue;">system_instruction </mark> is not enabled for models/gemini-pro.

In [13]:
model = genai.GenerativeModel(model_name= "gemini-1.5-flash",
                              system_instruction= """Generate some factual information to complete the user input. 
                              Completion must have maximum 2-3 lines.""")
user_prompt = """In our solar system, Earth is a"""
response = model.generate_content(user_prompt)
print(response.text)
                              


In our solar system, Earth is a **rocky planet**. It's the third planet from the Sun and the only one known to support life. 



## Important Parameters

If you run the above code few times, you will notice that the output changes across runs. Generative models are **non-deterministic**. This means that even with the same input they can produce different outputs. This behavior allows for creativity and diversity in the generated outputs, which can be great when trying to generate different creative styles. There are parameters which can help us control this behavior like temperature, top_p, etc...

* **candidate_count:** This controls the number of responses that will be generated for a single prompt. Default value is 1. Increasing this will generate more text responses. This increase the resource usage.
* **stop_sequence:** It allows to specify a list of strings that will act as stopsigns for the model.
* **max_output_tokens:** This is the maximum number of tokens the model will generate in the response.
* **temperature:** It act as a control knob that influences the randomness of the model's output. A higher temperature value will result in a more varied and creative response. Lower values would be more effective in returning predictable results with an LLM.
* **top_p:** Range from [0.0, 1.0]. This is also known a nucleus sampling. The LLM only considers the next word options that cumulatively add up to a probability of reaching or exceeding the top_p value. A higher value will create looser threshold. This will allow the model to consider a wider range of probable options while still prioritizing the most likely ones. A lower top_p value will create a stricter threshold, leading to less diverse and more predictable outputs.
* **top_k:** This parameter limits the number of possible next words to the k most probable options based on the probability distribution. A lower k value restricts the selection to a smaller pool of the most likely words, leading to less diverse and more predictable outputs.

Both <mark>top_p</mark> and <mark>top_k</mark> works in conjunction with the <mark>temperature</mark> parameter.

In [17]:
model = genai.GenerativeModel("gemini-1.5-flash")

# Setting our parameters
custom_config = genai.types.GenerationConfig(max_output_tokens=500, temperature=1.0)

user_prompt = """What is overfitting in data science? Explain in detail."""

# Passing our custom parameters to the generate_content method
response = model.generate_content(user_prompt, generation_config=custom_config)

print(response.text)

## Overfitting in Data Science: A Detailed Explanation

Overfitting is a common problem in machine learning, occurring when a model learns the training data too well, capturing even the noise and random fluctuations. This results in a model that performs exceptionally well on the training data but poorly on unseen data, making it useless for real-world applications.

**Here's a detailed breakdown:**

**1. The Ideal Scenario:**

- We aim to build models that generalize well, meaning they can accurately predict outcomes on new, unseen data.
- The model should capture the underlying patterns and relationships within the data, not just memorize the specific instances in the training set.

**2. Overfitting: The Problem:**

- Overfitting occurs when a model learns the training data too well, including its noise and outliers.
- It creates a model that is highly complex, often with many parameters, resulting in a "memorization" of the training data rather than understanding its underlying patt

In [18]:
# Setting our parameters
custom_config = genai.types.GenerationConfig(temperature=0.1, top_p=0.1, top_k=32)

user_prompt = """What is feature selection in data science? Explain in detail."""

# Passing our custom parameters to the generate_content method
response = model.generate_content(user_prompt, generation_config=custom_config)

print(response.text)

## Feature Selection: The Art of Choosing the Right Variables

In data science, feature selection is a crucial process that involves **identifying and selecting the most relevant features (variables) from a dataset** for building a predictive model. It's like choosing the right ingredients for a recipe – the right features can significantly improve model performance, while irrelevant ones can lead to noise and overfitting.

**Why is Feature Selection Important?**

* **Improved Model Performance:** By removing irrelevant or redundant features, we reduce noise and complexity, leading to more accurate and robust models.
* **Reduced Overfitting:** Overfitting occurs when a model learns the training data too well, failing to generalize to new data. Feature selection helps prevent this by simplifying the model.
* **Faster Training and Inference:** Fewer features mean less data to process, resulting in faster model training and prediction times.
* **Enhanced Interpretability:** Models with fe