### Installation of Required Packages

In this section, we are installing two necessary packages:
- `indoxGen`: This package will be used for generating text or other tasks associated with the Indox library.
- `python-dotenv`: This package is used to load environment variables from a `.env` file, which can help in securely managing sensitive information like API keys.

```bash
!pip install indoxGen --upgrade
!pip install python-dotenv


In [1]:
!pip install indoxGen --upgrade
!pip install python-dotenv

Collecting indoxGen
  Downloading IndoxGen-0.0.2-py3-none-any.whl.metadata (10 kB)
Collecting loguru (from indoxGen)
  Downloading loguru-0.7.2-py3-none-any.whl.metadata (23 kB)
Downloading IndoxGen-0.0.2-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading loguru-0.7.2-py3-none-any.whl (62 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: loguru, indoxGen
Successfully installed indoxGen-0.0.2 loguru-0.7.2
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


### Importing Modules and Loading Environment Variables

In this cell, we are importing the necessary modules and loading the environment variables:
- `os`: A standard Python module used here to retrieve environment variables.
- `load_dotenv` from `dotenv`: This function loads environment variables from a `.env` file into the program's environment.
- `IndoxApi` from `indoxGen.llms`: This class will be used to interact with the Indox API for generating text or other machine learning-related tasks.

After loading the environment variables with `load_dotenv()`, we fetch the `INDOX_API_KEY` from the environment using `os.getenv()`. This key is necessary for authentication with the Indox API.



In [2]:
import os
from dotenv import load_dotenv
from indoxGen.llms import IndoxApi
load_dotenv()
INDOX_API_KEY = os.getenv("INDOX_API_KEY")

### Data Generation from a User Prompt

In this cell, we are generating a dataset based on a user-defined prompt using the IndoxGen API:
- **Imports**: We import `DataFromPrompt` and `DataGenerationPrompt` from `indoxGen.synthCore` to handle the prompt-based data generation.
- **User Prompt**: A string prompt is defined by the user to specify what kind of dataset to generate. In this case, the prompt is asking for a dataset with 3 columns and 3 rows related to astronomy.
- **IndoxApi Object**: We instantiate the `IndoxApi` class, passing the `INDOX_API_KEY` for authentication.
- **Data Generation Instruction**: The method `DataGenerationPrompt.get_instruction` is used to generate the appropriate instruction based on the user prompt.
- **DataFromPrompt Object**: We create an instance of `DataFromPrompt` to handle the data generation. We pass:
  - The prompt name.
  - Arguments including the `llm` (language model), number of generations (`n`), and the instruction.
  - The output type (`generations`).
- **Running the Data Generation**: The `run()` method is used to generate the dataset, and the result is printed to verify the output.
- **Saving to Excel**: Finally, the generated dataset is saved to an Excel file named `output_data.xlsx`.


In [35]:
from indoxGen.synthCore import DataFromPrompt
from indoxGen.synthCore import DataGenerationPrompt


user_prompt = "Generate a dataset with 4 column and 7 row about astronomy."

LLM = IndoxApi(api_key=INDOX_API_KEY)
instruction = DataGenerationPrompt.get_instruction(user_prompt)

data_generator = DataFromPrompt(
    prompt_name="sample Prompt",
    args={
        "llm": LLM,
        "n": 1,
       "instruction": instruction,
    },
    outputs={"generations": "generate"},

)

generated_df = data_generator.run()

print(generated_df)
data_generator.save_to_excel("output_data.xlsx")


[32mINFO[0m: [1mGenerated DataFrame with shape: (7, 4)[0m
    Celestial_Object  Distance_Light_Years  Discovery_Year  \
0             Quasar                  12.5            1970   
1       Neutron Star                   0.8            1967   
2          Red Giant                2000.0            1920   
3         Black Hole               10000.0            1783   
4  Supernova Remnant                5000.0            1987   
5             Pulsar                3000.0            1964   
6          Exoplanet                 150.0            1995   

        Notable_Feature  
0       High luminosity  
1            Dense core  
2  Expanding atmosphere  
3         Event horizon  
4     Remnant of a star  
5        Rapid rotation  
6       Orbiting a star  
[32mINFO[0m: [1mDataFrame saved to Excel file at: output_data.xlsx[0m


In [36]:
generated_df

Unnamed: 0,Celestial_Object,Distance_Light_Years,Discovery_Year,Notable_Feature
0,Quasar,12.5,1970,High luminosity
1,Neutron Star,0.8,1967,Dense core
2,Red Giant,2000.0,1920,Expanding atmosphere
3,Black Hole,10000.0,1783,Event horizon
4,Supernova Remnant,5000.0,1987,Remnant of a star
5,Pulsar,3000.0,1964,Rapid rotation
6,Exoplanet,150.0,1995,Orbiting a star


### Loading an Excel Dataset and Adding a New Row via a User Prompt

In this cell, we are loading a dataset from an Excel file and generating a new row based on the given dataset and a user prompt:
- **Imports**: In addition to `DataFromPrompt` and `DataGenerationPrompt`, we import the `Excel` class from `indoxGen.synthCore` to handle Excel file loading.
- **Loading the Dataset**: We specify the path to the Excel file (`output_data.xlsx`) and load the dataset into a DataFrame using the `Excel` class.
- **User Prompt**: A user prompt is defined to instruct the model to generate one unique row about astronomy based on the existing dataset.
- **Instruction for Data Generation**: We generate the data generation instruction using `DataGenerationPrompt.get_instruction`.
- **DataFromPrompt Object**: We create an instance of `DataFromPrompt` where:
  - We pass the loaded dataset as `dataframe`.
  - The `llm` (language model), number of generations (`n`), and instruction are also passed.
  - The output type is defined as `generations`.
- **Generating the New Row**: The `run()` method generates a new row, and the updated dataset is printed to verify the addition.
  

In [39]:
from indoxGen.synthCore import DataFromPrompt
from indoxGen.synthCore import DataGenerationPrompt
from indoxGen.synthCore import Excel

dataset_file_path = "output_data.xlsx"

excel_loader = Excel(dataset_file_path)
df = excel_loader.load()
user_prompt = " based on given dataset generate 4 unique row to existense of about astronomy"
LLM = IndoxApi(api_key=INDOX_API_KEY)

instruction = DataGenerationPrompt.get_instruction(user_prompt)

dataset = DataFromPrompt(
    prompt_name="Generate",
    args={
        "llm": LLM,
        "n": 1,
        "instruction": instruction,
    },
    outputs={"generations": "generate"},
    dataframe=df
)
updated_df = dataset.run()
print(updated_df)


[32mINFO[0m: [1mGenerated DataFrame with shape: (11, 4)[0m
     Celestial_Object  Distance_Light_Years  Discovery_Year  \
0         Brown Dwarf                  50.0            1995   
1     Gamma-Ray Burst                   9.5            1997   
2         White Dwarf                  25.0            1924   
3              Nebula                1300.0            1785   
4              Quasar                  12.5            1970   
5        Neutron Star                   0.8            1967   
6           Red Giant                2000.0            1920   
7          Black Hole               10000.0            1783   
8   Supernova Remnant                5000.0            1987   
9              Pulsar                3000.0            1964   
10          Exoplanet                 150.0            1995   

          Notable_Feature  
0             Failed star  
1       Intense radiation  
2         Cooling remnant  
3   Star formation region  
4         High luminosity  
5           

In [40]:
updated_df

Unnamed: 0,Celestial_Object,Distance_Light_Years,Discovery_Year,Notable_Feature
0,Brown Dwarf,50.0,1995,Failed star
1,Gamma-Ray Burst,9.5,1997,Intense radiation
2,White Dwarf,25.0,1924,Cooling remnant
3,Nebula,1300.0,1785,Star formation region
4,Quasar,12.5,1970,High luminosity
5,Neutron Star,0.8,1967,Dense core
6,Red Giant,2000.0,1920,Expanding atmosphere
7,Black Hole,10000.0,1783,Event horizon
8,Supernova Remnant,5000.0,1987,Remnant of a star
9,Pulsar,3000.0,1964,Rapid rotation



### Few-Shot Learning for Dataset Generation with Examples

In this cell, we use a few-shot learning approach to generate a dataset based on provided examples and a user prompt:
- **Examples Definition**: We define a list of examples where each example includes an `input` (a user prompt) and its corresponding `output` (a dataset in JSON format). These examples help guide the model in understanding the structure of the desired output for new prompts.
- **User Prompt**: We define a user prompt asking to generate a dataset with 3 columns and 2 rows about astronomy.
- **FewShotPrompt Object**: We instantiate the `FewShotPrompt` class to generate the dataset. The arguments passed include:
  - The `llm` (language model) initialized with the `IndoxApi` object.
  - The instruction, which in this case is the `user_prompt`.
  - The list of examples to guide the model's generation process.
  - The output type (`generations`).
- **Running the Data Generation**: We use the `run()` method to generate the dataset, which is then printed to verify the output.
- **Saving to Excel**: The generated dataset is saved to an Excel file named `output_data.xlsx`.


In [7]:
from indoxGen.synthCore import FewShotPrompt

examples = [
    {
        "input": "Generate a dataset with 3 columns and 2 rows about biology.",
        "output": '[{"Species": "Human", "Cell Count": 37.2, "Age": 30}, {"Species": "Mouse", "Cell Count": 3.2, "Age": 2}]'
    },
    {
        "input": "Generate a dataset with 3 columns and 2 rows about chemistry.",
        "output": '[{"Element": "Hydrogen", "Atomic Number": 1, "Weight": 1.008}, {"Element": "Oxygen", "Atomic Number": 8, "Weight": 15.999}]'
    }
]

user_prompt = "Generate a dataset with 3 columns and 2 rows about astronomy."
#instruction = DataGenerationPrompt.get_instruction(user_prompt)
LLM = IndoxApi(api_key=INDOX_API_KEY)

data_generator= FewShotPrompt(
    prompt_name="Generate Astronomy Dataset",
    args={
        "llm": LLM,
        "n": 1,
        "instruction": user_prompt,
    },
    outputs={"generations": "generate"},
    examples=examples
)

generated_df = data_generator.run()

print(generated_df)
data_generator.save_to_excel("output_data.xlsx",generated_df)


  Celestial Body  Diameter (km)  Distance from Sun (million km)
0          Earth          12742                           149.6
1           Mars           6779                           227.9
[32mINFO[0m: [1mDataFrame saved to Excel file at: output_data.xlsx[0m


In [8]:
generated_df

Unnamed: 0,Celestial Body,Diameter (km),Distance from Sun (million km)
0,Earth,12742,149.6
1,Mars,6779,227.9


### Data Generation with Attributed Prompts

In this cell, we use `DataFromAttributedPrompt` to generate a dataset based on a prompt that includes attributes for customization:
- **LLM Initialization**: The `IndoxApi` object is instantiated using the API key loaded from the environment variable.
- **Attributes Definition**:
  - We define a template for the instruction: `"Write a {tone} email about {topic}."`
  - The `attributes` parameter allows us to provide options for `tone` (formal, casual) and `topic` (meeting, project update), which the model will use to generate variations in the emails.
- **DataFromAttributedPrompt Object**: We instantiate the `DataFromAttributedPrompt` class with the following arguments:
  - `prompt_name`: A descriptive name for the prompt (here, `"EmailPrompt"`).
  - `args`: The attributes, including the instruction template, the specific values for `tone` and `topic`, and the language model (`llm`).
  - `outputs`: We leave this as an empty dictionary.
- **Running the Data Generation**: The `run()` method is called to generate the email data, which is stored in a DataFrame (`df`).
- **Saving to Excel**: The generated emails are saved to an Excel file named `generated_emails.xlsx`.


In [11]:
from indoxGen.synthCore import DataFromAttributedPrompt
from indoxGen.llms import IndoxApi
import os

# Load API key
LLM = IndoxApi(api_key=os.getenv("INDOX_API_KEY"))

# Define attributes
args = {
    "instruction": "Write a {tone} email about {topic}.",
    "attributes": {
        "tone": ["formal", "casual"],
        "topic": ["meeting", "project update"]
    },
    "llm": LLM
}

# Create an instance of DataFromAttributedPrompt
data_generator = DataFromAttributedPrompt(prompt_name="EmailPrompt",
                                          args=args,
                                          outputs={})

# Generate data
df = data_generator.run()

# Save to Excel
data_generator.save_to_excel("generated_emails.xlsx", df)

[32mINFO[0m: [1mGenerated 4 prompts from attributes.[0m
[32mINFO[0m: [1mRunning prompt: Write a formal email about meeting.[0m
[32mINFO[0m: [1mRunning prompt: Write a formal email about project update.[0m
[32mINFO[0m: [1mRunning prompt: Write a casual email about meeting.[0m
[32mINFO[0m: [1mRunning prompt: Write a casual email about project update.[0m
[32mINFO[0m: [1mGenerated DataFrame with 4 records.[0m
[32mINFO[0m: [1mDataFrame saved to Excel file at: generated_emails.xlsx[0m


In [19]:
df['response'][1]

"Subject: Project Update\n\nDear [Recipient's Name],\n\nI hope this message finds you well. I am writing to provide you with an update on the progress of [Project Name].\n\nAs of today, we have successfully completed the following milestones:\n\n1. **[Milestone 1]**: Brief description of what was achieved and any relevant details.\n2. **[Milestone 2]**: Brief description of what was achieved and any relevant details.\n3. **[Milestone 3]**: Brief description of what was achieved and any relevant details.\n\nLooking ahead, we are currently working on [next steps or upcoming milestones]. We anticipate that this phase will be completed by [expected completion date]. \n\nWe have encountered some challenges, specifically [briefly describe any challenges], but we are actively addressing these issues by [explain how you are resolving them].\n\nPlease feel free to reach out if you have any questions or require further details regarding the project. I appreciate your continued support and collab