# Enable GPU
* Go to "Runtime" -> "Change Runtime Type"
* Choose "T4 GPU" under "Hardware Accelerator"
* Click "Save"



# Install Libraries

In [None]:
!pip install accelerate



The accelerate library optimizes how our code interacts with the GPU, making the most of Colab's hardware.

# LLM setup and configuration

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Specify the LLM model we'll be using
model_name = "microsoft/Phi-3-mini-4k-instruct"

# Configure for GPU usage
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)

# Load the tokenizer for the chosen model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a pipeline object for easy text generation with the LLM
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Importing Libraries: We bring in required modules from the hugging face (hf) transformers library to work with the language model.
Specifing LLM to use: We specify the hf repository [microsoft/Phi-3-mini-4k-instruct](<https://huggingface.co/microsoft/Phi-3-mini-4k-instruct>) to load the instruct LLM from.
Loading the Model and Tokenizer: We load the LLM and its associated tokenizer, which prepares them for use.
Creating a Pipeline: We set up a hf pipeline object to streamline sending prompts and getting responses from the LLM.

# Configure LLM generation parameters

In [None]:
# Parameters to control LLM's response generation
generation_args = {
    "max_new_tokens": 512,     # Maximum length of the response
    "return_full_text": False,      # Only return the generated text
}

This code defines settings to manage the LLM's text generation behavior.

max_new_tokens: limits the response length.
return_full_text: ensures we only get the newly generated part of the response.

# Builing the query function

In [None]:
def chat(input):

  messages = [
      {"role": "user", "content": input}
  ]

  output = pipe(messages, **generation_args)
  return output[0]['generated_text']

In [None]:
def prompt_llm_task(task_description):
    # Construct a prompt that instructs the LLM to perform a specific task
    prompt = f"""Please perform the following task:

{task_description}

Provide your response in a clear and concise manner."""

    # Use the chat function to get the LLM's response
    response = chat(prompt)
    return response

# Example usage:
task = "Explain the concept of quantum computing in simple terms."
result = prompt_llm_task(task)
print(result)

# You can also use it in a loop for multiple tasks:
tasks = [
    "Write a short poem about artificial intelligence.",
    "Summarize the key points of climate change.",
    "Provide three tips for effective time management."
]

for task in tasks:
    print(f"\nTask: {task}")
    result = prompt_llm_task(task)
    print(f"Response: {result}")

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


 Quantum computing is a type of computing that uses the principles of quantum mechanics, which is the science of the very small particles that make up our universe. Unlike traditional computers that use bits (0s and 1s) to process information, quantum computers use quantum bits, or qubits. These qubits can exist in multiple states at once, thanks to a property called superposition. This allows quantum computers to process a vast amount of possibilities simultaneously, making them potentially much more powerful than classical computers for certain tasks. Another key feature of quantum computing is entanglement, which allows qubits that are entangled to be correlated with each other even when separated by large distances. This can lead to faster processing and more secure communication. Quantum computing is still in the early stages of development, but it has the potential to revolutionize fields like cryptography, drug discovery, and complex problem-solving.

Task: Write a short poem ab

In [7]:
#Assigment 2 Write a Gen AI agent to finish Assignment 1

tasks = [
    "I need you to write code for Jupyter notes that reads a rawdata.tsv file from my Google Drive",
    "Write code to Mount my Google Drive in Colab",
    "Write code toRead the rawdata.tsv file",
    "Write code to Read the headers.txt file",
    "Write code to Combine the data",
    "Write code to Write to CSV",
    "Write code toClean the code"
]

for task in tasks:
    print(f"\nTask: {task}")
    result = prompt_llm_task(task)
    print(f"Response: {result}")


Task: I need you to write code for Jupyter notes that reads a rawdata.tsv file from my Google Drive
Response:  To read a `rawdata.tsv` file from your Google Drive in a Jupyter notebook, you'll need to use the `gdown` library to download the file and then read it using pandas. Here's a step-by-step guide:

1. First, install the `gdown` library if you haven't already. You can do this by running the following command in your terminal or command prompt:

```
pip install gdown
```

2. Next, you'll need to get the direct link to your `rawdata.tsv` file on Google Drive. You can do this by right-clicking on the file and selecting "Get shareable link."

3. Once you have the direct link, you can use the following code in your Jupyter notebook to download and read the file:

```python
import pandas as pd
import gdown

# Replace this with your Google Drive file link
file_link = 'https://drive.google.com/uc?id=FILE_ID'

# Download the file from Google Drive
file_path = 'rawdata.tsv'
gdown.download