<a href="https://colab.research.google.com/github/majidraza1228/LLM/blob/LLM_Tech_Class/mpt_7b_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MPT-7B-chat

---

🚨 **Note: this must be run on a GPU. If you run this on a CPU, even a very fast one, it can take many hours to answer a single question!**

---

In [None]:
!pip install --upgrade pip

In [None]:
!pip uninstall torchaudio torchvision torch -y

In [None]:
!pip install torchaudio==2.5.0

In [None]:
!pip install transformers accelerate einops langchain xformers

In [None]:
from torch import cuda, bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'


tokenizer = AutoTokenizer.from_pretrained("mosaicml/mpt-7b-chat",
                                          trust_remote_code=True)

config={"init_device": "meta"}
model = AutoModelForCausalLM.from_pretrained("mosaicml/mpt-7b-chat",
                                             trust_remote_code=True,
                                             config=config,
                                             torch_dtype=bfloat16)

print(f"device={device}")
print('model loaded')

In [None]:
# Use the GPU
model.to(device)

In [5]:
import time
from IPython.display import Markdown
import torch
from transformers import StoppingCriteria, StoppingCriteriaList

# mtp-7b is trained to add "<|endoftext|>" at the end of generations
stop_token_ids = [tokenizer.eos_token_id]

# define custom stopping criteria object.
# Source: https://github.com/pinecone-io/examples/blob/master/generation/llm-field-guide/mpt-7b/mpt-7b-huggingface-langchain.ipynb
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor,scores: torch.FloatTensor,
                 **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

def ask_question(question, max_length=100):
    start_time = time.time()

    # Encode the question
    input_ids = tokenizer.encode(question, return_tensors='pt')

    # Use the GPU
    input_ids = input_ids.to(device)

    # Generate a response
    output = model.generate(
        input_ids,
        max_new_tokens=max_length,
        temperature=0.9,
        stopping_criteria=stopping_criteria
    )

    # Decode the response
    response = tokenizer.decode(output[:, input_ids.shape[-1]:][0],
                                skip_special_tokens=True)

    end_time = time.time()
    duration = end_time - start_time

    display(Markdown(response))

    print("Function duration:", duration, "seconds")

In [7]:
# Ask a question
#ask_question("What is the capital of France?", 200)
ask_question("write python code that converts a csv into a pdf", 400)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




import csv
import reportlab.pdfgen

# open the csv file
with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    
    # loop through the csv file
    for row in csvreader:
        # create a new PDF document
        pdf = reportlab.pdfgen.PDFGenerator()
        
        # add a page to the PDF document
        pdf.addPage()
        
        # loop through the row
        for col in row:
            # add text to the PDF document
            pdf.setFont('Helvetica', 12)
            pdf.drawString(100, 750, col)
        
        # save the PDF document
        pdf.save('output.pdf')
        
        # add a blank line to the PDF document
        pdf.setFont('Helvetica', 12)
        pdf.drawString(100, 750, '')
        
        # move the PDF writer to the next page
        pdf.setPage(pdf.getNumPages() + 1)
```

This code reads in a CSV file, creates a new PDF document, and loops through each row in the CSV file, adding text to the PDF document for each column in the row. It then saves the PDF document and moves on to the next row. The resulting PDF document will have one page per row in the CSV file.

To use this code, you will need to have the reportlab library installed. You can install it using pip:

```
pip install reportlab
```

You will also need to have a CSV file with the appropriate data. You can create a CSV file using a text editor or spreadsheet program, such as Microsoft Excel.

Once you have a CSV file and the reportlab library installed, you can run the Python

Function duration: 510.75560235977173 seconds
