# Performing Prompt Engineering:
        Identifying best Prompt to interact with Language model which results in meaningful output.

### Please fill below variables with path to your example to execute 
        1. OpenAI_api_key
        2. path_to_inference_notebook - sample notebook for zero shot prompting
        3. path_to_example  - example path for few shot prompting
        4. human_grade  - human provided grade for above example
        5. human_feedback - human provided feedback for above example

In [3]:
!pip3 install -q torch openai 

You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [7]:
!pip3 install -q langchain nbformat 

You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.[0m


### Note: Uses nbformat to parse Jupyter Notebook

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage
from langchain.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)
from utils import read_notebook,extract_cells,format_for_gpt, create_prompt_example



## Reading data

In [51]:
path_to_inference_notebook = ""
notebook = read_notebook(path_to_inference_notebook)
extracted_cells = extract_cells(notebook)
formatted_data = format_for_gpt(extracted_cells)

## Model creation

### Keeping the same temperature

In [12]:
OpenAI_api_key = ""
chat = ChatOpenAI(temperature=0, model_name="gpt-4", openai_api_key=OpenAI_api_key)


## Simple prompting

In [83]:
messages = [
    SystemMessage(
        content=""" You are a helpful assistant that evaluate python program and grad from 1-100 
        and explain why you gave such grade. make sure to provide your response in bullet points"""
    ),
    HumanMessage(
        content=  formatted_data )
]
response = chat(messages)

In [84]:
import re
overall_grade = re.search(r"(\d+) out of 100", response.content)
points = re.findall(r"- (.+?)\s+\(\+?(-?\d+) points\)", response.content)
print(f"Overall Grade: {overall_grade.group(1)} out of 100")
for point, grade_change in points:
    print(f"Feedback: {point}\nGrade points: {grade_change} points")

Overall Grade: 90 out of 100
Feedback: The code is well-structured and organized into different cells, each performing a specific task. This makes the code easy to read and understand.
Grade points: 20 points
Feedback: The code uses pandas, a powerful data analysis library in Python, to perform operations on the data. This is a good choice for handling data in Python.
Grade points: 20 points
Feedback: The code correctly calculates the average number of words in each review, the count of reviews by year-month, and the average rating of any review marked as "cool". The logic used in these calculations is correct.
Grade points: 30 points
Feedback: The code uses the `apply` function to apply a function to each element in a column of the DataFrame. This is a good use of pandas functionality.
Grade points: 10 points
Feedback: The code uses the `groupby` function to group the data by year-month and count the number of reviews in each group. This is a good use of pandas functionality.
Grade po

## Instruction prompting

In [85]:
messages = [
    SystemMessage(
        content=""" You are a grading bot that will help a teaching assistant grade student submissions for a programming task. Please follow the
                instructions below to grade the student submission. In your assessment, you must always fill in the following bullet points:
                Grade: 0-100
                Summarised feedback: A short one or two sentence summary of your assessment.
                Possible errors: Write any errors that you might find in a bullet point list. If you find noerrors, write "None". If there are several,
                which there very well might be, please list them all.
                
                You shall under no circumstanses comment on unnecessary improvements, only those who are relevant to the task. Only comment on mistakes
                that makes the code fail the requirements of the task. Elements such as effiency is not relevant if its not explicitly a requirement
                of the task."""
    ),
    HumanMessage(
        content=  formatted_data )
]

In [86]:
response = chat(messages)

In [87]:
grade = re.search(r"Grade: (\d+)", response.content)
summarised_feedback = re.search(r"Summarised feedback: (.+)", response.content)
possible_errors = re.search(r"Possible errors: (.+)", response.content)

# Displaying the results
grade = grade.group(1) if grade else "Unknown"
summarised_feedback = summarised_feedback.group(1) if summarised_feedback else "No feedback provided"
possible_errors = possible_errors.group(1) if possible_errors else "No errors listed"

print(f"Grade: {grade}")
print(f"Summarised Feedback: {summarised_feedback}")
print(f"Possible Errors: {possible_errors}")

Grade: 100
Summarised Feedback: The student has successfully completed all the tasks as per the requirements. The code is correct and produces the expected output.
Possible Errors: None


## Few shot prompting

### Note: I consider 3 fewshot examples

In [7]:
example_1,human_feedback_example_1 = create_prompt_example(path_to_example="",human_grade="",human_feedback="")
example_2,human_feedback_example_2 = create_prompt_example(path_to_example="",human_grade="",human_feedback="")
example_3,human_feedback_example_3 = create_prompt_example(path_to_example="",human_grade="",human_feedback="")


examples = [
    {"input": example_1, "output": human_feedback_example_1},
    {"input": example_2, "output": human_feedback_example_2},
    {'input': example_3, "output": human_feedback_example_3}
]

In [10]:
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("user", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    
    input_variables=["input"],
    examples=examples,
    example_prompt= example_prompt
    )

In [136]:
print(few_shot_prompt.format())

Human: {
    "cells": [
        {
            "type": "code",
            "content": "!pip install MRJob\n!pip install python-dateutil"
        },
        {
            "type": "code",
            "content": "from google.colab import drive\ndrive.mount('/content/drive')\nimport pandas as pd\ndata = pd.read_csv('/content/drive/MyDrive/bigdata/yelp.csv')"
        },
        {
            "type": "code",
            "content": "%%file average_words_per_review.py\nfrom mrjob.job import MRJob\nclass MRAverageWordsPerReview(MRJob):\n    def mapper(self, _, line):\n        # Assuming each line is a review.\n        words = line.split()\n        yield \"review\", len(words)\n    def reducer(self, key, values):\n        total_words = 0\n        total_reviews = 0\n        for value in values:\n            total_words += value\n            total_reviews += 1\n        average_words = total_words / total_reviews\n        yield key, average_words\nif __name__ == '__main__':\n    MRAverageWordsPerRev

### Few shot without context(Rubic)

In [138]:
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that evaluate python program and grad from 1-100 and \
         explain why you gave such grade. make sure to provide your response in bullet points"),
        few_shot_prompt,
        ("user", "{input}"),
    ]
)
chain = final_prompt | chat
response_without_context = chain.invoke({"input": formatted_data})
points = response_without_context.content.split("\n")
for point in points:
    print(f"{point}")

Grade: 70

Feedback:

- The code is well-structured and easy to read.
- The use of pandas for data manipulation is appropriate and efficient.
- The logic used to solve the problems is correct and the code produces the expected results.
- However, the code is not scalable. It works well for small datasets but for larger datasets, it would be more efficient to use a map-reduce framework like Hadoop or Spark.
- The code could be improved by adding more comments to explain the logic and the steps.
- The code could also be improved by defining functions for repeated code blocks.
- The code does not handle exceptions or errors. It would be better to add error handling to make the code more robust.
- The code does not have any unit tests. It would be better to add tests to ensure the code works as expected.


### few shot with context

In [53]:
Instruction = ""
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that evaluate python program and grad from 1-100 and explain why you gave such grade. make sure to provide your response in bullet points\
         " + Instruction
),
        few_shot_prompt,
        ("user", "{input}"),
    ]
)
chain = final_prompt | chat
response_without_context = chain.invoke({"input": formatted_data})
points = response_without_context.content.split("\n")
for point in points:
    print(f"{point}")

Grade: 20, summarized Feedback: No map/reduce logic has been written to do the task. DataFrames are not allowed


        So, I have inferenced a gpt-4 model to identify best prompting way to get the desired results on Evaluating student assignments. 

        Hence performing few-shot prompting has showed significant improvement but still the model focused more coding standards.

        To improve further, provided content on what the model shoud be looking for in grading the assignments. With that I was able to achieve the desired output. 

        Using this Prompt with context(Instruction) for Fine Tuning