<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_class8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative AI
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/index.html)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 8 Assignment: Kaggle-like Assignment**

**Student Name: Your Name**

# Assignment Instructions

One of the major assignments in this course is a Kaggle competition. Each semester, I create a new dataset for each course I teach to try. Kaggle competitions typically provide two files: testing and training datasets. I will give you two files for this assignment, which you can see here.

* [Training Dataset](https://data.heatonresearch.com/data/t81-559/assignments/riddles_train.csv)
* [Testing Dataset](https://data.heatonresearch.com/data/t81-559/assignments/riddles_test.csv)

You will make use of the training dataset to train your model. For this assignment, I do not suggest you train a model; instead, I recommend examining the training dataset to get an idea of the input and expected output. The training dataset will always contain both the input (riddle, in this case) and the expected output (answer, in this case). The testing dataset has no "answer" column, which is the expected output. You will use the riddle to predict the answer for the testing dataset.
The training dataset looks like the following:

|riddle|answer|
|---|---|
|I am tall when I am young, and short when I am old. What am I?|candle|
|What has keys but can't open locks?|piano|
|I have branches, but no fruit, trunk or leaves. What am I?|bank|
|...|...|

Notice how the file provides the answer for the training dataset.

The testing dataset looks like the following:

|riddle|
|---|
|What has to be broken before you can use it?|
|The more you have of it, the less you see. What is it?|
|What gets wetter the more it dries?|
|...|

Notice how it provides the riddles but not their answers.

|riddle|answer|
|---|---|
|What has to be broken before you can use it?|egg|
|"The more you have of it, the less you see. What is it?"|darkness|
|What gets wetter the more it dries?|towel|
|...|...|

Your assignment is to use an LLM to determine the answer for each item in the training dataset. You are to produce a submission data frame that will look like the following:

Please keep in mind the following for this assignment.

You will need to craft your prompt so that your answer is something like "banana," not "a banana" or "a banana."
You do not need to use agents or tools.
You do not need to train a model; look at the training data to get an idea of the answers. You will submit based on the test data set.


# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```.

In [24]:
import os

try:
    from google.colab import drive, userdata
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Mounted at /content/drive
Note: using Google CoLab


# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems.

**It is unlikely that should need to modify this function.**

In [25]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io
from io import BytesIO

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# course - The course that you are in, currently t81-558 or t81-559.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,course,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/wu/submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'course':course, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #3 Sample Code

The following code provides a starting point for this assignment.

In [None]:
import os
import pandas as pd
from scipy.stats import zscore
import string
from langchain.prompts import ChatPromptTemplate


# This is your student key that I emailed to you at the beginnning of the semester.
key = "uTtH5yNbPs9tjdjdsBf9V9FaQA9RU2iP5cL7F3zH"

# You must also identify your source file.  (modify for your local setup)
file='/content/drive/MyDrive/Colab Notebooks/assignment_solution_class8.ipynb'  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\assignments\\assignment_yourname_class3.ipynb'  # Windows
# file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class8.ipynb'  # Mac/Linux

# Begin assignment

df = pd.read_csv("https://data.heatonresearch.com/data/t81-559/assignments/riddles_test.csv")
df.head()

In [29]:
# Submit
submit(source_file=file,data=[df_submit],course='t81-559',key=key,no=8)

Success: Submitted Assignment 8 (t81-559) for jtheaton:
You have submitted this assignment 6 times. (this is fine)
