<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_t81_559_class5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative AI
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/index.html)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-559/).

**Module 5 Assignment: LangChain Data Extraction**

**Student Name: Your Name**

# Assignment Instructions

Make use of the LangChain CommaSeparatedListOutputParser to first find 10 programming languages. Then for each programming language find 10 features, produce a table similar to this.

| Language   | Feature 1                       | Feature 2                    | ...                           | Feature 10                |
|------------|---------------------------------|------------------------------|-------------------------------|---------------------------|
| Python     | Dynamic typing                  | Interpreted language          | ...                           | Object-oriented           |
| Java       | Object-oriented                 | Platform-independent          | ...                           | Robust exception handling |
| C++        | Object-oriented programming     | Strongly typed                | ...                           | Compile-time polymorphism |
| JavaScript | Dynamic typing                  | First-class functions         | ...                           | Extensive community support|
| Ruby       | Object-oriented                 | Dynamic typing                | ...                           | Cross-platform compatibility|
| Go         | Concurrency support             | Garbage collection            | ...                           | Fast compilation times    |
| Swift      | Type safety                     | Optionals                     | ...                           | Strong community support  |
| PHP        | Dynamic typing                  | Cross-platform compatibility  | ...                           | Wide range of frameworks and libraries |
| Rust       | Memory safety                   | Zero-cost abstractions        | ...                           | Excellent tooling         |
| Kotlin     | Null safety                     | Extension functions           | ...                           | Concise syntax            |










# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```.

In [1]:
import os

try:
  from google.colab import drive, userdata
  drive.mount('/content/drive', force_remount=True)
  COLAB = True
  print("Note: using Google CoLab")
except:
  print("Note: not using Google CoLab")
  COLAB = False

# Assignment Submission Key - Was sent you first week of class.
# If you are in both classes, this is the same key.
if COLAB:
  # For Colab, add to your "Secrets" (key icon at the left)
  key = userdata.get('T81_559_KEY')
else:
  # If not colab, enter your key here, or use an environment variable.
  # (this is only an example key, use yours)
  key = "Gx5en9cEVvaZnjhdaushddhuhhO4PsI32sgldAXj"

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Mounted at /content/drive
Note: using Google CoLab
Collecting langchain_openai
  Downloading langchain_openai-0.3.6-py3-none-any.whl.metadata (2.3 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading langchain_openai-0.3.6-py3-none-any.whl (54 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.9/54.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken, langchain_openai
Successfully installed langchain_openai-0.3.6 tiktoken-0.9.0


# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems.

**It is unlikely that should need to modify this function.**

In [2]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io
from typing import List, Union

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# course - The course that you are in, currently t81-558 or t81-559.
# no - The assignment class number, should be 1 through 10.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.

def submit(
    data: List[Union[pd.DataFrame, PIL.Image.Image]],
    key: str,
    course: str,
    no: int,
    source_file: str = None
) -> None:
    if source_file is None and '__file__' not in globals():
        raise Exception("Must specify a filename when in a Jupyter notebook.")
    if source_file is None:
        source_file = __file__

    suffix = f'_class{no}'
    if suffix not in source_file:
        raise Exception(f"{suffix} must be part of the filename.")

    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb', '.py']:
        raise Exception(f"Source file is {ext}; must be .py or .ipynb")

    with open(source_file, "rb") as file:
        encoded_python = base64.b64encode(file.read()).decode('ascii')

    payload = []
    for item in data:
        if isinstance(item, PIL.Image.Image):
            buffered = io.BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG': base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif isinstance(item, pd.DataFrame):
            payload.append({'CSV': base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
        else:
            raise ValueError(f"Unsupported data type: {type(item)}")

    response = requests.post(
        "https://api.heatonresearch.com/wu/submit",
        headers={'x-api-key': key},
        json={
            'payload': payload,
            'assignment': no,
            'course': course,
            'ext': ext,
            'py': encoded_python
        }
    )

    if response.status_code == 200:
        print(f"Success: {response.text}")
    else:
        print(f"Failure: {response.text}")

# Assignment #5 Sample Code

The following code provides a starting point for this assignment.

In [7]:
import os
import pandas as pd
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Ensure OpenAI API Key is set
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    raise ValueError("OpenAI API Key is missing. Set it using os.environ['OPENAI_API_KEY'] = 'your_key_here'")

llm = ChatOpenAI(model="gpt-4", temperature=0.3, openai_api_key=openai_api_key)

# Step 1: Identify 10 Programming Languages
prompt_lang = PromptTemplate(
    input_variables=[],
    template="List 10 popular programming languages."
)

parser = CommaSeparatedListOutputParser()
chain_lang = prompt_lang | llm | parser

# Ensure the response is processed correctly
languages = chain_lang.invoke({})

# Step 2: Retrieve 10 Features for Each Language
features_dict = {}

for lang in languages:
    prompt_features = PromptTemplate(
        input_variables=["language"],
        template="List 10 key features of the {language} programming language."
    )
    chain_features = prompt_features | llm | parser

    # Ensure we get exactly 10 features (trim or pad if needed)
    features = chain_features.invoke({"language": lang})

    # Trim to 10 if too many, pad if too few
    features = features[:10] if len(features) > 10 else features + [''] * (10 - len(features))

    features_dict[lang] = features

# Step 3: Convert to DataFrame
df_submit = pd.DataFrame.from_dict(features_dict, orient='index', columns=[f"Feature {i+1}" for i in range(10)])
df_submit.insert(0, "Language", df_submit.index)
df_submit.reset_index(drop=True, inplace=True)

# Save to CSV
df_submit.to_csv("programming_languages_features.csv", index=False)

# Display DataFrame in Google Colab
from IPython.display import display
display(df_submit)

# Step 4: Submit Assignment
file = "/content/drive/My Drive/Colab Notebooks/assignment_SongyuhaoShi_t81_559_class5.ipynb"
submit(source_file=file, data=[df_submit], course='t81-559', key=key, no=5)


Unnamed: 0,Language,Feature 1,Feature 2,Feature 3,Feature 4,Feature 5,Feature 6,Feature 7,Feature 8,Feature 9,Feature 10
0,1. Python,1. Readability: Python has a clean,easy-to-read syntax that makes it a great lang...,2. High-Level Language: Python is a high-level...,meaning it abstracts the complexity of the com...,allowing developers to focus more on the funct...,3. Dynamically Typed: Python is dynamically typed,which means the type is checked during runtime,not in advance. This feature makes Python more...,4. Object-Oriented: Python supports object-ori...,5. Interpreted Language: Python is an interpre...
1,2. Java,1. Object-Oriented: Java is an object-oriented...,which means it represents data as objects and ...,2. Platform Independent: Java is designed to r...,making it highly portable. This is due to its ...,"run anywhere"" philosophy.",3. Multithreaded: Java supports multithreading,which allows multiple sequences of code to run...,improving the efficiency of programs.,4. Robust: Java is designed to eliminate error...,5. Secure: Java provides a secure environment ...
2,3. JavaScript,1. Interpreted Language: JavaScript is an inte...,meaning it doesn't need to be compiled by the ...,as it is processed directly in the browser.,2. Object-Oriented: JavaScript supports object...,rather than classes (as in other object-orient...,3. Client-Side Scripting: JavaScript is primar...,enabling dynamic content on client browsers wi...,4. Event-Driven: JavaScript is event-driven,meaning it can execute code in response to use...,mouse movements
3,4. C++,1. Object-Oriented: C++ supports the object-or...,which includes concepts of classes,objects,inheritance,polymorphism,encapsulation,and abstraction.,2. Low-Level Manipulation: C++ allows for dire...,giving the programmer a high degree of control...,3. Performance: C++ is known for its high perf...
4,5. C#,1. Object-Oriented: C# is a fully object-orien...,meaning it supports the concepts of encapsulation,inheritance,and polymorphism.,2. Type-Safe: C# is a type-safe language,which means that it prevents type errors. This...,3. Interoperability: C# provides seamless inte...,4. Automatic Garbage Collection: C# automatica...,freeing the programmer from the need to manual...,5. Exception Handling: C# provides robust erro...
5,6. PHP,1. Open Source: PHP is an open-source language,which means it is free to download,use,and distribute. This makes it very popular amo...,2. Cross-Platform: PHP is a cross-platform lan...,meaning it can run on various operating system...,Linux,Unix,Mac OS X,etc.
6,7. Swift,1. Safe: Swift is designed to be a safe language,which means it helps developers avoid common p...,2. Fast and Powerful: Swift is built with perf...,it also lives up to its name: as stated on app...,Swift is 2.6x faster than Objective-C and 8.4x...,3. Modern: Swift incorporates modern programmi...,such as closures,multiple return types,and namespaces. It also supports inferred type...,4. Interactive: Swift includes Playgrounds
7,8. Ruby,1. Object-Oriented: Everything in Ruby is an o...,including primitive data types like numbers an...,2. Dynamic Typing: Ruby is a dynamically typed...,which means the type of a variable is checked ...,3. Garbage Collection: Ruby has a built-in gar...,which automatically manages the memory used by...,freeing up memory that is no longer needed.,4. Mixins: Ruby supports mixins,which is a way to include functionality from o...,5. Blocks
8,9. Go,1. Simplicity: Go is designed to be simple and...,2. Static Typing: Go is a statically typed lan...,which means the type of a variable is checked ...,3. Concurrency: Go has built-in support for co...,4. Garbage Collection: Go has a garbage collec...,5. Standard Library: Go has a rich standard li...,from handling files and network connections to...,6. Performance: Go is compiled to machine code,which makes it run faster compared to interpre...,making it suitable for high-performance applic...
9,10. TypeScript,1. Static Typing: TypeScript supports static t...,which allows for checking type correctness at ...,2. Object-Oriented Programming: TypeScript sup...,interfaces,and inheritance,which makes it a great tool for large-scale ap...,3. Type Inference: TypeScript can infer types ...,4. Optional Static Typing: TypeScript also sup...,which means you can choose when to enforce typ...,5. ES6 Feature Support: TypeScript includes mo...


Success: Submitted Assignment 5 (t81-559) for s.songyuhao:
This is your first submission of this assignment.
Based on the provided STUDENT CODE and STUDENT CSV DATA, here's the evaluation:
Item 1: Yes. The student output had the correct CSV headers (Language, Feature 1, Feature 2, ..., Feature 10).
Item 2: Yes. The student output contains 10 programming languages and their features.
Item 3: Yes. The STUDENT CODE made use of LangChain CommaSeparatedListOutputParser.
Item 4: Yes. The STUDENT CODE calls LLM to get a list of programming languages, then features of each language.
You addressed all items (4/4).


In [None]:
import os
import pandas as pd
from scipy.stats import zscore
import string

# You must identify your source file.  (modify for your local setup)
file="/content/drive/My Drive/Colab Notebooks/assignment_yourname_t81_559_class5.ipynb"  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_559_deep_learning\\assignments\\assignment_yourname_t81_559_class5.ipynb'  # Windows
# file='/Users/jheaton/projects/t81_559_deep_learning/assignments/assignment_yourname_t81_559_class5.ipynb'  # Mac/Linux

# Begin assignment

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
import pandas as pd

## ... continue your code...

## Submit assignment
submit(source_file=file,data=[df_submit],course='t81-559',key=key,no=5)