In [1]:
from notebook_helpers import mprint

# Prompt Templates

A prompt-template is just a way to build different prompts to send to a chat model, based on pre-defined use-cases.

# DocSearchTemplate

A document search template is an object that searches a `DocumentIndex` based on a query, and inserts `n_docs` documents into the prompt, along with additional wording to the model asking it to use the information provided from the docs to answer the question.

In [2]:
from llm_workflow.indexes import ChromaDocumentIndex
from llm_workflow.prompt_templates import DocSearchTemplate

doc_index = ChromaDocumentIndex()
prompt_template = DocSearchTemplate(doc_index=doc_index, n_docs=1)

Here's the default prompt-template used by `DocSearchTemplate`:

In [3]:
mprint(prompt_template.template)


Answer the question at the end of the text as truthfully and accurately as possible, based on the following information provided.

Here is the information:

```
{{documents}}
```

Here is the question:

```
{{prompt}}
```


Let's add documents to our document index.

**If we pass a list of documents to `doc_index`, the `__call__` method will pass the list to the `add()` method. If we pass a string or Document to `dock_index`, the `__call__` method will pass the value to the `search()` method.**

In [4]:
from llm_workflow.base import Document

docs = [
    Document(
        content="The greatest basketball player of all time is Michael Jordan",
        metadata={'id': 1}
    ),
    Document(
        content="The greatest three point shooter of all time is Steph Curry.",
        metadata={'id': 0}
    ),
    Document(
        content="The greatest hockey player of all time is Wayne Gretzky.",
        metadata={'id': 2}
    ),
]
# passing list[Document] is equivalent of calling `doc_index.add(docs)`
doc_index(docs)

In [5]:
# passing a string (or Document) is equivalent of calling `doc_index.search(value)`
doc_index("Who is the greatest 3-point shooter of all time?", n_results=1)

[Document(content='The greatest three point shooter of all time is Steph Curry.', metadata={'id': 0, 'distance': 0.35710838437080383})]

Now, let's construct our prompt. The `DocSearchTemplate` object will retrieve the most relevant document (from the `ChromaDocumentIndex` object) based on the value we send it, and then inject that document into the prompt. Because we set `n_docs=1` above, it will only include one Document.

In [6]:
prompt = prompt_template("Who is the greatest 3-point shooter of all time?")
mprint(prompt)


Answer the question at the end of the text as truthfully and accurately as possible, based on the following information provided.

Here is the information:

```
The greatest three point shooter of all time is Steph Curry.
```

Here is the question:

```
Who is the greatest 3-point shooter of all time?
```


---

# PythonObjectMetadataTemplate 

In [10]:
from llm_workflow.base import Workflow
from llm_workflow.openai import OpenAIChat
from llm_workflow.prompt_templates import PythonObjectMetadataTemplate, MetadataMetadata
import pandas as pd
from notebook_helpers import mprint

my_credit_df = pd.read_csv('/code/tests/test_data/data/credit.csv')

workflow = Workflow(tasks=[
    PythonObjectMetadataTemplate(metadatas=[
        MetadataMetadata(obj=my_credit_df, object_name='my_credit_df'),
    ]),
    OpenAIChat(),
])

response = workflow("Create a graph using plotly express of checking account balance and the duration of the loan.?")
mprint(response)

To create a graph using plotly express of checking account balance and the duration of the loan, you can use the `plotly.express` library in Python. You can use the `scatter` function to create a scatter plot with the checking account balance on the x-axis and the duration of the loan on the y-axis.

Here's an example of how you can achieve this:

```python
import plotly.express as px

# Assuming my_credit_df is the DataFrame containing the data
fig = px.scatter(my_credit_df, x='checking_balance', y='months_loan_duration', title='Checking Account Balance vs. Loan Duration')
fig.show()
```

This code will create a scatter plot with the checking account balance on the x-axis and the duration of the loan on the y-axis, using the data from the `my_credit_df` DataFrame.

In [15]:
mprint(workflow.history()[0].prompt)


Answer the question at the end of the text as truthfully and accurately as possible. Use the metadata of the python objects as appropriate. Tailor your response according to the most relevant objects. Don't use the if they don't appear relevant.

Here is the metadata:

```
A pd.DataFrame `my_credit_df` that contains the following columns with the following types of values:

`checking_balance`: [<class 'str'>]
`months_loan_duration`: [<class 'int'>]
`credit_history`: [<class 'str'>]
`purpose`: [<class 'str'>]
`amount`: [<class 'int'>]
`savings_balance`: [<class 'str'>]
`employment_duration`: [<class 'str'>]
`percent_of_income`: [<class 'int'>]
`years_at_residence`: [<class 'int'>]
`age`: [<class 'int'>]
`other_credit`: [<class 'str'>]
`housing`: [<class 'str'>]
`existing_loans_count`: [<class 'int'>]
`job`: [<class 'str'>]
`dependents`: [<class 'int'>]
`phone`: [<class 'str'>]
`default`: [<class 'str'>]

The following numeric columns contain the following summary statistics:

                       count      mean          std  ...     50%      75%      max
months_loan_duration  1000.0    20.903    12.058814  ...    18.0    24.00     72.0
amount                1000.0  3271.258  2822.736876  ...  2319.5  3972.25  18424.0
percent_of_income     1000.0     2.973     1.118715  ...     3.0     4.00      4.0
years_at_residence    1000.0     2.845     1.103718  ...     3.0     4.00      4.0
age                   1000.0    35.546    11.375469  ...    33.0    42.00     75.0
existing_loans_count  1000.0     1.407     0.577654  ...     1.0     2.00      4.0
dependents            1000.0     1.155     0.362086  ...     1.0     1.00      2.0

[7 rows x 8 columns]

The following non-numeric columns contain the following unique values and corresponding value counts:

`checking_balance`: {'unknown': 394, '< 0 DM': 274, '1 - 200 DM': 269, '> 200 DM': 63}
`credit_history`: {'good': 530, 'critical': 293, 'poor': 88, 'very good': 49, 'perfect': 40}
`purpose`: {'furniture/appliances': 473, 'car': 337, 'business': 97, 'education': 59, 'renovations': 22, 'car0': 12}
`savings_balance`: {'< 100 DM': 603, 'unknown': 183, '100 - 500 DM': 103, '500 - 1000 DM': 63, '> 1000 DM': 48}
`employment_duration`: {'1 - 4 years': 339, '> 7 years': 253, '4 - 7 years': 174, '< 1 year': 172, 'unemployed': 62}
`other_credit`: {'none': 814, 'bank': 139, 'store': 47}
`housing`: {'own': 713, 'rent': 179, 'other': 108}
`job`: {'skilled': 630, 'unskilled': 200, 'management': 148, 'unemployed': 22}
`phone`: {'no': 596, 'yes': 404}
`default`: {'no': 700, 'yes': 300}

```

----

Here is the question:

```
Create a graph using plotly express of checking account balance and the duration of the loan.?
```


---

In [19]:
from llm_workflow.base import Workflow
from llm_workflow.openai import OpenAIChat
from llm_workflow.prompt_templates import PythonObjectMetadataTemplate, MetadataMetadata
import pandas as pd
from notebook_helpers import mprint

my_credit_df = pd.read_csv('/code/tests/test_data/data/credit.csv')

workflow = Workflow(tasks=[
    PythonObjectMetadataTemplate(metadatas=[
        MetadataMetadata(obj=my_credit_df, object_name='my_credit_df'),
    ]),
    OpenAIChat(),
])

response = workflow("Which columns are going to be the best predictors of `default`? Show me the code for logistic regression using statsmode. One hot encode any non-numeric columns and print the summary of coefficients.")
mprint(response)

Based on the metadata provided, the columns that are likely to be the best predictors of `default` are `checking_balance`, `credit_history`, `purpose`, `savings_balance`, `employment_duration`, `other_credit`, `housing`, `job`, and `phone`.

Here's the code for logistic regression using statsmodels to analyze the predictors of `default`:

```python
import pandas as pd
import statsmodels.api as sm
from sklearn.preprocessing import OneHotEncoder

# One hot encode non-numeric columns
non_numeric_columns = ['checking_balance', 'credit_history', 'purpose', 'savings_balance', 'employment_duration', 'other_credit', 'housing', 'job', 'phone']
encoded_df = pd.get_dummies(my_credit_df, columns=non_numeric_columns, drop_first=True)

# Define the independent variables (predictors) and the dependent variable
X = encoded_df.drop('default', axis=1)
y = encoded_df['default']

# Add a constant to the independent variables
X = sm.add_constant(X)

# Fit logistic regression model
logit_model = sm.Logit(y, X)
result = logit_model.fit()

# Print the summary of coefficients
print(result.summary())
```

This code will perform logistic regression using statsmodels, one hot encode the non-numeric columns, and print the summary of coefficients for the logistic regression model.

---