# Welcome to our playground!

It's wonderful to be here
It's certainly a thrill
You're such a lovely audience
We'd like to take you home with us
We'd love to take you home!!

## Now let's first see if we can connect to the bank!

To do so we need to install some packages and set up the sql extension for Jupyter. We will also set some variables required by the code. 

***Nota Bene - This might take some time and there will be no output! - > once a number appears in between the brackets [] the command is done!***

In [1]:
# Installing some python packages to talk SQL
! pip install -q ipython-sql sqlalchemy psycopg2-binary -q

import requests

project_id= requests.get("http://metadata/computeMetadata/v1/project/project-id", headers={'Metadata-Flavor': 'Google'}).text
full_zone_string = requests.get("http://metadata/computeMetadata/v1/instance/zone", headers={'Metadata-Flavor': 'Google'}).text
zone_name = full_zone_string.split("/")[3]
region = zone_name[:-2]

We can jump right into it now - adjust the following cell with the password you created for the postgres user!

In [None]:
%sql postgresql://postgres:kcg6a2jg@192.168.142.72:5432/postgres

No news is good news! But to be sure, let's run a query:

In [None]:
%sql SELECT * FROM pg_catalog.pg_user

If all went according to plan you will see a table with the results of the SQL query. Feel free to play around with this and query other stuff if you want!

## So we can connect to the data, but where is the AI goodness? 

Right here! But we will start with something simple - let's see if we can predict the next transaction for a certain user! Let's make a dataset out of the transactions database!

In [None]:
# Import SQLAlchemy & Pandas
import pandas as pd
from sqlalchemy import create_engine
# Define the engine to use
engine = create_engine("postgresql://postgres:kcg6a2jg@192.168.142.72:5432/transactions-db")
table_name = 'transactions'

# Capture the table into a dataframe!

table_df = pd.read_sql_table(
    table_name,
    con=engine
)
# And store the dataframe as a csv file

table_df.rename(columns={'from_acct':'from account','to_acct':'to account'}, inplace=True)
table_df.to_csv("all_transactions.csv")

# Split the data in data to train and data the model and data to train the model

test_df = table_df.iloc[:10,:]
train_df = table_df.iloc[10:,:]

test_df.to_csv("test_transactions.csv")
train_df.to_csv("train_transactions.csv")


## AutoGluon

Like we mentioned we are starting with something small, using Autogluon. Autogluon will allow you to easily train a model based on a small amount of data (we have only 66 transactions unless you went crazy and added a lot more previously..)

Let's first install AutoGluon.

***Nota Bene - This might take some time and there will be no output! - > once a number appears in between the brackets [] the command is done!***

In [None]:
# Install and Import autogluon
! pip install autogluon -q
from autogluon.tabular import TabularDataset, TabularPredictor

We've split the transactions from our table into test and training data and will now start to train a model! The model we are training will be used to predict how much money (amount) will be moved around in the next transaction,
based on the other characteristics of the transaction. Isn't that exciting? 

In [None]:
train_data = TabularDataset('train_transactions.csv')
predictor = TabularPredictor(label='amount').fit(train_data)

In [None]:
# Optional - evaluate the model
test_data = TabularDataset('test_transactions.csv')
predictor.evaluate(test_data, silent=True)

Now we we have a model and evaluated it (not the most amazing stats I know...) we can do a prediction. We'll use the test data, but feel free to create your own test set!

In [None]:
y_pred = predictor.predict(test_data.drop(columns=['amount']))
print(y_pred)

Well it made some predictions, but if you compare the values it predicted to the original in the test data well..kinda meh. Let's see if we can improve this by doing something more fancy!

## LLMs

By now we hope we don't have to explain to you what an LLM is - but if that is the case we'd suggest you'd ask ChatGPT! 

Normally you'd turn to a LLM for anything related to language, however they aren't half bad at working with numbers and structured data either.

### Chatting with Bison. Or is it Bisons..

Let's set up a connection to our PALM2 text model - called bison-text, where bison says something about the size of the model. PALM2 is the name of the model (like GPT 1/2/3/4) that powers both Vertex AI search and conversations and for instance Bard.

Because we want to know things about our transactions we will use the transactions file as the context when asking the LLM questions. This way it will know how to react and answer.

***Nota Bene - This might take some time and there will be no output! - > once a number appears in between the brackets [] the command is done!***

In [2]:
#  Stuff we need to import
# 
! pip install langchain -q
import time
import io
from typing import List
from os import listdir
import numpy as np
import math

# Langchain
import langchain
from langchain.document_loaders.csv_loader import CSVLoader

# Vertex AI
from google.cloud import aiplatform


In [3]:
# Now where would the fun be if we could not just import the csv file...we load it and add each row as a separate document (something for the LLM to read!)
loader = CSVLoader(file_path='all_transactions.csv', encoding="utf-8", csv_args={'delimiter': ','})
data = loader.load()

Now below we will define how to connect to the Google LLM. **This will produce no output**

In [4]:
import vertexai
from vertexai.language_models import TextGenerationModel

parameters = {
"temperature": 1,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
   
def predict_text(prompt, **parameters):
    vertexai.init(project=project_id, location=region)
    model = TextGenerationModel.from_pretrained("text-bison-32k")
    prompt_response = model.predict(prompt,**parameters)
    return prompt_response.text

command = "below are some rows with example transactions predict the amount and the to for the next transcation"
text = data
prompt = """
{command}
{text}
""".format(text=text, command=command)

### The Season Finale

And the moment has arrived, run the following cells to make a prediction! If you have the time, feel free to experiment with the setting the cell above this one, some ideas:

- See what changing top_p and top_k into other values do
- Change the commmand, maybe you can enable it to make a better prediction? Or completly change the prompt and let it do something else, like create more test data?
- Maybe you can ask it to explain why it made the prediction?
- Always be nice to your LLM!

In [None]:
print(predict_text(prompt))