# AI / ML Easy Button 
## *Basics using Review Data*
Welcome to the AI/ML Easy Button. This notebook will allow you to explore to explore Snowflake's out of the box functions and understand how they work with custom prompts and real data in both SQL and Python.  
[LLM Docs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions)

![](https://www.fatherhood.org/hs-fs/hubfs/Images/Blog/easy-button.png?width=585&name=easy-button.png)

In [None]:
# !!! Add packages in the upper right under packages --- this code needs 'snowflake-ml'
# Import python packages
import streamlit as st
import pandas as pd
import json

# We can also use Snowpark for our analyses!
from snowflake.snowpark import functions as F
from snowflake.cortex import Complete, Sentiment, Summarize, Translate
from snowflake.snowpark.context import get_active_session
session = get_active_session()


In [None]:
--first create your own schema to work in!
create schema if not exists MY_NAME ;
use schema MY_NAME;
---this should be your name!!
select current_database(), current_schema();

### Translate  
https://docs.snowflake.com/en/sql-reference/functions/translate-snowflake-cortex

In [None]:
-- TRANSLATE
-- Translate text from Reviews

---call the in built SQL function, with text, from language and to language
select
    snowflake.cortex.translate(
        'NBA 2K23 oferece conteúdo divertido adicionando novos modos à série, além do modo Challenge de mais de 10 horas, também é um prazer jogar estações da NBA em diferentes épocas.',
        'pt',
        'en');

In [None]:
-- SQL Translate from Any Text to English...leave from language blank to infer it
select
    snowflake.cortex.translate(
        '마이크로 트랜잭션이 매우 공격적임에도 불구하고 모든 스포츠 게임 팬에게 NBA 2K23를 안전하게 추천할 수 있습니다. 게임 플레이가 너무 좋고 향후 업데이트에서 문제가 해결되지 않기를 바랍니다.',
        '',
        'en');

In [None]:
#PYTHON IS EASY TOO...Now we use the same function directly from Python
Translate(
    "Detta är det bästa videospelet någonsin!",
    "",
    "en"
)

In [None]:
--we can run this against millions of records easily, effeciently and securly
select * from LAB_DATA.PUBLIC.REVIEWS where IS_ENGLISH = 0;

In [None]:
--we can run this against millions of records easily, effeciently and securly
select
    REVIEW_TEXT,
    snowflake.cortex.translate(
        REVIEW_TEXT,
        '',
        'en'
    ) as review_in_english
from
    LAB_DATA.PUBLIC.REVIEWS
    where IS_ENGLISH = 0;

In [None]:
###can turn cell results into snowpark dataframes... or Pandas ones too
sp_df = sample_reviews.to_df()
sp_df.show(3)
pd_df = sample_reviews.to_pandas()
pd_df.head(3)

### Sentiment 
https://docs.snowflake.com/en/sql-reference/functions/sentiment-snowflake-cortex

In [None]:
select
    REVIEW_TEXT,
    snowflake.cortex.sentiment(
        snowflake.cortex.translate( ---chained together, translate and sentiment
        REVIEW_TEXT,
        '',
        'en'
        ) 
    ) as review_sentiment
from
    LAB_DATA.PUBLIC.REVIEWS

### Summarize  
https://docs.snowflake.com/en/sql-reference/functions/summarize-snowflake-cortex

In [None]:
select
    snowflake.cortex.summarize('WWE 2K24 is clear proof that 2K and Visual Concepts have certainly hit their stride when it comes to making today’s package better on the whole than yesterday’s. While not drastically different, there are enough new little features added to every inch of this iteration that make it well worth climbing back into the ring. Old enemies still have their number, though, such as making its docuseries Showcase mode feel good to actually play, creating a more consistent tone and pacing in MyRise, and getting more of the current day roster up to the high visual bar that’s currently only hit by its most popular superstars. But even with those lingering annoyances, WWE 2K24 puts an impressive cap on a three year run of great wrestling games. ');

### COMPLETE  
https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex

In [None]:
select
   REVIEW_TEXT, snowflake.cortex.complete(
        'llama3-8b',
        '[INST]
### 
Tell me based on the following game reviews, will they recommend the game to \
their friends and family? Answer should be only one of the following words - \
"Likely" or "Unlikely" or "Unsure". Make sure there are no additional additional text.
Review -
###' || REVIEW_TEXT) as CLEANED_REVIEWS
     from  LAB_DATA.PUBLIC.REVIEWS limit 5

In [None]:
---Need to leverage an LLM... this is as easy as SELECT too. Models are hosted and managed by Snowflake in a secure & scalable environment, just call the built-in function.   
select
    snowflake.cortex.complete(
        'mistral-large',
        'What are Take-Two\'s three most popular games?  Return the answer in consice a JSON array object with NAME and RANK'
    );

In [None]:
#Similarly we can call this easily from Python
topTitles = Complete(
        "mistral-large",
        "What are Take-Two's three most popular games?  Display answer in consice a JSON array object with NAME and RANK"
    )
#can manipulate this json object
topTitlesJSON = json.loads(topTitles)
for title in topTitlesJSON:
    print(title["RANK"]+' '+title["NAME"])

In [None]:
select * from LAB_DATA.PUBLIC.GAMES limit 10

In [None]:
---SQL to run the LLM over a table
select
    title,
    summary,
    snowflake.cortex.complete(
        'llama3-8b',
        CONCAT(
            'You\'re a game critic. I really like role playing games but I don\'t like violience. Based on the summary between <summary> and </summary>, can you tell me if 
            this would be a good game for me? <summary>', summary, '</summary>'
        )
    ) as answer
from
    LAB_DATA.PUBLIC.VW_GAMES
where developer = 'Rockstar Games'
limit 5;

In [None]:
#PYTHON to run the LLM over a table. 
df = session.table("LAB_DATA.PUBLIC.VW_GAMES").filter(F.col('DEVELOPER') == 'Rockstar Games').select(
    F.col("title"),
    Complete(
        "llama3-8b",
        F.concat(
            F.lit("""Based on the value between the <context> and </context> tags, please provide me with a list
            of 3 similar games. Return the result as only a JSON array.  Be conscise.
            <context>"""),
            F.col("summary"),
            F.lit("</context>"))
    ).alias("similar_titles")
).limit(5)
df.show()

In [None]:
###easily save to table in SQL or Python
df.write.mode("overwrite").save_as_table("GAMES_SIMILAR")

In [None]:
select * from GAMES_SIMILAR limit 5;

### EMBED_TEXT
https://docs.snowflake.com/sql-reference/functions/embed_text-snowflake-cortex

In [None]:
select
    snowflake.cortex.embed_text_768(
        'snowflake-arctic-embed-m',
        'I love video games'
    );

In [None]:
select
    summary,
    snowflake.cortex.embed_text_768(
        'snowflake-arctic-embed-m',
        summary
    ) as summary_embedding
from
    LAB_DATA.PUBLIC.GAMES
limit 10;

### VECTOR DISTANCE CALCULATIONS  
https://docs.snowflake.com/en/sql-reference/functions/vector_cosine_similarity

In [None]:
select
    vector_cosine_similarity(
        snowflake.cortex.embed_text_768('snowflake-arctic-embed-m', 'California Contemporary style'),
        snowflake.cortex.embed_text_768('snowflake-arctic-embed-m', 'California Contemporary style homes')
    );

In [None]:
--snowflake own SOTA text embed model
create or replace table GAMES_EMBED as select *,
    snowflake.cortex.embed_text_768(
        'snowflake-arctic-embed-m',
        summary
    ) as summary_embedding
from
    LAB_DATA.PUBLIC.GAMES

In [None]:
--use cosine similarity to find like titles
select
    b.title SIMILAR_TITLES, a.title TITLE_SEARCHED, 
    vector_cosine_similarity(
        a.summary_embedding,
        b.summary_embedding
    ) as similarity
from
    GAMES_EMBED a
    cross join GAMES_EMBED b
where
    a.title = 'NBA 2K23'
order by
    similarity desc
limit 20;

## Retrieval-Augmented Generation (RAG) 
is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.
We do this in two steps when querying the LLM
1. First we take the question and get the relevant data using vector cosine similarities (as shown above)
2. Next we use that relevant data in the prompt to the LLM with our question

In [None]:
#question = """What Cities does GTA V Take Place In?"""
#question = """Is Lebron James featured in any NBA 2K games?"""
question = """What year is Grand Theft Auto: Vice City set in?"""

In [None]:
#step 1 get the relvant data
relevant_titles = session.sql(f"""
   select
            title, summary,
            vector_cosine_similarity(
                summary_embedding,
                snowflake.cortex.embed_text_768(
                  'snowflake-arctic-embed-m',
                  '{question}'
                )
            ) as similarity
        from
            GAMES_EMBED
  order by
      similarity desc
  limit 10""") 
relevant_titles.show()

In [None]:
#step 2 feed to the LLM using a specific prompt
info = '. | '.join([x[0] for x in relevant_titles.select("*").collect()]).replace("'", "")
prompt = f"""
            You are a video game expert. Please provide knowledge and guidance to the questions in the tags <question> and </question> based on the provided 
            context found between the tags <context> and </context>.

            <context>
            '{info}'
            </context>
            <question>
            '{question}'
            </question>
            Answer: """
query = """
      select
          snowflake.cortex.complete(
              ?, 
              ?
          ) as response
      """
complete = session.sql(query, params=['mistral-large', prompt])
with st.chat_message(name="Assistant"):
    st.write(complete.collect()[0][0])

## Fine Tuning Too
Snowflake is making this easy too, you supply your training data to the model via a table and you get an additional version of the model...fine tuned to your task.
Fine tuning can be the best combo of results and cost. Dont have training data?  Use a more expensive LLM to score a set of data and train a smaller, cheaper one. Boom cheap and effective.

[Snowflake Fine Tuning Docs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-finetuning?_fsi=wWDeqSCS&_fsi=wWDeqSCS)

[Snowflake Fine Tuning Workshop](https://quickstarts.snowflake.com/guide/finetuning_llm_using_snowflake_cortex_ai/index.html?index=..%2F..index#2)

# UI to Easily Create Chat Bots with (RAG & Search) is coming!
A Summit our head of product randomly chose someone from the audience to create a chatbot in Snowflake.  They had only logged into Snowflake 7 times and were able to create a RAG chat bot using the new UI in minutes.