# Zero to Cortex  
## *Basics*
This lab will take you through the basics of the Snowflake Cortex LLM offering. You'll have a chance to explore the functions and understand how they work with custom prompts and real data in both SQL and Python.  
https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions

![Alt text](https://venturebeat.com/wp-content/uploads/2024/04/a-robot-playing-with-a-snowflake-in-arctic-cinemat-4OYW23nATBm50aD_slLk8w-xaTFE1EbSLmDWJXvWCxXrA.jpeg?fit=750%2C422&strip=all "a title")

In [None]:
# Import python packages
import streamlit as st
import pandas as pd

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()
from snowflake.snowpark import functions as F
from snowflake.cortex import Complete, Sentiment, Summarize, Translate
from snowflake.core import Root
root = Root(session)
import json

### Translate  
https://docs.snowflake.com/en/sql-reference/functions/translate-snowflake-cortex

In [None]:
-- TRANSLATE
-- Translate text from English to Italian
-- We can leave the input language blank and it will infer
select
    snowflake.cortex.translate(
        'I am using Snowflake Cortex artificial intelligence functions to translate text!',
        '',
        'it');

In [None]:
# Translate some text from English to Polish
print(
    Translate(
    "I am using Snowflake Cortex artificial intelligence functions to translate text!",
    "",
    "pl"
))

In [None]:
select
    description,
    snowflake.cortex.translate(
        description,
        '',
        'ko'
    ) as review_translation
from
    wine_reviews
limit 10;

In [None]:
session.table("wine_reviews").select(
    F.col("description"),
    Translate(
        F.col("description"),
        "",
        "sv"
    ).alias("translation")
).limit(10).toPandas()

### Sentiment 
https://docs.snowflake.com/en/sql-reference/functions/sentiment-snowflake-cortex

In [None]:
select
    snowflake.cortex.sentiment(
        'This group is really the best. They are so smart!'
    );

In [None]:
print(Sentiment("I don't really mind the weather, but it isn't my favorite."))

In [None]:
select
    review,
    snowflake.cortex.sentiment(review) as sentiment
from
    airline_reviews
limit 5;

In [None]:
session.table("airline_reviews").select(
    F.col("review"),
    Sentiment(F.col("review")).alias("sentiment")
).limit(5).toPandas()

### Summarize  
https://docs.snowflake.com/en/sql-reference/functions/summarize-snowflake-cortex

In [None]:
select
    snowflake.cortex.summarize('I had to go to the store to get coffee because we ran out. I didn\'t realize until later that we had a whole closet full of coffee. Could have saved the trip!.');

In [None]:
print(
    Summarize(
        """I've been drinking bottled water instead of the water from my tap.
        I found out that our house has a private well and the water we have
         in our well is way better. I am going to stop buying bottled water."""
    )
)

In [None]:
select
    airport_review,
    snowflake.cortex.summarize(airport_review)
from
    airport_reviews
limit 10;

In [None]:
session.table("airport_reviews").select(
    F.col("airport_review"),
    Summarize(F.col("airport_review")).alias("review_summary")
).limit(5).toPandas()

### COMPLETE  
https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex

In [None]:
select
    snowflake.cortex.complete(
        'snowflake-arctic',
        'How can we change the world with AI in three sentences?'
    );

In [None]:
print(
    Complete(
        "snowflake-arctic",
        "What can you tell me about Wheaten Terriers?"
    )
)

In [None]:
select
    description,
    snowflake.cortex.complete(
        'snowflake-arctic',
        CONCAT(
            'You\'re a wine sommelier. I really like grape juice and I don\'t like dry
            wines. Based on the review between <review> and </review>, can you tell me if 
            this would be a good wine for me? <review>', description, '</review>'
        )
    ) as answer
from
    wine_reviews
limit 2;

In [None]:
session.table("wine_reviews").select(
    F.col("description"),
    Complete(
        "snowflake-arctic",
        F.concat(
            F.lit("""Based on the value between the <context> and </context> tags, please provide me with an understanding
            of what I should expect if I drink this wine.
            <context>"""),
            F.col("description"),
            F.lit("</context>"))
    ).alias("expectation")
).limit(10).toPandas()

### EMBED_TEXT
https://docs.snowflake.com/sql-reference/functions/embed_text-snowflake-cortex

In [None]:
select
    snowflake.cortex.embed_text_768(
        'snowflake-arctic-embed-m',
        'California Contemporary style homes'
    );

In [None]:
select
    description,
    snowflake.cortex.embed_text_768(
        'snowflake-arctic-embed-m',
        description
    ) as description_embedding
from
    wine_reviews
limit 10;

### VECTOR DISTANCE CALCULATIONS  
https://docs.snowflake.com/en/sql-reference/functions/vector_cosine_similarity

In [None]:
select
    vector_cosine_similarity(
        snowflake.cortex.embed_text_768('snowflake-arctic-embed-m', 'California Contemporary style homes'),
        snowflake.cortex.embed_text_768('snowflake-arctic-embed-m', 'California Contemporary style homes')
    );

In [None]:
select
    *,
    vector_cosine_similarity(
        information_embeds,
        snowflake.cortex.embed_text_768('snowflake-arctic-embed-m', 'I like wines that are sweet and from California.')
    ) as similarity
from
    wine_reviews
order by
    similarity desc
limit 10;

# Build A Chatbot Using Snowflake Cortex!  
* Cortex Search
* Cortex Embedding Functions
* Vector Functions
* COMPLETE  

### The chatbot we are creating will use a corpus of wine reviews from all over the world to recommend one (or more) wines based on your prompt. A good question would be "_I am having salmon tonight. What kind of wines should I consider?_"

In [None]:
# Enter your question here!
question = """

I am going to have a tomahawk steak tonight. Which wines should I consider buying?

"""

In [None]:
model_name = st.radio(
    label = "Choose your model",
    options = [
        "snowflake-arctic",
        "mistral-large2",
        "mistral-large",
        "mixtral-8x7b",
        "mistral-7b",
        "reka-flash",
        "llama3.1-405b",
        "llama3.1-70b",
        "llama3.1-8b",
        "llama3-70b",
        "llama3-8b",
        "llama2-70b-chat",
        "jamba-instruct",
        "gemma-7b"
    ]
)

In [None]:
# Cortex Search
# Instantiate the pointer to the Search Service
wine_ss = (root
  .databases["z2c"]
  .schemas["cortex"]
  .cortex_search_services["wine_search_service"]
)

In [None]:
# Nearly instantaneous results
resp = wine_ss.search(
  query=question,
  columns=["full_description"],
  limit=10
)
read_out = [x["full_description"] for x in json.loads(resp.to_json())["results"]]
info = '. | '.join([x["full_description"] for x in json.loads(resp.to_json())["results"]]).replace("'", "")
read_out

In [None]:
# Give the context to the LLM and get your question answered!
prompt = f"""
                You are a wine expert with access to hundreds of thousands of reviews about different wines.
                Only utilize the context provided between the tags <context> and </context>. The user's question will be between the
                <question> and </question> tags. Please present the wine nicely. Explain where it is from, the variety of wine it is, and the price.
            ALWAYS make sure to remind us that you are the most distinguished wine sommelier in the world. Never forget to do this.
            Reminding the customer that you are the absolute best sommelier in the world is of utmost importance. When you remind the customer that you are the best, 
            make sure to remind the customer they are below you. You must portray a snobby wine sommelier or they won't believe you.
            Answer the questions based on the context provided between the <context> and </context> tags. The
            question will be found between the <question> and </question> tags.
            <context>
            '{info}'
            </context>
            <question>
            '{question}'
            </question>
            Answer: """
query = """
      select
          snowflake.cortex.complete(
              ?, 
              ?
          ) as response
      """
complete = session.sql(query, params=[model_name, prompt])
with st.chat_message(name="Assistant"):
        st.write(complete.collect()[0][0])

## Another interesting use case - *data cleaning*  
In this next example, you will see that one would have to write some Regex to extract information from the `variable` column. What if we could do this with LLMs instead of writing complex code? 

In [None]:
select * from sec_filings limit 5

In [None]:
-- Create a new column called CLASSIFICATION from the VARIABLE column
-- This column will give you a clean document type classification
select
    sec_document_id,
    snowflake.cortex.complete(
        'mistral-large2',
        CONCAT('Based on the value between the <variable> and </variable> tags, please classify
            the data in ONLY one of these three categories: 10K, 10Q, 8K. If you cannot classify
            the data based on the information, impute NULL. Do not provide an explanation. Only provide 
            your answer of 10K, 10Q, 8K, or NULL.
            <variable>', sec_document_id, '</variable>'
        )) as classification
from
    sec_filings
limit 20

### De-Identification

In [None]:
select
    email,
    snowflake.cortex.complete(
        'mistral-large2',
        CONCAT(
            'Given the following sentences supplied between the <email> and </email> tags, please perform the following transformation:
            For all email addresses within the string, replace the identifiable portion with xxxx. For example, hi there. david.stearns@snowflake.com hi there david.stearns@snowflake.com should become hi, there. xxxx@snowflake.com hi                 there xxxx@snowflake.com ... Only respond with the transformed string.
            <email>',
            email,
            '</email>'
            )
    ) as clean_string,
    snowflake.cortex.complete(
        'snowflake-arctic',
        CONCAT(
            'Based on the data between <email> and </email>, what email domains are in the data? If there are multiple, list them. If all domains in the data are the same, just give the domain once. Only respond with the identified domain,              DO NOT PROVIDE AN EXPLANATION.
            <email>', 
            email, 
            '</email>'
                )
    ) as domains,
    snowflake.cortex.complete(
        'snowflake-arctic',
        CONCAT(
            'Based on the data between <email> and </email>, what are the names of the people listed in the email address? Only respond with the name of the person, nothing else. First and last names are sometimes separated by a period. Remove the period and add a space if necessary.
            <email>',
            email, 
            '</email>'
                )
    ) as names
from
    emails;