In [1]:
!pip install python-dotenv openai --quiet

In [2]:
import openai
import os
from glob import glob
from openai import OpenAI
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from dotenv import load_dotenv

load_dotenv()

openai.api_key = os.environ["OPENAI_API_KEY"]
client = OpenAI()

### Helper Functions

These first helper functions:
1. Create embeddings, nothing too wild
2. Find cosine similarity. This is the workhorse here
3. Ranks/sorts given cosine distance
4. Easy chatting abstraction

In [4]:
def get_ada_embedding(text):
    result = client.embeddings.create(input=[text], model="text-embedding-ada-002")
    return np.array(result.data[0].embedding).reshape(1, -1)

def get_cosine_similarity(embedding1, embedding2):
    return cosine_similarity(embedding1, embedding2)[0][0]

def get_nearest_neighbor_text(text, df):
    embedding = get_ada_embedding(text)
    cosine_similarities = df.embedding.apply(lambda x: get_cosine_similarity(x, embedding))
    closest_document = cosine_similarities.sort_values(ascending=False).head(1)
    return df.loc[closest_document.index[0]].text

def simple_chat(prompt):
    return client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are a helpful Onboarding assistant."},
        {"role": "user", "content": prompt},
      ]
    ).choices[0].message.content

In [5]:
embeddings = [
    ("i love you", get_ada_embedding("I love you.")),
    ("i adore you", get_ada_embedding("I adore you.")),
    ("i hate you", get_ada_embedding("I hate you.")),
    ("i despise you", get_ada_embedding("I despise you.")),
    ("peanut butter sandwich", get_ada_embedding("peanut butter sandwich")),
    ("i am ambivalent toward you", get_ada_embedding("i am ambivalent toward you")),
    ("The happiness of your life depends upon the quality of your thoughts.", get_ada_embedding("The happiness of your life depends upon the quality of your thoughts.")),
    ("你生活的幸福取决于你思想的质量。", get_ada_embedding("dsakljflks'ajfo[easi'urtoaeishfja'sdoilfjas]")),
]

--- 

#### Similarity 

In [6]:
for name, embedding in embeddings:
    for name2, embedding2 in embeddings:
        if name in ("i love you", "i hate you"):
            print(f"{name} vs {name2}: {round(cosine_similarity(embedding, embedding2)[0][0], 3)}")
    print(" ")

i love you vs i love you: 1.0
i love you vs i adore you: 0.922
i love you vs i hate you: 0.844
i love you vs i despise you: 0.824
i love you vs peanut butter sandwich: 0.761
i love you vs i am ambivalent toward you: 0.802
i love you vs The happiness of your life depends upon the quality of your thoughts.: 0.764
i love you vs 你生活的幸福取决于你思想的质量。: 0.76
 
 
i hate you vs i love you: 0.844
i hate you vs i adore you: 0.839
i hate you vs i hate you: 1.0
i hate you vs i despise you: 0.938
i hate you vs peanut butter sandwich: 0.736
i hate you vs i am ambivalent toward you: 0.815
i hate you vs The happiness of your life depends upon the quality of your thoughts.: 0.719
i hate you vs 你生活的幸福取决于你思想的质量。: 0.727
 
 
 
 
 
 


## Semantic Searching 

Extending the idea from above with , we can think about words as existing in vector space.

First we're going to read in our data:

In [7]:
raw_onboarding_docs = glob("documents/*.md")
onboarding_doc_text = []
embeddings = []

for doc in raw_onboarding_docs:
    with open(doc) as f:
        document_text = f.read()
        onboarding_doc_text.append(document_text)
        embeddings.append(get_ada_embedding(document_text))

In [8]:
onboarding_docs_df = pd.DataFrame({"text": onboarding_doc_text, "embedding": embeddings, "filename": raw_onboarding_docs})
onboarding_docs_df.head()

Unnamed: 0,text,embedding,filename
0,# Senior Software Engineer Onboarding\n\n1. **...,"[[0.013391508720815182, -0.01913640648126602, ...",documents/onboarding-senior-swe.md
1,# Junior Software Engineer Onboarding\n\n1. **...,"[[0.006385904736816883, -0.0068930406123399734...",documents/onboarding-junior-swe.md
2,# Sales Account Executive Onboarding\n\n1. **C...,"[[-0.0068964832462370396, -0.02236486598849296...",documents/onboarding-ae.md
3,# Marketing Team Onboarding\n\n1. **Introducti...,"[[-0.015337598510086536, -0.0217508003115654, ...",documents/onboading-marketing.md


Now, let's try finding a word w/ keyword search. In the Junior and Senior SWE we might be trying to answer a question about "what sort of code editor is used around here. Let's see "code editor"  with a simple CTRL + F keyword search

In [15]:
onboarding_docs_df[onboarding_docs_df['text'].str.contains("code editor")]

Unnamed: 0,text,embedding,filename


Now with semantic search:

In [16]:
get_nearest_neighbor_text("code editor", onboarding_docs_df)

"# Junior Software Engineer Onboarding\n\n1. **Introduction to the Team and Culture:**\n   - Welcome the new engineer and introduce them to the team members with whom they'll be working closely.\n   - Organize a meet-and-greet with key members of other departments to build interdepartmental relationships.\n   - Provide an overview of the company's culture, values, and mission to instill a sense of belonging and purpose.\n\n2. **Setup of Tools and Environment:**\n   - Ensure their workstation is fully set up with the necessary hardware and software.\n   - Provide access to all required internal systems, including email, project management tools, code repositories, and documentation.\n   - Guide them through the setup of development environments, including IDEs, databases, and local servers.\n\n3. **Codebase and Documentation Review:**\n   - Walk them through the architecture of the main projects they will be working on.\n   - Review the existing documentation, including code style guide

--- 

### RAG + Prompting

We can use RAG to surface most "important" content, which we can think of as most similar content to the concepts that we're in the space of, then pass that to a prompt to do retrieval of documents. This is like a toy example of what we were talking about with Scribd.

In [8]:
new_message = "I just started my new job as a junior software engineer. I'm so excited to be here! What should I do first?"

In [9]:
print(get_nearest_neighbor_text(new_message, onboarding_docs_df))

# Junior Software Engineer Onboarding

1. **Introduction to the Team and Culture:**
   - Welcome the new engineer and introduce them to the team members with whom they'll be working closely.
   - Organize a meet-and-greet with key members of other departments to build interdepartmental relationships.
   - Provide an overview of the company's culture, values, and mission to instill a sense of belonging and purpose.

2. **Setup of Tools and Environment:**
   - Ensure their workstation is fully set up with the necessary hardware and software.
   - Provide access to all required internal systems, including email, project management tools, code repositories, and documentation.
   - Guide them through the setup of development environments, including IDEs, databases, and local servers.

3. **Codebase and Documentation Review:**
   - Walk them through the architecture of the main projects they will be working on.
   - Review the existing documentation, including code style guides, best practic

--- 

### RAG + Chat

We can use RAG to surface important concepts up, then pass those concepts over to our prompt - which eventually makes its way to and from OpenAI - to give our chat better context and sort of like a hint on what it should take into consideration

In [18]:
def help_me_onboard(question, onboarding_docs_df=onboarding_docs_df):
  
    closest_document_text = get_nearest_neighbor_text(question, onboarding_docs_df)

    prompt = f"""
    I would like help answering the following question:

    {question}

    Please only answer the question using this as context:

    {closest_document_text}
    """

    print(simple_chat(prompt))

In [19]:
help_me_onboard("What language do we use?")

We use the context of the information given regarding Junior Software Engineer onboarding to answer your question. However, the question "What language do we use?" is not directly addressed in this context. Based on the provided context which includes code reviews, development environments, and IDEs, it is likely referring to programming languages. Typically, this would involve languages commonly used in software engineering, such as Python, Java, C++, or JavaScript, but the specific language your team uses is not explicitly stated in the provided text. You might need to consult with your team or onboarding documents for precise information.


In [11]:
help_me_onboard("I am a new AE. I've attended orientation and prodcut training. What should I do next?")

After attending orientation and product training as a new Account Executive, there are several steps you can take to further prepare for your role:

1. **Familiarize Yourself with Systems and Tools:** Learn how to use the CRM software, communication tools, and any other technology platforms that are necessary for your job. Understand how to track leads, manage customer interactions, and report sales activities.

2. **Review the Sales Process and Methodologies:** Go over the sales process in detail, including prospecting, qualification, needs analysis, solution offering, objection handling, closing techniques, and post-sale service. Familiarize yourself with the preferred sales methodologies and strategies your company uses, such as SPIN Selling or Challenger Sales.

3. **Gain Market and Customer Insights:** Dig deeper into your target market by researching industry trends, key players, and conducting market analysis. Understand your customer personas, including their buying patterns an

In [20]:
!pip install segno pillow qrcode-artistic --quiet

In [21]:
import segno
from urllib.request import urlopen

qrcode = segno.make_qr("https://docs.google.com/presentation/d/1BEXKx4GHmku6VdD54zZPauBqQ6YN9mkPKUWbQeHXzZ8/edit?usp=sharing")
qrcode.save("basic_qrcode.png", 
            scale=20,
           border=0,
            light="#f5f2eb",
            dark="#dda04c")
# logo_link = urlopen("https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/1869px-Python-logo-notext.svg.png")
# qrcode.to_artistic(
#     background=logo_link,
#     target="py-rag-qrcode.png",
#     scale=20,
#     border=0,
#     light="#f5f2eb",
#     dark="#dda04c"