## Semantic text search using embeddings

We can search through all our reviews semantically in a very efficient manner and at very low cost, by simply embedding our search query, and then finding the most similar reviews. The dataset is created in the [Obtain_dataset Notebook](Obtain_dataset.ipynb).

In [21]:
!pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.2.1-cp39-cp39-macosx_12_0_arm64.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting joblib>=1.1.1
  Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.2.0 scikit-learn-1.2.1 threadpoolctl-3.1.0


In [3]:
import pandas as pd
import numpy as np

datafile_path = "data/embeddings.csv"

df = pd.read_csv(datafile_path)
df["embeddings"] = df.embeddings.apply(eval).apply(np.array)

Remember to use the documents embedding engine for documents (in this case reviews), and query embedding engine for queries. Note that here we just compare the cosine similarity of the embeddings of the query and the documents, and show top_n best matches.

In [8]:
from openai.embeddings_utils import get_embedding, cosine_similarity

# search through the reviews for a specific product
def search(df, query, n):
    query_embedding = get_embedding(
        query,
        engine="text-embedding-ada-002"
    )
    df["similarity"] = df.embeddings.apply(lambda x: cosine_similarity(x, query_embedding))

    results = (
        df.sort_values("similarity", ascending=False)
        .head(n)
    )

    return results


results = search(df, "who can view a variable marked private in a solidity contract", n=3)


Unnamed: 0

text

embeddings

similarity



In [14]:
#iterate through the results and print the text
for index, row in results.iterrows():
    print(row['text'])
    print()

not in derived contracts


Both functions and state variables can be made public or private

Here's a function for updating a state variable on a contract:

solidity
// Solidity example
function update_name(string value) public {
    dapp_name = value;
}



The parameter value of type string is passed into the function: update_name
It's declared public, meaning anyone can access it
It's not declared view, so it can modify the contract state


View functions {#view-functions}

These functions promise not to modify the state of the contract's data. Common examples are "getter" functions – you might use this to receive a user's balance for example.

solidity
// Solidity example
function balanceOf(address _owner) public view

anyone can access it
It's not declared view, so it can modify the contract state


View functions {#view-functions}

These functions promise not to modify the state of the contract's data. Common examples are "getter" functions – you might use this to receive a user's