<h1>Diamonds Are Forever</h1>
<h2>Import the needed modules</h2>

In [9]:
# !pip install --upgrade scikit-learn
# !pip install pinecone-client
# !pip install -U langchain-cli
# !pip install transformers

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Load environment variables
from dotenv import load_dotenv
import os
from typing import List

# Data handling
import pandas as pd

## Regression
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor 
from sklearn.neighbors import KNeighborsRegressor

# Modelling Helpers
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score

# Preprocessing
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Evaluation metrics
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Pinecone
from pinecone import Pinecone, ServerlessSpec

# OpenAI
from openai import OpenAI
from openai.types import Image, ImagesResponse

# Tokenization
import nltk
import tiktoken

# Downloads
nltk.download('punkt')

# Langchain
from langchain_openai import ChatOpenAI
from langchain.docstore.document import Document
from langchain.chains.question_answering import load_qa_chain

# Transformers
from transformers import pipeline
import gradio as gr

# Matplotlib
import matplotlib.pyplot as plt

# Import all functions from file
from inc.functions import *


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\James\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


<h2>Define Needed Variables</h2>

In [11]:
# Load environment variables.
variables_to_clear = ['OPENAI_API_KEY',
                      'LANGCHAIN_TRACING_V2',
                      'LANGCHAIN_ENDPOINT',
                      'LANGCHAIN_API_KEY',
                      'LANGCHAIN_PROJECT',
                      'PINECONE_API_KEY']

for var in variables_to_clear:
    if var in os.environ:
        del os.environ[var]

load_dotenv("inc/api_keys.env")

## Get the API keys defined
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

# Check the API key
if not PINECONE_API_KEY:
    raise ValueError("PINECONE_API_KEY environment variable is not set.")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
# Check the API key
if not OPENAI_API_KEY:
    raise ValueError("PINECONE_API_KEY environment variable is not set.")

pc = Pinecone(api_key=PINECONE_API_KEY)

## Attempt to access the index
try:
    index = pc.Index("diamonds")
    print("Successfully accessed the index 'diamonds'.")
except Exception as e:
    print(f"Error accessing the index 'diamonds': {e}")

## Set the model name for our LLMs.
OPENAI_MODEL = "gpt-3.5-turbo"
EMBED_MODEL = "text-embedding-ada-002"

client = OpenAI(api_key=OPENAI_API_KEY)
MAX_TOKENS = 1536

# Define vector list for chunking
vectors = []
filename = 'README1.md'

# Pull book and article text for chunking
text = get_diamond_info()

query_responses = []
answers = []
questions = ["What is the most famous type of diamond cut?",
            "What is the process of diamond certification?",
            "Who is the top diamond dealer in the world?",
            "What are the signs of diamond impurities?",
            "What are Blood diamonds?",
            "What is the history of the diamond industry?",
            "Who are the major players in the diamond market?",
            "How many diamonds are typically used in high-end jewelry?",
            "What will Langsmith help us learn about diamond appraisals?"]

Successfully accessed the index 'diamonds'.


# Question & Answers 
Here are the answers to your diamond-related questions:

## What is the most famous type of diamond cut?

The most famous type of diamond cut is the round brilliant cut. It is known for its ability to maximize a diamond's sparkle and brilliance due to its precise faceting.
## What is the process of diamond certification?

Diamond certification is a process in which a diamond is evaluated and graded based on its characteristics such as carat weight, cut, color, and clarity (known as the 4Cs). This evaluation is conducted by a reputable gemological laboratory, such as the Gemological Institute of America (GIA) or the International Gemological Institute (IGI). A certificate detailing these characteristics is provided to verify the diamond's quality and authenticity.
## Who is the top diamond dealer in the world?

The diamond industry is vast, and there isn't one definitive "top diamond dealer" globally. However, some of the largest and most well-known diamond companies include De Beers, Alrosa, and Rio Tinto. These companies are known for their extensive involvement in diamond mining and trade.
## What are the signs of diamond impurities?

Diamond impurities or inclusions are natural features within a diamond that can affect its clarity. Signs of diamond impurities include internal flaws such as feathers, crystals, or needles. Surface blemishes may include pits or scratches. These imperfections are often evaluated and rated on a clarity scale, ranging from "Flawless" to "Included."
## What are blood diamonds?

Blood diamonds, also known as conflict diamonds, are diamonds mined in war zones and sold to finance armed conflict against governments. The term gained widespread awareness due to humanitarian concerns related to unethical mining practices and exploitation. Efforts such as the Kimberley Process have been established to prevent the trade of conflict diamonds.
## What is the history of the diamond industry?

The diamond industry has a long and storied history. Diamonds were first discovered and mined in India around the 4th century BCE. The industry expanded with the discovery of diamond deposits in South Africa in the late 1800s, which led to the establishment of large diamond mining companies like De Beers. Today, diamonds are sourced from various regions worldwide, including Russia, Australia, and Canada.
## Who are the major players in the diamond market?

Major players in the diamond market include companies involved in diamond mining, trading, and retailing. De Beers, Alrosa, and Rio Tinto are some of the leading diamond mining companies. In terms of retail, companies like Tiffany & Co., Cartier, and Graff Diamonds are well-known luxury brands in the diamond jewelry market.
## How many diamonds are typically used in high-end jewelry?

The number of diamonds used in high-end jewelry can vary depending on the design and size of the piece. For example, a solitaire engagement ring may feature a single prominent diamond, while a necklace or bracelet might include multiple smaller diamonds set in intricate patterns. High-end jewelry pieces often focus on quality and artistry, with diamonds chosen for their exceptional cut, clarity, color, and carat weight.
## What will Langsmith help us learn about diamond appraisals?

Langsmith is likely a reference to an expert or resource specializing in diamond appraisals. An expert like Langsmith can teach us how to evaluate diamonds based on the 4Cs (carat, cut, color, and clarity), how to identify diamond treatments or enhancements, and how to properly assess the value of a diamond. They may also provide guidance on recognizing reputable grading certificates and authenticating diamonds.

In [4]:
""" Added Below for chunking"""
def prep(text: str):
    return text.replace("\n", " ").replace("\r", " ").replace("\t", " ")

def tokenize(text: List[str]):
    encoding = tiktoken.encoding_for_model(EMBED_MODEL)
    return encoding.encode(text)

def embed(tokens: List[int]):
    response = client.embeddings.create(input=tokens,model=EMBED_MODEL)
    return response.data[0].embedding

def chunk_text(text:str):
    current_chunk = []
    current_para = ""
    chunks = []
    paras = []
    current_len = 0
    sentences = nltk.sent_tokenize(text)
    chunks_of_tokens = []
    
    for sentence in sentences:
        # Tokenize the sentence
        sentence_tokens = tokenize(sentence)
        sentence_token_len = len(sentence_tokens)
        
        # Check if adding the next sentence exceeds the max token limit
        if current_len + sentence_token_len > MAX_TOKENS:
            # Add the current chunk to the list and start a new one
            paras.append(current_para)
            current_para = ""
            chunks_of_tokens.append(current_chunk)
            embeddings = embed(current_chunk)
            chunks.append(embeddings)
            current_chunk = []
            current_len = 0
        
        # Add the sentence to the current chunk
        current_para += " " + sentence
        current_chunk.extend(sentence_tokens)
        current_len += sentence_token_len
    
    # Add the last chunk if it's not empty
    if current_chunk:
        paras.append(current_para)
        chunks_of_tokens.append(current_chunk)
        embeddings = embed(current_chunk)
        chunks.append(embeddings)

    return paras, chunks, chunks_of_tokens

def create_embeddings(filename: str):
    with open(filename, "r") as file:
        text = file.read()
    text = prep(text)
    return chunk_text(text)
    
def create_embeddings_prompt(prompt:str):
    prompt = prep(prompt)
    return chunk_text(prompt)

def vectorize_chunks(paras: List, chunks: List, **kwargs):
    vectors = []
    for i in range(len(chunks)):
        if "filename" in kwargs:
            vectors.append({"id": f"{i}", "values": chunks[i], "metadata": {"file": filename, "para": f"{paras[i]}"}})
        else:
            vectors.append({"id": f"{i}", "values": chunks[i], "metadata": {"para": f"{paras[i]}"}})
        
    return vectors


def ask_a_question(prompt):
    # convert the prompt to chunks of  embeddings
    paras, chunks, chunks_of_tokens  = create_embeddings_prompt(prompt)
    print(f"Embeddings: {chunks[0]}")
    # vectorize the embeddings
    prompt_vectors = vectorize_chunks(paras, chunks)
    print(f"Vectorized: {prompt_vectors[0]}")
    # search the index for the best match using semantic search
    query_response = index.query(
        top_k=2,
        vector=prompt_vectors[0]["values"]
    )
    query_responses.append(query_response)
    print(f"Query response: {query_response}")
    # get the id of the best match
    best_id = query_response["matches"][0]["id"]
    print(f"Best ID: {best_id}")
    # fetch the best match from the index
    result = index.fetch(ids=[best_id])
    # get the paragraph of interest from the result metadata
    para_of_interest = result["vectors"][best_id]["metadata"]["para"]
    print(f"Para of interest: {para_of_interest}")
    # Initialize the langchain chat model.
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name=OPENAI_MODEL, temperature=0.0)
    # turn the para_of_interest into a Document
    document = Document(page_content=para_of_interest)
    # Create the QA chain using the LLM.
    chain = load_qa_chain(llm)
    # Pass the para_of_interest and the prompt to the chain, and print the result.
    question = "If you can't find the answer in the provided document, say, I don't know the answer to that beautiful person, otherwise, answer the question. " + prompt
    result = chain.invoke({"input_documents": [document], "question": question})
    return result["output_text"]

<h3>Vectorize Information</h3>

In [5]:
## Vectorize text information
# Vectors from text from get_diamond_info()
paras, chunks, chunks_of_tokens  = create_embeddings_prompt(text)
vectors_from_text = vectorize_chunks(paras, chunks)
vectors.extend(vectors_from_text)

# Vectors from filename README1.md
paras, chunks, chunks_of_tokens = create_embeddings(filename)
vectors_from_file = vectorize_chunks(paras, chunks, filename=filename)
vectors.extend(vectors_from_file)

# Create index.upsert from vectors above
index.upsert(
    vectors=vectors     
)

{'upserted_count': 91}

<h4>Question the chatbot</h4>

In [6]:
for question in questions:
    answers.append(ask_a_question(question))

Embeddings: [-0.015503785572946072, 0.007622694596648216, 0.005119479261338711, -0.01576218195259571, -0.010549033991992474, 0.018940458074212074, -0.029767267405986786, 0.00627903314307332, -0.009108473546802998, -0.008656280115246773, 0.01166013814508915, 0.012176930904388428, -0.00415049260482192, -0.00819762609899044, -0.011821635998785496, 0.024534739553928375, 0.02689906768500805, 0.011123965494334698, 0.018449503928422928, -0.009147233329713345, -0.03971552848815918, 0.005830069072544575, 0.01290690153837204, -0.021317705512046814, -0.006585878785699606, -0.004819093272089958, 0.029198795557022095, -0.012642044574022293, -0.005830069072544575, -0.022583847865462303, 0.018307385966181755, 0.0030490776989609003, -0.03325561806559563, -0.001703801448456943, -0.01822986826300621, 0.01832030527293682, 7.797314174240455e-05, 0.002420851495116949, 0.01162783894687891, 0.005035500042140484, 0.006976703181862831, -0.008701499551534653, 5.788685302832164e-05, -0.0032234953250736, -0.00841

<h4>Test the results of the answers.</h4>

In [7]:
ix = 0
for query_response in query_responses:
    print(f"Match Score: {query_response['matches'][0]['score']}")
    print(f"Question: {questions[ix]}")
    print(f"Answer:   {answers[ix]}\n\n")
    ix += 1

Match Score: 0.862766266
Question: What is the most famous type of diamond cut?
Answer:   The most popular and famous type of diamond cut is the brilliant cut.


Match Score: 0.870021701
Question: What is the process of diamond certification?
Answer:   Diamond certification is a process in which a diamond is evaluated and graded based on its characteristics such as carat weight, cut, color, and clarity (known as the 4Cs). This evaluation is conducted by a reputable gemological laboratory, such as the Gemological Institute of America (GIA) or the International Gemological Institute (IGI). A certificate detailing these characteristics is provided to verify the diamond's quality and authenticity.


Match Score: 0.834440231
Question: Who is the top diamond dealer in the world?
Answer:   I don't know the answer to that beautiful person.


Match Score: 0.829233944
Question: What are the signs of diamond impurities?
Answer:   I don't know the answer to that beautiful person.


Match Score: 0.

<h2>Ask it a question</h2>

In [8]:
app = gr.Interface(fn=ask_a_question,
                   inputs=gr.Textbox(label="Ask me about Diamonds"),
                   outputs=gr.Textbox(lines=10, label="Your answer about diamonds:", show_copy_button=True))
app.launch()

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




Embeddings: [-0.019042646512389183, -0.022632423788309097, 0.0004495985631365329, -0.03929123282432556, -0.007656321860849857, 0.026053929701447487, -0.01958952657878399, -0.027484232559800148, -0.029194984585046768, -0.004255849402397871, 0.015214485116302967, 0.020907647907733917, -0.0027519287541508675, -0.012823637574911118, -0.015522981993854046, 0.01682708039879799, 0.022646445780992508, -0.0036669012624770403, 0.02862006053328514, -0.02301103249192238, -0.02240806259214878, 0.01863599196076393, 0.009486266411840916, -0.01996813528239727, 0.004424119833856821, -0.005219900514930487, 0.021132009103894234, -0.012108487077057362, -0.015971703454852104, -0.006275098770856857, 0.0006967463414184749, -0.002154216868802905, -0.0071935770101845264, 0.0042032646015286446, -0.02071133255958557, 0.020795468240976334, 0.004942954983562231, 0.005055135581642389, 0.005272485315799713, 0.014190837740898132, -0.007663332857191563, 0.0011682551121339202, 0.003915802109986544, -0.01799095422029495

<h2>Image generator</h2>

In [12]:
def generate_image(prompt):
    # call the OpenAI API
    response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="standard",
        n=1,
        )
    image_url = response.data[0].url
    return image_url

# Create a Gradio interface
interface = gr.Interface(
    fn=generate_image,  # Function to generate images
    inputs="text",  # Text input for the prompt
    outputs="image",  # Output is an image (URL of the generated image)
    title="DALL-E Image Generator",
    description="Enter a text prompt to generate an image using OpenAI's DALL-E."
)

# Launch the Gradio interface
interface.launch()

Running on local URL:  http://127.0.0.1:7866

To create a public link, set `share=True` in `launch()`.


