In [1]:
"""
Title: Text Embedding and Similarity Analysis for FAQs

Description:
This script processes a dataset of Frequently Asked Questions (FAQs) by generating embeddings for both questions 
and answers using OpenAI's text embedding model. The embeddings are numeric representations of the text, which 
can be used for various machine learning and search applications. The script reads the data from a CSV file, 
generates the embeddings, and saves the updated dataset with the embeddings to a new CSV file.

Author: James Taylor
Date: 06/06/2024

Dependencies:
- numpy
- pandas
- openai

Ensure you have the necessary dependencies installed:
pip install numpy pandas openai

Usage:
- Reads data from a CSV file containing precomputed embeddings.
- Initializes the OpenAI client using an API key stored in a text file.
- Defines functions for generating embeddings and applies themo to the data.
- Saves the updated data with embeddings to a new CSV file.
"""

import pandas as pd
import openai

# Read API key from text file
with open('api_key.txt', 'r') as file:
    api_key_1 = file.read().strip()

# Initialize the OpenAI client
openai.api_key = api_key_1

# Load data from CSV
df = pd.read_csv('FAQ_Table.csv')
df.head()  # Display the first few rows

def get_embedding(text, model="text-embedding-3-small"):
    """
    Generate an embedding for a given text using a specified model.

    Parameters
    ----------
    text : str
        The input text to be converted into an embedding.
    model : str, optional
        The model to be used for generating the embedding. Default is "text-embedding-3-small".

    Returns
    -------
    list of float
        The embedding vector for the input text.
    """
    text = text.replace("\n", " ")
    return openai.embeddings.create(input=[text], model=model).data[0].embedding

# Generate embeddings for answers and questions
df['answer_embedding'] = df['Answer'].apply(lambda x: get_embedding(x, model='text-embedding-3-small'))
df['question_embedding'] = df['Question'].apply(lambda x: get_embedding(x, model='text-embedding-3-small'))

# Print data types to verify embedding columns
print(df.dtypes)

# Save the updated DataFrame to a new CSV
df.to_csv('embedded.csv', index=False)
df.head()  # Display the first few rows of the updated DataFrame

Question ID            int64
Question              object
Answer                object
answer_embedding      object
question_embedding    object
dtype: object


Unnamed: 0,Question ID,Question,Answer,answer_embedding,question_embedding
0,1,What is the current interest rate for savings?,The current interest rate for savings accounts...,"[-0.029849905520677567, -0.002606721827760339,...","[-0.027089398354291916, -0.01929938793182373, ..."
1,2,How can I open a checking account?,You can open a checking account by visiting an...,"[0.011019216850399971, 0.04632323980331421, 0....","[0.03132950887084007, 0.031158041208982468, 0...."
2,3,What is the minimum balance for a savings acco...,The minimum balance for a savings account is $...,"[0.029641009867191315, 0.019177109003067017, 0...","[0.027614394202828407, 0.018171781674027443, 0..."
3,4,How do I apply for a personal loan?,You can apply for a personal loan online throu...,"[-0.0037767095491290092, 0.015247618779540062,...","[-0.0032004239037632942, -0.002346499124541878..."
4,5,What documents are required to open an account?,"To open an account, you need a valid ID, proof...","[0.0847620889544487, 0.011813902296125889, 0.0...","[0.038933608680963516, 0.07153330743312836, 0...."
