# Carbonia Package Tutorial

This tutorial will guide you through the process of using the `carbonia` package to embed and match data. We will start by loading an Excel file, embedding the data using OpenAI embeddings, and then matching the data.

## Step 1: Import Necessary Libraries

First, we need to import the necessary libraries and change the current working directory to the parent directory.



In [2]:
# Import necessary libraries
import os
import pandas as pd

# Change current working directory to parent
os.chdir('..')

ModuleNotFoundError: No module named 'pandas'

## Step 2: Import Functions from Carbonia Package
Next, we import the functions we need from the carbonia package.

In [None]:
# Import functions from the carbonia package
from carbonia.match import match
from carbonia.embeddings import embed

## Step 3: Load Environment Variables
We use the dotenv package to load environment variables, including the OpenAI API key.

In [1]:
# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Set your OpenAI API key
api_key = os.getenv("OPENAI_API_KEY")

ModuleNotFoundError: No module named 'dotenv'

Step 4: Load the Excel File
We load the Excel file containing the data we want to embed and match.

In [None]:
# Load the Excel file
file_path = 'path/to/your/excel/file.xlsx'
data = pd.read_excel(file_path)

# Display the first few rows of the dataframe
data.head()

Step 5: Embed the Data
We use the embed function from the carbonia package to embed the data. We need to specify the column name(s) to embed.

In [None]:
# Specify the column name(s) to embed
embedding_column_name = 'your_column_name'

# Embed the data
embedded_data = embed(data, embedding_column_name=embedding_column_name, api_key=api_key)

# Display the first few rows of the embedded dataframe
embedded_data.head()

Step 6: Match the Data
Finally, we use the match function from the carbonia package to match the embedded data with a target dataframe.

In [None]:
# Load the target dataframe
target_file_path = 'path/to/your/target/excel/file.xlsx'
target_data = pd.read_excel(target_file_path)

# Match the data
matched_data = match(embedded_data, target_data, embedding_column_name=embedding_column_name, api_key=api_key)

# Display the first few rows of the matched dataframe
matched_data.head()