## 1. Load Energy Document Corpus
Use scraped or sample energy technical documents.

In [1]:
import pyarrow
import sys
print("pyarrow version:", pyarrow.__version__)
print("Python executable:", sys.executable)

pyarrow version: 15.0.0
Python executable: /Users/justin/miniconda3_Jun2023/envs/lora-llm/bin/python


In [2]:
with open('../data/energy_text.txt') as f:
    docs = [line.strip() for line in f if line.strip()]
docs[:2]

['Oil production forecasting is the prediction of future oil output from wells using historical and engineering data.',
 'Reservoir pressure is a key factor affecting oil production rates.']

## 2. Azure OpenAI Embeddings
Use text-embedding-ada-002 for vectorization.

In [3]:
import os
import openai
from typing import List

# Try to import sentence-transformers for local fallback
try:
    from sentence_transformers import SentenceTransformer
    _has_hf = True
except ImportError:
    _has_hf = False

# Check for Azure OpenAI credentials
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_KEY = os.getenv('AZURE_OPENAI_KEY')

def get_embedding(text: str) -> List[float]:
    """
    Returns embedding for text using Azure OpenAI if credentials are set, otherwise uses Hugging Face sentence-transformers.
    """
    if AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_KEY:
        openai.api_type = 'azure'
        openai.api_base = AZURE_OPENAI_ENDPOINT
        openai.api_key = AZURE_OPENAI_KEY
        resp = openai.embeddings.create(input=[text], model='text-embedding-ada-002')
        return resp.data[0].embedding
    elif _has_hf:
        # Use Hugging Face all-MiniLM-L6-v2 as fallback
        if not hasattr(get_embedding, '_model'):
            get_embedding._model = SentenceTransformer('all-MiniLM-L6-v2')
        return get_embedding._model.encode(text).tolist()
    else:
        raise RuntimeError('No embedding provider available. Set Azure OpenAI credentials or install sentence-transformers.')


  from .autonotebook import tqdm as notebook_tqdm


> **Tip:** This notebook will use Azure OpenAI for embeddings if credentials are set, otherwise it will fall back to Hugging Face sentence-transformers (all-MiniLM-L6-v2). To enable Azure, set `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_KEY` in your environment. To use the local fallback, ensure `sentence-transformers` is installed.

## 3. Store Embeddings in ChromaDB
Vector store for fast retrieval.

In [4]:
import chromadb
client = chromadb.Client()
collection = client.create_collection('energy_docs')
for i, doc in enumerate(docs):
    emb = get_embedding(doc)
    collection.add(documents=[doc], embeddings=[emb], ids=[str(i)])

## 4. RAG Pipeline: Retrieval + Generation
Retrieve relevant docs and generate answer with Azure OpenAI.

In [5]:
def rag_query(query):
    q_emb = get_embedding(query)
    results = collection.query(query_embeddings=[q_emb], n_results=3)
    context = '\n'.join([d for d in results['documents'][0]])
    prompt = f'Context: {context}\n\nQuestion: {query}\nAnswer:'
    # For openai>=1.0.0
    response = openai.chat.completions.create(
        model='gpt-4',
        messages=[{'role': 'user', 'content': prompt}]
    )
    return response.choices[0].message.content

rag_query('How does reservoir pressure affect oil production?')

'Reservoir pressure affects oil production by pushing the oil from underground reservoirs to the surface. When this natural pressure declines, the rate of oil production can decrease. This is why artificial lift methods or enhanced oil recovery techniques may be implemented to increase the pressure and enhance oil production.'

## 5. Cost Tracking
Log token usage and estimate cost.

In [6]:
# Example: log token usage
def log_cost(response):
    usage = response['usage']
    tokens = usage['total_tokens']
    cost = tokens * 0.00002  # Example cost per token
    print(f'Tokens: {tokens}, Estimated Cost: ${cost:.4f}')

## 6. Prompt Engineering
Try few-shot and chain-of-thought prompts.

In [7]:
few_shot_prompt = '''
Context: Oil production depends on reservoir pressure.

        Question: What is reservoir pressure?
        Answer: Reservoir pressure is the pressure of fluids within a reservoir.
        
        Question: How does it affect oil production?
        Answer: Higher reservoir pressure generally increases oil production.
        
        Question: What happens when pressure drops?
        Answer: Oil production typically declines.
        '''

## 7. Evaluation
Evaluate retrieval accuracy and answer quality.