# AG CROP PLANNING TOOL

The Ag Decision Engine accepts inputs from a user (Crop Type, Location and Timeframe) and develops a customized crop planning and protection plan for farmland owners or operatators across North Carolina. The decision engine offers a basic interface for user input and leverages ouputs from a crop performance prediction model and a RAG-enhanced LLM for recomendation building.

## CODE CONTENT

**1.0 - USER INTERFACE (U/I)** \
1.1 - Input Definition\
1.2 - U/I Design\
\
**2.0 - CROP PREDICTION MODEL** \
2.1 - TBD\
2.2 - TBD\
\
**3.0 - DECISON LOGIC MODEL** \
3.1 - TBD\
3.2 - TBD\
\
**4.0 - RECOMENDATION BUILDER** \
4.1 - TBD\
4.2 - TBD

## 1.0 USER INTERFACE
**OVERVIEW**: Gradio interface enables user to select a count, select crops to consider and input 4-digit planting year using keyboard.

**DEPENDENCIES**
* _N/A_

**INSTRUCTIONS**
* _N/A_

**IMPORTS**

In [None]:
# Uses Gradio to build interface
import gradio as gr

# Removes unnecessary warnings
import warnings
warnings.filterwarnings('ignore')

## 1.1 Input Definition

In [None]:
# Defines the inputs for counties, crops and seasons
counties = ["Alamance", "Alexander", "Alleghany", "Anson", "Ashe", "Avery", "Beaufort", "Bertie", "Bladen", "Brunswick",
            "Buncombe", "Burke", "Cabarrus", "Caldwell", "Camden", "Carteret", "Caswell", "Catawba", "Chatham",
            "Cherokee", "Chowan", "Clay", "Cleveland", "Columbus", "Craven", "Cumberland", "Currituck", "Dare",
            "Davidson", "Davie", "Duplin", "Durham", "Edgecombe", "Forsyth", "Franklin", "Gaston", "Gates", "Graham",
            "Granville", "Greene", "Guilford", "Halifax", "Harnett", "Haywood", "Henderson", "Hertford", "Hoke", "Hyde",
            "Iredell", "Jackson", "Johnston", "Jones", "Lee", "Lenoir", "Lincoln", "Macon", "Madison", "Martin",
            "McDowell", "Mecklenburg", "Mitchell", "Montgomery", "Moore", "Nash", "New Hanover", "Northampton",
            "Onslow", "Orange", "Pamlico", "Pasquotank", "Pender", "Perquimans", "Person", "Pitt", "Polk", "Randolph",
            "Richmond", "Robeson", "Rockingham", "Rowan", "Rutherford", "Sampson", "Scotland", "Stanly", "Stokes",
            "Surry", "Swain", "Transylvania", "Tyrrell", "Union", "Vance", "Wake", "Warren", "Washington", "Watauga",
            "Wayne", "Wilkes", "Wilson", "Yadkin", "Yancey"]

crops = ['Barley', 'Corn', 'Cotton', 'Hay', 'Oats', 'Peanuts', 'Bell Peppers', 'Pumpkins', 'Soybeans', 'Squash',
         'Sweet Potatoes', 'Tobacco', 'Wheat']

seasons = ['Spring', 'Summer', 'Fall']

# Function for formating inputs
def crop_prediction(county, crop_list, selected_seasons, year):
    # Placeholder function to simulate crop prediction
    crop_yields = [1.0] * len(crop_list)
    crop_values = [2.0] * len(crop_list)
    confidence_levels = [0.8] * len(crop_list)
    return crop_yields, crop_values, confidence_levels

## 1.2 U/I Design

In [None]:
# Function defining Gradio Interface
def user_interface(county, crop_list, selected_seasons, year):
    # Call the crop prediction model with user inputs
    crop_yields, crop_values, confidence_levels = crop_prediction(county, crop_list, selected_seasons, year)
    
    # Display results (for now, just showing the inputs for demonstration)
    return f"County: {county}\nCrops: {', '.join(crop_list)}\nSeasons: {', '.join(selected_seasons)}\nYear: {year}"

# Define the Gradio interface
inputs = [
    gr.Image(value="Images/ui_image.png", label="Farm Image"),  
    gr.Dropdown(choices=counties, label="Select County"),
    gr.CheckboxGroup(choices=crops, label="Crops to Consider"),
    gr.CheckboxGroup(choices=seasons, label="Planting Season(s)", value=seasons),
    gr.Number(label="Planting Year (YYYY)", value=2025, minimum=2025, maximum=2035)
]

# Defines Output Design
outputs = gr.Textbox(label="Planting and Protection Recommendations")

# Launches the Gradio interface
gr.Interface(fn=user_interface, inputs=inputs, outputs=outputs, title="Crop Planning and Protection Plan Generator").launch(share=True)

## 2.0 CROP PREDICTION MODEL
**OVERVIEW**: Trains ML model on historical agriculture data (i.e., 12 NC crops grown in North Carolina between 2000 and 2020). Training features comprised of annual and seasonal temperature and precipitation data. Training targets were production value ($) per acre and yield per acre.

**DEPENDENCIES**\
[dependencies]

**INSTRUCTIONS**\
[instructions]

**IMPORTS**

In [None]:
# Insert Imports

## 4.0 RECOMENDATION BUILDER

## 4.0 RECOMENDATION BUILDER
**OVERVIEW**: Accepts dataframe variable contining crop performance and associated justifications. Generates recommendation narrative for each crop using LLM. Supplements recommendation with additional considerations and mitagation information retrieve from RAG.

**DEPENDENCIES**
* Natural Lanugage Processing
    * Local LLM:  Ollama _([dowload]('https://ollama.com/download/windows'))_ running 'phi3:mini' model _([documentation]('https://ollama.com/library/phi3'))_
    * Hosted LLM: OpenAI _([documentation]('https://platform.openai.com/docs/overview'))_ _(ALTERNATIVE)_ 
* Document Loading, Embedding and Retrieval
    * LangChain _([documentation]('https://python.langchain.com/v0.2/docs/introduction/')) loads and splits documents_
    * Unstructured _([documentation]('https://docs.unstructured.io/welcome')) pre-processes pdf documents_
    * OpenAI _([documentation]('https://platform.openai.com/docs/guides/embeddings/')) converts documents into embeddings_
    * ChromaDB _([documentation]('https://docs.trychroma.com/getting-started')) stores embeddings_


**INSTRUCTIONS**
1.  Start the Ollama service by running the following command: `ollama serve`
2.  Allow Ollama service to run in the background while running code
3.  Pull the latest update to the Ollama phi3 model by running the following command:`ollama pull phi3:mini`


**IMPORTS**

In [25]:
import path # supports file paths
import os # supports use of environment variables
import time
from tqdm import tqdm # supports progress monitoring

# Assumes use of local LLM (if using hosted LLM use libaries below instead)
import ollama
from langchain.llms import Ollama
from langchain.embeddings import OllamaEmbeddings

# # Uncomment if using hosted LLM (OpenAI)
# import openai # for hosted LLM option
# from langchain import OpenAI
# from langchain.embeddings import OpenAIEmbeddings

# for loading various document types
from langchain_community.document_loaders import PyPDFLoader, BSHTMLLoader, UnstructuredFileLoader, DirectoryLoader
from bs4 import BeautifulSoup

from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader

# Libraries for prompting and parsing
from langchain.prompts import PromptTemplate
from langchain.output_parsers import RegexParser

In [2]:
# Uncomment code below if using hosted LLM (OpenAI)

# # Helper function for loading API key
# from dotenv import load_dotenv, find_dotenv
# _ = load_dotenv(find_dotenv()) # reads local .env file

# # Loads variable environment for API Key
# openai.api_key = os.environ['OPENAI_API_KEY']

## Document Loading

In [14]:
# Checks current directory path (helps user ensure correct documents_path set
current_dir = os.getcwd()
print("Current working directory:", current_dir)

Current working directory: C:\Users\Jamie\OneDrive\desktop\AI_Bootcamp\MOD_23_Project_3\AgProject3


**USER NOTE**: Variable below (documents_path) must be modified to reflect location of RAG documents

In [15]:
# Loads documents from current working directory
documents_path = './rag_content' # EDIT PATH FOR NEW DIRECTORY AS NEEDED

In [12]:
# Helper function to support html files docs with different encodings
class CustomHTMLLoader(UnstructuredFileLoader):
    def __init__(self, file_path: str):
        super().__init__(file_path)

    def _get_elements(self):
        try:
            with open(self.file_path, 'r', encoding='utf-8') as f:
                content = f.read()
        except UnicodeDecodeError:
            try:
                with open(self.file_path, 'r', encoding='latin-1') as f:
                    content = f.read()
            except UnicodeDecodeError:
                with open(self.file_path, 'r', encoding='cp1252') as f:
                    content = f.read()
        
        soup = BeautifulSoup(content, 'html.parser')
        text = soup.get_text(separator='\n', strip=True)
        return [text]

In [13]:
# Helper function to support loading text files with different encodings
class CustomTextLoader(UnstructuredFileLoader):
    def __init__(self, file_path: str):
        super().__init__(file_path)

    def _get_elements(self):
        try:
            with open(self.file_path, 'r', encoding='utf-8') as f:
                text = f.read()
        except UnicodeDecodeError:
            try:
                with open(self.file_path, 'r', encoding='latin-1') as f:
                    text = f.read()
            except UnicodeDecodeError:
                with open(self.file_path, 'r', encoding='cp1252') as f:
                    text = f.read()
        return [text]

In [16]:
# Sets up loaders for different file types
loaders = {
    "**/*.pdf": PyPDFLoader,
    "**/*.html": CustomHTMLLoader,
    "**/*.txt": CustomTextLoader
}
# Check if the directory exists
if not os.path.exists(documents_path):
    print(f"Directory not found: {documents_path}")
    print("Contents of current directory:")
    print(os.listdir(os.getcwd()))
    raise FileNotFoundError(f"Directory {documents_path} does not exist")

print(f"Directory found: {documents_path}")

# Function to get the appropriate loader
def get_loader(file_path):
    for glob_pattern, loader_class in loaders.items():
        if file_path.endswith(glob_pattern.split("*")[-1]):
            return loader_class(file_path)
    return CustomTextLoader(file_path)  # Default to CustomTextLoader

# Load documents
print("Loading documents...")
documents = []
errors = []

for root, _, files in os.walk(documents_path):
    for file in tqdm(files, desc="Processing files"):
        file_path = os.path.join(root, file)
        try:
            loader = get_loader(file_path)
            docs = loader.load()
            documents.extend(docs)
        except Exception as e:
            errors.append((file_path, str(e)))

print(f"Loaded {len(documents)} documents")
print(f"Encountered {len(errors)} errors")

if errors:
    print("\nErrors encountered:")
    for file_path, error in errors:
        print(f"{file_path}: {error}")

# Print summary of loaded documents
file_types = {}
for doc in documents:
    file_type = os.path.splitext(doc.metadata.get('source', ''))[-1].lstrip('.')
    file_types[file_type] = file_types.get(file_type, 0) + 1

print("\nSummary of loaded documents:")
for file_type, count in file_types.items():
    print(f"{file_type}: {count}")

Directory found: ./rag_content
Loading documents...


Processing files: 100%|██████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  8.71it/s]

Loaded 10 documents
Encountered 0 errors

Summary of loaded documents:
pdf: 7
html: 3





In [None]:
# # Loads pdf from website
# link = "https://www.rma.usda.gov/sites/default/files/topics/good_farming_practices.pdf"
# !wget link -O good_farming_practices.pdf

In [17]:
# Split the documents into smaller chunks for better processing
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

In [20]:
# Splits the documents into smaller chunks for better processing
# Initializes the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Initializes an empty list to store the split documents
start_time = time.time()
split_docs = []
total_chunks = 0

# Monitors text splitter progress
for i, doc in enumerate(tqdm(documents, desc="Splitting documents")):
    # Prints count number of document being processed size of the document before processing
    print(f"\nProcessing document {i+1}/{len(documents)}")
    print(f"Document {i+1} size: {len(doc.page_content)} characters")
    
    # Performs the text splitting
    doc_start_time = time.time()
    split_doc = text_splitter.split_documents([doc])
    split_docs.extend(split_doc)
    
    # Calculates and prints statistics for the current document
    doc_time = time.time() - doc_start_time
    chunks_created = len(split_doc)
    total_chunks += chunks_created
    
    print(f"Document {i+1}/{len(documents)} processed:")
    print(f"  - Chunks created: {chunks_created}")
    print(f"  - Time taken: {doc_time:.2f} seconds")
    
    # Avoid division by zero
    if doc_time > 0:
        print(f"  - Processing speed: {len(doc.page_content) / doc_time:.2f} characters/second")
    else:
        print(f"  - Processing speed: N/A (processed too quickly to measure)")
    
    print(f"Total time elapsed: {time.time() - start_time:.2f} seconds")

# Final statistics
total_time = time.time() - start_time
total_characters = sum(len(doc.page_content) for doc in documents)

print("\nText splitting complete!")
print(f"Total documents processed: {len(documents)}")
print(f"Total chunks created: {total_chunks}")
print(f"Total characters processed: {total_characters}")
print(f"Total time taken: {total_time:.2f} seconds")

# Avoid division by zero in overall statistics
if total_time > 0:
    print(f"Overall processing speed: {total_characters / total_time:.2f} characters/second")
else:
    print("Overall processing speed: N/A (processed too quickly to measure)")

# Now split_docs contains all the split documents
docs = split_docs

Splitting documents: 100%|███████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2845.53it/s]


Processing document 1/10
Document 1 size: 5276 characters
Document 1/10 processed:
  - Chunks created: 7
  - Time taken: 0.00 seconds
  - Processing speed: N/A (processed too quickly to measure)
Total time elapsed: 0.00 seconds

Processing document 2/10
Document 2 size: 6632 characters
Document 2/10 processed:
  - Chunks created: 9
  - Time taken: 0.00 seconds
  - Processing speed: 6574479.82 characters/second
Total time elapsed: 0.00 seconds

Processing document 3/10
Document 3 size: 1079 characters
Document 3/10 processed:
  - Chunks created: 2
  - Time taken: 0.00 seconds
  - Processing speed: N/A (processed too quickly to measure)
Total time elapsed: 0.00 seconds

Processing document 4/10
Document 4 size: 18854 characters
Document 4/10 processed:
  - Chunks created: 24
  - Time taken: 0.00 seconds
  - Processing speed: 18882380.04 characters/second
Total time elapsed: 0.00 seconds

Processing document 5/10
Document 5 size: 2475 characters
Document 5/10 processed:
  - Chunks create




## Text Embedding

In [22]:
# Generate embeddings for the document chunks
print("Generating embeddings and creating vector store...")
start_time = time.time()


# Use Ollama for embeddings (NOTE: Use the second line below instead if using hosted LLM) 
embeddings = OllamaEmbeddings(model="phi3:mini")
# embeddings = OpenAIEmbeddings() 


vector_store = Chroma.from_documents(docs, embeddings)

# Create a progress bar
pbar = tqdm(total=len(docs), desc="Processing documents")

def embed_function(texts):
    results = embeddings.embed_documents(texts)
    pbar.update(len(texts))
    return results

# Create the vector store with the custom embed_function
vector_store = Chroma.from_documents(docs, embeddings, embed_documents=embed_function)

pbar.close()

end_time = time.time()
total_time = end_time - start_time

print(f"\nEmbedding generation and vector store creation completed.")
print(f"Total time taken: {total_time:.2f} seconds")
print(f"Average time per document: {total_time/len(docs):.2f} seconds")

In [24]:
# Create a retriever using the vector store
retriever = vector_store.as_retriever()

## Prompt Design

In [32]:
# Define a prompt template
prompt_template = PromptTemplate(
    input_variables=["query"],
    template="You are an agricultural specialist who advises farmers on how to optimize farm operations and mitigate against weather and climate disasters. Please respond to the following: {query}"
)

In [31]:
# if using Ollama 
llm = Ollama(model="phi3:mini")

# Set up the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={
        "prompt": prompt_template,
    }
)

ValidationError: 1 validation error for StuffDocumentsChain
__root__
  document_variable_name context was not found in llm_chain input_variables: ['query'] (type=value_error)

In [None]:
# Set up the RetrievalQA chain
qa_chain = RetrievalQA(
    llm=OpenAI(),  # Use Ollama if available, otherwise OpenAI
    retriever=retriever
)

### Output Parser

In [None]:
# Define an output parser
output_parser = RegexParser(
    pattern=r"Answer: (.*)",
    output_keys=["answer"]
)

In [None]:
# # If using OpenAI
# # Set up the RetrievalQA chain with prompt template and output parser
# qa_chain = RetrievalQA(
#     llm=OpenAI(),  # Use Ollama if available, otherwise OpenAI
#     retriever=retriever,
#     prompt_template=prompt_template,
#     output_parser=output_parser
# )

In [None]:
# Example query
query = "What is the capital of France?"
formatted_query = prompt_template.format(query=query)
response = qa_chain.run(formatted_query)
parsed_response = output_parser.parse(response)

print(f"Response: {parsed_response['answer']}")

### Direct RAG Query 

In [None]:
# Example query
query = "What is the capital of France?"
response = qa_chain.run(query)
print(f"Response: {response}")