Skip to content

jonathan2951/ONTOLOGY_GENERATION_API

Repository files navigation

Ontology Generation API

A FastAPI service for generating ontologies through LLM-powered workflows with MongoDB persistence and comprehensive quality assurance.

Python FastAPI MongoDB

📋 Table of Contents

🎯 Overview

The Ontology Generation API is a comprehensive service that generates ontologies through a multi-step LLM-powered process. It creates structured ontologies based on business ideas and tenant requirements, stores them in MongoDB, and provides quality assurance validation.

Workflow

The ontology generation process follows a structured 3-step workflow:

  1. Use Case Generation & Ranking

    • LLM generates 2-5 use cases based on business_idea and tenant inputs
    • Each use case is ranked with relevance and importance scores (0-1)
    • Results are saved to the MONGODB_RULES collection
    • Ranking evaluates:
      • Use case relevance to the business idea
      • Use case importance to the business idea
  2. ERD Generation

    • For each use case, an Entity-Relationship Diagram (ERD) is generated
    • Results are saved to the MONGODB_COLNAME collection
  3. Quality Assurance

    • Validates that generated Tables and Columns meet required fields according to Pydantic models
    • Scores tables and columns for compliance
    • Note: This is a structural validation, not content relevance validation
    • Results are saved to the MONGODB_QA collection

✨ Features

  • 🧠 LLM-Powered Generation: Leverages OpenAI models for intelligent ontology creation
  • 📊 Quality Assurance: Automated validation of generated ontologies
  • 🎯 Use Case Ranking: Intelligent scoring of use case relevance and importance
  • 🗄️ MongoDB Persistence: Robust data storage with MongoDB
  • 🔍 RESTful API: Clean, well-documented REST endpoints
  • 🎨 Streamlit Frontend: Interactive web UI for ontology management
  • 🐳 Docker Support: Containerized deployment ready
  • ☸️ Kubernetes Ready: Helm charts included for Kubernetes deployment
  • 📝 Type Safety: Pydantic models for request/response validation with complete type annotations across service layer and API routes
  • 🔒 Environment-Based Configuration: Secure configuration management
  • 📈 Structured Logging: Comprehensive logging throughout the application
  • 🛡️ Error Handling: Robust error handling with proper exception propagation and HTTP status codes
  • 📚 Auto-Generated Documentation: OpenAPI/Swagger documentation automatically generated from type annotations

🏗️ Architecture

Technology Stack

  • Framework: FastAPI 0.119+
  • Database: MongoDB (via PyMongo)
  • LLM: OpenAI API
  • Frontend: Streamlit
  • Python: 3.12+
  • Package Manager: UV (optional) or pip

Design Principles

  • Separation of Concerns: Clear separation between API routes, business logic, and data access
  • Modular Architecture: Routes organized by resource type
  • Type Safety: Comprehensive Pydantic models
  • Error Handling: Consistent error responses with appropriate HTTP status codes
  • Configuration Management: Centralized settings with environment variable validation

🚀 Quick Start

Prerequisites

  • Python 3.12 or higher
  • MongoDB instance (local or cloud)
  • OpenAI API key
  • (Optional) UV package manager

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd ontology-generation-api
  2. Install dependencies:

    Using pip:

    pip install -r requirements.txt

    Using UV (recommended):

    uv sync
  3. Set up environment variables:

    Copy the example environment file:

    cp env.example .env

    Edit .env with your configuration:

    # MongoDB Configuration
    MONGODB_URL="mongodb://localhost:27017"  # or mongodb+srv://...
    MONGODB_DBNAME="ontoai"
    MONGODB_RULES="ontologies_ideation"
    MONGODB_COLNAME="ontologies_v2"
    MONGODB_QA="ontologies_qa"
    
    # OpenAI Configuration
    OPENAI_API_KEY="your-openai-api-key"
    OPENAI_BASE_URL="https://api.openai.com/v1"
    OPENAI_MODEL_LARGE="gpt-4o"
    OPENAI_MODEL_SMALL="gpt-4o-mini"
    OPENAI_MODEL_NANO="gpt-4o-mini"
    
    # API Configuration (optional)
    HOST="0.0.0.0"
    PORT="8000"
    LOG_LEVEL="INFO"
  4. Start MongoDB (if running locally):

    # macOS
    brew services start mongodb/brew/mongodb-community
    
    # Linux
    sudo systemctl start mongod
    
    # Docker
    docker run -d -p 27017:27017 --name mongodb mongo:latest
  5. Run the API:

    Using UV:

    uv run uvicorn src.app:app --reload

    Using Python:

    python -m uvicorn src.app:app --reload

    Or directly:

    uvicorn src.app:app --reload
  6. Verify the API is running:

    • Visit http://localhost:8000/docs for interactive API documentation
    • Visit http://localhost:8000/ for the root endpoint
    • Visit http://localhost:8000/readiness to check MongoDB connection

🔧 Configuration

Required Environment Variables

Variable Description Example
MONGODB_URL MongoDB connection URI mongodb://localhost:27017 or mongodb+srv://...
MONGODB_DBNAME MongoDB database name ontoai
MONGODB_COLNAME MongoDB collection for ontologies ontologies_v2
MONGODB_RULES MongoDB collection for use case rules ontologies_ideation
MONGODB_QA MongoDB collection for QA results ontologies_qa
OPENAI_API_KEY OpenAI API key sk-...
OPENAI_BASE_URL OpenAI API base URL https://api.openai.com/v1

Optional Environment Variables

Variable Description Default
HOST Server host 0.0.0.0
PORT Server port 8000
LOG_LEVEL Logging level INFO
OPENAI_MODEL_LARGE Large OpenAI model gpt-4o
OPENAI_MODEL_SMALL Small OpenAI model gpt-4o-mini
OPENAI_MODEL_NANO Nano OpenAI model gpt-4o-mini

MongoDB Connection

  • Local MongoDB: Use mongodb://localhost:27017
  • Docker MongoDB: Use mongodb://host.docker.internal:27017 when running API in Docker
  • Cloud MongoDB: Use mongodb+srv://***/ format

📖 API Documentation

Interactive Documentation

Once the API is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

The API documentation is automatically generated from the route handler type annotations, ensuring accuracy and consistency between the code and documentation.

Endpoints

All endpoints have complete type annotations for request parameters and response types, enabling:

  • Automatic OpenAPI schema generation
  • Type checking and validation
  • Better IDE support and autocomplete
  • Accurate API documentation

Health & Status

GET /

Root endpoint for basic health check.

Response:

{
  "message": "Ontology Generation API is running",
  "version": "1.0.0"
}
GET /readiness

Check MongoDB connection and service readiness.

Response:

{
  "status": "ready",
  "services_ready": true,
  "timestamp": "2025-11-06T21:59:49.625368+00:00",
  "version": "1.0.0"
}

Ontologies

GET /api/v1/ontologies/health

Health check for the ontologies router.

Response:

{
  "msg": "Hello from ontologies router"
}
GET /api/v1/ontologies/ontologies

Get all ontologies from MongoDB.

Response: list[OntologiesMongo]

GET /api/v1/ontologies/ontologies_ids

Get all ontology ObjectIds from MongoDB.

Response: list[str]

Example:

[
  "69069d0c358d4c067d0e9156",
  "6906a30a96448b738a9d96a6",
  "6907ae6c7da72bcf7d70e1cf"
]
GET /api/v1/ontologies/ontology/{id}

Get a specific ontology by ObjectId.

Parameters:

  • id (path): Ontology ObjectId

Response: OntologiesMongo

Note: Returns 404 if the ontology ID is not found.

Example:

{
  "ontologies": [
    {
      "use_case_name": "Product and Channel Profitability Analytics",
      "use_case_uid": "1",
      "use_case_relevance_score": 0.88,
      "use_case_relevance_score_motivation": "Directly informs pricing discounts...",
      "use_case_importance_score": 0.9,
      "use_case_importance_score_motivation": "High immediate impact on gross margins...",
      "domains": "Finance and product profitability analytics",
      "questions": [...],
      "concepts": [...],
      "erd": {}
    }
  ],
  "rules_id": "690d0d35f15220b7a7a8582f",
  "initial_business_idea": "finance and accounting",
  "_id": "690d0e66f15220b7a7a85830",
  "date_created": "2025-11-06T13:08:54.791000"
}
GET /api/v1/ontologies/ontology_ranking/{id}

Get ontology ranking (use case scores) by ObjectId.

Parameters:

  • id (path): Ontology ObjectId

Response: list[UseCaseRanking]

Note: Returns 404 if the ontology ID is not found.

Example:

[
  {
    "use_case_name": "Tariff-aware Costing and Wholesale Margin Optimization",
    "use_case_uid": "1",
    "use_case_relevance_score": 0.92,
    "use_case_relevance_score_motivation": "Tariffs directly influence landed cost...",
    "use_case_importance_score": 0.88,
    "use_case_importance_score_motivation": "Immediate profitability and alignment...",
    "initial_business_idea": "tariffs supply chain"
  }
]
GET /api/v1/ontologies/ontology_qa/{id}

Get QA results for an ontology by ObjectId.

Parameters:

  • id (path): Ontology ObjectId

Response: dict[str, Any] | None

Note:

  • Returns None if QA has not been run for this ontology yet
  • QA is automatically run when an ontology is created via the POST /api/v1/ontologies/create_ontology endpoint
  • Returns 404 if the ontology ID is not found

Example:

{
  "_id": "690d0e66f15220b7a7a85831",
  "use_case_id": "690d0e66f15220b7a7a85830",
  "qa": [
    {
      "id": "690d0e66f15220b7a7a85830",
      "use_case_id": "1",
      "tables_scores": [
        {
          "name": "RevenueFact",
          "score": 100
        }
      ],
      "columns_scores": [
        {
          "name": "RevenueFactId",
          "score": 100,
          "parent_table": "RevenueFact"
        }
      ]
    }
  ]
}
GET /api/v1/ontologies/ontology_candidate/best

Get the best ontology candidate (highest average of importance and relevance scores).

Parameters:

  • id (query): Ontology ObjectId

Response: Ontology

Note: Returns 404 if the ontology ID is not found. The best candidate is determined by averaging the use_case_relevance_score and use_case_importance_score for each use case.

Example:

{
  "use_case_name": "Product and Channel Profitability Analytics",
  "use_case_uid": "1",
  "use_case_relevance_score": 0.88,
  "use_case_relevance_score_motivation": "Directly informs pricing discounts...",
  "use_case_importance_score": 0.9,
  "use_case_importance_score_motivation": "High immediate impact on gross margins...",
  "domains": "Finance and product profitability analytics",
  "questions": [...],
  "concepts": [...],
  "erd": {}
}
POST /api/v1/ontologies/create_ontology

Create a new ontology for a use case and tenant.

Request Body:

{
  "use_case": "finance and accounting",
  "tenant": "your-tenant-name"
}

Response: dict[str, str]

Example:

{
  "id": "690d0e66f15220b7a7a85830",
  "status": "ontology created, QA'ed, saved to MongoDB"
}

Note:

  • This endpoint triggers the full ontology generation workflow (use case generation, ERD creation, and QA validation)
  • QA is automatically performed and saved to MongoDB
  • Returns 500 if ontology generation fails at any step
DELETE /api/v1/ontologies/delete/{id}

Delete an ontology by ObjectId.

Parameters:

  • id (path): Ontology ObjectId

Response: dict[str, str | bool]

Example:

{
  "id": "690d0e66f15220b7a7a85830",
  "acknowledged": true
}

Note:

  • Returns 404 if the ontology ID is not found
  • Returns 500 if deletion fails

💻 Development

Development Setup

  1. Install development dependencies:

    pip install -r requirements.txt
    # or
    uv sync
  2. Run with auto-reload:

    uvicorn src.app:app --reload --host 0.0.0.0 --port 8000
  3. Run with UV:

    uv run uvicorn src.app:app --reload

Code Quality

The codebase follows best practices for type safety and error handling:

  • Complete Type Annotations: All service layer and API route handler functions have proper return type annotations
  • API Type Safety: All FastAPI route handlers are fully typed, enabling automatic OpenAPI schema generation and better IDE support
  • Error Handling: Functions properly raise exceptions instead of silently failing
  • Type Safety: Uses Pydantic models for data validation and type checking
  • MongoDB Operations: Proper handling of None returns from database queries

Code Structure

The project follows a clean architecture with clear separation of concerns:

src/
├── api/              # API routes and endpoints
│   ├── routes/       # Route handlers by resource
│   └── routes.py     # Main API router
├── config/           # Configuration management
│   └── settings.py   # Environment settings
├── models/           # Pydantic models
│   ├── health_models.py
│   ├── ontologies_models.py
│   ├── qa_models.py
│   └── use_cases_models.py
├── services/         # Business logic
│   ├── ontologies_service.py
│   └── openai_service.py
├── utils/            # Utility functions
│   ├── prompts.py    # LLM prompts
│   └── qa.py         # QA validation
├── frontend/         # Streamlit frontend pages
│   └── pages/
├── app.py            # FastAPI application
└── startup.py        # Startup and shutdown logic

🐳 Docker Deployment

Build Docker Image

docker build -t ontology-generation-api .

Run Docker Container

For local MongoDB:

docker run -d \
  --name ontology-generation-api-container \
  -p 8000:8000 \
  -e MONGODB_URL=mongodb://host.docker.internal:27017 \
  -e MONGODB_DBNAME=ontoai \
  -e MONGODB_COLNAME=ontologies_v2 \
  -e MONGODB_RULES=ontologies_ideation \
  -e MONGODB_QA=ontologies_qa \
  -e OPENAI_API_KEY=your-api-key \
  -e OPENAI_BASE_URL=https://api.openai.com/v1 \
  ontology-generation-api

For cloud MongoDB:

docker run -d \
  --name ontology-generation-api-container \
  -p 8000:8000 \
  -e MONGODB_URL=mongodb+srv://user:password@cluster.mongodb.net/ \
  -e MONGODB_DBNAME=ontoai \
  -e MONGODB_COLNAME=ontologies_v2 \
  -e MONGODB_RULES=ontologies_ideation \
  -e MONGODB_QA=ontologies_qa \
  -e OPENAI_API_KEY=your-api-key \
  -e OPENAI_BASE_URL=https://api.openai.com/v1 \
  ontology-generation-api

Docker Compose

Use the provided docker-compose.yml for a complete setup with MongoDB:

docker-compose up -d

View Logs

docker logs ontology-generation-api-container

Expected output:

INFO:     Started server process [7]
INFO:     Waiting for application startup.
2025-10-20 23:38:17,443 - src.startup - INFO - ✅ Mongodb connection established
2025-10-20 23:38:17,464 - src.startup - INFO - ✅ openai connection established
2025-10-20 23:38:17,464 - src.startup - INFO - ✅ Ontology Generation API initialization complete!
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

🎨 Frontend

The project includes a Streamlit-based web UI for interacting with the API.

Running the Frontend

streamlit run streamlit_app.py

The frontend will be available at http://localhost:8501

Frontend Pages

  1. View Ontologies: View a specific ontology by selecting its ID
  2. Create Ontology: Create a new ontology based on use case and tenant
  3. Details Ontology: View detailed information about an ontology
  4. Read Me: Instructions and documentation

Frontend Configuration

Set the backend URL via environment variable:

export BACKEND_URL=http://127.0.0.1:8000

Or modify it in streamlit_app.py:

BACKEND_URL = os.getenv("BACKEND_URL", "http://127.0.0.1:8000")

🧪 Testing

Run tests using pytest:

pytest tests/test.py

Or with verbose output:

pytest tests/test.py -v

🐛 Troubleshooting

Common Issues

1. MongoDB Connection Failed

Symptoms: ConnectionFailure error or readiness check fails

Solutions:

  • Verify MongoDB is running: mongosh or mongo --eval "db.adminCommand('ping')"
  • Check MONGODB_URL is correct
  • For Docker: Use mongodb://host.docker.internal:27017 for local MongoDB
  • Check firewall settings
  • Verify MongoDB credentials (for cloud instances)

2. OpenAI API Errors

Symptoms: API calls fail with authentication or rate limit errors

Solutions:

  • Verify OPENAI_API_KEY is set correctly
  • Check API key has sufficient credits
  • Verify OPENAI_BASE_URL is correct
  • Check rate limits in OpenAI dashboard

3. Missing Environment Variables

Symptoms: ValueError: Missing required environment variables

Solutions:

  • Copy env.example to .env
  • Verify all required variables are set
  • Check variable names match exactly (case-sensitive)
  • Restart the application after setting variables

4. Port Already in Use

Symptoms: Address already in use error

Solutions:

  • Change PORT environment variable
  • Kill the process using the port: lsof -ti:8000 | xargs kill
  • Use a different port: uvicorn src.app:app --port 8001

5. Import Errors

Symptoms: ModuleNotFoundError or import errors

Solutions:

  • Ensure virtual environment is activated
  • Reinstall dependencies: pip install -r requirements.txt
  • Verify Python version is 3.12+
  • Check PYTHONPATH is set correctly

Getting Help

  1. Check Logs: Review application logs for detailed error information
  2. API Documentation: Use interactive docs at /docs to test endpoints
  3. Validate Configuration: The app validates settings on startup
  4. MongoDB Status: Check MongoDB connection with /readiness endpoint

📁 Project Structure

ontology-generation-api/
├── deploy_script.sh          # Deployment script
├── docker-compose.yml        # Docker Compose configuration
├── Dockerfile                # Docker image definition
├── env.example               # Environment variables example
├── pyproject.toml            # Project configuration (UV)
├── requirements.txt          # Python dependencies
├── README.md                 # This file
├── streamlit_app.py          # Streamlit frontend entry point
├── uv.lock                   # UV lock file
├── infra/                    # Infrastructure as code
│   └── helm/                 # Kubernetes Helm charts
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── _helpers.tpl
│           ├── deployment.yaml
│           └── service.yaml
├── src/                      # Source code
│   ├── api/                  # API routes
│   │   ├── routes/
│   │   │   ├── __init__.py
│   │   │   └── ontology.py
│   │   └── routes.py
│   ├── app.py                # FastAPI application
│   ├── config/               # Configuration
│   │   ├── __init__.py
│   │   └── settings.py
│   ├── frontend/             # Streamlit frontend
│   │   ├── __init__.py
│   │   └── pages/
│   │       ├── __init__.py
│   │       ├── create.py
│   │       ├── delete.py
│   │       ├── details.py
│   │       ├── instructions.py
│   │       ├── read_use_cases.py
│   │       └── read.py
│   ├── models/               # Pydantic models
│   │   ├── health_models.py
│   │   ├── ontologies_models.py
│   │   ├── qa_models.py
│   │   └── use_cases_models.py
│   ├── services/             # Business logic
│   │   ├── __init__.py
│   │   ├── ontologies_service.py
│   │   └── openai_service.py
│   ├── startup.py            # Startup/shutdown logic
│   └── utils/                # Utilities
│       ├── __init__.py
│       ├── prompts.py
│       └── qa.py
└── tests/                    # Tests
    ├── __init__.py
    └── test.py

📄 License

[Add your license information here]

🤝 Contributing

[Add contributing guidelines here]


Version: 1.0.0
Last Updated: 2025-11-08

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors