<a href="https://colab.research.google.com/github/nelslindahlx/KnowledgeReduce/blob/main/KGpkgBuildv4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Code Summary Building a Knowledge Graph Package in Python

This notebook provides a comprehensive guide to creating a Python package for managing knowledge graphs. It is designed to walk you through the complete process of package creation, from initial setup to final packaging. The notebook is structured into seven distinct steps, each focusing on a critical aspect of the package development process.

### Summary of Steps and Code:

#### Step 1: Package Structure
- Introduction to the directory and file structure necessary for a Python package. This step focuses on the conceptual layout rather than actual code.

#### Step 2: Initial Python Files
- **Importing Dependencies**: We start by importing essential Python modules like `os`.
- **Package Initialization**: Creation of the `__init__.py` file to designate the `knowledge_graph_pkg` as a Python package.
- **Core Module**: Development of the `core.py` file, which forms the backbone of the package, containing the primary functionality of the knowledge graph.
- **Test Suite Setup**: Establishment of a testing framework with an `__init__.py` file and a `test_core.py` file in the `tests` subdirectory, ensuring robustness and reliability of our package.

#### Step 3: Essential Package Files
- **Setup Script (`setup.py`)**: Crafting a `setup.py` file, which includes all necessary package metadata and installation instructions.
- **README Documentation (`README.md`)**: Writing a `README.md` file to provide an overview and guidelines for using the package.
- **License Agreement (`LICENSE`)**: Adding a `LICENSE` file to define the usage rights and restrictions for the package.

#### Step 4: Packaging
- **Zipping the Package**: Instructions on how to compress the package contents into a `.zip` file, making it ready for distribution.

### Step 5: Download Preparation
- **Download Trigger**: Code snippet for enabling the download of the zipped package in Google Colab.
- **Suggested Update**: Mention how to handle this process outside of Google Colab.

### Step 6: Make a Git Repository
- **Instructions**: Navigating to the package directory and initializing a Git repository.
- **Suggested Update**: Include steps for committing initial files and pushing to a remote repository.

### Step 7: Download the repo from GitHub
- **Process**: Cloning a GitHub repository and installing the package using `pip`.
- **Suggested Update**: Provide guidance on versioning and releasing the package on GitHub.

Each step in this notebook is well-documented with markdown explanations, followed by practical code implementations. This structured approach not only facilitates the understanding of Python package creation but also serves as a template for building and distributing Python-based software.


# Step 1: Create the Package Structure

In [1]:
import os

# Create main package directory and subdirectories
os.makedirs('knowledge_graph_pkg', exist_ok=True)
os.makedirs('knowledge_graph_pkg/tests', exist_ok=True)

# Step 2: Create Initial Python Files

Package Initialization and Core Module:

In [2]:
%%writefile knowledge_graph_pkg/__init__.py
from .core import KnowledgeGraph
# This file now imports the KnowledgeGraph class from the core module

Writing knowledge_graph_pkg/__init__.py


Core Module with Knowledge Schema:

In [3]:
%%writefile knowledge_graph_pkg/core.py
"""
Core functionality of the knowledge graph package.
This module provides the basic knowledge schema and features for creating and managing knowledge graphs.
"""

import networkx as nx
from datetime import datetime
from enum import Enum

class ReliabilityRating(Enum):
    UNVERIFIED = 1
    POSSIBLY_TRUE = 2
    LIKELY_TRUE = 3
    VERIFIED = 4

class KnowledgeGraph:
    def __init__(self):
        self.graph = nx.DiGraph()

    def validate_fact_id(self, fact_id):
        if not isinstance(fact_id, str) or not fact_id:
            raise ValueError("Fact ID must be a non-empty string.")

    def validate_reliability_rating(self, rating):
        if not isinstance(rating, ReliabilityRating):
            raise ValueError("Reliability rating must be an instance of ReliabilityRating Enum.")

    def add_fact(self, fact_id, fact_statement, category, tags, date_recorded, last_updated,
                 reliability_rating, source_id, source_title, author_creator,
                 publication_date, url_reference, related_facts, contextual_notes,
                 access_level, usage_count):
        self.validate_fact_id(fact_id)
        self.validate_reliability_rating(reliability_rating)
        # Additional validations for other parameters can be added here
        try:
            # Conversion of list and datetime objects to strings for storage
            tags_str = ', '.join(tags) if tags else ''
            date_recorded_str = date_recorded.isoformat() if isinstance(date_recorded, datetime) else date_recorded
            last_updated_str = last_updated.isoformat() if isinstance(last_updated, datetime) else last_updated
            publication_date_str = publication_date.isoformat() if isinstance(publication_date, datetime) else publication_date

            # Adding fact to the graph
            self.graph.add_node(fact_id, fact_statement=fact_statement, category=category,
                                tags=tags_str, date_recorded=date_recorded_str, last_updated=last_updated_str,
                                reliability_rating=reliability_rating, source_id=source_id, source_title=source_title,
                                author_creator=author_creator, publication_date=publication_date_str,
                                url_reference=url_reference, related_facts=related_facts, contextual_notes=contextual_notes,
                                access_level=access_level, usage_count=usage_count)
        except Exception as e:
            raise Exception(f"Error adding fact: {e}")

    def get_fact(self, fact_id):
        self.validate_fact_id(fact_id)
        if fact_id not in self.graph:
            raise ValueError(f"Fact ID '{fact_id}' not found in the graph.")
        return self.graph.nodes[fact_id]

    def update_fact(self, fact_id, **kwargs):
        self.validate_fact_id(fact_id)
        if fact_id not in self.graph:
            raise ValueError(f"Fact ID '{fact_id}' not found in the graph.")
        try:
            for key, value in kwargs.items():
                if key in self.graph.nodes[fact_id]:
                    self.graph.nodes[fact_id][key] = value
                else:
                    raise ValueError(f"Invalid attribute '{key}' for fact update.")
        except Exception as e:
            raise Exception(f"Error updating fact: {e}")

Writing knowledge_graph_pkg/core.py


Test Module:

In [4]:
%%writefile knowledge_graph_pkg/tests/__init__.py
# This file allows the tests directory to be treated as a package

Writing knowledge_graph_pkg/tests/__init__.py


In [5]:
%%writefile knowledge_graph_pkg/tests/test_core.py
"""
Tests for the knowledge graph core functionality.
"""

import unittest
from knowledge_graph_pkg.core import KnowledgeGraph, ReliabilityRating
from datetime import datetime

class TestKnowledgeGraph(unittest.TestCase):

    def setUp(self):
        # Setup a KnowledgeGraph instance for each test
        self.kg = KnowledgeGraph()

    def test_graph_initialization(self):
        # Test if the graph is initialized correctly
        self.assertIsNotNone(self.kg.graph)
        self.assertEqual(len(self.kg.graph.nodes), 0)

    def test_adding_and_getting_fact(self):
        # Test adding a fact and then retrieving it
        fact_id = "fact1"
        self.kg.add_fact(fact_id, "The sky is blue", "Science", ["sky", "color"],
                         datetime.now(), datetime.now(),
                         ReliabilityRating.VERIFIED, "source1", "Nature Journal",
                         "Dr. Sky Watcher", datetime.now(), "https://example.com/fact1",
                         [], "Some notes", "public", 5)

        fact = self.kg.get_fact(fact_id)
        self.assertIsNotNone(fact)
        self.assertEqual(fact['fact_statement'], "The sky is blue")

    def test_fact_quality_score(self):
        # Test the quality score calculation
        fact_id = "fact2"
        self.kg.add_fact(fact_id, "Water boils at 100°C", "Science", ["water", "boiling point"],
                         datetime.now(), datetime.now(),
                         ReliabilityRating.VERIFIED, "source2", "Science Daily",
                         "Dr. H2O", datetime.now(), "https://example.com/fact2",
                         [], "Boiling point at sea level", "public", 10)

        fact = self.kg.get_fact(fact_id)
        expected_score = 10 * ReliabilityRating.VERIFIED.value + 2 * 10  # Based on your scoring logic
        self.assertEqual(fact['quality_score'], expected_score)

    # Additional tests can be added here for other functionalities like updating facts, error handling, etc.

if __name__ == '__main__':
    unittest.main()

Writing knowledge_graph_pkg/tests/test_core.py


# Step 3: Create setup.py, README.md, and LICENSE

Setup Script:

In [6]:
%%writefile knowledge_graph_pkg/setup.py
from setuptools import setup, find_packages

setup(
    name='knowledge_graph_pkg',
    version='0.1',
    author='Nels Lindahl',
    author_email='nels@nelslindahl.com',
    description='A Python package for creating and managing portable knowledge graphs',
    packages=find_packages(),
    install_requires=['requests', 'beautifulsoup4', 'networkx', 'spacy'],
    python_requires='>=3.6',
)

Writing knowledge_graph_pkg/setup.py


README File:

In [7]:
%%writefile knowledge_graph_pkg/README.md
# Knowledge Graph Package

This Python package facilitates creating and managing portable knowledge graphs.

## Installation
pip install knowledge_graph_pkg

## Usage
from knowledge_graph_pkg import KnowledgeGraph

kg = KnowledgeGraph()
kg.add_fact('fact1', {'detail': 'Example fact data'})
print(kg.get_fact('fact1'))

Writing knowledge_graph_pkg/README.md


License file

In [8]:
%%writefile knowledge_graph_pkg/LICENSE
MIT License

Copyright (c) 2023 Nels Lindahl

Permission is hereby granted, free of charge, to any person obtaining a copy...

Writing knowledge_graph_pkg/LICENSE


# Step 4: Zip the Package

In [9]:
!zip -r knowledge_graph_package.zip knowledge_graph_pkg

  adding: knowledge_graph_pkg/ (stored 0%)
  adding: knowledge_graph_pkg/LICENSE (deflated 11%)
  adding: knowledge_graph_pkg/tests/ (stored 0%)
  adding: knowledge_graph_pkg/tests/test_core.py (deflated 63%)
  adding: knowledge_graph_pkg/tests/__init__.py (deflated 6%)
  adding: knowledge_graph_pkg/README.md (deflated 38%)
  adding: knowledge_graph_pkg/core.py (deflated 69%)
  adding: knowledge_graph_pkg/__init__.py (deflated 24%)
  adding: knowledge_graph_pkg/setup.py (deflated 39%)


# Step 5: Trigger the Download

In [10]:
# Import the files module from google.colab
from google.colab import files

# Trigger the download which is commented out probably
# files.download('knowledge_graph_package.zip')

# Step 6: Make a Git Repository

In [11]:
import os
import subprocess

# Navigate to the package directory 'knowledge_graph_pkg'
os.chdir('knowledge_graph_pkg')

# Initialize a Git repository
try:
    subprocess.run(['git', 'init'], check=True)
    print(".git directory created successfully in 'knowledge_graph_pkg'.")
except subprocess.CalledProcessError as e:
    print(f"Error during Git initialization: {e}")

.git directory created successfully in 'knowledge_graph_pkg'.


# Step 7: Download the repo from my Github

In [12]:
import subprocess
import sys

# URL of the GitHub repository
repo_url = "https://github.com/nelslindahlx/KnowledgeReduce.git"

# Clone the repository
subprocess.run(["git", "clone", repo_url], check=True)

# Path to the package inside the cloned repository
package_path = "KnowledgeReduce/knowledge_graph_pkg"

# Install the package using pip
subprocess.run([sys.executable, "-m", "pip", "install", package_path], check=True)


CompletedProcess(args=['/usr/bin/python3', '-m', 'pip', 'install', 'KnowledgeReduce/knowledge_graph_pkg'], returncode=0)