# ChaotEX data models and pipleine

*reference:* ChatGPT -> `ChaotEX` -> `ChaotEX: pipleine`  
```my_input
raw->rough->polished->published

Or

source->segment->ordered_segments_with_context-> versioned_artifacts

Each stage has its own canonical storage.
Data models ensure lineage can be tracked.
This generic model can work for 4 stages or cleaning unorganized data, videos, pictures, notes, etc, eventually.

The methods to move between levels change depending on input data type, but the stages remain the same.

Donâ€™t rewrite, just looking for feedback on my idea and compared to existing successful models in the world of tech
```

## Pipeline Concept

   raw->         rough->        polished->   published  
user_input->source->capsule->module->corpus->artifacts

---

From user_upload + context comes source.

From source comes capsule,  
by defining boundaries and adding context.  

From capsule comes module,  
by organizing capsules and adding context.  

From modules come corpus,  
by storing similar modules in a canonical location.

From corpus comes artifacts,  
by publishing all or parts of corpus.

---

Basically,  
1. `User` will upload data and add context to create a `Source`.
2. A `Capsule` is created by defining segments of a `Source` and addidng context.
3. A `Module` consists of organized `Capsules` with context.
4. A `Corpus` consists of organized `Modules`  with context.
5. `Artifacts` are exported from a `Corpus`.

`Context` comes in different forms:  
- `metadata` (what is this)
- `interpretive` (structured; `comments`, `ideas`)  
- `notes` (unstrucutred; development to interpretive)
- `tags` (search/filter)

`notes` are used to fill in the gaps of intput where a strucutred response doesn't exist, at least not yet.

## 12-11-2025 at 11:05pm  

I just learned the importance of not getting stuck in a feedback loop without action.

Building is what makes the feedback work.  But feedback doesn't come with something in place.

I can now successly `print_report()`.  

Now I need to build:

- a small temp "database" using `build_()`, `save_()`, `load_()` functions.


If this ever fails, your API is wrong:
```python
obj == load(save(obj))
```

In [1]:
from dataclasses import dataclass
from typing import List, Tuple

In [25]:
@dataclass
class User:
  """User information"""
  username:   str #uniquie username <10 char?
  first_name: str #user first name
  last_name:  str # user last name

  def __str__(self):
    return(
        f"User\n"
        f"--------------------\n"
        f"username:   {self.username}\n"
        f"first_name: {self.first_name}\n"
        f"last_name:  {self.last_name}\n"
    )

def build_user(
  *,
  username: str,
  first_name: str,
  last_name: str
  ) -> User:

  return User(
    username   = username,
    first_name = first_name,
    last_name  = last_name
    )

user = build_user(
  username   = "noahfrank",
  first_name = "Noah",
  last_name  = "Frank"
  )

print(user)

User
--------------------
username:   noahfrank
first_name: Noah
last_name:  Frank



In [4]:
@dataclass
class Source:
    """A data item uploaded by user with context: metadata"""
    source_id: int #database ID, for lineage
    added_by:  str #who added this.source
    ref:       str #link to record in database
    title:     str #short descriptive human reference
    data_type: str #video | image | doc | etc.
    creation:  str #when was data created, if known
    added_at:  str #timestamp of when this.source was added
    notes:     str #unstrucutred user text input about this.source
    tags:      List[str] #search and filter terms

    def __str__(self):
        return (
            f"Source \n"
            f"--------------------\n"
            f"source_id: {self.source_id}\n"
            f"added_by:  {self.added_by}\n"
            f"ref:       {self.ref}\n"
            f"title:     {self.title}\n"
            f"data_type: {self.data_type}\n"
            f"creation:  {self.creation}\n"
            f"added_at:  {self.added_at}\n"
            f"notes:     {self.notes}\n"
            f"tags:      {self.tags}\n"
        )

source = Source(
    source_id = 101,
    added_by  = user.username,
    ref       = "link-to-record",
    title     = "Tunnel-Training-01",
    data_type = "video",
    creation  = "12-03-2025",
    added_at  = "12:39pm on 12-03-2025",
    notes     = "some real crazy stuff",
    tags      = ["tunnel", "skydiving", "coach"]
    )

print(source)


Source 
--------------------
source_id: 101
added_by:  noahfrank
ref:       link-to-record
title:     Tunnel-Training-01
data_type: video
creation:  12-03-2025
added_at:  12:39pm on 12-03-2025
notes:     some real crazy stuff
tags:      ['tunnel', 'skydiving', 'coach']



In [5]:
@dataclass
class Capsule:
    """A segment of a Source defined by user with context"""
    capsule_id: int #database ID, for lineage
    created_by: str #who created this.capsule
    title:      str #short descriptive human reference
    creation:   str #when was data created
    notes:      str #unstrucutred user text input about this.capsule
    tags:       List[str] #search and filter terms
    bounds:     Tuple[float, float] #begin/end timestamps, line number, etc
    comments:   List[str] #structured training input; interpretive
    ideas:      List[str] #structured content input; interpretive
    source_id:  int #source_source of source item

    def __str__(self):
        return (
            f"Capsule \n"
            f"--------------------\n"
            f"capsule_id: {self.capsule_id}\n"
            f"source_id:  {self.source_id}\n"
            f"created_by: {self.created_by}\n"
            f"title:      {self.title}\n"
            f"creation:   {self.creation}\n"
            f"notes:      {self.notes}\n"
            f"tags:       {self.tags}\n"
            f"bounds:     {self.bounds}\n"
            f"comments:   {self.comments}\n"
            f"ideas:      {self.ideas}\n"
        )

capsule_1 = Capsule(
    capsule_id = 201,
    created_by = user.username,
    title      = "Good-box-drill",
    creation   = "12-03-2025",
    notes      = "coach had good feedback",
    tags       = ["tunnel", "box-drill", "coach"],
    bounds     = [1.00, 2.00],
    comments   = ["comment 1", "comment 2"],
    ideas      = ["idea 1"],
    source_id  = source.source_id
    )

capsule_2 = Capsule(
    capsule_id = 202,
    created_by = user.username,
    title      = "Good-tunnel-entrance",
    creation   = "12-08-2025",
    notes      = "coach no longer needs to help",
    tags       = ["tunnel", "entrance", "fundamentals"],
    bounds     = [5.50, 22.15],
    comments   = ["comment 3", "comment 4"],
    ideas      = ["idea 2"],
    source_id  = source.source_id
    )

print(capsule_1)
print(capsule_2)


Capsule 
--------------------
capsule_id: 201
source_id:  101
created_by: noahfrank
title:      Good-box-drill
creation:   12-03-2025
notes:      coach had good feedback
tags:       ['tunnel', 'box-drill', 'coach']
bounds:     [1.0, 2.0]
comments:   ['comment 1', 'comment 2']
ideas:      ['idea 1']

Capsule 
--------------------
capsule_id: 202
source_id:  101
created_by: noahfrank
title:      Good-tunnel-entrance
creation:   12-08-2025
notes:      coach no longer needs to help
tags:       ['tunnel', 'entrance', 'fundamentals']
bounds:     [5.5, 22.15]
comments:   ['comment 3', 'comment 4']
ideas:      ['idea 2']



In [6]:
@dataclass
class Module:
    """A group of similar capsules connected by order and context."""
    module_id: int #database ID, for lineage
    created_by: str #who created this.module
    title:      str #short descriptive human reference
    creation:   str #when was data created
    notes:      str #unstrucutred user text input about this.module
    tags:       List[str] #search and filter terms
    desc:       str #what is the purpose/theme of this.module
    capsules:   List[int] #ordered list of capsule (by ID) in this.module

    def __str__(self):
        return (
            f"Module \n"
            f"--------------------\n"
            f"module_id:  {self.module_id}\n"
            f"created_by: {self.created_by}\n"
            f"title:      {self.title}\n"
            f"creation:   {self.creation}\n"
            f"notes:      {self.notes}\n"
            f"tags:       {self.tags}\n"
            f"desc:       {self.desc}\n"
            f"capsules:   {self.capsules}\n"
        )

module = Module(
    module_id  = 301,
    created_by = user.username,
    title      = "Learning to Proper Body Position",
    creation   = "12-08-2025",
    notes      = "",
    tags       = ["skydiving", "tunnel", "mind-body-connection"],
    desc       = "Compare good and poor body positions for specific maneuvers along with ",
    capsules   = [capsule_1.capsule_id, capsule_2.capsule_id]
    )

print(module)


Module 
--------------------
module_id:  301
created_by: noahfrank
title:      Learning to Proper Body Position
creation:   12-08-2025
notes:      
tags:       ['skydiving', 'tunnel', 'mind-body-connection']
desc:       Compare good and poor body positions for specific maneuvers along with 
capsules:   [201, 202]



In [11]:
def print_report():
  print(user)
  print(source)
  print(capsule_1)
  print(module)

print_report()

User
--------------------
username:   noahfrank
first_name: Noah
last_name:  Frank

Source 
--------------------
source_id: 101
added_by:  noahfrank
ref:       link-to-record
title:     Tunnel-Training-01
data_type: video
creation:  12-03-2025
added_at:  12:39pm on 12-03-2025
notes:     some real crazy stuff
tags:      ['tunnel', 'skydiving', 'coach']

Capsule 
--------------------
capsule_id: 201
source_id:  101
created_by: noahfrank
title:      Good-box-drill
creation:   12-03-2025
notes:      coach had good feedback
tags:       ['tunnel', 'box-drill', 'coach']
bounds:     [1.0, 2.0]
comments:   ['comment 1', 'comment 2']
ideas:      ['idea 1']

Module 
--------------------
module_id:  301
created_by: noahfrank
title:      Learning to Proper Body Position
creation:   12-08-2025
notes:      
tags:       ['skydiving', 'tunnel', 'mind-body-connection']
desc:       Compare good and poor body positions for specific maneuvers along with 
capsules:   [201, 202]

