In [1]:
import lionagi as li

In [2]:
from pathlib import Path

data_path = Path.cwd() / "lionagi_data"  # Path to the data directory

### Prepare Data

In [None]:
# load files from directory
docs = li.load(input_dir=data_path, recursive=True, required_exts=[".py"])
docs = [doc for doc in docs if len(doc.content) > 100]

In [None]:
# # chunk
# pile = li.chunk(docs=docs, chunk_size=2000, overlap=0.1)

# embed
# await pile.embed_pile()

# # save
# pile.to_csv("lionagi_embedding.csv")

In [6]:
# reload pile from saved csv
pile = li.pile(csv_file="lionagi_embedding.csv") 

In [4]:
await pile.query_pile("what is session in lionagi?")

'In lionagi, a session is an object that manages interactions and data flow within the system. It includes components such as a system message node, a model, a datalogger, and various tools and branches. The session is initialized with parameters like system messages, sender information, configuration settings, and service details. It also handles the logging of data and the management of tools and messages.'

### Construct Workflow

In [7]:
instruction = """
write a good API documentation for this code, must use 
query engine to check meanings of related code concepts 
to accurately describe, for example if a name of a variable,
function, class, or module is used but not present in context,
you must check with the query engine. make sure to cross 
reference the code with the query engine to ensure the 
documentation is accurate
"""

edit = """
provide documentation only: final documentation in md 
format of the module of interest, do not include other 
fields do not present in JSON format, only markdown format 
you asked a lot of good questions and got plenty answers, 
please integrate your conversation, be a lot more technical, you will 
be rewarded with 500 dollars for great work, and 
punished for subpar work, take a deep breath, you can do it
"""

In [8]:
from PROMPTS import sys_prompt  # put your system prompt here

model = li.iModel(
    model="gpt-4o",
    provider="openai",
    interval_tokens=1_000_000,
    interval_requests=1_000,
    interval=60,
)

from llama_index.core.postprocessor import LLMRerank

tools = pile.as_query_tool(
    name="qa_lionagi",
    guidance="Perform query to a QA bot", 
    query_description="a term/phrase to lookup or a question to answer",
    node_postprocessor=LLMRerank(), 
)

In [9]:
async def write_doc(context):
    branch = li.Branch(
        system=sys_prompt, 
        tools=tools, 
        imodel=model
    )

    form = await branch.direct(
        instruction=instruction,
        context=context,
        reason=True,
        score=True,
        allow_action=True,
        allow_extension=True,
        max_extension=2,
    )
    
    if form is None:
        print("No form returned")
        return None, None

    form.answer = await branch.chat(
        instruction=edit,
        temperature=0.5,
    )
    
    # save all messages into a unique file
    df = branch.to_df()
    df.to_csv(f"lion_doc_{branch.ln_id[:8]}.csv", index=False)

    return form, branch

### Run workflow

In [10]:
form, branch = await write_doc(docs[82].content)

In [11]:
form.display()

**task**: 
 Follow the prompt and provide the necessary output.
- Additional instruction: 
write a good API documentation for this code, must use 
query engine to check meanings of related code concepts 
to accurately describe, for example if a name of a variable,
function, class, or module is used but not present in context,
you must check with the query engine. make sure to cross 
reference the code with the query engine to ensure the 
documentation is accurate

- Additional context: """
Copyright 2024 HaiyangLi
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""
import atexit
import contextlib
import logging
from collections import deque
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List
from lionagi.libs import SysUtil, convert, nested

# TODO: there should be a global data logger, under setting

@dataclass
class DLog:
 """Defines a log entry structure for data processing operations.
 This class encapsulates both the input to and output from a data processing
 operation, along with an automatically generated timestamp indicating when
 the log entry was created. It aims to standardize logging across applications
 for easier analysis and debugging.
 Attributes:
  input_data: The data input to the operation. Can be of any type.
  output_data: The data output by the operation. Can be of any type.
 """
 input_data: Any
 output_data: Any
 def serialize(self, *, flatten_: bool = True, sep: str = "[^_^]") -> dict[str, Any]:
  """Serialize the DLog instance into a dictionary wit...

**reason**: 
 Let's think step by step, The task requires writing good API documentation for the provided code. To ensure accuracy, we need to cross-reference the code with a query engine to check the meanings of related code concepts. This will help us accurately describe the variables, functions, classes, and modules used in the code. Given the complexity and the need for accurate cross-referencing, we should use the available tool to perform queries as needed.

**actions**: 
 
 
1. **qa_lionagi**(query: What is the purpose of the 'SysUtil' module in the provided code?), 
 
2. **qa_lionagi**(query: What does the 'convert' module do in the provided code?), 
 
3. **qa_lionagi**(query: What is the functionality of the 'nested' module in the provided code?)

**action_required**: True

**extension_required**: True

**score**: 7

**action_response**: 
 
 
1. **qa_lionagi**(query: What is the purpose of the 'SysUtil' module in the provided code?) 
 
 The 'SysUtil' module in the provided code is designed to offer various system utility functions. These functions include sleeping for a specified duration, retrieving the current time in different formats, changing dictionary keys, generating timestamps, creating copies of objects, generating unique IDs, binning data, and determining the CPU architecture. The module is tested using unit tests to ensure its functionality., 
 
2. **qa_lionagi**(query: What does the 'convert' module do in the provided code?) 
 
 The 'convert' module in the provided code is used to convert various inputs into a list format using the `convert.to_list` function. This function ensures that the inputs, whether they are single items or multiple items, are treated as lists for further processing., 
 
3. **qa_lionagi**(query: What is the functionality of the 'nested' module in the provided code?) 
 
 The 'nested' module in the provided code is designed to handle operations on nested data structures such as dictionaries and lists. It includes functions for setting values (`nset`), retrieving values (`nget`), flattening nested structures (`flatten`), unflattening them (`unflatten`), filtering items (`nfilter`), and inserting values (`ninsert`). The module is tested using various unit tests to ensure these functions work correctly with different types of nested data.

**action_performed**: True

**available_tools**: ['qa_lionagi']

**answer**: 
 # API Documentation

## Class: `DLog`

**Description**:  
`DLog` defines a log entry structure for data processing operations. This class encapsulates both the input to and output from a data processing operation, along with an automatically generated timestamp indicating when the log entry was created. It aims to standardize logging across applications for easier analysis and debugging.

### Method: `serialize`

**Signature**:  
```python
def serialize(self, *, flatten_: bool = True, sep: str = "[^_^]") -> dict[str, Any]:
```

**Parameters**:
- `flatten_` (bool): If True, flattens dictionary inputs for serialization.
- `sep` (str): Separator used in flattening nested dictionaries.

**Return Values**:
- `dict[str, Any]`: A dictionary representation of the `DLog` instance, including `input_data`, `output_data`, and `timestamp`.

**Exceptions Raised**:
- `Exception`: If there is an error in processing data to string.

**Usage Examples**:
```python
log_entry = DLog(input_data={"key": "value"}, output_data={"result": "success"})
serialized_log = log_entry.serialize()
print(serialized_log)
```

### Method: `deserialize`

**Signature**:  
```python
def deserialize(cls, *, input_str: str, output_str: str, unflatten_: bool = True, sep: str = "[^_^]") -> "DLog":
```

**Parameters**:
- `input_str` (str): String representation of the input data.
- `output_str` (str): String representation of the output data.
- `unflatten_` (bool): Indicates whether to unflatten the string data back into nested dictionaries.
- `sep` (str): Separator used if unflattening is performed.

**Return Values**:
- `DLog`: An instance of `DLog` reconstructed from the provided string data.

**Exceptions Raised**:
- `ValueError`: If deserialization or unflattening fails due to incorrect data format or separator issues.

**Usage Examples**:
```python
serialized_log = {"input_data": "{\\"key\\": \\"value\\"}", "output_data": "{\\"result\\": \\"success\\"}"}
deserialized_log = DLog.deserialize(input_str=serialized_log["input_data"], output_str=serialized_log["output_data"])
print(deserialized_log)
```

## Class: `DataLogger`

**Description**:  
`DataLogger` manages logging for data processing activities within an application. This class handles the accumulation, structuring, and persistence of log entries. It supports exporting logs to disk in both CSV and JSON formats, with features for automatic log saving at program exit and customizable file naming.

### Method: `__init__`

**Signature**:  
```python
def __init__(self, persist_path: str | Path | None = None, log: List[Dict] | None = None, filename: str | None = None) -> None:
```

**Parameters**:
- `persist_path` (str | Path | None): The file system path for storing log files. Defaults to `data/logs/`.
- `log` (List[Dict] | None): Initial log entries.
- `filename` (str | None): Base name for exported log files.

**Usage Examples**:
```python
data_logger = DataLogger(persist_path="/path/to/logs", filename="my_log")
```

### Method: `extend`

**Signature**:  
```python
def extend(self, logs) -> None:
```

**Parameters**:
- `logs` (List[Dict]): A list of log entries, each as a dictionary conforming to the log structure (e.g., containing `input_data`, `output_data`, etc.).

**Usage Examples**:
```python
logs = [{"input_data": "data1", "output_data": "result1"}, {"input_data": "data2", "output_data": "result2"}]
data_logger.extend(logs)
```

### Method: `append`

**Signature**:  
```python
def append(self, *, input_data: Any, output_data: Any) -> None:
```

**Parameters**:
- `input_data` (Any): Input data to the operation.
- `output_data` (Any): Output data from the operation.

**Usage Examples**:
```python
data_logger.append(input_data="data3", output_data="result3")
```

### Method: `to_csv_file`

**Signature**:  
```python
def to_csv_file(self, filename: str = "log.csv", *, dir_exist_ok: bool = True, timestamp: bool = True, time_prefix: bool = False, verbose: bool = True, clear: bool = True, flatten_: bool = True, sep: str = "[^_^]", index: bool = False, random_hash_digits: int = 3, **kwargs) -> None:
```

**Parameters**:
- `filename` (str): Filename for the exported CSV. Defaults to `log.csv`.
- `dir_exist_ok` (bool): If True, allows writing to an existing directory.
- `timestamp` (bool): If True, appends a timestamp to the filename.
- `time_prefix` (bool): If True, places the timestamp prefix before the filename.
- `verbose` (bool): If True, prints a message upon successful save.
- `clear` (bool): If True, clears the log deque after saving.
- `flatten_` (bool): If True, flattens dictionary data for serialization.
- `sep` (str): Separator for flattening nested dictionaries.
- `index` (bool): If True, includes an index column in the CSV.
- `random_hash_digits` (int): Number of random hash digits to add to the filename.
- `**kwargs`: Additional arguments for DataFrame.to_csv().

**Usage Examples**:
```python
data_logger.to_csv_file(filename="my_log.csv")
```

### Method: `to_json_file`

**Signature**:  
```python
def to_json_file(self, filename: str = "log.json", *, dir_exist_ok: bool = True, timestamp: bool = True, time_prefix: bool = False, verbose: bool = True, clear: bool = True, flatten_: bool = True, sep: str = "[^_^]", index: bool = False, random_hash_digits: int = 3, **kwargs) -> None:
```

**Parameters**:
- `filename` (str): Filename for the exported JSON. Defaults to `log.json`.
- `dir_exist_ok` (bool): If True, allows writing to an existing directory.
- `timestamp` (bool): If True, appends a timestamp to the filename.
- `time_prefix` (bool): If True, places the timestamp prefix before the filename.
- `verbose` (bool): If True, prints a message upon successful save.
- `clear` (bool): If True, clears the log deque after saving.
- `flatten_` (bool): If True, flattens dictionary data for serialization.
- `sep` (str): Separator for flattening nested dictionaries.
- `index` (bool): If True, includes an index in the JSON.
- `random_hash_digits` (int): Number of random hash digits to add to the filename.
- `**kwargs`: Additional arguments for DataFrame.to_json().

**Usage Examples**:
```python
data_logger.to_json_file(filename="my_log.json")
```

### Method: `save_at_exit`

**Signature**:  
```python
def save_at_exit(self):
```

**Description**:  
Save any unsaved logs automatically upon program termination. This method is registered as an at-exit handler to ensure that any unsaved logs are automatically persisted to a file upon program termination. This safeguard helps prevent the loss of log data due to unexpected shutdowns or program exits. The method is configured to save the logs to a CSV file, named `unsaved_logs.csv`, which is stored in the designated persisting directory. This automatic save operation is triggered only if there are unsaved logs present at the time of program exit.

**Usage Examples**:
```python
# No direct usage example as this is automatically handled at program exit
```

In [12]:
branch.to_df()

Unnamed: 0,ln_id,message_type,timestamp,role,content,metadata,sender,recipient
0,0c56e902e9587b4685136d79a86de928,System,2024-05-23T16:29:57.398932,system,"{'system_info': ' you are a helpful assistant,...",{'last_updated': {'recipient': '2024-05-23T16:...,system,5ad9145e8d7f1a1037f47b3b791be9a8
1,d35fca1cc3f14bb0b5d712b683c73e91,Instruction,2024-05-23T16:29:57.399950,user,{'instruction': '  ## Task Instructions...,{'last_updated': {'sender': '2024-05-23T16:29:...,user,5ad9145e8d7f1a1037f47b3b791be9a8
2,f9eb5013ca559fa2f0d4f874337002bb,AssistantResponse,2024-05-23T16:30:04.897473,assistant,"{'assistant_response': '```json {  ""answer"": ...",{'last_updated': {'sender': '2024-05-23T16:30:...,5ad9145e8d7f1a1037f47b3b791be9a8,user
3,6595d07a9989f22ca1ac665c34bf90a7,ActionRequest,2024-05-23T16:30:04.901081,assistant,"{'action_request': {'function': 'qa_lionagi', ...",{'last_updated': {'function': '2024-05-23T16:3...,5ad9145e8d7f1a1037f47b3b791be9a8,979180334d7857a7e43d406b28288bbb
4,1270287c638e2f241df2e9c59d087cd1,ActionRequest,2024-05-23T16:30:04.901397,assistant,"{'action_request': {'function': 'qa_lionagi', ...",{'last_updated': {'function': '2024-05-23T16:3...,5ad9145e8d7f1a1037f47b3b791be9a8,979180334d7857a7e43d406b28288bbb
5,6dd44df0d70afc50349ab78fdc975739,ActionRequest,2024-05-23T16:30:04.901561,assistant,"{'action_request': {'function': 'qa_lionagi', ...",{'last_updated': {'function': '2024-05-23T16:3...,5ad9145e8d7f1a1037f47b3b791be9a8,979180334d7857a7e43d406b28288bbb
6,7f19166739c14d1c9318d1c4b8698d61,ActionResponse,2024-05-23T16:30:10.039713,assistant,"{'action_response': {'function': 'qa_lionagi',...",{'last_updated': {'function': '2024-05-23T16:3...,979180334d7857a7e43d406b28288bbb,5ad9145e8d7f1a1037f47b3b791be9a8
7,6a84df5a3704ec7a980a983845821073,ActionResponse,2024-05-23T16:30:10.039901,assistant,"{'action_response': {'function': 'qa_lionagi',...",{'last_updated': {'function': '2024-05-23T16:3...,979180334d7857a7e43d406b28288bbb,5ad9145e8d7f1a1037f47b3b791be9a8
8,6adbcb94643f2319facb9b023b2f1f1b,ActionResponse,2024-05-23T16:30:10.039985,assistant,"{'action_response': {'function': 'qa_lionagi',...",{'last_updated': {'function': '2024-05-23T16:3...,979180334d7857a7e43d406b28288bbb,5ad9145e8d7f1a1037f47b3b791be9a8
9,2522d56d8d7e0204a36b9a57d1fe91ec,Instruction,2024-05-23T16:30:10.040462,user,{'instruction': '  ## Task Instructions...,{'last_updated': {'sender': '2024-05-23T16:30:...,user,5ad9145e8d7f1a1037f47b3b791be9a8


In [13]:
msg = branch.messages[2]

In [14]:
form.tool_schema

{'tools': [{'type': 'function',
   'function': {'name': 'qa_lionagi',
    'description': 'Perform query to a QA bot',
    'parameters': {'type': 'object',
     'properties': {'query': {'type': 'string',
       'description': 'a term/phrase to lookup or a question to answer'}},
     'required': ['query']}}}]}

In [15]:
form

ln_id                                  1f8015df6228ae25be01abbb9326b0ec
created                                      2024-05-23T16:29:57.399716
metadata              {'last_updated': {'input_fields': '2024-05-23T...
content                                                            None
embedding                                                            []
template_name                                             UnitDirective
assignment                                               task -> answer
input_fields                                        [task, tool_schema]
requested_fields      [answer, reason, actions, action_required, ext...
task                  Follow the prompt and provide the necessary ou...
validation_kwargs     {'reason': {}, 'actions': {}, 'action_required...
confidence_score                                                   None
reason                Let's think step by step, The task requires wr...
actions               {'action_1': {'function': 'qa_lionagi', 'a

In [16]:
if hasattr(form, "extension_forms"):
    for i in form.extension_forms:
        i.display()

**task**: 
 Follow the prompt and provide the necessary output.
- Additional instruction: N/A
- Additional context: N/A
- Reason and prepare actions with GIVEN TOOLS ONLY.
- Allow auto-extension up to another 1 rounds.
- Give a numeric score in [0, 10] and precision of 0 decimal places.


**reason**: 
 Let's think step by step, The task required writing good API documentation for the provided code. I cross-referenced the code with the query engine to ensure accuracy in describing the variables, functions, classes, and modules used in the code. The documentation provided covers the class descriptions, method signatures, parameters, return values, exceptions raised, and usage examples for both `DLog` and `DataLogger` classes.

**actions**: 

**action_required**: False

**extension_required**: False

**score**: 9

**available_tools**: ['qa_lionagi']

**answer**: 
 The API documentation for the provided code is as follows:

### Class: `DLog`

**Description**:
`DLog` defines a log entry structure for data processing operations. This class encapsulates both the input to and output from a data processing operation, along with an automatically generated timestamp indicating when the log entry was created. It aims to standardize logging across applications for easier analysis and debugging.

#### Method: `serialize`

**Signature**:
```python
def serialize(self, *, flatten_: bool = True, sep: str = "[^_^]") -> dict[str, Any]:
```

**Parameters**:
- `flatten_` (bool): If True, flattens dictionary inputs for serialization.
- `sep` (str): Separator used in flattening nested dictionaries.

**Return Values**:
- `dict[str, Any]`: A dictionary representation of the `DLog` instance, including `input_data`, `output_data`, and `timestamp`.

**Exceptions Raised**:
- `Exception`: If there is an error in processing data to string.

**Usage Examples**:
```python
log_entry = DLog(input_data={"key": "value"}, output_data={"result": "success"})
serialized_log = log_entry.serialize()
print(serialized_log)
```

#### Method: `deserialize`

**Signature**:
```python
def deserialize(cls, *, input_str: str, output_str: str, unflatten_: bool = True, sep: str = "[^_^]") -> "DLog":
```

**Parameters**:
- `input_str` (str): String representation of the input data.
- `output_str` (str): String representation of the output data.
- `unflatten_` (bool): Indicates whether to unflatten the string data back into nested dictionaries.
- `sep` (str): Separator used if unflattening is performed.

**Return Values**:
- `DLog`: An instance of `DLog` reconstructed from the provided string data.

**Exceptions Raised**:
- `ValueError`: If deserialization or unflattening fails due to incorrect data format or separator issues.

**Usage Examples**:
```python
serialized_log = {"input_data": "{\"key\": \"value\"}", "output_data": "{\"result\": \"success\"}"}
deserialized_log = DLog.deserialize(input_str=serialized_log["input_data"], output_str=serialized_log["output_data"])
print(deserialized_log)
```

### Class: `DataLogger`

**Description**:
`DataLogger` manages logging for data processing activities within an application. This class handles the accumulation, structuring, and persistence of log entries. It supports exporting logs to disk in both CSV and JSON formats, with features for automatic log saving at program exit and customizable file naming.

#### Method: `__init__`

**Signature**:
```python
def __init__(self, persist_path: str | Path | None = None, log: List[Dict] | None = None, filename: str | None = None) -> None:
```

**Parameters**:
- `persist_path` (str | Path | None): The file system path for storing log files. Defaults to `data/logs/`.
- `log` (List[Dict] | None): Initial log entries.
- `filename` (str | None): Base name for exported log files.

**Usage Examples**:
```python
data_logger = DataLogger(persist_path="/path/to/logs", filename="my_log")
```

#### Method: `extend`

**Signature**:
```python
def extend(self, logs) -> None:
```

**Parameters**:
- `logs` (List[Dict]): A list of log entries, each as a dictionary conforming to the log structure (e.g., containing `input_data`, `output_data`, etc.).

**Usage Examples**:
```python
logs = [{"input_data": "data1", "output_data": "result1"}, {"input_data": "data2", "output_data": "result2"}]
data_logger.extend(logs)
```

#### Method: `append`

**Signature**:
```python
def append(self, *, input_data: Any, output_data: Any) -> None:
```

**Parameters**:
- `input_data` (Any): Input data to the operation.
- `output_data` (Any): Output data from the operation.

**Usage Examples**:
```python
data_logger.append(input_data="data3", output_data="result3")
```

#### Method: `to_csv_file`

**Signature**:
```python
def to_csv_file(self, filename: str = "log.csv", *, dir_exist_ok: bool = True, timestamp: bool = True, time_prefix: bool = False, verbose: bool = True, clear: bool = True, flatten_: bool = True, sep: str = "[^_^]", index: bool = False, random_hash_digits: int = 3, **kwargs) -> None:
```

**Parameters**:
- `filename` (str): Filename for the exported CSV. Defaults to `log.csv`.
- `dir_exist_ok` (bool): If True, allows writing to an existing directory.
- `timestamp` (bool): If True, appends a timestamp to the filename.
- `time_prefix` (bool): If True, places the timestamp prefix before the filename.
- `verbose` (bool): If True, prints a message upon successful save.
- `clear` (bool): If True, clears the log deque after saving.
- `flatten_` (bool): If True, flattens dictionary data for serialization.
- `sep` (str): Separator for flattening nested dictionaries.
- `index` (bool): If True, includes an index column in the CSV.
- `random_hash_digits` (int): Number of random hash digits to add to the filename.
- `**kwargs`: Additional arguments for DataFrame.to_csv().

**Usage Examples**:
```python
data_logger.to_csv_file(filename="my_log.csv")
```

#### Method: `to_json_file`

**Signature**:
```python
def to_json_file(self, filename: str = "log.json", *, dir_exist_ok: bool = True, timestamp: bool = True, time_prefix: bool = False, verbose: bool = True, clear: bool = True, flatten_: bool = True, sep: str = "[^_^]", index: bool = False, random_hash_digits: int = 3, **kwargs) -> None:
```

**Parameters**:
- `filename` (str): Filename for the exported JSON. Defaults to `log.json`.
- `dir_exist_ok` (bool): If True, allows writing to an existing directory.
- `timestamp` (bool): If True, appends a timestamp to the filename.
- `time_prefix` (bool): If True, places the timestamp prefix before the filename.
- `verbose` (bool): If True, prints a message upon successful save.
- `clear` (bool): If True, clears the log deque after saving.
- `flatten_` (bool): If True, flattens dictionary data for serialization.
- `sep` (str): Separator for flattening nested dictionaries.
- `index` (bool): If True, includes an index in the JSON.
- `random_hash_digits` (int): Number of random hash digits to add to the filename.
- `**kwargs`: Additional arguments for DataFrame.to_json().

**Usage Examples**:
```python
data_logger.to_json_file(filename="my_log.json")
```

#### Method: `save_at_exit`

**Signature**:
```python
def save_at_exit(self):
```

**Description**:
Save any unsaved logs automatically upon program termination. This method is registered as an at-exit handler to ensure that any unsaved logs are automatically persisted to a file upon program termination. This safeguard helps prevent the loss of log data due to unexpected shutdowns or program exits. The method is configured to save the logs to a CSV file, named `unsaved_logs.csv`, which is stored in the designated persisting directory. This automatic save operation is triggered only if there are unsaved logs present at the time of program exit.

**Usage Examples**:
```python
# No direct usage example as this is automatically handled at program exit
```