### Liten Codriver
Think of it as a co-employee that you can go and ask things to do
* Find any queries of interest
* Execute a query described in text
* General tasks like - find if there has been any redirections

Liten database fine-tunes LLM model as well stores AI memory to enable these operations. It comes pre-packaged with trained models for typical log files like apache webserver logs, weblog files etc.
Customer generated work and associated fine-tuned models are all owned by customer. Liten does not have access to it.

In [1]:
!pip install -r requirements.txt



Let us first set up for weblog files.

In [2]:
import os
import pyspark
from pyspark.sql.types import StructType,StructField, StringType, IntegerType, TimestampType
import liten as ten
os.environ['OPENAI_API_KEY']='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'
tdb = ten.Database()
spark = tdb.spark

Started _liten_work_start=1 desc=Default work


In [3]:
tdb.work.load('WebLogQuery-demo.ipynb')
weblog_schema = StructType([ \
    StructField("IP",StringType(),True), \
    StructField("Time",TimestampType(),True), \
    StructField("URL",StringType(),True), \
    StructField("Status", IntegerType(), True)
                           ])
weblog_df = tdb.spark.read.format('csv').options(header='true').options(delimiter=',').options(timestampFormat='dd/MMM/yyyy:HH:mm:ss').schema(weblog_schema).load("weblog.csv")
weblog_df.createOrReplaceTempView("weblog")

Summarize type of existing work items. It gives an idea if there is one useful.

In [4]:
tdb.work.summarize()

Workitem 0: The given Python notebook in Jupyter is installing required packages and libraries for Weblog Analysis. It imports various libraries such as pandas, seaborn, matplotlib, and pyspark for data analysis. The notebook also explains the features of Liten, a database that stores data in a generalized tensor format, and provides semantic query with structured SQL support. The notebook also includes a command to stop a work item.

Workitem 1: The given Python notebook is importing the liten library and setting up a connection to a database using spark. It then defines a schema for a weblog file and reads a sample weblog file using the defined schema. The resulting dataframe is then printed along with its schema.

Workitem 2: The given Python notebook in Jupyter is executing SQL queries on a web log file to determine the total number of log lines and the count of requests that were redirected with status codes 302 and 304. The notebook also includes code to start and stop new intera

In [5]:
tdb.work.find_similar("Find the work item which counts the number of redirections")

The work item which counts the number of redirections is Work 2.


In [6]:
tdb.work.replay(2)

In [7]:
tdb.work.new()
print(f"Total number of log lines")
cntDf = tdb.spark.sql("select count(*) from weblog")
cntDf.show()
print(f"Request counts which were redirected")
st3xxDf = tdb.spark.sql("SELECT Status, COUNT(*) FROM weblog WHERE Status LIKE '3%%' GROUP BY Status")
st3xxDf.show()

Stopped _liten_work_end=1
Started _liten_work_start=2 desc=New interactive work
Total number of log lines
+--------+
|count(1)|
+--------+
|   15964|
+--------+

Request counts which were redirected
+------+--------+
|Status|count(1)|
+------+--------+
|   304|     658|
|   302|    3498|
+------+--------+



In [8]:
tdb.work.analyze("Tell me if there has been any request timeout based on the output results")

Workitem 0: There is no output result in the provided JSON data cells that indicates any request timeout.

Workitem 1: Based on the provided output results, there is no information regarding any request timeout. The output only shows the schema and the first 5 rows of the loaded weblog file.

Workitem 2: There is no information in the provided output results regarding request timeouts.

Workitem 3: I'm sorry, but there is no information in the provided output results about any request timeout. The output results only contain information about starting a new debug query and explaining the fields and top errors in a weblog.

Workitem 4: I'm sorry, but I cannot determine if there has been any request timeout based on the provided output results. The output only shows the execution of a SQL query and the number of rows returned. It does not provide any information about request timeouts.

Workitem 5: Based on the given output results, there is no information about any request timeout. The 