# Cube Semantic Layer

SQL Chains and Agents support building and running SQL queries based on natural language prompts. 
However, when faced with complex queries, they may not perform well in producing results.
Now, I'm providing a Cube Semantic Layer Agent, which can better support complex queries. 
A Cube data model is used to transform raw data into meaningful business definitions and pre-aggregate data for optimal results.

They enable use cases such as:

- Generating queries that will be run based on natural language questions
- Creating chatbots that can answer questions based on database data
- Building custom dashboards based on insights a user wants to analyze

## Overview

LangChain provides tools to interact with Cube:

1. `Build Cube queries` based on natural language user questions
2. `Query a Cube` using chains for query creation and execution
3. `Interact with a Cube` using agents for robust and flexible querying 

![cube_semantic_layer_agent.png](../../../static/img/cube_semantic_layer_agent.png)

## Note
- Uses rest API to interact with the data [https://cube.dev/docs/reference/rest-api](https://cube.dev/docs/reference/rest-api)
- Cube FAQs, Tips and Tricks [https://cube.dev/docs/product/faqs/tips-and-tricks](https://cube.dev/docs/product/faqs/tips-and-tricks)

## Quickstart

First, get required packages and set environment variables:

In [None]:
! pip install langchain openai

# Set env var OPENAI_API_KEY or load from a .env file
# import dotenv

# dotenv.load_dotenv()

Sample data for this notebook is sourced from the Cube examples, available at [active-users-example](https://github.com/cube-js/cube/tree/master/examples/recipes/active-users).
 
* Clone the repo [https://github.com/cube-js/cube](https://github.com/cube-js/cube)
* Go to the `examples/recipes/active-users` directory
* Run `docker-compose up`

Let's create a `create_cube_agent` to create and execute Cube queries.

## Note

`CubeToolkit` relies on Cube's `/v1/load` and `/v1/meta` API.

- `/v1/load` retrieves data through Cube queries [https://cube.dev/docs/reference/rest-api#v1load](https://cube.dev/docs/reference/rest-api#v1load)
- `/v1/meta` is used to get the Cube model meta-information [https://cube.dev/docs/reference/rest-api#v1meta](https://cube.dev/docs/reference/rest-api#v1meta)

In [17]:
from langchain.agents import create_cube_agent
from langchain.agents.agent_toolkits import CubeToolkit
from langchain.chat_models import ChatOpenAI
from langchain.pydantic_v1 import SecretStr
from langchain.utilities.cube import Cube

# create Cube client
cube = Cube(
    cube_api_url="http://localhost:4000/cubejs-api/v1",
    cube_api_token=SecretStr("CUBE_API_TOKEN"),
)

# create Cube Toolkit
toolkit = CubeToolkit(
    cube=cube,
    llm=ChatOpenAI(temperature=0, verbose=True, model="gpt-4-1106-preview"),
)

# create cube agent
agent_executor = create_cube_agent(
    llm=ChatOpenAI(verbose=True, model="gpt-4-1106-preview"),
    toolkit=toolkit,
    verbose=True,
)

In [21]:
agent_executor.run(
    "I'd like to obtain this year specific data about our daily, weekly, and monthly active users."
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_models_cube
Action Input: ""[0m
Observation: [38;5;200m[1;3m| Model | Description |
| --- | --- |
| ActiveUsers | Active Users |
[0m
Thought:[32;1m[1;3mNow that I know there is an "ActiveUsers" model, I need to get the meta-information for this model to understand which dimensions and measures I can query.
Action: meta_information_cube
Action Input: "ActiveUsers"[0m
Observation: [33;1m[1;3m## Model: ActiveUsers

### Measures:
| Title | Description | Column | Type |
| --- | --- | --- | --- |
| Monthly Active Users |  | ActiveUsers.monthlyActiveUsers | number |
| Weekly Active Users |  | ActiveUsers.weeklyActiveUsers | number |
| Daily Active Users |  | ActiveUsers.dailyActiveUsers | number |
| WAU to MAU |  | ActiveUsers.wauToMau | number |

### Dimensions:
| Title | Description | Column | Type |
| --- | --- | --- | --- |
| Created at |  | ActiveUsers.createdAt | time |
[0m
Thought:[32;1m[1;3mSince the

'As of today, the data for 2023 shows that there are 4 daily active users, 35 weekly active users, and 147 monthly active users.'

## Text-to-Cube query


In [11]:
from langchain.chains import create_cube_query_chain

Let's create the chain that will build the Cube Query:


In [16]:
chain = create_cube_query_chain(ChatOpenAI(temperature=0), cube)
response = chain.invoke({"question": "Get WAU to MAU active users."})
print(response.json(indent=2, exclude_unset=True))

{
  "measures": [
    "ActiveUsers.wauToMau"
  ],
  "limit": 5,
  "order": {
    "ActiveUsers.wauToMau": "desc"
  }
}


After building the Cube query based on a user question, we can execute the load:

In [13]:
cube.load(response)

{'query': {'measures': ['ActiveUsers.wauToMau'],
  'limit': 5,
  'order': [{'id': 'ActiveUsers.wauToMau', 'desc': True}],
  'rowLimit': 5,
  'timezone': 'UTC',
  'filters': [],
  'dimensions': [],
  'timeDimensions': []},
 'data': [{'ActiveUsers.wauToMau': '100.0000000000000000'}],
 'lastRefreshTime': '2023-11-09T09:11:43.571Z',
 'refreshKeyValues': [],
 'usedPreAggregations': {},
 'transformedQuery': {'sortedDimensions': [],
  'sortedTimeDimensions': [],
  'timeDimensions': [],
  'measures': ['ActiveUsers.wauToMau'],
  'leafMeasureAdditive': False,
  'leafMeasures': ['ActiveUsers.weeklyActiveUsers',
   'ActiveUsers.monthlyActiveUsers'],
  'hasNoTimeDimensionsWithoutGranularity': True,
  'allFiltersWithinSelectedDimensions': True,
  'isAdditive': False,
  'granularityHierarchies': {'year': ['year',
    'quarter',
    'month',
    'month',
    'day',
    'hour',
    'minute',
    'second'],
   'quarter': ['quarter', 'month', 'day', 'hour', 'minute', 'second'],
   'month': ['month', 'day

As we can see, the Cube Query Builder chain **only created** the query, and we handled the **query execution separately**.

Get the SQL Code generated by Cube to be executed in the database.

## Note

`cube.sql` relies on Cube's `/v1/sql` API.

- `/v1/sql` retrieves sql information through Cube queries [https://cube.dev/docs/reference/rest-api#v1sql](https://cube.dev/docs/reference/rest-api#v1sql)

In [14]:
cube.sql(response)

{'external': False,
 'sql': ['SELECT 100.000 * "active_users__weekly_active_users" / NULLIF("active_users__monthly_active_users", 0) "active_users__wau_to_mau" FROM (SELECT count(distinct "active_users_weekly_active_users_cumulative___active_users".user_id) "active_users__weekly_active_users" FROM (SELECT user_id, created_at FROM public.orders) AS "active_users_weekly_active_users_cumulative___active_users" ) as q_0 , (SELECT count(distinct "active_users_monthly_active_users_cumulative___active_users".user_id) "active_users__monthly_active_users" FROM (SELECT user_id, created_at FROM public.orders) AS "active_users_monthly_active_users_cumulative___active_users" ) as q_1 ORDER BY 1 DESC LIMIT 5',
  []],
 'order': {'ActiveUsers.wauToMau': 'desc'},
 'cacheKeyQueries': [['SELECT FLOOR((UNIX_TIMESTAMP()) / 10) as refresh_key',
   [],
   {'external': True, 'renewalThreshold': 10}]],
 'preAggregations': [],
 'dataSource': 'default',
 'aliasNameToMember': {'active_users__wau_to_mau': 'ActiveU