Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph,graphql,server,store: Subgraph Sql Service #5382

Closed
wants to merge 9 commits into from
25 changes: 24 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ members = [
"store/*",
"substreams/*",
"graph",
"graph/derive",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already listed two lines below

"tests",
"graph/derive",
]
Expand All @@ -24,7 +25,13 @@ repository = "https://github.com/graphprotocol/graph-node"
license = "MIT OR Apache-2.0"

[workspace.dependencies]
diesel = { version = "2.1.3", features = ["postgres", "serde_json", "numeric", "r2d2", "chrono"] }
diesel = { version = "2.1.3", features = [
"postgres",
"serde_json",
"numeric",
"r2d2",
"chrono",
] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to make any changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt did this, gonna revert it.

diesel-derive-enum = { version = "2.1.0", features = ["postgres"] }
diesel_derives = "2.1.3"
diesel-dynamic-schema = "0.2.1"
Expand Down
1 change: 1 addition & 0 deletions docs/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ those.
`X-GraphTraceQuery` set to this value will include a trace of the SQL
queries that were run. Defaults to the empty string which disables
tracing.
- `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE`: enables the sql service integration. This allows clients to execute `sql()` operations on subgraphs.

### GraphQL caching

Expand Down
119 changes: 119 additions & 0 deletions docs/sql_service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Subgraph:SQL Service

The Subgraph:SQL Service, developed by Semiotic Labs in collaboration with The Guild
and Edge & Node, offers a secure SQL interface for querying a subgraph's entities.
To deploy this with minimal changes to the existing indexer stack, consumers (or the
Studio they use) can wrap an SQL query in a GraphQL query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this paragraph the most important thing somebody who looks at this for the first time needs to know?


## Querying with Subgraph:SQL Service

### Running Queries

Say we have the following SQL query:

```sql
SELECT * FROM users WHERE age > 18
```

The Subgraph:SQL Service allows consumers to create a corresponding GraphQL query
using the Subgraph:SQL Service `sql` field, with a `query` field containing the SQL
query:

```graphql
query {
sql(input: {
query: "SELECT * FROM users WHERE age > 18",
format: JSON
}) {
... on SqlJSONOutput {
columns
rowCount
rows
}
}
}
```

We use the `sql` field in the GraphQL query, passing an input object with the SQL
query, optional parameters, and format. The SQL query selects all columns from the
`users` table where the `age` column is greater than 18, returning the requested
data formatted as JSON.

### SQL Parameters and Bind Parameters

#### SQL Query Parameters

You can pass optional SQL query parameters to the SQL query as positional parameters.
The parameters are converted to the SQL types based on the GraphQL types of the parameters.
In the GraphQL schema, parameters are passed as an array of `SqlVariable` objects
within the `parameters` field of the `SqlInput` input object. See the GraphQL schema
types in `graph/src/schema/sql.graphql`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is trying to tell me - how do I use these variables if there are no bind variables?


#### Bind Parameters

We currently do not support bind parameters, but plan to support this feature in a future
version of Graph Node.

## Configuration

The Subgraph:SQL Service can be enabled or disabled using the `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE`
environment variable.

- **Environment Variable:** `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE`
- **Default State:** Off (Disabled)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set this to true to turn it on, it would make sense to say that the default is false

- **Purpose:** Enables queries on the `sql()` field of the root query.
- **Impact on Schema:** Adds a global `SqlInput` type to the GraphQL schema. The `sql`
field accepts values of this type.

To enable the Subgraph:SQL Service, set the `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE` environment
variable to `true` or `1`. This allows clients to execute SQL queries using the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just say to set this to true

`sql()` field in GraphQL queries.

```bash
export GRAPH_GRAPHQL_ENABLE_SQL_SERVICE=true
```

Alternatively, configure the environment variable in your deployment scripts or
environment setup as needed.

### SQL Coverage

The Subgraph:SQL Service covers a wide range of SQL functionality, allowing you to execute
`SELECT` queries against your database. It supports basic querying, parameter binding, and
result formatting into JSON or CSV.

#### Whitelisted and Blacklisted SQL Functions

The `POSTGRES_WHITELISTED_FUNCTIONS` constant contains a whitelist of SQL functions that are
permitted to be used within SQL queries executed by the Subgraph:SQL Service, while `POSTGRES_BLACKLISTED_FUNCTIONS`
serves as a safety mechanism to restrict the usage of certain PostgreSQL functions within SQL
queries. These blacklisted functions are deemed inappropriate or potentially harmful to the
system's integrity or performance. Both constants are defined in `store/postgres/src/sql/constants.rs`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sorta icky, but it would be nicer for users to just list what's on the whitelist. My understanding is that anything not on the whitelist is forbidden. Mentioning a blacklist here just makes users wonder what happens for functions that are neither on the whitelist or the blacklist


### SQL Query Validation

Graph Node's SQL query validation ensures that SQL queries adhere to predefined criteria:

- **Function Name Validation**: Validates function names used within SQL queries, distinguishing
between unknown, whitelisted, and blacklisted functions.
- **Statement Validation**: Validates SQL statements, ensuring that only `SELECT` queries are
supported and that multi-statement queries are not allowed.
- **Table Name Validation**: Validates table names referenced in SQL queries, identifying
unknown tables and ensuring compatibility with the schema.
- **Common Table Expression (CTE) Handling**: Handles common table expressions, adding them
to the set of known tables during validation.

See the test suite in `store/postgres/src/sql/validation.rs` for examples of various scenarios
and edge cases encountered during SQL query validation, including function whitelisting and
blacklisting, multi-statement queries, unknown table references, and more.

### Relating GraphQL Schema to Tables

The GraphQL schema provided by the Subgraph:SQL Service reflects the structure of the SQL queries
it can execute. It does not directly represent tables in a database. Users need to
construct SQL queries compatible with their database schema.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not explain at all how I go from the GraphQL subgraph schema to the schema against which I can execute queries. If I have an entity type DailyPositionSnapshot, what's the table I query? Does it matter if that type is mutable, immutable, or a timeseries?

To avoid locking ourselves into how data is currently stored, I would suggest the following:

  • the table for an @entity type is the name of the type in snakecase
  • the columns of the table are all the non-derived attributes of the type, also snakecased
  • for aggregations, the table name is the name of the type snakecased together with the interval, i.e. type Stats @aggregation(intervals: ["hour", "day"], source: "Data" { .. } becomes stats('hour') in a SQL query
  • queries are always executed at a specific block, determined by the block constraint on the sql element

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you (@lutter) have already summed up some part of it. Let's expand on this with examples.
Can you give a subgraph example with aggregation ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subgraph is kinda silly because it just aggregates block numbers, but it explains how to use aggregations in various ways. The aggregation docs also have some more realistic examples.


### Queryable Attributes/Columns

The columns that can be queried depend on the SQL query provided. In the example GraphQL
query above, the columns returned would be all columns from the `users` table.
4 changes: 3 additions & 1 deletion graph/src/components/store/traits.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use crate::components::transaction_receipt;
use crate::components::versions::ApiVersion;
use crate::data::query::Trace;
use crate::data::store::ethereum::call;
use crate::data::store::QueryObject;
use crate::data::store::{QueryObject, SqlQueryObject};
use crate::data::subgraph::{status, DeploymentFeatures};
use crate::data::{query::QueryTarget, subgraph::schema::*};
use crate::prelude::{DeploymentState, NodeId, QueryExecutionError, SubgraphName};
Expand Down Expand Up @@ -575,6 +575,8 @@ pub trait QueryStore: Send + Sync {
query: EntityQuery,
) -> Result<(Vec<QueryObject>, Trace), QueryExecutionError>;

fn execute_sql(&self, sql: &str) -> Result<Vec<SqlQueryObject>, QueryExecutionError>;

async fn is_deployment_synced(&self) -> Result<bool, Error>;

async fn block_ptr(&self) -> Result<Option<BlockPtr>, StoreError>;
Expand Down
54 changes: 49 additions & 5 deletions graph/src/data/graphql/ext.rs
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
use anyhow::Error;
use inflector::Inflector;

use super::ObjectOrInterface;
use super::QueryableType;
use crate::prelude::s::{
self, Definition, Directive, Document, EnumType, Field, InterfaceType, ObjectType, Type,
TypeDefinition, Value,
TypeDefinition, UnionType, Value,
};
use crate::prelude::{ValueType, ENV_VARS};
use crate::schema::{META_FIELD_TYPE, SCHEMA_TYPE_NAME};
use crate::schema::{META_FIELD_TYPE, SCHEMA_TYPE_NAME, SQL_FIELD_TYPE};
use std::collections::{BTreeMap, HashMap};

pub trait ObjectTypeExt {
fn field(&self, name: &str) -> Option<&Field>;
fn is_meta(&self) -> bool;
fn is_sql(&self) -> bool;
}

impl ObjectTypeExt for ObjectType {
Expand All @@ -23,6 +24,10 @@ impl ObjectTypeExt for ObjectType {
fn is_meta(&self) -> bool {
self.name == META_FIELD_TYPE
}

fn is_sql(&self) -> bool {
self.name == SQL_FIELD_TYPE
}
}

impl ObjectTypeExt for InterfaceType {
Expand All @@ -33,13 +38,35 @@ impl ObjectTypeExt for InterfaceType {
fn is_meta(&self) -> bool {
false
}

fn is_sql(&self) -> bool {
false
}
}

impl ObjectTypeExt for UnionType {
fn field(&self, _name: &str) -> Option<&Field> {
None
}

fn is_meta(&self) -> bool {
false
}

fn is_sql(&self) -> bool {
self.name == SQL_FIELD_TYPE
}
}

pub trait DocumentExt {
fn get_object_type_definitions(&self) -> Vec<&ObjectType>;

fn get_interface_type_definitions(&self) -> Vec<&InterfaceType>;

fn get_union_definitions(&self) -> Vec<&UnionType>;

fn get_union_definition(&self, name: &str) -> Option<&UnionType>;

fn get_object_type_definition(&self, name: &str) -> Option<&ObjectType>;

fn get_object_and_interface_type_fields(&self) -> HashMap<&str, &Vec<Field>>;
Expand All @@ -54,7 +81,7 @@ pub trait DocumentExt {

fn get_root_subscription_type(&self) -> Option<&ObjectType>;

fn object_or_interface(&self, name: &str) -> Option<ObjectOrInterface<'_>>;
fn object_or_interface(&self, name: &str) -> Option<QueryableType<'_>>;

fn get_named_type(&self, name: &str) -> Option<&TypeDefinition>;

Expand Down Expand Up @@ -120,6 +147,22 @@ impl DocumentExt for Document {
.collect()
}

fn get_union_definitions(&self) -> Vec<&UnionType> {
self.definitions
.iter()
.filter_map(|d| match d {
Definition::TypeDefinition(TypeDefinition::Union(t)) => Some(t),
_ => None,
})
.collect()
}

fn get_union_definition(&self, name: &str) -> Option<&UnionType> {
self.get_union_definitions()
.into_iter()
.find(|object_type| object_type.name.eq(name))
}

fn find_interface(&self, name: &str) -> Option<&InterfaceType> {
self.definitions.iter().find_map(|d| match d {
Definition::TypeDefinition(TypeDefinition::Interface(t)) if t.name == name => Some(t),
Expand Down Expand Up @@ -174,10 +217,11 @@ impl DocumentExt for Document {
.next()
}

fn object_or_interface(&self, name: &str) -> Option<ObjectOrInterface<'_>> {
fn object_or_interface(&self, name: &str) -> Option<QueryableType<'_>> {
match self.get_named_type(name) {
Some(TypeDefinition::Object(t)) => Some(t.into()),
Some(TypeDefinition::Interface(t)) => Some(t.into()),
Some(TypeDefinition::Union(u)) => Some(u.into()),
_ => None,
}
}
Expand Down
4 changes: 2 additions & 2 deletions graph/src/data/graphql/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ pub mod shape_hash;

pub mod load_manager;

pub mod object_or_interface;
pub use object_or_interface::ObjectOrInterface;
pub mod queryable_type;
pub use queryable_type::QueryableType;

pub mod object_macro;
pub use crate::object;
Expand Down