Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql): extend create table statement syntax to support "IN VOLUME volume-alias" #2931

Merged
merged 298 commits into from
Feb 15, 2023

Conversation

marregui
Copy link
Contributor

@marregui marregui commented Jan 18, 2023

Summary

This PR proposes a syntax extension for the create table statement.

It addresses the issue #2884 and is complementary to PR #2710 .

Documented https://github.com/questdb/questdb.io/pull/1341

Create table syntax

All these are valid statements (also with WAL, and IF NOT EXISTS):

create table table (i symbol, ts timestamp), index(i capacity 32) timestamp(ts) partition by day with maxUncommittedRows=7, o3MaxLag=12d, in volume SECONDARY_VOLUME;

create table table (i symbol, ts timestamp), index(i capacity 32) with maxUncommittedRows=7, o3MaxLag=12d, in volume SECONDARY_VOLUME;

create table table (i symbol, ts timestamp), index(i capacity 32) with maxUncommittedRows=7, in volume SECONDARY_VOLUME;

create table table (i symbol, ts timestamp), index(i capacity 32) in volume SECONDARY_VOLUME;

create table table (i symbol, ts timestamp), index(i capacity 32) timestamp(ts) in volume SECONDARY_VOLUME;

...

create table table (i symbol, ts timestamp) in volume 'SECONDARY_VOLUME';
create table table (i symbol, ts timestamp) in volume SECONDARY_VOLUME;

The with part of the statement can happen independently of the partition by statement, where before it could not. If in volume is present along with with, a ',' is required (as if it were part of the with clause). If with is not present, then in volume must not have the ',' before it. in volume is always the last part of the create table statement, it is optional.
Because the volume path can be quoted, any name is valid for the alias, including names that contain white space.

Behavior

The table will be created in the target volume, and a symbolik link will be created in the table's standard root folder to point to it. From this point forward the table behaves the same way as if it had been created in the main (default) volume, with one exception, drop table will unlink the table, but the data will remain intact in its volume. This will be a problem if you need to create the table again with the same name in that volume. This is intentional and requires that you go manually to the volume and either rename the table folder, or delete it.

Configuration

  • new server.conf attribute cairo.volumes supports a comma separated list of entries:

    cairo.volumes=SECONDARY_VOLUME -> /Users/Godzilla/left/paw, BIN -> /var/bin, ...
    

    where the token before the arrow is an alias to the absolute path found after the arrow. Either token can be optionally single quoted.

  • aliases are case insensitive, you can use the alias typed the way you want, all variations refer to the same alias.

  • volume paths must be absolute and must exist at bootstap time, as well as at the time of creating the table.

  • by default cairo.volumes is an empty list, which means the feature is disabled.

In docker container

Given this Dockerfile:

FROM python:3.10.9-slim-buster

EXPOSE 8888/tcp
EXPOSE 8812/tcp
EXPOSE 9000/tcp
EXPOSE 9009/tcp

ENV ARCHITECTURE=x64
ENV PYTHONUNBUFFERED 1
ENV JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto
ENV PATH="$JAVA_HOME/bin:${PATH}"

# Update system
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get -y --no-install-recommends install syslog-ng ca-certificates git curl wget vim procps gnupg2 lsb-release software-properties-common unzip less tar gzip iputils-ping

# Install JDK
RUN wget -O- https://apt.corretto.aws/corretto.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/winehq.gpg >/dev/null && \
    add-apt-repository 'deb https://apt.corretto.aws stable main' && \
    apt-get update && \
    apt-get install -y java-17-amazon-corretto-jdk=1:17.0.3.6-1

# Clean after packages installation
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*

# Aliases
RUN echo "alias l='ls -l'" >> ~/.bashrc
RUN echo "alias ll='ls -la'" >> ~/.bashrc
RUN echo "alias rm='rm -i'" >> ~/.bashrc

WORKDIR /opt

# Install QuestDB
COPY questdb-6.7.1-SNAPSHOT-no-jre-bin.tar.gz questdb.tar.gz
RUN tar xvfz questdb.tar.gz
RUN rm questdb.tar.gz
RUN mv questdb-6.7.1-SNAPSHOT-no-jre-bin questdb

# Configure QuestDB
RUN ulimit -S unlimited
RUN ulimit -H unlimited
RUN mkdir csv
RUN mkdir tmp
RUN mkdir backups
RUN mkdir volume0
RUN mkdir volume1
RUN mkdir questdb/db
RUN mkdir questdb/conf
RUN echo "config.validation.strict=true" > questdb/conf/server.conf
RUN echo "query.timeout.sec=120" >> questdb/conf/server.conf
RUN echo "cairo.volumes=volume0->/opt/volume0, volume1->/opt/volume1" >> questdb/conf/server.conf
RUN echo "cairo.sql.copy.root=/opt/csv" >> questdb/conf/server.conf
RUN echo "cairo.sql.copy.work.root=/opt/tmp" >> questdb/conf/server.conf
RUN echo "cairo.sql.backup.root=/opt/backups" >> questdb/conf/server.conf

# Install requirements.txt
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-compile --only-binary :all: -r requirements.txt
RUN jupyter-lab --generate-config && sed -i -e "s|# c.ServerApp.allow_remote_access = False|# c.ServerApp.allow_remote_access = True|g" /root/.jupyter/jupyter_lab_config.py

# Create run.sh script
RUN echo "#!/bin/bash" > run.sh
RUN echo "/opt/questdb/questdb.sh start -d /opt/questdb" >> run.sh
RUN echo "jupyter-lab --allow-root --ip 0.0.0.0 --port 8888 --no-browser --notebook-dir /opt/notebooks /opt/notebooks/play.ipynb" >> run.sh
RUN chmod 700 run.sh

COPY notebooks notebooks
CMD ["/bin/bash", "-c", "/opt/run.sh"]

(you would need to build questdb-6.7.1-SNAPSHOT-no-jre-bin.tar.gz questdb.tar.gz from my branch and place it alongside the other ^ ^ Dockerfile)

(you would build the image: docker build -t io.questdb.play:1.0-SNAPSHOT .)

This command will work just fine:

docker run --rm \
    -p 8888:8888 \
    -p 8812:8812 \
    -p 9009:9009 \
    -p 9000:9000 \
    --name questdb-play \
    -v /Users/marregui/QUEST/db:/opt/questdb/db \
    -v /Users/marregui/QUEST/notebooks:/opt/notebooks \
    -v /Users/marregui/QUEST/backups:/opt/backups \
    -v /Users/marregui/QUEST/csv:/opt/csv \
    -v /Users/marregui/OTHER/volume0:/opt/volume0 \
    -v /Users/marregui/OTHER/volume1:/opt/volume1 \
    -it io.questdb.play:1.0-SNAPSHOT

segregating the QuestDB file system at that level of granularity.

Notice the cairo.volumes=volume0->/opt/volume0, volume1->/opt/volume1" in server.conf .

Then these queries will work from the web console:

create table if not exists trade1 (
    sym symbol index capacity 128,
    px long,
    qty int,
    leverage float,
    ts timestamp
) timestamp(ts) partition by day in volume volume0;

insert into trade1 values('A', 200000, 3, 0.9,  '2022-12-31T12:00:00.000000Z');
insert into trade1 values('B', 198000, 4, 0.97, '2022-12-31T12:00:10.000000Z');
insert into trade1 values('C', 199000, 4, 0.85, '2022-12-31T12:00:11.000000Z');
insert into trade1 values('A', 200000, 3, 0.9,  '2023-01-01T12:00:00.000000Z');
insert into trade1 values('B', 198000, 4, 0.97, '2023-01-01T12:00:10.000000Z');
insert into trade1 values('C', 199000, 4, 0.85, '2023-01-01T12:00:11.000000Z');

create table if not exists byb_ltinfo (
    sym symbol index capacity 128,
    price long,
    quantity int,
    ts timestamp
) timestamp(ts) partition by day in volume volume1;


insert into byb_ltinfo values('A', 200300, 2, '2022-12-31T12:00:01.000000Z');
insert into byb_ltinfo values('B', 198200, 1, '2022-12-31T12:00:11.000000Z');
insert into byb_ltinfo values('C', 199000, 1, '2022-12-31T12:00:12.000000Z');
insert into byb_ltinfo values('A', 200300, 2, '2023-01-01T12:00:01.000000Z');
insert into byb_ltinfo values('B', 198200, 1, '2023-01-01T12:00:11.000000Z');
insert into byb_ltinfo values('C', 199000, 1, '2023-01-01T12:00:12.000000Z');

each table will be in a different volume.

…ze as a masked size, with 62nd bit -> partition read-only
@marregui marregui requested a review from ideoma February 2, 2023 15:13
@marregui marregui requested a review from ideoma February 2, 2023 18:48
core/Dockerfile Outdated Show resolved Hide resolved
@ideoma
Copy link
Collaborator

ideoma commented Feb 15, 2023

[PR Coverage check]

😍 pass : 320 / 343 (93.29%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 2 50.00%
🔵 io/questdb/cairo/TableNameRegistryFileStore.java 11 15 73.33%
🔵 io/questdb/cairo/CairoEngine.java 45 52 86.54%
🔵 io/questdb/griffin/SqlCompiler.java 22 25 88.00%
🔵 io/questdb/cairo/TableUtils.java 80 87 91.95%
🔵 io/questdb/griffin/SqlParser.java 33 34 97.06%
🔵 io/questdb/griffin/engine/functions/catalogue/TableListFunctionFactory.java 28 28 100.00%
🔵 io/questdb/Bootstrap.java 2 2 100.00%
🔵 io/questdb/PropServerConfiguration.java 3 3 100.00%
🔵 io/questdb/cairo/ReverseTableMapItem.java 2 2 100.00%
🔵 io/questdb/ServerConfigurationException.java 1 1 100.00%
🔵 io/questdb/VolumeDefinitions.java 66 66 100.00%
🔵 io/questdb/std/FilesFacadeImpl.java 1 1 100.00%
🔵 io/questdb/std/Files.java 6 6 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/cutlass/text/ParallelCsvFileImporter.java 1 1 100.00%
🔵 io/questdb/griffin/model/CreateTableModel.java 8 8 100.00%
🔵 io/questdb/griffin/SqlKeywords.java 9 9 100.00%

@bluestreak01 bluestreak01 merged commit 21ca8e9 into master Feb 15, 2023
@bluestreak01 bluestreak01 deleted the ma/in-volume branch February 15, 2023 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New feature Feature requests SQL Issues or changes relating to SQL execution
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet