# SQL Compatibility

Thus far we have focused on RQL (RAW Query Language) queries.

However, RAW users are not forced to use RQL. In fact, RAW also provides SQL as a first-class query language.

This SQL mode is provided for strict compatibility with SQL standards. It is also helpful when submitting queries to RAW from external tools that only "speak" SQL.

In Jupyter, this is available using the `%%sql` magic.

In SQL mode, however, none of the extensions seen so far are available.
Therefore, to use SQL, users have first to define views in RQL, and then consume them from SQL.
That's because keywords like `READ`, to read data from source, are not part of the SQL specification.

(In later notebooks we will see how to define materialized views and tables in RAW, which provides additional ways to make data available for the SQL mode.)

In [1]:
%load_ext raw_magic

We start by defining a view over a CSV file in RQL as before:

In [2]:
%%view airports

SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")

View "airports" was replaced


Now that `airports` is available, we can query it using SQL.

For this we use `%%sql` instead of `%%rql`:

In [5]:
%%sql

SELECT * FROM airports LIMIT 5

AirportID,Name,City,Country,IATA_FAA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,TZ
1,Goroka,Goroka,Papua New Guinea,GKA,AYGA,-6.081689,145.391881,5282,10.0,U,Pacific/Port_Moresby
2,Madang,Madang,Papua New Guinea,MAG,AYMD,-5.207083,145.7887,20,10.0,U,Pacific/Port_Moresby
3,Mount Hagen,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.826789,144.295861,5388,10.0,U,Pacific/Port_Moresby
4,Nadzab,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569828,146.726242,239,10.0,U,Pacific/Port_Moresby
5,Port Moresby Jacksons Intl,Port Moresby,Papua New Guinea,POM,AYPY,-9.443383,147.22005,146,10.0,U,Pacific/Port_Moresby


If we filter airports by city, we see the SQL syntax being used:

In [6]:
%%sql

SELECT * FROM airports WHERE City = 'Lisbon'

AirportID,Name,City,Country,IATA_FAA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,TZ
1638,Lisboa,Lisbon,Portugal,LIS,LPPT,38.781311,-9.135919,374,0.0,E,Europe/Lisbon
7752,Lisbon Cruise Terminal,Lisbon,Portugal,,N,38.712606,-9.122483,0,0.0,E,Europe/Lisbon


In RQL, we use `""` for strings instead, as in:

In [8]:
%%rql

SELECT * FROM airports WHERE City = "Lisbon"

AirportID,Name,City,Country,IATA_FAA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,TZ
1638,Lisboa,Lisbon,Portugal,LIS,LPPT,38.781311,-9.135919,374,0.0,E,Europe/Lisbon
7752,Lisbon Cruise Terminal,Lisbon,Portugal,,N,38.712606,-9.122483,0,0.0,E,Europe/Lisbon


## SQL Limitations

The SQL layer is meant to conform to the SQL 2003 standard.

The major limitation with the SQL layer is that only RAW entities that are tables are made available to the SQL layer.

A "table" is, in RAW, a collection of records, whose fields are all primitive types.

In [11]:
%%query_validate

airports

collection(
    record(
        AirportID: int,
        Name: string,
        City: string,
        Country: string,
        IATA_FAA: string,
        ICAO: string,
        Latitude: double,
        Longitude: double,
        Altitude: int,
        Timezone: double,
        DST: string,
        TZ: string))


The view `airports` is compatible with SQL.

Let's define a new view that is not SQL compatible:

In [12]:
%%view airports_1

SELECT City, * FROM airports GROUP BY City

In [15]:
%%query_validate

airports_1

collection(
    record(
        City: string,
        _2: collection(
            record(
                AirportID: int,
                Name: string,
                City: string,
                Country: string,
                IATA_FAA: string,
                ICAO: string,
                Latitude: double,
                Longitude: double,
                Altitude: int,
                Timezone: double,
                DST: string,
                TZ: string))))


This type is not SQL compatible, and therefore, not made available to query from the SQL layer.

In [16]:
%%sql

SELECT * FROM airports_1

org.apache.calcite.runtime.CalciteContextException: From line 3, column 15 to line 3, column 24: Object 'AIRPORTS_1' not found. Positions: 3:15 to 3:24
  3: SELECT * FROM airports_1
                   ^^^^^^^^^^


**Next:** [Caching](Caching.ipynb)