# New Tutorial - QFrame

## What is a QFrame?
QFrame is a class which generates an SQL statement. It stores fields info in `QFrame.data` parameter which is a dictionary.

`QFrame.data` has `select` key in which it stores `fields` which we want to have in our SQL statement. Each key have to have specified `type` which can be 'dim' if the varibale is a dimension variable or 'num' if the variable is a numeric variable. Let's take a look at all options that we can have under `select` and `fields` keys.

```json
{
  "select": {
    "table": "table",
    "schema": "schema",
    "fields": {
      "column": {
        "type": "dim",
        "as": "",
        "group_by": "",
        "order_by": "",
        "expression": "",
        "select": "",
        "custom_type": ""
      }
    },
    "where": "",
    "distinct": "",
    "having": "",
    "limit": ""
  }
}
```

- `table` - Name of the table.
- `schema` - Name of the schema.
- `fields`, in each field:
    - `type` - Type of the column. Options:

        - 'dim' - VARCHAR(500)  
        - 'num' - FLOAT
     
     Every column has to have specified type. If you want to sepcify another type check `custom_type`.
    - `as` - Column alias (name).

    - `group_by` - Aggregation type. Possibilities:

        - 'group' - This field will go to GROUP BY statement.
        - {'sum', 'count', 'min', 'max', 'avg'} - This field will by aggregated in specified way.
  
     If you don't want to aggregate fields leave `group_by` empty in each field.
    - `order_by` - Put the field in order by statement. Options:
    
        - 'ASC'
        - 'DESC'
        
    - `expression` - Expression, eg. CASE statement, column operation, CONCAT statement, ... .
    - `select` - Set 0 if you don't want to put this field in SELECT statement.
    - `custom_type` - Specify custom SQL data type, eg. DATE.
- `where` - Add where statement, eg. 'sales>100'
- `distinct` - Set 1 to add distinct to select
- `having` - Add having statement, eg. 'sum(sales)>100'
- `limit` - Add limit, eg. 100

## How to create a QFrame?
You can create a QFrame manually - passing the data directly to QFrame or automatically - using `initiate` function.

In [1]:
from grizly import (
    get_path, 
    QFrame
)

### Manually - using dictionary

This method is the most direct method of creating a QFrame - to use it you need to know the structure of `QFrame.data`. From following dictionary

In [2]:
data = {
  "select": {
    "table": "table",
    "schema": "schema",
    "fields": {
      "col": {
        "type": "dim"
      }
    }
  }
}

QFrame will generate a simple sql

In [3]:
qf = QFrame().read_dict(data)
qf.get_sql()

SELECT col
FROM schema.table


<grizly.qframe.QFrame at 0x2167c0fe7c8>

Here we also used simple method `.get_sql()` which prints sql saved in QFrame.

### Manually - using JSON file

We use a `.json` file to conviniently manipulate information about columns, renames and other things that might be very verbose to manipulate in python code. We can edit the json file into a json editor like http://jsoneditoronline.org/ more conviniently than in Python code.

After editing the `store.json` we can read it back inside a QFrame using `read_json()`.

This means we can use our json as our main `store` of verbose information and python as our main way to manipulate said information.

In [4]:
json_path = get_path("dev", "grizly", "notebooks","store.json")
qf.save_json(json_path=json_path, subquery="my_query_1")

qf = QFrame().read_json(json_path=json_path, subquery="my_query_1")
qf.get_sql()

Data saved in C:\Users\TE386850\dev\grizly\notebooks\store.json
SELECT col
FROM schema.table


<grizly.qframe.QFrame at 0x21676c46c48>

### Automatically - using initiate funtion

The other way to generate a QFrame is to use `initiate` function. You can use it in two ways. First is to pass the column names directly. 

In [5]:
from grizly import initiate

initiate(columns=["col1", "col2"], 
         schema="schema", 
         table="table", 
         json_path=json_path,
         subquery="my_query_2")

qf = QFrame().read_json(json_path=json_path, subquery="my_query_2")
qf.get_sql()

Data saved in C:\Users\TE386850\dev\grizly\notebooks\store.json
SELECT col1,
       col2
FROM schema.table


<grizly.qframe.QFrame at 0x2167c127408>

The second way is to use `get_columns` function which will import all names of the columns in given table, also with the types.

In [6]:
from grizly import get_columns

columns, col_types = get_columns(table='table_tutorial',
                                 schema='administration',
                                 column_types=True,
                                 db='redshift')
initiate(columns=columns,
         col_types=col_types,
         schema="administration", 
         table="table_tutorial", 
         json_path=json_path,
         subquery="my_query_3")

qf = QFrame().read_json(json_path=json_path, subquery="my_query_3")
qf.get_sql()

  "detect unicode returns: %r" % de


Data saved in C:\Users\TE386850\dev\grizly\notebooks\store.json
SELECT col1,
       col2,
       col3,
       col4
FROM administration.table_tutorial


<grizly.qframe.QFrame at 0x2167d769648>

## Working with the QFrame
There is a lot of methods which you can use to edit the QFrame. You can check them in QFrame docs. In this tutorial we will only show some of them.

### Doing some basic SQL stuff
Let's now add a `where` statement, rename some fields, add calculated field, remove some fields and add `limit`.

In [7]:
qf.query("col2 > 1") #<- where
qf.rename({"col1": "items", "col2": "price"})
qf.assign(calculated_field = "col4*2", type='num', custom_type='double precision')
qf.remove(["col3", "col4"])
qf.limit(10)
qf.get_sql()

SELECT col1 AS items,
       col2 AS price,
       col4*2 AS calculated_field
FROM administration.table_tutorial
WHERE col2 > 1
LIMIT 10


<grizly.qframe.QFrame at 0x2167d769648>

:Be aware that `rename()` method doesn't change the name of the field but only the alias (final name) of the column.

Now you can check how the data changed calling `data` attribute. Using IPython at this point gives more readable view.

In [8]:
from IPython import display

display.JSON(data=qf.data)

<IPython.core.display.JSON object>

You can see that now we also have `sql_blocks` key. You can ignore it. This key is used to build SQL statement and is generated any time `get_sql()` method is called.

### Forking

Forking qframes can be important if your data workflow needs to take the same sql table and apply different transformations to it.

Sometimes we want to fork, do some transforms, then union the QFrames back together which results into an append operation on the data side.

Let's create two copies of one QFrame.

In [9]:
qf1 = qf.copy()
qf2 = qf.copy()

## Unioning data

There are two ways of unioning two QFrames - we can union by the position of the field or by the final name of the columns (that means the alias). 

In [10]:
from grizly import union

qf1.rename({"col2": "price_1", "calculated_field": "price_2"})
qf2.rename({"col2": "price_2", "calculated_field": "price_1"})

<grizly.qframe.QFrame at 0x2167c12f3c8>

#### Union by the positon

In [11]:
uqf_pos = union(qframes=[qf1, qf2], union_type="UNION ALL", union_by='position')
uqf_pos.get_sql()

Data unioned successfully.
SELECT col1 AS items,
       col2 AS price_1,
       col4*2 AS price_2
FROM administration.table_tutorial
WHERE col2 > 1
LIMIT 10
UNION ALL
SELECT col1 AS items,
       col2 AS price_2,
       col4*2 AS price_1
FROM administration.table_tutorial
WHERE col2 > 1
LIMIT 10


<grizly.qframe.QFrame at 0x2167c127248>

In [12]:
display.JSON(data=uqf_pos.data)

<IPython.core.display.JSON object>

#### Union by the column names

In [13]:
uqf_name = union(qframes=[qf1, qf2], union_type="UNION ALL", union_by='name')
uqf_name.get_sql()

Data unioned successfully.
SELECT col1 AS items,
       col2 AS price_1,
       col4*2 AS price_2
FROM administration.table_tutorial
WHERE col2 > 1
LIMIT 10
UNION ALL
SELECT col1 AS items,
       col4*2 AS price_1,
       col2 AS price_2
FROM administration.table_tutorial
WHERE col2 > 1
LIMIT 10


<grizly.qframe.QFrame at 0x2167d787ec8>

In [14]:
display.JSON(data=uqf_name.data)

<IPython.core.display.JSON object>

You can see that in this case union changes the order of the columns. 