# Task 1 - Introduction to Flink SQL
Your coworker Peter wants to learn the basics of data science. For this purpose, he wants to experiment on a public dataset - Iris. The Iris dataset consists of 50 samples from each of 3 species of Iris, each sample has 4 measurements: the length and the width of the sepals and petals (in centimetres). He already found some correlations and needs to extract only the most significant columns. Please help him. Start by downloading data into `task1/data/` folder.

In [None]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -O /home/jovyan/task1/data/iris.csv

## Project initialization
The line below we use to load jupyter magics.

In [None]:
%reload_ext streaming_jupyter_integrations.magics

We use `%flink_connect` to initialize local environment.

In [None]:
%flink_connect

## Data definition
We have the table definition below. You have to fill this DDL in with proper data types. You can find list of available ones [HERE](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/types/).

More about FileSystem connector and its properties you can read [HERE](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/).

In [None]:
%%flink_execute_sql
CREATE TABLE iris_input (
    sepal_length DECIMAL(3,1),
    sepal_width DECIMAL(3,1),
    petal_length DECIMAL(3,1),
    petal_width DECIMAL(3,1),
    class STRING
) WITH (
    'connector' = 'filesystem',
    'path' = 'file:///home/jovyan/task1/data/',
    'format' = 'csv',
    'csv.ignore-parse-errors' = 'true' -- we have some empty lines at the end of file
)

Now you can query the table and validate results. If they are invalid, feel free to `DELETE TABLE iris_input`, fix the definition and try again.

In [None]:
%%flink_execute_sql
SELECT
    *
FROM
    iris_input

## Save data
Now you can save the most important data to the new location in JSON format. Peter said he needs only three columns: class, petal length and petal width. Let's create the output table definition.

In [None]:
%%flink_execute_sql
CREATE TABLE iris_output (
    class STRING,
    petal_length DECIMAL(3,1),
    petal_width DECIMAL(3,1)
) WITH (
    'connector' = 'filesystem',
    'format' = 'json',
    'path' = 'file:///home/jovyan/task1/output/'
)

Then process the data.

In [None]:
%%flink_execute_sql
INSERT INTO iris_output
SELECT
    class,
    petal_length,
    petal_width
FROM
    iris_input

## Filters&Transformations
Peter has one more request. He would like to get all measurements in millimetres and filter the rows where `petal_width` is greater than 2 cm. Please help him.

In [None]:
%%flink_execute_sql
SELECT
    sepal_length * 10 AS sepal_length,
    sepal_width * 10 AS sepal_width,
    petal_length * 10 AS petal_length,
    petal_width * 10 AS petal_width,
    class
FROM
    iris_input
WHERE
    petal_width > 2