# Quick start to SingleStore using Stages

What we will learn in this notebook
1. Sign up for a free trial
2. Create a database and assign compute resources
3. Load your own dataset using Stages
4. Query data

You may also like to see a [recording of this webinar] (https://www.singlestore.com/resources/webinar-getting-started-with-singlestoredb/)

# Step 1- Sign up for a free trial
1. Go to https://www.singlestore.com/cloud-trial/
2. Just answer a few questions
3. Verify your email
4. Log in to portal.singlestore.com

# Step 2- Create a workspace and attach a database to it
In the portal:
1. Create a workspace group by choosing a cloud provider and region
2. On the next page, create a workspace. Change the size of the cluster if needed
3. On the next page, wait for the workspace to be deployed. This may take a few minutes.
4. Once the workspace is deployed, click on " + Create Database"
5. Name your new database and make sure to attach it to the workspace you just created

# Step 3- Load your own dataset using Stages

## Step 3.1
1. Find the feature "Stages" either on left sidebar under your workspace group or top nav bar for your workspace group. 
2. Click on "Upload New File"
3. Drag and drop your CSV into Stages. You will see "File uploaded to Stages successfully". In this example, I use a dataset called foodhub_order.csv, which can be found here
https://s2webinardemos.s3.us-west-2.amazonaws.com/foodhub_order.csv
4. Under Actions for this Stages file, click on "Load to Database"
5. On the "Load Data" screen, choose the same workspace and database that you created previously. Hit "Generate Notebook"
6. This creates a notebook. You will see a "Success" message pop.

The following sections contain code mostly generated by Stages.

## Step 3.2 - Completed for you by Stages- Load data to foodhub_order
This notebook provides step by step instructions for ingesting 'foodhub_order.csv' file to a database.

Start by creating a table to store the ingested data:

In [369]:
%%sql
%%sql
# If rerunning, clean up previously created resources
DROP PIPELINE IF EXISTS foodhub_database.`foodhub_order`;
DROP TABLE IF EXISTS foodhub_database.`foodhub_order`;

## Step 3.3 - Completed for you by Stages- 

In [372]:
%%sql
%%sql
USE foodhub_database;
CREATE TABLE foodhub_database.`foodhub_order` (
	`order_id` bigint(20) NULL,
	`customer_id` bigint(20) NULL,
	`restaurant_name` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,
	`cuisine_type` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,
	`cost_of_the_order` double NULL,
	`day_of_the_week` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,
	`rating` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,
	`food_preparation_time` bigint(20) NULL,
	`delivery_time` bigint(20) NULL,
	 SHARD KEY ()
);

## Step 3.4 - Completed for you by Stages- Create a pipeline that will load file into the database
### [Learn about Load Data with Pipeline through our documentation](https://docs.singlestore.com/managed-service/en/load-data/load-data-with-pipelines.html)

In [374]:
%%sql
%%sql
USE foodhub_database;
CREATE PIPELINE foodhub_database.`foodhub_order`
AS LOAD DATA STAGE 'foodhub_order.csv'
BATCH_INTERVAL 2500
DISABLE OUT_OF_ORDER OPTIMIZATION
DISABLE OFFSETS METADATA GC
SKIP DUPLICATE KEY ERRORS -- SKIP ALL ERRORS can be used to skip all errors that can be tracked through "Monitor the pipeline for errors"
INTO TABLE `foodhub_order`
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' 
LINES TERMINATED BY '\r\n' STARTING BY '' 
IGNORE 1 LINES
FORMAT CSV
(
	`foodhub_order`.`order_id`,
	`foodhub_order`.`customer_id`,
	`foodhub_order`.`restaurant_name`,
	`foodhub_order`.`cuisine_type`,
	`foodhub_order`.`cost_of_the_order`,
	`foodhub_order`.`day_of_the_week`,
	`foodhub_order`.`rating`,
	`foodhub_order`.`food_preparation_time`,
	`foodhub_order`.`delivery_time`
);

## Step 3.5 - Completed for you by Stages- Start the pipeline

In [378]:
%%sql
%%sql
START PIPELINE foodhub_database.`foodhub_order`;

### Monitor the pipeline for errors

In [379]:
%%sql
%%sql
USE foodhub_database;
SELECT * FROM information_schema.pipelines_errors
WHERE pipeline_name ='foodhub_order';

DATABASE_NAME,PIPELINE_NAME,ERROR_UNIX_TIMESTAMP,ERROR_TYPE,ERROR_CODE,ERROR_MESSAGE,ERROR_KIND,STD_ERROR,LOAD_DATA_LINE,LOAD_DATA_LINE_NUMBER,BATCH_ID,ERROR_ID,BATCH_SOURCE_PARTITION_ID,BATCH_EARLIEST_OFFSET,BATCH_LATEST_OFFSET,HOST,PORT,PARTITION


### Check that the data has loaded

In [380]:
%%sql
%%sql
SELECT * FROM foodhub_database.`foodhub_order`
LIMIT 5;

order_id,customer_id,restaurant_name,cuisine_type,cost_of_the_order,day_of_the_week,rating,food_preparation_time,delivery_time
1476849,42052,Pepe Rosso To Go,Italian,13.73,Weekend,Not given,21,28
1476821,80434,Bareburger,American,12.18,Weekend,Not given,33,18
1477535,125123,S'MAC,American,15.57,Weekend,5,34,28
1476915,142273,Pepe Rosso To Go,Italian,15.57,Weekend,5,29,27
1476637,49695,Blue Ribbon Fried Chicken,American,18.24,Weekend,Not given,21,16


# Step 4- Query the data

## Step 4.1- Run your own queries
Examples below

In [429]:
%%sql
%%sql
# What is the average order value?

SELECT AVG (cost_of_the_order)
FROM foodhub_database.`foodhub_order`;

AVG (cost_of_the_order)
16.49885142255006


In [430]:
%%sql
%%sql
# What is the total number of orders?
SELECT COUNT (order_id)
FROM foodhub_database.`foodhub_order`;

COUNT (order_id)
1898


## Step 4.2 (Optional)- Use SQrL to generate queries
1. On the top right, toggle to enable "Code with SQRrL". Type what you'd like to accomplish with the query. Something like this
"I have a table foodhub_database.`foodhub_order` with the fields order_id customer_id restaurant_name cuisine_type	cost_of_the_order. Write a query to find out the most popular cuisine type."
2. Hit "Add Cell" to add this code to your notebook. You should see something similar to the query below

In [None]:
%%sql
%%sql
# SQrL created query- What is the most popular cuisine?
SELECT cuisine_type, COUNT(*) as orders_count
FROM foodhub_database.foodhub_order
GROUP BY cuisine_type
ORDER BY orders_count DESC LIMIT 1;