# **KoçDigital Sandbox Services** 
Data and AI Experiences Made Easy!

**Create a Data Pipeline with AWS Services**

![](./img/aws.png)

**Architecture Review**

![](./img/aws_architecture.png)

**Pipeline Components:**

* Amazon S3
* Glue
* Sagemaker
* Athena
* Lambda
* Step Functions
* Cloudwatch

**Step-1: Explore Model Input**

Check model input on S3

s3://sagemaker-eu-central-1-160696311973/test/x_test_new.csv


s3://sagemaker-eu-central-1-160696311973/train/y_train_new.csv

**Step-2: Query model input using Athena**

* Check sandbox-db database on Glue

  Defining database : https://docs.aws.amazon.com/glue/latest/dg/define-database.html



![](./img/aws_database.png)
![](./img/aws_database2.png)








* Execute "sandbox-input-data" crawler to create queryable table for input data 

  Adding crawler : https://docs.aws.amazon.com/glue/latest/ug/tutorial-add-crawler.html
        
![](./img/crawler.png)




* Check created tables after crawler stopped

![](./img/table.PNG)




* Open Athena query your data using sql

  Setting up Athena : https://docs.aws.amazon.com/athena/latest/ug/setting-up.html


![](./img/athena.PNG)




**Step-2: Transform input data with Glue Jobs**

* Examine the transformation code previously written on Glue : glue-sandbox-aws-exp

  Adding jobs: https://docs.aws.amazon.com/glue/latest/dg/add-job.html
  
  
* Start job wait for completion
  
* After job succeeded , check processed output on S3

  s3://sandbox-aws-experience/processed_data/x_test_processed/
  s3://sandbox-aws-experience/processed_data/x_train_processed/


* Execute "sandbox-processed-data" crawler to create tables

* Check processed tables and query them from Athena

![](./img/athena2.PNG)


**Step-3: Run Model on Sagemaker**

* Sagemaker : https://docs.aws.amazon.com/sagemaker/index.html

* Open Sagemaker and open "sandbox-exp" notebook which is in notebook instances

![](./img/sagemaker1.PNG)


* Open tf-eager.ipynb notebook

![](./img/sagemaker2.PNG)

* Follow steps in notebook

  **TensorFlow Eager Execution with Amazon SageMaker Script Mode and Automatic Model Tuning**
  
* Execute "sandbox-model-output" crawler and check model output table on Athena 

![](./img/athena3.PNG)


**Step-4: Orchestration with Step Function**

* Step Function : https://docs.aws.amazon.com/step-functions/latest/dg/getting-started.html

* Open step function service, create state machine by using workflow studio with the following steps

  Crawler  - sandbox-input-data
  
  Glue Job - glue-sandbox-aws-exp
  
  Crawler  - sandbox-processed-data
  
  Sagemaker- Tensorflow Model
  
  Crawler  - sandbox-model-output
  
  ![](./img/step.PNG)


**Step-5: Schedule with Amazon EventBridge**

* Create rule to schedule main state machine which is created in previous step
  
  Event Bridge : https://docs.aws.amazon.com/eventbridge/index.html


  ![](./img/eventbridge.png)
