# Scenario 1

## Step 1: Create a MySQL database in CloudSQL

The first step is to prepare our CloudSQL-MySQL environment. This step is not part of building a data warehouse. However, to simulate table extraction from application databases to GCS, this will be very helpful. So, let's start by creating the Cloud SQL instance. Here are the steps:

1. Create a CloudSQL instance.
2. Connect to the MySQL instance.
3. Create a MySQL database.
4. Create a table in the MySQL database.
5. Import CSV data into the MySQL database.

### Create a CloudSQL instance

In [None]:
!gcloud sql instances create mysql-instance-source \
--database-version=MYSQL_5_7 \
--tier=db-g1-small \
--region=us-central1 \
--root-password=sparsh123 \
--availability-type=zonal \
--storage-size=10GB \
--storage-type=HDD

Wait for around 5 minutes. After it's finished, refresh your browser or go back to your Cloud SQL home page and you will see that your MySQL instance is ready.

### Connect to the MySQL instance

In [None]:
!gcloud sql connect mysql-instance-source --user=root

### Create a MySQL database

In [None]:
CREATE DATABASE apps_db;
SHOW DATABASE;

### Create a table in the MySQL database

In [None]:
CREATE TABLE apps_db.stations(
station_id varchar(255),
name varchar(255),
region_id varchar(10),
capacity integer
);

### Import CSV data into the MySQL database

In the real-life scenario, the tables will be used by applications, and the data will be inserted based on user interactions with the database. We will not go back too far to build a sample application that writes records to our table. We will just load CSV files to our tables from GCS.

In a later section, we will export the MySQL table back to GCS, and you may be wondering why. The reason why we are doing this is to simplify the data generation the MySQL database. But for the later steps, given that the MySQL database is a genuine example of a data source, we will use it for simulating the *Extraction* step in an ETL process.

To import the CSV data from GCS to MySQL, we can use the Cloud SQL console. 

Don't close the MySQL shell yet, since we want to check our imported data later using the **SELECT** statement.

To upload the data, go to the Cloud SQL console:

1.  Click the created **mysql-instance** source, and then find and click the **Import **button.
1.  Choose the name of the data file in our GCS bucket under bucket-name/file-name:

    **gs://[your project name]-data-bucket/stations/stations.csv**

1.  Change the **File format** option to **CSV**.
1.  Input the destination database, **apps_db**, and the table name, **stations**.
1.  Once everything is complete, click the **Import** button.
1.  Now we will return to Cloud Shell and try to access the **stations** table. In the MySQL shell, run the following query:

    **mysql> SELECT * FROM apps_db.stations LIMIT 10;**

Make sure you see some data there. Repeat the process if you can't see any records. If successful, exit from the MySQL shell by typing **exit**, as shown in the following code block:

mysql > exit

Now we have a simulation MySQL database as our data source. In the next section, we will do the extraction from MySQL to GCS.

## Step 2: Extract data from MySQL to GCS

In [None]:
!bucket_name=[your gcs bucket name]

!gcloud sql export csv mysql-instance-source \
gs://$bucket_name/mysql_export/stations/20180101/stations.csv \
--database=apps_db \
--offload \
--query='SELECT * FROM stations WHERE station_id <= 200;'

!gcloud sql export csv mysql-instance-source \
gs://$bucket_name/mysql_export/stations/20180102/stations.csv \
--database=apps_db \
--offload \
--query='SELECT * FROM stations WHERE station_id <= 400;'

You can delete the MySQL instance by running the following command:

In [None]:
!gcloud sql instances delete mysql-instance-source

## Step 4 - Create View data mart

In [None]:
CREATE VIEW `[your project id].dm_bikesharing.top_2_region_by_capacity`
AS
SELECT region_id, SUM(capacity) as total_capacity
FROM `[your project id].staging.stations`
WHERE region_id != ''
GROUP BY region_id
ORDER BY total_capacity desc
LIMIT 2;

In [None]:
SELECT * FROM `[your project id].dm_regional_manager.top_2_region_by_capacity`;