## Building an IoT Analytics Pipeline

### Overview
The term Internet of Things (IoT) refers to the interconnection of physical devices with the global Internet. These devices are equipped with sensors and networking hardware, and each is globally identifiable. Taken together, these capabilities afford rich data about items in the physical world.

Cloud IoT Core is a fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices. The service connects IoT devices that use the standard Message Queue Telemetry Transport (MQTT) protocol to other Google Cloud Platform data services.

Cloud IoT Core has two main components:
- A device manager for registering devices with the service, so you can then monitor and configure them.
- A protocol bridge that supports MQTT, which devices can use to connect to the Google Cloud Platform.

#### Objectives:
- Connect and manage MQTT-based devices using Cloud IoT Core (we will use simulated devices.)
- Ingest a stream of information from Cloud IoT Core using Cloud Pub/Sub.
- Process the IoT data using Cloud Dataflow.
- Analyze the IoT data using BigQuery.

#### GCP Products:
- BigQuery
- Cloud Pub/Sub
- Dataflow
- IoT

1.) Enable:
- Google Cloud IoT API
- Google Cloud Pub/Sub API
- Dataflow API

2.) Create a Cloud Pub/Sub Topic
- In the GCP Console, go to Navigation menu> Pub/Sub> Topics.
- Click Create Topic. The Create a topic dialog shows you a partial URL path, consisting of projects/ followed by your project name and a trailing slash, then topics/ . Confirm that the project name is the one you noted above.
```
# topic name
iotlab
```
- Edit topic Permissions (three dots) and add: cloud-iot@system.gserviceaccount.com
- Grant the new member the Pub/Sub > Pub/Sub Publisher role. Click Add.

3.) Create a BigQuery dataset
- In the GCP Console, go to Navigation menu> BigQuery.
- In the left-hand side of the browser window, make sure your project isn't set to "Qwiklabs Resources". If it is, click on the blue arrow next to the name of your project and select Switch to project > your-qwiklabs-project.
- Click on the blue arrow next to the name of your project and select Create new dataset.
- Give the new dataset the name iotlab and click OK.
- When the dataset is created, to the right of iotlab, click the "Add table" icon. The Create Table dialog opens.
- In the Source Data section, click Create empty table.
- In the Destination Table section's Table name field, enter sensordata.
- In the Schema section, enter timestamp for the field name. Set the field's Type to TIMESTAMP.
- Click the Add Field button.
- In the newly created line, enter device for the field name. Set the field's Type to STRING.
- Click the Add Field button.
- In the newly created line, enter temperature for the field name. Set the field's Type to FLOAT.
- Leave the other defaults unmodified. Click Create Table.

4.) Create a Cloud Storage Bucket
- Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download. In this lab we will use Cloud Storage to provide working space for our Cloud Dataflow pipeline.
- In the GCP Console, go to Navigation menu > Storage.
- Click CREATE BUCKET.
- For Name, paste in your GCP project ID.
- For Default storage class, click Multi-regional if it is not already selected.
- For Location, choose the selection closest to you.
- Click Create.

5.) Set up a Cloud Dataflow Pipeline
- Cloud Dataflow is a serverless way to carry out data analysis. In this lab, you will set up a streaming data pipeline to read sensor data from Pub/Sub, compute the maximum temperature within a time window, and write this out to BigQuery.
- In the GCP Console, go to Navigation menu > Dataflow.
- In the top menu bar, click CREATE JOB FROM TEMPLATE.
- In the job-creation dialog, for Job name, enter iotlab.
- For Cloud Dataflow template, choose PubSub to BigQuery. When you choose this template, the form updates to review new fields below.
- For Cloud Dataflow Regional Endpoint, choose the region closest to you.
- For Cloud Pub/Sub input topic, enter projects/ followed by your GCP project ID then add /topics/iotlab . The resulting string will look like this: projects/qwiklabs-gcp-d2e509fed105b3ed/topics/iotlab
- For BigQuery output table, enter your GCP project ID followed by :iotlab.sensordata. The resulting string will look like this: qwiklabs-gcp-d2e509fed105b3ed:iotlab.sensordata
- For Temporary location, enter gs:// ollowed by your GCP project ID and then /tmp/. The resulting string will look like this: gs://qwiklabs-gcp-d2e509fed105b3ed/tmp

Click on the URL shown to open a new browser window that displays a verification code.

Use your browser to copy the verification code.

Paste the verification code in response to the "Enter verification code:" prompt and press Enter.

In response to "Pick cloud project to use," pick the GCP project that Qwiklabs created for you.

Enter this command to make sure that the components of the SDK are up to date:/
- Click Optional parameters.
- For Max workers, enter 2.
- For Machine type, enter n1-standard-1.
- Click Run Job. A new streaming job is started. You can now see a visual representation of the data pipeline.

6.) Prepare Your Compute Engine VM
- In your project, a pre-provisioned VM instance named __iot-device-simulator__ will let you run instances of a Python script that emulate an MQ```TT-connected IoT device. Before you emulate the devices, you will also use this VM instance to populate your Cloud IoT Core device registry.
- To connect to the __iot-device-simulator__ VM instance:
- In the GCP Console, go to Navigation menu > Compute Engine> VM Instances. You'll see your VM instance listed as iot-device-simulator.
- To the right, click the SSH drop-down arrow and select Open in browser window.
- In your SSH session on the iot-device-simulator VM instance, enter this command to remove the default Google Cloud Platform SDK installation. (In subsequent steps, you will install the latest version, including the beta component.)

```
sudo apt-get remove google-cloud-sdk -y
curl https://sdk.cloud.google.com | bash
exit
gcloud init
```

- Click on the URL shown to open a new browser window that displays a verification code.
- Use your browser to copy the verification code.
- Paste the verification code in response to the "Enter verification code:" prompt and press Enter.
- In response to "Pick cloud project to use," pick the GCP project that Qwiklabs created for you.
- Enter this command to make sure that the components of the SDK are up to date:

```
gcloud components update
gcloud components install beta
sudo apt-get update
# check packages
sudo apt-get install python-pip openssl git -y
sudo pip install pyjwt paho-mqtt cryptography
git clone http://github.com/GoogleCloudPlatform/training-data-analyst
```

7.) Create a Registry for IoT Devices
- To register devices, you must create a registry for the devices. The registry is a point of control for devices.
- To create the registry:
- In your SSH session on the iot-device-simulator VM instance, run the following, adding your project ID as the value for PROJECT_ID:

```
export PROJECT_ID=
export MY_REGION=

# create registry
gcloud beta iot registries create iotlab-registry \
   --project=$PROJECT_ID \
   --region=$MY_REGION \
   --event-notification-config=topic=projects/$PROJECT_ID/topics/iotlab
```

8.) Create a Cryptographic Keypair
- To allow IoT devices to connect securely to Cloud IoT Core, you must create a cryptographic keypair.
- In your SSH session on the iot-device-simulator VM instance, enter these commands to create the keypair in the appropriate directory:

```
cd $HOME/training-data-analyst/quests/iotlab/
openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem \
    -nodes -out rsa_cert.pem -subj "/CN=unused"
```

9.) Add Simulated Devices to the Registry

- For a device to be able to connect to Cloud IoT Core, it must first be added to the registry.
- In your SSH session on the iot-device-simulator VM instance, enter this command to create a device called temp-sensor-buenos-aires:

```
# create device called "temp-sensor-buenos-aires"
gcloud beta iot devices create temp-sensor-buenos-aires \
  --project=$PROJECT_ID \
  --region=$MY_REGION \
  --registry=iotlab-registry \
  --public-key path=rsa_cert.pem,type=rs256
  
# create device called "temp-sensor-istanbul"
gcloud beta iot devices create temp-sensor-istanbul \
  --project=$PROJECT_ID \
  --region=$MY_REGION \
  --registry=iotlab-registry \
  --public-key path=rsa_cert.pem,type=rs256
```

10.) Run Simulated Devices
- In your SSH session on the iot-device-simulator VM instance, enter these commands to download the CA root certificates from pki.google.com to the appropriate directory:

```
cd $HOME/training-data-analyst/quests/iotlab/
wget https://pki.google.com/roots.pem

# run first device
python cloudiot_mqtt_example_json.py \
   --project_id=$PROJECT_ID \
   --cloud_region=$MY_REGION \
   --registry_id=iotlab-registry \
   --device_id=temp-sensor-buenos-aires \
   --private_key_file=rsa_private.pem \
   --message_type=event \
   --algorithm=RS256 > buenos-aires-log.txt 2>&1 &
   
# run second device
python cloudiot_mqtt_example_json.py \
   --project_id=$PROJECT_ID \
   --cloud_region=$MY_REGION \
   --registry_id=iotlab-registry \
   --device_id=temp-sensor-istanbul \
   --private_key_file=rsa_private.pem \
   --message_type=event \
   --algorithm=RS256
```
- Telemetry data will flow from the simulated devices through Cloud IoT Core to your Cloud Pub/Sub topic. In turn, your Dataflow job will read messages from your Pub/Sub topic and write their contents to your BigQuery table.

11.) Analyze the Sensor Data Using BigQuery
- In the GCP Console, on the Navigation menu> BigQuery.
- In the left-hand side of the browser window, click on Compose Query.
- Enter the following query:
```
#standardsql
SELECT timestamp, device, temperature from iotlab.sensordata
ORDER BY timestamp DESC
LIMIT 100
```