# 1. Introduction to IoT: Accessing IoT Data

See the IoT Data Analysis guide: [`IoT_Datacamp.md`](../IoT_Datacamp.md).

## 1.1 Data Acquisition

Typical data acquisition with `requests` and `pandas`:

```python
import requests
import pandas as pd

# Option 1: Requests
url = "https://demo.datacamp.com/api/temp?count=3"
r = requests.get(url)
# Extract JSON
r.json()

# Convert JSON to pandas dataframe
df = pd.DataFrame(r.json()).head()

# Option 2: Handle download + conversion with pandas
url = "https://demo.datacamp.com/api/temp?count=3"
df_env = pd.read_json(url)
df_env.head()

# Pandas often takes care of data types, e.g., timestamps
print(df_env.dtypes)
```

To store the data:

```python
# JSON
df_env.to_json("data.json", orient="records")
# CSV
df_temp.to_csv("temperature.csv", index=False)
```

In [2]:
import requests

In [3]:
url = "https://demo.datacamp.com/api/temp?count=3"
r = requests.get(url)
print(r.json())

{'message': 'no Route matched with those values'}


## 1.2 Understanding the Data

In [7]:
import requests
import pandas as pd

In [8]:
DATA_PATH = "../data/"
filename = "environ_MS83200MS_nowind_3m-10min.json"

In [10]:
df = pd.read_json(DATA_PATH+filename)

In [13]:
df.head()

Unnamed: 0,timestamp,precipitation,humidity,radiation,sunshine,pressure,temperature
0,2018-09-01 00:00:00,0.0,95.6,0.0,599.2,1016.3,16.1
1,2018-09-01 00:05:00,0.1,,,,,
2,2018-09-01 00:10:00,0.0,95.5,0.0,600.0,1016.4,16.1
3,2018-09-01 00:15:00,0.0,,,,,
4,2018-09-01 00:20:00,0.0,95.2,0.0,598.9,1016.5,16.1


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26175 entries, 0 to 26174
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   timestamp      26175 non-null  datetime64[ns]
 1   precipitation  26162 non-null  float64       
 2   humidity       13085 non-null  float64       
 3   radiation      13085 non-null  float64       
 4   sunshine       13083 non-null  float64       
 5   pressure       13085 non-null  float64       
 6   temperature    13059 non-null  float64       
dtypes: datetime64[ns](1), float64(6)
memory usage: 1.4 MB


In [16]:
df.describe()

Unnamed: 0,precipitation,humidity,radiation,sunshine,pressure,temperature
count,26162.0,13085.0,13085.0,13083.0,13085.0,13059.0
mean,0.008142,73.785059,118.825518,187.421539,1019.190394,14.06767
std,0.05747,20.232647,201.190397,273.950142,6.711385,6.612924
min,0.0,8.9,0.0,0.0,989.5,-1.8
25%,0.0,57.5,0.0,0.0,1016.0,9.8
50%,0.0,78.9,0.0,0.0,1019.7,13.4
75%,0.0,91.3,161.5,598.9,1023.3,18.9
max,2.7,100.1,928.0,600.0,1039.8,30.4


### 1.3 Introduction to Data Streams with MQTT

Data streams are constant streams of data; e.g.:

- Twitter
- Video
- Sensor IoT data
- Market orders

The MQTT protocol can bee used to deal with them. MQTT = Message Queueing Telemetry Transport. It is used for machine-to-machine communication. Advantages:

- It has a nice **publisher/subscriber** architecture.
- It has a small footprint, it's lightweight.
- It's robust in environments with high latency and low bandwidth.

Concepts:

- There is a server, which is the **broker**; the broker **defines topics**, and any device can **publish to those topics**. Examples of topics: `temperature`, `position`.
- Any device, client, can **subscribe to a topic**.
- Also: **publisher = producer**, **subscriber = consumer**.

Installation of Paho-MQTT, the python library which implements the MQTT protocol:

```bash
python -m pip install paho-mqtt
```

Note for usage: in order to make use of MQTT, we need to set up a broker; we can either install one (e.g., [Eclipse Mosquitto](https://mosquitto.org)), or use available internet brokers created for test purposes:

- [mqtt-dashboard.com](http://www.mqtt-dashboard.com)
- [test.mosquitto.org](https://test.mosquitto.org)
- [iot.eclipse.org](https://iot.eclipse.org)

Interesting links: 

- [MQTT Beginners Guide](https://medium.com/python-point/mqtt-basics-with-python-examples-7c758e605d4).
- [Eclipse Mosquitto: An open source MQTT broker](https://mosquitto.org)


#### Example: Publisher & Subscriber via Test Mosquitto Broker

Source: [MQTT Beginners Guide](https://medium.com/python-point/mqtt-basics-with-python-examples-7c758e605d4); I modified the code.

In this example, 2 publisher scripts publish to a topic on a public broker; then, a subscriber reads from that topic. We need to run each script in a separate shell.

File [`mqtt_publisher_1.py`](mqtt_publisher_1.py):

```python
import time
import json
from random import uniform
import paho.mqtt.client as mqtt 

# Public broker: remove https://www.
#mqttBroker = "test.mosquitto.org"
mqttBroker = "mqtt.eclipseprojects.io"

# Create a client with a name
client = mqtt.Client("Temperature_Inside")
client.connect(mqttBroker) 

# Topic name: we can use any name we want, as long as it is free.
topic_name = "/mqtt/test/temperature"

while True:
    # Measure the value (or generate)
    rand_temp = uniform(20.0, 21.0)
    # Pack it
    packet = {"temperature": rand_temp, "location": "inside"}
    # PUBLISH to broker topic /mqtt/test/temperature
    # The broker creates the topic if not available
    client.publish(topic_name, json.dumps(packet))
    print(f"Just published {str(packet)} to topic {topic_name}")
    time.sleep(1) # 1 sec

```

File [`mqtt_publisher_2.py`](mqtt_publisher_2.py):

```python
import time
import json
from random import randrange
import paho.mqtt.client as mqtt

# Public broker: remove https://www.
#mqttBroker = "test.mosquitto.org"
mqttBroker = "mqtt.eclipseprojects.io"

# Create a client with a name
client = mqtt.Client("Temperature_Outside")
client.connect(mqttBroker)

# Topic name: we can use any name we want, as long as it is free.
topic_name = "/mqtt/test/temperature"

while True:
    # Measure the value (or generate)
    rand_temp = randrange(10)
    # Pack it
    packet = {"temperature": rand_temp, "location": "outside"}
    # PUBLISH to broker topic /mqtt/test/temperature
    # The broker creates the topic if not available
    client.publish(topic_name, json.dumps(packet))
    print(f"Just published {str(packet)} to topic {topic_name}")
    time.sleep(1) # 1 sec

```

File [`mqtt_subscribe.py`](./lab/mqtt_subscribe.py): Note that we can either (1) create a client which runs in a `loop` or (2) create a `callback`. For both cases, a function `on_message()` needs to be defined. This script reads the messages sent by the other two to the topic `/mqtt/test/temperature` hosted in the specified public broker.

```python
import time
import paho.mqtt.client as mqtt
import paho.mqtt.subscribe as subscribe

# We always need on_message with these arguments
# even if they are not used!
def on_message(client, userdata, message):
    # We simple print the message content here = message.payload
    # We can also access the topic name via message.topic
    print(f"Received message: {str(message.payload.decode('utf-8'))}")
    # To parse a JSON: data = json.loads(message.payload)
    # Then we would store it: store.append(data)
    # And finally as a dataframe outside from on_message:
    # df = pd.DataFrame(store)
    # df.to_csv("datastream.csv", index=False)

# Public broker: remove https://www.
#mqttBroker = "test.mosquitto.org"
mqttBroker = "mqtt.eclipseprojects.io"

# Topic name: the name should be the one used by the publishers
topic_name = "/mqtt/test/temperature"

## Option 1: Use a client and a loop
client = False
if client:
    client = mqtt.Client("Smartphone")
    client.connect(mqttBroker)

    # Loop
    client.loop_start()
    client.subscribe(topic_name)
    client.on_message = on_message
    # It means the loop stops after 30 sec!
    # Not that it waits 30 sec after reading once!
    time.sleep(30)
    client.loop_stop()

## Option 2: Use a callback
if not client:
    subscribe.callback(on_message,
                       topics=topic_name,
                       hostname=mqttBroker)

```