&copy; 2024 by Pearson Education, Inc. All Rights Reserved. The content in this notebook is based on the book [**Python for Programmers**](https://amzn.to/2VvdnxE).

In [None]:
%%html
<!-- CSS settings for this notbook -->
<style>
    h1 {color:#BB0000}
    h2 {color:purple}
    h3 {color:#0099ff}
    hr {    
        border: 0;
        height: 3px;
        background: #333;
        background-image: linear-gradient(to right, #ccc, black, #ccc);
    }
</style>

# 16. Big Data: Hadoop, Spark, NoSQL and IoT 


In [None]:
# enable high-res images in notebook 
%config InlineBackend.figure_format = 'retina'

# 16.8 Internet of Things and Dashboards

| Small subset of IoT device types and applications |
| --- |
| **activity trackers**—Apple Watch, FitBit, …
| **personal assistants**—Amazon Echo (Alexa), Apple HomePod (Siri), Google Home (Google Assistant)
| **appliances**—ovens, coffee makers, refrigerators, …
| **driverless cars**
| **earthquake sensors**
| **healthcare**—blood glucose monitors for diabetics, blood pressure monitors, electrocardiograms (EKG/ECG), electroencephalograms (EEG), heart monitors, ingestible sensors, pacemakers, sleep trackers, …
| **sensors**—chemical, gas, GPS, humidity, light, motion, pressure, temperature, …
| **smart home**—lights, garage openers, video cameras, doorbells, irrigation controllers, security devices, smart locks, smart plugs, smoke detectors, thermostats, air vents
| **tracking devices**
| **wireless network devices**

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.1 Publish and Subscribe 
* **IoT devices** (and more) commonly communicate via **pub/sub (publisher/subscriber) systems**
* **Publisher** &mdash; Anything that **sends a message** to a **cloud-based service**, which in turn **sends** that **message** to all **subscribers**
    * **Publisher** specifies a **topic** or **channel**
* **Subscriber** specifies one or more **topics** or **channels** for which they’d like to **receive messages**

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.2 Visualizing a PubNub Sample Live Stream with a Freeboard Dashboard
* **PubNub** is geared to **real-time pub/sub applications** 
* Many **use-cases** 
    * IoT, chat, online multiplayer games, social apps, collaborative apps
* Provides several **demo live streams**, including one that **simulates IoT sensors** 
* Common to **visualize** live data **streams** for **monitoring purposes**
* [Here is a **Freeboard.io web-based dashboard**](https://freeboard.io/board/5d335719d4982c3074000469) that &mdash; **without writing code** &mdash; connects to a **live data stream** and **visualizes the data** 
* For each **sensor**, we used a **Gauge**  and a **Sparkline** 
    * **Sparkline** &mdash; **Line graph without axes** that shows **data value changing over time**
* **Lecture Note: Show some of the settings**

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Signing up for Freeboard.io
* [Register for a Freeboard.io 30-day trial](https://freeboard.io/signup)
* Once registered, the **My Freeboards** page appears
* If you’d like, you can click the **Try a Tutorial** button and **visualize data** from your **smartphone**


<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Creating a New Dashboard
* I discuss in detail **how to specify a data source and use it to create a dashboard** in my [**Python Fundamentals LiveLessons videos**](https://learning.oreilly.com/videos/python-fundamentals/9780135917411/9780135917411-PFLL_Lesson16_43) and in [**Python for Programmers, Section 16.8.2**](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/ch16.xhtml#ch16lev2sec31)

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.3 Simulating an Internet-Connected Thermostat in Python (1 of 2)
* Common to use **IoT simulators** for **testing**, especially if you **do not have access to actual devices and sensors** during development
* Many **cloud vendors** have **IoT simulation** capabilities
    * **IBM Watson IoT Platform**, **IOTIFY.io**, ...
* We'll create a **script** that **simulates IoT thermostat** 
    * Uses **`dweet.io`**  to publish periodic **JSON messages**—called **dweets** (like a **tweet from a device**)

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.3 Simulating an Internet-Connected Thermostat in Python (2 of 2)
* We'll simulate a **temperature sensor** that can issue
    * **low-temperature warnings** before pipes freeze
    * **high-temperature warnings** to indicate there might be a fire
* Our **dweets** will contain 
    * **location** 
    * **temperature**
    * **low** or **high temperature warnings** if the temperature drops to **3 degrees Celsius** or rises to **35 degrees Celsius**
* Use **`freeboard.io`** to create a **dashboard**

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Installing Dweepy
* `pip install dweepy`
* [Dweepy documentation](https://github.com/paddycarey/dweepy)

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Invoking the `simulator.py` Script
* Script `simulator.py` simulates our thermostat 
* Invoke script with two command-line arguments

> `ipython simulator.py 1000 1`

* **number of total messages** to simulate 
* **delay** in seconds **between sending dweets**
* Can **immediately begin tracking** messages on the `dweet.io` site at 
> [https://dweet.io/follow/temperature-simulator-deitel-python](https://dweet.io/follow/temperature-simulator-deitel-python-paul)

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Sending Dweets 
* **`dweet.io`** is a public service, so **any app** can **publish** or **subscribe** to messages
* **Do not need to register** to use the service
* When publishing, specify a **unique name for your device** 
    * We used `'temperature-simulator-deitel-python'` 
* On first call to **`dweepy`’s `dweet_for` function** to send a dweet, `dweet.io` **creates the device name**
    * Function receives **device name** and a **dictionary** representing the **message to send**
    * Sends dictionary in **JSON** format

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Script That Sends Dweets 
 
```python
# simulator.py
"""A connected thermostat simulator that publishes JSON
messages to dweet.io"""
import dweepy
import sys
import time
import random

MIN_CELSIUS_TEMP = -25  
MAX_CELSIUS_TEMP = 45 
MAX_TEMP_CHANGE = 2

# get the number of messages to simulate and delay between them
NUMBER_OF_MESSAGES = int(sys.argv[1]) 
MESSAGE_DELAY = int(sys.argv[2])

dweeter = 'temperature-simulator-deitel-python'  # provide a unique name
thermostat = {'Location': 'Boston, MA, USA',
              'Temperature': 20, 
              'LowTempWarning': False,
              'HighTempWarning': False}

```

	
```python
print('Temperature simulator starting')

for message in range(NUMBER_OF_MESSAGES):
    # generate a random number in the range -MAX_TEMP_CHANGE 
    # through MAX_TEMP_CHANGE and add it to the current temperature
    thermostat['Temperature'] += random.randrange(
        -MAX_TEMP_CHANGE, MAX_TEMP_CHANGE + 1)
    
    # ensure that the temperature stays within range
    if thermostat['Temperature'] < MIN_CELSIUS_TEMP:
        thermostat['Temperature'] = MIN_CELSIUS_TEMP
    
    if thermostat['Temperature'] > MAX_CELSIUS_TEMP:
        thermostat['Temperature'] = MAX_CELSIUS_TEMP
    
    # check for low temperature warning
    if thermostat['Temperature'] < 3:
        thermostat['LowTempWarning'] = True
    else:
        thermostat['LowTempWarning'] = False

    # check for high temperature warning
    if thermostat['Temperature'] > 35:
        thermostat['HighTempWarning'] = True
    else:
        thermostat['HighTempWarning'] = False

    # send the dweet to dweet.io via dweepy
    print(f'Messages sent: {message + 1}\r', end='')
    dweepy.dweet_for(dweeter, thermostat)
    time.sleep(MESSAGE_DELAY)

print('Temperature simulator finished')
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.4 Creating the Dashboard with Freeboard.io 
**Lecture Note: Show our dashboard** 
> https://freeboard.io/board/ILyDgM

Dashboard contains:
* A **Gauge** widget showing the **current temperature**
* A **Text** widget to show the  **current temperature** in **Fahrenheit** 
* Two **Indicator Light** widgets for low and high temperature warnings


<!--
<img src="./ch16images/thermostat1.png" alt="Streaming temperature sensor visualization" width=150 style="float:left; padding:10px"/> 
<img src="./ch16images/thermostat2.png" alt="Streaming temperature sensor visualization" width=150 style="float:left; padding:10px"/>

<img src="./ch16images/thermostat3.png" alt="Streaming temperature sensor visualization" width=150 style="float:left; padding:10px"/>
-->

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.5 Creating a Python PubNub Subscriber (1 of 2)
* **`pubnub` module** for performing **pub/sub operations**
* [**Sample streams** for you to experiment with—four real-time streams and **three simulated streams**](https://www.pubnub.com/demos/real-time-data-streaming/?show=demo)
	* **Twitter** live feed
	* **Wikipedia Changes**
	* **Game State Sync**—**Simulated**: multiplayer game data
	* **Sensor Network**—**Simulated sensor data**: radiation, humidity, temperature and ambient light
	* **Market Orders**: **Simulated stock orders** for five fake companies

<hr style="height:2px; border:none; color:#000; background-color:#000;">

## 16.8.5 Creating a Python PubNub Subscriber (2 of 2)
* We'll **subscribe** to their **Market Orders stream**, then **visualize** changing stock prices in a **barplot**
    * Also can [publish messages to streams](https://www.pubnub.com/docs/python/pubnub-python-sdk)
* `pip install "pubnub>=4.1.2"`
* **`stocklistener.py`** subscribes to the stream and visualizes the stock prices

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Message Format
* **Simulated Market Orders** stream returns **JSON objects** containing five key–value pairs with the keys **`'bid_price'`**, **`'order_quantity'`**, **`'symbol'`**, **`'timestamp'`** and **`'trade_type'`**
    * We’ll use `'bid_price'` and `'symbol'`
* **PubNub client** returns **JSON objects** as Python **dictionaries**

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Importing the Libraries
```python
# stocklistener.py
"""Visualizing a PubNub live stream."""
from matplotlib import animation
import matplotlib.pyplot as plt
import pandas as pd
import random 
import seaborn as sns
import sys

from pubnub.callbacks import SubscribeCallback
from pubnub.enums import PNStatusCategory
from pubnub.pnconfiguration import PNConfiguration
from pubnub.pubnub import PubNub

```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Initialize List and DataFrame Used for Storing Company Names and Prices
* Pandas **`DataFrame` `companies_df`** stores each company’s **last price**
 
```python
companies = ['Apple', 'Bespin Gas', 'Elerium', 'Google', 'Linen Cloth']

# DataFrame to store last stock prices 
companies_df = pd.DataFrame(
    {'company': companies, 'price' : [0, 0, 0, 0, 0]})
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Class `SensorSubscriberCallback`
* A PubNub stream **listener** receives **status notifications** and **messages from the channel**
* Subclass of **`SubscribeCallback`** (module `pubnub.callbacks`)
* **Overridden method `status`** &mdash; called by PubNub client each time a **status notification arrives**
    * We check for **subscribed to** or **unsubscribed from** a channel messages
* **Overridden method `message`** &mdash; called when a **message arrives from the channel**
    * Stores new stock price

```python
class SensorSubscriberCallback(SubscribeCallback):
    """SensorSubscriberCallback receives messages from PubNub."""
    def __init__(self, df, limit=1000):
        """Create instance variables for tracking number of tweets."""
        self.df = df  # DataFrame to store last stock prices
        self.order_count = 0
        self.MAX_ORDERS = limit  # 1000 by default
        super().__init__()  # call superclass's init
```

```python
    def status(self, pubnub, status):
        if status.category == PNStatusCategory.PNConnectedCategory:
            print('Connected to PubNub')
        elif status.category == PNStatusCategory.PNAcknowledgmentCategory:
            print('Disconnected from PubNub')
```

```python
    def message(self, pubnub, message):
        symbol = message.message['symbol']
        bid_price = message.message['bid_price']
        print(symbol, bid_price)
        self.df.at[companies.index(symbol), 'price'] = bid_price
        self.order_count += 1
        
        # if MAX_ORDERS is reached, unsubscribe from PubNub channel
        if self.order_count == self.MAX_ORDERS:
            pubnub.unsubscribe_all()
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Function `update` Visualizes the Stock Prices 

```python
def update(frame_number):
    """Configures bar plot contents for each animation frame."""
    plt.cla()  # clear old barplot
    axes = sns.barplot(
        data=companies_df, x='company', y='price', palette='cool')
    axes.set(xlabel='Company', ylabel='Price')  
    plt.tight_layout()
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Configuring the Application

```python
if __name__ == '__main__':
    sns.set_style('whitegrid')  # white background with gray grid lines
    figure = plt.figure('Stock Prices')  # Figure for animation
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Configuring the PubNub Client
* Specify the **PubNub subscription key**
    * **Used with the channel name** to **subscribe to the channel**
* The `SensorSubscriberCallback` object is passed to the **`PubNub` client’s `add_listener` method** to register it to **receive messages from the channel**

```python
    # set up pubnub-market-orders sensor stream key
    config = PNConfiguration()
    config.subscribe_key = 'sub-c-4377ab04-f100-11e3-bffd-02ee2ddab7fe'
    config.uuid = 'UUID_DeitelHeartbeatUnitTest' # new requirement in SDK 6.x

    # create PubNub client and register a SubscribeCallback
    pubnub = PubNub(config) 
    pubnub.add_listener(
        SensorSubscriberCallback(df=companies_df, 
            limit=int(sys.argv[1] if len(sys.argv) > 1 else 1000))
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Subscribing to the Channel
* **Completes subscription process** by indicating that we wish to **receive messages** from **channel `'pubnub-market-orders'`**
* **`execute()`** tells client to **begin listening** for messages

```python
    # subscribe to pubnub-sensor-network channel and begin streaming
    pubnub.subscribe().channels('pubnub-market-orders').execute()
```

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Configuring the FuncAnimation and Displaying the Window
* **Matplotlib’s `show` method** normally **blocks** a script from continuing until you close the `Figure`
    * **`block=False`** allows execution to continue 
    * we'll **configure Pubnub client** next
* For **detailed intro to Matplotlib `FuncAnimation`**, see my [**Python Fundamentals LiveLessons videos**](https://learning.oreilly.com/videos/python-fundamentals/9780135917411/9780135917411-PFLL_Lesson06_17) (two videos) and in [**Python for Programmers, Section 6.4**](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/ch06.xhtml#ch06lev1sec4)

```python
    # configure and start animation that calls function update
    stock_animation = animation.FuncAnimation(
        figure, update, repeat=False, interval=33)
    plt.show()  # keeps graph on screen until you dismiss its window
```

In [None]:
%matplotlib widget

In [None]:
# stocklistener.py
"""Visualizing a PubNub live stream."""
from matplotlib import animation
import matplotlib.pyplot as plt
import pandas as pd
import random 
import seaborn as sns
import sys
import uuid

from pubnub.callbacks import SubscribeCallback
from pubnub.enums import PNStatusCategory
from pubnub.pnconfiguration import PNConfiguration
from pubnub.pubnub import PubNub

companies = ['Apple', 'Bespin Gas', 'Elerium', 'Google', 'Linen Cloth']

# DataFrame to store last stock prices 
companies_df = pd.DataFrame(
    {'company': companies, 'price' : [0, 0, 0, 0, 0]})
 
class SensorSubscriberCallback(SubscribeCallback):
    """SensorSubscriberCallback receives messages from PubNub."""
    def __init__(self, df, limit=1000):
        """Create instance variables for tracking number of tweets."""
        self.df = df  # DataFrame to store last stock prices
        self.order_count = 0
        self.MAX_ORDERS = limit  # 1000 by default
        super().__init__()  # call superclass's init

    def status(self, pubnub, status):
        if status.category == PNStatusCategory.PNConnectedCategory:
            print('Subscribed')
        elif status.category == PNStatusCategory.PNAcknowledgmentCategory:
            print('Unsubscribed')
 
    def message(self, pubnub, message):
        symbol = message.message['symbol']
        bid_price = message.message['bid_price']
        print(symbol, bid_price)
        self.df.at[companies.index(symbol), 'price'] = bid_price
        self.order_count += 1
        
        # if MAX_ORDERS is reached, unsubscribe from PubNub channel
        if self.order_count == self.MAX_ORDERS:
            pubnub.unsubscribe_all()
            
def update(frame_number):
    """Configures bar plot contents for each animation frame."""
    plt.cla()  # clear old barplot
    axes = sns.barplot(
        data=companies_df, x='company', y='price', palette='cool') 
    axes.set(xlabel='Company', ylabel='Price')  

#if __name__ == '__main__':
sns.set_style('whitegrid')  # white background with gray grid lines
figure = plt.figure('Stock Prices')  # Figure for animation

# set up pubnub-market-orders sensor stream key
config = PNConfiguration()
config.subscribe_key = 'sub-c-99084bc5-1844-4e1c-82ca-a01b18166ca8'
config.uuid = 'UUID_DeitelHeartbeatUnitTest' # new requirement in SDK 6.x

# create PubNub client and register a SubscribeCallback
pubnub = PubNub(config) 
pubnub.add_listener(
    SensorSubscriberCallback(df=companies_df, 
        limit=1000)) #int(sys.argv[1] if len(sys.argv) > 1 else 1000)))

# subscribe to pubnub-sensor-network channel and begin streaming
pubnub.subscribe().channels('pubnub-market-orders').execute()

# configure and start animation that calls function update
stock_animation = animation.FuncAnimation(
    figure, update, frames=1000, repeat=False, interval=33)
plt.tight_layout()
plt.show()  # keeps graph on screen until you dismiss its window


#**************************************************************************
#* (C) Copyright 1992-2018 by Deitel & Associates, Inc. and               *
#* Pearson Education, Inc. All Rights Reserved.                           *
#*                                                                        *
#* DISCLAIMER: The authors and publisher of this book have used their     *
#* best efforts in preparing the book. These efforts include the          *
#* development, research, and testing of the theories and programs        *
#* to determine their effectiveness. The authors and publisher make       *
#* no warranty of any kind, expressed or implied, with regard to these    *
#* programs or to the documentation contained in these books. The authors *
#* and publisher shall not be liable in any event for incidental or       *
#* consequential damages in connection with, or arising out of, the       *
#* furnishing, performance, or use of these programs.                     *
#**************************************************************************    
    
    



<hr style="height:2px; border:none; color:#000; background-color:#000;">

# Resources
### Many Free Big-Data Sources
* Articles and sites with links to **hundreds of free big data sources**

| Big-data sources |
| :--- |
| [**“The Best Tools for Using Twitter as a Data Source”**](https://www.methodspace.com/best-tools-twitter-data-source) |
| [**“How to Use Wikipedia as a Data Source”**](https://www.thedataschool.co.uk/jeremy-kneebone/use-wikipedia-data-source/) |
| [**“Awesome-Public-Datasets”**](https://github.com/caesar0301/awesome-public-datasets) |
| [**“AWS Public Datasets”**](https://aws.amazon.com/public-datasets/) |
| [**“Big Data And AI: 30 Amazing (And Free) Public Data Sources For 2018,”** by B. Marr](https://www.forbes.com/sites/bernardmarr/2018/02/26/big-data-and-ai-30-amazing-and-free-public-data-sources-for-2018/) |
| [**“Datasets for Data Mining and Data Science”**](http://www.kdnuggets.com/datasets/index.html) |
| [**“Exploring Open Data Sets”**](https://datascience.berkeley.edu/open-data-sets/) |
| [**“Free Big Data Sources”**](http://datamics.com/free-big-data-sources/) |
| [**_Hadoop Illuminated_, Chapter 16. Publicly Available Big Data Sets**](http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html) |
| [**“List of Public Data Sources Fit for Machine Learning”**](https://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/) |
| [**“Open Data,”** Wikipedia](https://en.wikipedia.org/wiki/Open_data) |
| [**“Open Data 500 Companies**”](http://www.opendata500.com/us/list/) |
| [**“Other Interesting Resources/Big Data and Analytics Educational Resources and Research,”**, Bernard Marr](http://computing.derby.ac.uk/bigdatares/?page_id=223) |
| [**“6 Amazing Sources of Practice Data Sets”**](https://www.jigsawacademy.com/6-amazing-sources-of-practice-data-sets/) |
| [**“20 Big Data Repositories You Should Check Out”** M. Krivanek](http://www.datasciencecentral.com/profiles/blogs/20-free-big-data-sources-everyone-should-check-out) |
| [**“70+ Websites to Get Large Data Repositories for Free”**](http://bigdata-madesimple.com/70-websites-to-get-large-data-repositories-for-free/) |
| [**“Ten Sources of Free Big Data on Internet,”** A. Brown](https://www.linkedin.com/pulse/ten-sources-free-big-data-internet-alan-brown) |
| [**“Top 20 Open Data Sources”**](https://www.linkedin.com/pulse/top-20-open-data-sources-zygimantas-jacikevicius) |
| [**“We’re Setting Data, Code and APIs Free,”** NASA](https://open.nasa.gov/open-data/) |
| [**“Where Can I Find Large Datasets Open to the Public?”** Quora](https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public) |

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Kaggle Competition Site 
* **No obvious optimal solutions** for many **machine learning** and **deep learning tasks**
* People’s **creativity is really the only limit**
* Companies and organizations **fund competitions** 
    * Encourage people worldwide to **develop better-performing solutions** for something that’s important to their business or organization
* Some companies offer **prize money** &mdash; Netflix once offered **\$1,000,000** 
    * [Netflix wanted to get a 10% or better improvement in their model for determining whether people will like a movie, based on how they rated previous ones](https://netflixprize.com/rules.html)
    * Used to help make better recommendations to members
* Even if you do not win, **Kaggle** is a great way to get experience working on problems of current interest

<hr style="height:2px; border:none; color:#000; background-color:#000;">

# More Info 
* See Lesson 16 in [**Python Fundamentals LiveLessons** here on O'Reilly Online Learning](https://learning.oreilly.com/videos/python-fundamentals/9780135917411)
* See Chapter 16 in [**Python for Programmers** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/)
* See Chapter 17 in [**Intro Python for Computer Science and Data Science** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/intro-to-python/9780135404799/)
* Interested in a print book? Check out:

| Python for Programmers<br>(640-page professional book) | Intro to Python for Computer<br>Science and Data Science<br>(880-page college textbook)
| :------ | :------
| <a href="https://amzn.to/2VvdnxE"><img alt="Python for Programmers cover" src="../images/PyFPCover.png" width="150" border="1"/></a> | <a href="https://amzn.to/2LiDCmt"><img alt="Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud" src="../images/IntroToPythonCover.png" width="159" border="1"></a>

>Please **do not** purchase both books&mdash;_Python for Programmers_ is a subset of _Intro to Python for Computer Science and Data Science_

&copy; 1992-2024 by Pearson Education, Inc. All Rights Reserved. The content in this notebook is based on the book [**Python for Programmers**](https://amzn.to/2VvdnxE).