In [None]:
# To run this notebook as a reveal.js presentation, run the following command in notebook's folder:
# `jupyter-nbconvert --to slides 1-python-in-one-hour-or-so.ipynb --reveal-prefix=reveal.js --post serve`

# EMERGING TECHNOLOGIES
# CHALLENGES & OPPORTUNITIES

# CHAPTER 1 ► GARTNER'S HYPE CYCLE 2017

Reference: https://goo.gl/bstq8e

![img/hype-cycle-gartner.png](img/hype-cycle-gartner.png)

## Key takeaways:

* **Heavy R&D spending from Amazon, Apple, Baidu, Google, IBM, Microsoft, and Facebook is fueling a race for Deep Learning and Machine Learning patents today and will accelerate in the future**

* **Artificial General Intelligence is going to become pervasive during the next decade, becoming the foundation of AI as a Service**

# CHAPTER 2 ► WHAT IS BIG DATA?

## Everybody has their own opinion ! So will give mine!

## The more or less conventional definition, the 5 Vs:

* **VOLUME**

* **VELOCITY**

* **VARIETY**

* **VERACITY**

* **VALUE**

Some add **Variability** and **Visualization**... Why not?

## BIG is a very relative notion!

![img/big-is-relative.png](img/big-is-relative.png)
Source: https://www.wired.com/2011/01/jumbo-shrimps-why-mega-mammals-still-looked-puny-next-to-the-biggest-dinosaurs/

## What I can not handle with my available toolbox is Big Data! `[Personal view]` 

* **Have you tried to open a `csv` file containing 10 million rows in Excel?**

* **Have you tried to visualize 72 million measurements on Google Earth?**

## The example of SAFECAST DATA VISUALIZATION 

![img/safecast-web.png](img/safecast-web.png)
https://blog.safecast.org/

## Mapping 72 million measurements at once

* **ATTEMPT 1 ► FAILURE! - Not enough RAM and not appropriate tool ► THIS IS BIG DATA FOR ME!**

* **ATTEMPT 2 ► SUCCESS! - Bought 16 Gb of RAM and used recent https://datashader.readthedocs.io Python package**

## With the right toolbox, takes 3s to render 72 million of points [MacBook Pro with 16 Gb RAM]

```python
def draw_map(df, plot_width, plot_height, colors, agg_func, interp, background_col):
    cvs = ds.Canvas(plot_width=plot_width, plot_height=plot_height)
    agg = cvs.points(df, 'lon', 'lat',  agg_func('value'))
    img = tf.shade(agg, cmap=colors, how=interp)
    return tf.set_background(img, color=background_col)

img = draw_map(df, plot_width, plot_height, inferno, ds.count, 'log', 'black')
```

![img/safecast-map.png](img/safecast-map.png)

# CHAPTER 3 ► BIG DATA INFRASTRUCTURE & FRAMEWORKS

## Batch vs. stream vs hybrid processing frameworks

* **Batch-only processing: Apache Hadoop**

* **Stream-only frameworks: Apache Storm, Apache Samza**

* **Hybrid frameworks: Apache Spark, Apache Flink**

Reference: https://www.digitalocean.com/community/tutorials/hadoop-storm-samza-spark-and-flink-big-data-frameworks-compared

Note: There is a Master in HPC at ICTP: http://www.mhpc.it/

## Cloud computing in Africa - ISOC's update?

## Google cloud regions

![img/google-cloud-regions.png](img/google-cloud-regions.png)

## Amazon Web Services cloud regions
![img/amazon-regions.png](img/amazon-regions.png)

## Microsoft Azure regions

![img/microsoft-azure.png](img/microsoft-azure.png)

## ... 

## Is Africa a forgotten continent?

## If that's the case, might change very soon!

## What about Cloud computing and Iot?

**Every cloud computing platforms targets the market and propose a dedicated offer** - for instance: https://goo.gl/zAR3KT

# CHAPTER 4 ► WHEN AND WHY DO WE NEED SUCH VOLUME OF DATA?

## The drill-down & roll-up strategy

* **In many situations, is the relevant approach**

* **No need to look at all your data at once**

* **The example of decision making during nuclear emergencies**


## BUT, current tech. trends change the rules ...

## The case of Deep Learning and Artificial Neural Networks (ANN)

## A bit of history

* **Idea goes all the way back to the 40's**

* **Kohonen self-organizing map in the 70's**

* **In 1986 Rumelhart, Hinton and Williams experiments with hidden-layer**

* ** During the 2000s it fell out of favour**

* **but returned in the 2010s, benefitting from cheap, powerful GPU-based computing systems.**

* **In 2014, backpropagation was used to train a deep neural network for speech recognition.**

## Refs: 

* **https://en.wikipedia.org/wiki/Backpropagation**
* **https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html**

## Both computation technique and big data have changed the game

## Why did GPU change the game?

* **GPUs are all about matrix operations, linear algebra and parallelizing**

* **Perfectly suits ANNs computation needs**

## Why do Linear Algebra and Vectorization matter?

In [9]:
import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a)

[1 2 3 4 5]


In [13]:
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a, b)
toc = time.time()

print("Vectorized version: " + str(1000*(toc - tic)) + "ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
    
toc = time.time()

print("For loop: " + str(1000*(toc - tic)) + "ms")

Vectorized version: 1.3108253479003906ms
For loop: 644.2921161651611ms


## Technological stack: an example

## For 0.9 USD/hour you can get for instance:
 
* **4 CPU cores**
* **64 GB of RAM**
* **1 NVIDIA GPU**
* **Ubuntu + CudaNN + Python + ...**

## Will cover all you need unless you want to compete with Google or Baidu

# Use cases of IoT | big data | Deep learning

## Actually only limited by our imagination ...

## Some examples:

* **Land slide: ref...**


* **How a Japanese cucumber farmer is using deep learning and TensorFlow:**
    * https://goo.gl/YPZL52


* **Traffic light control:**
    * https://en.wikipedia.org/wiki/Traffic_light_control_and_coordination
    * Write `traffic light control deep learning` in your search engine

# Want to harness IoT and IA?