# Technologies for advanced programming
[Salvo Nicotra https://about.me/snicotra](https://about.me/snicotra)

# Syllabus

General knowledge of technologies useful to build end-to-end solutions to analyse, manage, store, process and present data acquired in real time.

Using simplified and cross infrastructure software deployment systems (containers) and microservices orchestration tool (kubernetes), the course will present "on-the-edge" technologies used for data ingestion, pipelines, big data processing and visualization

Using an agile and multidisciplinary approach: topics, technologies and enviroments discussed will be applied to real case examples 

# Demo

# Topics

## Big Data

Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze

[McKinsey](https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation)

![](../images/Big_Data.png)

<center>
<iframe width="560" height="315" data-src="https://www.youtube.com/embed/8pHzROP1D-w?start=113"></iframe>
</center>

## Digital Marketing

Digital marketing is a form of direct marketing which links consumers with sellers electronically using interactive technologies like emails, websites, online forums and newsgroups, interactive television, mobile communications etcetera (Kotler and Armstrong, 2009)

![](../images/digital-marketing.jpg)

<center>
  <iframe width="560" height="315" data-src="https://www.youtube.com/embed/8WVoJ6JNLO8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>  
</center>

## Social Impact of Big Data

big data applications such as lifestyle, disaster relief, energy and sustainability, critical infrastructure, and so forth that indicate promise for making a societal impact through the use of analytics.
[Big Data & Analytics for Societal Impact](https://link.springer.com/article/10.1007/s10796-018-9846-7)

![](../images/social-impact.jpg)

<center>
   <iframe width="560" height="315" data-src="https://www.youtube.com/embed/_2u_eHHzRto" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> 
</center>

## Literate Programming

Literate programming: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do. 
> Donald Knuth (1984)

![](../images/knuth-vs-mcilroy.png)

## Microservices

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

![](../images/microservices.png)

In [None]:
https://microservices-demo.github.io/

## Stream Processing

The definition of stream processing is exactly opposite of my definition of batch processing. In stream processing, you do not collect your data to reach certain quorum or timeout before you trigger your process. As soon as the data event is received, the program processes it, and creates the output. It’s event processing. So “real-time” word is somewhat redundant. Yet, a lot of systems do use “real-time” to describe them as low latency systems. Of course nobody can guarantee that the actual processing will be low latency. It’s function of what the process is trying to do (application logic). e.g. an application logic could be a delay loop where each event that it received is output exactly after 1 minute. No platform in the world can guarantee that 1 minute to be smaller than 60 seconds. The low latency refers to the overhead that the system adds to the application outside of the processing the application does.
https://www.datatorrent.com/blog/real-time-event-stream-processing-what-are-your-choices/

![](../images/streams.png)

## Machine Learning

> Machine learning is the science (and art) of programming computers so they can learn from data

Aurélien Géron in Hands-on Machine Learning with Scikit-Learn and TensorFlow.

![](../images/machine-learning.jpg)

## Cloud Computing

> Cloud computing is a style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using internet technologies.

Gartner Glossary

![](../images/cloudcomputing.jpg)

# Technologies

## Containers

> A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

![](../images/docker.png)

## Microservices Orchestration

> Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

![](../images/kubernetes.png)

## Data Ingestion

> Data Ingestion means taking data coming from multiple sources and putting it somewhere it can be accessed. Data Ingestion is the beginning of Data Pipeline where it obtains or import data for immediate use

> Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

![](../images/flume.png)

## Data Streaming

> Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

![](../images/kafka.jpg)

## Data Processing

> Apache Spark™ is a unified analytics engine for large-scale data processing.

![](../images/spark-logo.png)

## Data Indexing

> Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

![](../images/elastic-search.jpg)

## Data Visualization

> Power BI is a business analytics service by Microsoft. It aims to provide interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards

![](../images/powerbi.png)

## Notebooks

> Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.

![](../images/jupyter.png)

# Course detail

Lesson Tue/Thu 15-18 

Contacts

Office hours: TBD

Email: salvatore.nicotra1@unict.it

Telegram: (https://t.me/joinchat/DPCishPgqXBxeWUbrUKyCg)[@tap]

[Git Hub Repository](https://github.com/salvo-nicotra/tap)