Big Data Computing

Big Data phenomenon

Technological progress
- storage capacity
- communication bandwidth
- computing power
- Reduction of ICT costs
Digital Universe
- Integration of digital technologies in every human activity
- Scientific research (produces a lot of data)
- Exponential growth of data
Data can be either structured (database records) or unstructured (textual data)

Application Domains

The analysis of large datasets arises in:
- Retailing: product improvement, recommandation systems
- Banking/Finance: fraud detection...
- Telecommunications: user profiling
- Science: validation methods
- Medicine: diagnosis/therapy
- Social studies: IOT

The Four V's of DATA

Volume
- size of data poses several computational challenges and requires a data-centric perspective
Velocity
- the data arrives at such high rate that tey cannot be stored and processed offline, but need to be processed in streaming
Variety
- large datasets often come unconstructed and may relate to very different scenarios
Veracity
- large datasets coming form real-word applications are likely to contain noisy, uncerain data

All points above require a paradigm shift with respect to traditional computing

Course presentation

Main objectives

Novel computing/programming frameworks for big data processing: theory and practice
- Spark
A sample of key primitives for data analysis
- Rigorous setting (be able to analitically predict what's going to happen)
- Algorithmic solutions with focus on large inputs

Specific Content

Computational Frameworks: MapReduce, Apache Spark
Clustering primitives (Professor's focus)
Graph analysis primitives
Association analysis primitives (Data mining)
Data stream processing

Evaluation

Written exam (26 points)
Homeworks (6+1 points)
- groups of max 3/4 sudents
- 4 assignments, one every 2/3 weeks
- Use of Apache Spark on individual PCs (assignments 1-3) and CloudVeneto (assignment 4)

Online tools

Moodle: forum, evaluation of homeworks and of written exams
Uniweb: written exam lists, official final grades
Course website: http://www.dei.unipd.it/~capri/BDC/

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
0_Spark		0_Spark
1_MapReduce		1_MapReduce
2_Clustering		2_Clustering
3_Association_Analysis		3_Association_Analysis
4_Graphs		4_Graphs
Appelli		Appelli
Homework1		Homework1
Homework2		Homework2
Homework3		Homework3
Homework4		Homework4
immagini		immagini
.gitignore		.gitignore
PythonInstruction.html		PythonInstruction.html
PythonInstructions.md		PythonInstructions.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Computing

Big Data phenomenon

Application Domains

The Four V's of DATA

Course presentation

Main objectives

Specific Content

Evaluation

Online tools

About

Releases

Packages

Languages

mac40/BDC

Folders and files

Latest commit

History

Repository files navigation

Big Data Computing

Big Data phenomenon

Application Domains

The Four V's of DATA

Course presentation

Main objectives

Specific Content

Evaluation

Online tools

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages