## Exercise 04.01: Interpretable Machine Learning & Data Mining


Part 1: Learn about Interpretable Decision Sets, and the pyIDS implementation
* pyIDS: https://github.com/jirifilip/pyIDS
* Further details about implementation: http://ceur-ws.org/Vol-2438/paper8.pdf
* Original paper: https://cs.stanford.edu/people/jure/pubs/interpretable-kdd16.pdf

Part 2: Learn about subgroup discovery - the pysubgroup implementation, respectively
* pysubgroup: https://github.com/flemmerich/pysubgroup
* Further details about implementation: https://link.springer.com/chapter/10.1007/978-3-030-10997-4_46

Part 3: Apply pyIDS and pysubgroup
* Use the heart disease dataset: https://www.kaggle.com/ronitf/heart-disease-uci
* Apply preprocessing on the data as needed
* Apply pyIDS, and generate a model regarding the class of the heart disease dataset.Visualize the model/print the "final rules"
* Apply pysubgroup (with a suitable quality function, e.g., the ChiSquaredQF); for subgroup discovery, you might need to discretize numeric attributes; also, you might need a minimal support threshold (e.g., 5% of the instances); result is a list of the top-k (k=10, k=20) subgroups
* Compare the results of pyIDS and pysubgroup. Which similarities and differences do you observe? (Write a short text about this, max half a Din A4 page)

In [1]:
import numpy as np
import pandas as pd

# ... add here

## Exercise 04.02: Reading/Discussion/Summary

Part 1: Reading:
* Read the following paper: Zanin et al. (2016) "Combining complex networks and data mining: why and how"
* The paper is available here: https://www.sciencedirect.com/science/article/pii/S037015731630062X   <br>
  (It is also available in the "files/exercises" course folder)

Part 2: Think about the following questions:
* What are Complex Networks?
* Why are they useful, in general?
* What are specific challenges in their application?
* What is their relationship to Data Mining, and how can Complex Networks and Data Mining be connected?
* How do exemplary classification approaches work?
* What are some further exemplary techniques to apply?

Part 3: Discussing, Summary
* Prepare answers for these questions for the practical session on November 30, 2021. You will first discuss these in groups, and then we will discuss them in the plenary meeting.
* After that, summarize your findings (and those of the group discussion) in a small report (max. half a Din A4 page). For example, you could write 2-3 sentences for answering a specific question.

# Report 


## What are Complex Networks?

Complex Networks are Network representations of real world systems, which can encompass further metrics, topologies and even further layers, types of links or other additional features. Their goal is to represent systems in a way that makes it easier to analyse understand these systems. 

## Why are they useful, in general?

They are useful because the complexity of some systems can be too great for them to be easily analysed with data mining algorithms or represented by simpler models. The additional features and structure elements of complex systems therefore help with analysing certain systems or making them more understandable to humans. 

## What are specific challenges in their application?

The challenges of applying complex networks include finding relevant features/data, in order to focus on the important parts of systems and reducing the necessary computational power of the analysis. Another challenge is to find the right metrics and to construct the network in such a way, that the system is properly represented. This is even more relevant for functional networks, where the functions have to be choosen correctly aswell. 

## What is their relationship to Data Mining, and how can Complex Networks and Data Mining be connected?

Both data mining and complex networks share similar goals of making systems more understandable and being useful for anlaysing them. Complex Networks are more useful to represent structures and relations between certain elements, whereas data mining algorithms focus more on finding patterns and showing relations between certain features. Complex networks can be used for feature selection for data mining algorithms and there are other useful ways in which they can be combined.

## How do exemplary classification approaches work?

One approach is multiple kernel learning, which shows which parts of a network are most relevant for a certain classification. Another approach is to build a complex network based on an instance and use normal data mining algorithms like SVM to classify the created complex network. 

## What are some further exemplary techniques to apply?

Further techniques are for example link prediction, where Data Mining Algorithms are used to predict which links exist in complex networks, also taking into account their topology. Another technique is to represent features in a Complex Network, which then can be used to find the most important features, which solves the problem of feature selection for Data Mining Algorithms. A further technique is to transform big and heterogenous amounts of data into a complex network structure, this simplifies the data and makes them more homogenous and therefore makes it possible to use them for Data Mining more efficiently. 


## Uploading your solution
For uploading your solution, please upload two files:
* The Jupyter-Notebook file (.ipynb)
* A PDF (printout/file) of the Jupyter notebook file (.pdf)
* IMPORTANT: Please add your name (Example: MartinAtzmueller), as a suffix to the file names, e.g.:<br>
  KBS-Assigment4_MartinAtzmueller.ipynb, KBS-Assignment4_MartinAtzmueller.pdf