**HNet** stands for ***graphical Hypergeometric Networks***, which is a method where associations across variables are tested for significance by statistical inference.

Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.

The **aim** is to determine a network with significant associations that can shed light on the complex relationships across variables. Input datasets can range from generic dataframes to nested data structures with lists, missing values and enumerations.

* [API Documentation](https://erdogant.github.io/hnet/)
* [Article]( https://arxiv.org/abs/2005.04679)
* [Github]( https://github.com/erdogant/hnet)


In [None]:
!pip install hnet

In [None]:
import pandas as pd
import numpy as np
from hnet import hnet

# Import example dataset

There are various options that can be downloaded using hnet.
* 'sprinkler'
* 'titanic'
* 'student'
* 'fifa'
* 'cancer'
* 'waterpump'
* 'retail'

In [None]:
hn = hnet()
df = hn.import_example(data='titanic')

# Removing variables for which I know that will not contribute in the model.
del df['PassengerId']
del df['Name']
df.head()

In [None]:
# Initialize model with default parameters
hn = hnet()

===============================================================================

## Import data from url

***HNet*** allows direct downloads from the internet using a url-link. As an example, the [UCI](https://archive.ics.uci.edu/ml/) website is a huge *machine learning data repository*. Lets automatically download and import a datset from UCI website. Note that **not** all datasets can be used. The data needs to be in a csv format (not json), and the datasets needs to have at least 1 categorical variable.

In [None]:
# Import dataset from website
url='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'

df = hn.import_example(url=url)
# Add column names
df.columns=['age','workclass','fnlwgt','education','education-num','marital-status','occupation','relationship','race','sex','capital-gain','capital-loss','hours-per-week','native-country','earnings']
# Set the following columns as floating type
cols_as_float = ['age','hours-per-week','capital-loss','capital-gain']
df[cols_as_float]=df[cols_as_float].astype(float)


In [None]:
# Lets examine the dataset wether the columns are set correctly and the variable name matches the items.
df.head()

In [None]:
# Initialize model with variable fnlwgt in the black-list. This means that it is not included in the modelling. 
hn = hnet(black_list=['fnlwgt'])

===============================================================================

### Association learning using **HNet**

In [None]:
# Learn its associations
results = hn.association_learning(df)

In [None]:
# All the results are stored in results
print(results.keys())

# The results are accesable by using the keys.

#results['labx']

#results['counts']

#results['simmatLogP']

#results['dtypes']

## Plotting 

There are many possibilities regarding plotting the results. 
There are ***static*** plots, and ***dynamic*** plots. In case of using colab, the dynamic plots will not work as it requires writing d3-javascript files to disk. The following functions are available for plotting:

#### Network-graph
* hn.plot()
* hn.d3graph()

#### Heatmap
* hn.heatmap()
* hn.d3heatmap()


In [None]:
# Lets plot a static network

# If the network looks like a big hairball, try to play with some of the following parameters:
ax = hn.plot(scale=1, dpi=100, figsize=(15,15))
# ax = hn.plot(scale=10, dist_between_nodes=2, figsize=(20,20), dpi=400)

In [None]:
# Make the network plot interactive with d3-javascript
out = hn.d3graph()

# Download files and open locally it does not open automatically:
print(out['path'])

In [None]:
# Plot the heatmap

# Plot the heatmap without ordering:
ax = hn.heatmap(figsize=(10,10))

# Cluster the heatmap:
ax = hn.heatmap(cluster=True, figsize=(10,10))


In [None]:
# Plot the heatmap in d3-javascript

# Plot the heatmap without ordering:
ax = hn.d3heatmap()


In [86]:
# Save the resuls
savepath=hn.save(overwrite=True)

[pypickle] Pickle file saved: [hnet_model.pkl]
[hnet] >Saving.. True


In [83]:
dir(hn)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_check_results',
 'alpha',
 'association_learning',
 'black_list',
 'combined_rules',
 'compute_associations',
 'd3graph',
 'd3heatmap',
 'dropna',
 'dtypes',
 'excl_background',
 'fillna',
 'heatmap',
 'import_example',
 'k',
 'load',
 'multtest',
 'perc_min_num',
 'plot',
 'prepocessing',
 'results',
 'save',
 'specificity',
 'white_list',
 'y_min']