<!-- Centered logo -->
<p align="center">
  <img src="https://github.com/erdogant/hnet/blob/master/docs/figs/logo.png?raw=true" width="400" />
</p>

<!-- Badges row (centered) -->
<p align="center">
  <a href="https://img.shields.io/pypi/pyversions/hnet"><img src="https://img.shields.io/pypi/pyversions/hnet" /></a>
  <a href="https://pypi.org/project/hnet/"><img src="https://img.shields.io/pypi/v/bnlearn" /></a>
  <img src="https://img.shields.io/github/stars/erdogant/hnet" />
  <a href="https://github.com/erdogant/hnet/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" /></a>
  <a href="https://github.com/erdogant/hnet/network"><img src="https://img.shields.io/github/forks/erdogant/hnet.svg" /></a>
  <a href="https://github.com/erdogant/hnet/issues"><img src="https://img.shields.io/github/issues/erdogant/hnet.svg" /></a>
  <a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a>
  <a href="https://pepy.tech/project/hnet/"><img src="https://pepy.tech/badge/hnet/month" /></a>
  <a href="https://pepy.tech/project/hnet/"><img src="https://pepy.tech/badge/hnet" /></a>
  <a href="https://zenodo.org/badge/latestdoi/231263493"><img src="https://zenodo.org/badge/231263493.svg" /></a>
  <a href="https://erdogant.github.io/hnet/"><img src="https://img.shields.io/badge/Sphinx-Docs-Green" /></a>
  <a href="https://erdogant.github.io/bnlearn/pages/html/Documentation.html#medium-blog"><img src="https://img.shields.io/badge/Medium-Blog-black" /></a>
  <img src="https://img.shields.io/github/repo-size/erdogant/hnet" />
  <a href="https://erdogant.github.io/bnlearn/pages/html/Documentation.html#"><img src="https://img.shields.io/badge/Support%20this%20project-grey.svg?logo=github%20sponsors" /></a>
  <a href="https://erdogant.github.io/hnet/pages/html/Documentation.html#colab-notebook"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
</p>

---

# hnet

**HNet** stands for ***graphical Hypergeometric Networks***, which is a method where associations across variables are tested for significance by statistical inference. ⭐️Star it if you like it⭐️

---

Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.



---

## Core functionalities:

| Feature | Description |
|--------|-------------|
| [**Parametric Fitting**](https://erdogant.github.io/hnet/pages/html/Parametric.html) | Fit distributions on empirical data X. |
| [**Non-Parametric Fitting**](https://erdogant.github.io/hnet/pages/html/Quantile.html) | Fit distributions on empirical data X using non-parametric approaches (quantile, percentiles). |
| [**Discrete Fitting**](https://erdogant.github.io/hnet/pages/html/Discrete.html) | Fit distributions on empirical data X using binomial distribution. |
| [**Predict**](https://erdogant.github.io/hnet/pages/html/Functions.html#module-hnet.hnet.hnet.predict) | Compute probabilities for response variables y. |
| [**Synthetic Data**](https://erdogant.github.io/hnet/pages/html/Generate.html) |  Generate synthetic data. |
| [**Plots**](https://erdogant.github.io/hnet/pages/html/Plots.html) | Varoius plotting functionalities. |

---

### Support
* This library is for <b>free</b> but improvements and new functionalities are made using coffee! :)

* You can also [support](https://erdogant.github.io/hnet/pages/html/Documentation.html) in various other ways, have a look at the [sponser page](https://erdogant.github.io/bnlearn/pages/html/Documentation.html).

* Report bugs, issues and feature extensions at [github page](https://github.com/erdogant/bnlearn).

<table>
  <tr>
    <td style="text-align: center;">
      <a href="https://www.buymeacoffee.com/erdogant">
        <img src="https://img.buymeacoffee.com/button-api/?text=Buy me a coffee&emoji=&slug=erdogant&button_colour=FFDD00&font_colour=000000&font_family=Cookie&outline_colour=000000&coffee_colour=ffffff" />
      </a>
    </td>
    <td style="text-align: center;">
      <a href="https://erdogant.medium.com/subscribe" target="_blank">
        <img height="50" style="border:0px;height:36px;" src="https://erdogant.github.io/images/medium_follow_me.jpg" border="0" alt="Follow me on Medium" />
      </a>
    </td>
  </tr>
</table>

---

### Blog Posts with Podcast
- [**Medium**](https://erdogant.github.io/hnet/pages/html/Documentation.html#medium-blogs)
- [**Gumroad with podcast**](https://erdogant.github.io/hnet/pages/html/Documentation.html#gumroad-products-with-podcasts)
- [**Article**](https://arxiv.org/abs/2005.04679)

* [API Documentation](https://erdogant.github.io/hnet/)
* [Github]( https://github.com/erdogant/hnet)


-------------


In [None]:
!pip install -U hnet
import hnet
print(hnet.__version__)

In [None]:
import pandas as pd
import numpy as np
from hnet import hnet

# Import example dataset

There are various options that can be downloaded using hnet.
* 'sprinkler'
* 'titanic'
* 'student'
* 'fifa'
* 'cancer'
* 'waterpump'
* 'retail'

In [None]:
hn = hnet()
df = hn.import_example(data='titanic')

# Removing variables for which I know that will not contribute in the model.
del df['PassengerId']
del df['Name']
df.head()

In [None]:
# Initialize model with default parameters
hn = hnet()

===============================================================================

## Import data from url

***HNet*** allows direct downloads from the internet using a url-link. As an example, the [UCI](https://archive.ics.uci.edu/ml/) website is a huge *machine learning data repository*. Lets automatically download and import a datset from UCI website. Note that **not** all datasets can be used. The data needs to be in a csv format (not json), and the datasets needs to have at least 1 categorical variable.

In [None]:
# Import dataset from website
url='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'

df = hn.import_example(url=url)
# Add column names
df.columns=['age','workclass','fnlwgt','education','education-num','marital-status','occupation','relationship','race','sex','capital-gain','capital-loss','hours-per-week','native-country','earnings']
# Set the following columns as floating type
cols_as_float = ['age','hours-per-week','capital-loss','capital-gain']
df[cols_as_float]=df[cols_as_float].astype(float)


In [None]:
# Lets examine the dataset wether the columns are set correctly and the variable name matches the items.
df.head()

In [None]:
# Initialize model with variable fnlwgt in the black-list. This means that it is not included in the modelling.
hn = hnet(black_list=['fnlwgt'])

===============================================================================

### Association learning using **HNet**

In [None]:
# Learn its associations
results = hn.association_learning(df)

In [None]:
# All the results are stored in results
print(results.keys())

# The results are accesable by using the keys.

#results['labx']

#results['counts']

#results['simmatLogP']

#results['dtypes']

## Plotting

There are many possibilities regarding plotting the results.
There are ***static*** plots, and ***dynamic*** plots. In case of using colab, the dynamic plots will not work as it requires writing d3-javascript files to disk. The following functions are available for plotting:

#### Network-graph
* hn.plot()
* hn.d3graph()

#### Heatmap
* hn.heatmap()
* hn.d3heatmap()


In [None]:
# Lets plot a static network

# If the network looks like a big hairball, try to play with some of the following parameters:
ax = hn.plot(scale=1, dpi=100, figsize=(15,15))
# ax = hn.plot(scale=10, dist_between_nodes=2, figsize=(20,20), dpi=400)

In [None]:
# Make the network plot interactive with d3-javascript
out = hn.d3graph()

# Download files and open locally it does not open automatically:
print(out['path'])

In [None]:
# Plot the heatmap

# Plot the heatmap without ordering:
ax = hn.heatmap(figsize=(10,10))

# Cluster the heatmap:
ax = hn.heatmap(cluster=True, figsize=(10,10))

In [None]:
# Plot the heatmap in d3-javascript

# Plot the heatmap without ordering:
# ax = hn.d3heatmap(vmax=1)

In [None]:
# Feature importance
hn.plot_feat_importance()

In [None]:
# Plot summarized results over the categories

# Make the network plot interactive with d3-javascript
graph = hn.d3graph(summarize=True)

# Plot the heatmap without ordering:
ax = hn.plot(figsize=(10,10), summarize=True)


In [None]:
# Plot summarized results over the categories

# Make the network plot interactive with d3-javascript
out = hn.d3heatmap(summarize=True)

# Plot the heatmap without ordering:
ax = hn.heatmap(figsize=(10,10), summarize=True, cluster=True)


In [None]:
# Save the resuls
savepath=hn.save(overwrite=True)

In [None]:
dir(hn)

**Lets make an interactive and responsive network using pyvis**

In [None]:
!pip install pyvis
from pyvis import network as net
from IPython.core.display import display, HTML
from hnet.network import adjmat2graph

In [None]:
# Convert adjacency matrix into Networkx Graph
G=adjmat2graph(hn.results['simmatLogP'])
# Setup of the interactive network figure
g = net.Network(height='800px', width='80%',notebook=True,heading='HNET association Network')
g.from_nx(G)
# Create advanced buttons
g.show_buttons(filter_=['physics'])
# Display
g.show('hnet.html')
display(HTML('hnet.html'))


Fin