Griddify

Redistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent MolMap library.

Installation

git clone https://github.com/ersilia-os/griddify.git
cd griddify
pip install -e .

Note that you may have to install a C++ compiler. You can just use conda for that:

conda install -c conda-forge cxx-compiler

Step by step

Get a multidimensional dataset and preprocess it

In this example, we will use a dataset of 200 physicochemical descriptors calculated for about 10k compounds. You can get these data with the following command.

from griddify import datasets

data = datasets.get_compound_descriptors()

It is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.

from griddify import Preprocessing

pp = Preprocessing()
pp.fit(data)
data = pp.transform(data)

Create a 2D cloud of data features

Start by calculating distances between features.

from griddify import FeatureDistances

fd = FeatureDistances(metric="cosine").calculate(data)

You can now obtain a 2D cloud of your data features. By default, UMAP is used.

from griddify import Tabular2Cloud

tc = Tabular2Cloud()
tc.fit(fd)
Xc = tc.transform(fd)

It is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.

from griddify.plots import cloud_plot

cloud_plot(Xc)

Rearrange the 2D cloud onto a grid

Distribute cloud points on a grid using a linear assignment algorithm.

from griddify import Cloud2Grid

cg = Cloud2Grid()
cg.fit(Xc)
Xg = cg.transform(Xc)

You can check the rearrangement with an arrows plot.

from griddify.plots import arrows_plot

arrows_plot(Xc, Xg)

To continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.

mappings, side = cg.get_mappings(Xc)

Rearrange your flat data points into grids

Let's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.

from griddify import Flat2Grid

fg = Flat2Grid(mappings, side)
Xi = fg.transform(data)

Explore one sample.

from griddify.plots import grid_plot

grid_plot(Xi[0])

Full pipeline

You can run the full pipeline described above in only a few lines of code.

from griddify import datasets
from griddify import Griddify

data = datasets.get_compound_descriptors()

gf = Griddify(preprocess=True)
gf.fit(data)
Xi = gf.transform(data)

You can find more examples as Jupyter Notebooks in the notebooks folder.

Learn more

The Ersilia Open Source Initiative is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: hello@ersilia.io

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
griddify		griddify
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Griddify

Installation

Step by step

Get a multidimensional dataset and preprocess it

Create a 2D cloud of data features

Rearrange the 2D cloud onto a grid

Rearrange your flat data points into grids

Full pipeline

Learn more

About

Releases 1

Packages

Contributors 3

Languages

License

ersilia-os/griddify

Folders and files

Latest commit

History

Repository files navigation

Griddify

Installation

Step by step

Get a multidimensional dataset and preprocess it

Create a 2D cloud of data features

Rearrange the 2D cloud onto a grid

Rearrange your flat data points into grids

Full pipeline

Learn more

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages