A library for high-dimensional latent factor modeling for collaborative filtering applications
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs/source
hidi
tests
.gitignore
CHANGELOG.rst
LICENSE
Makefile
README.rst
circle.yml
requirements.testing.txt
requirements.txt
setup.cfg
setup.py
tox.ini

README.rst

HiDi: Pipelines for Latent Factor Modeling

https://circleci.com/gh/VEVO/hidi/tree/master.svg?style=svg

HiDi is a library for high-dimensional latent factor modeling for collaborative filtering applications.

Read the full documentation.

How Do I Use It?

This will get you started.

from hidi import inout, clean, matrix, pipeline


# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'latent-factors.csv'

transforms = [
    inout.ReadTransform(in_files),      # Read data from disk
    clean.DedupeTransform(),            # Dedupe it
    matrix.SparseTransform(),           # Make a sparse user*item matrix
    matrix.SimilarityTransform(),       # To item*item similarity matrix
    matrix.SVDTransform(),              # Perform SVD dimensionality reduction
    matrix.ItemsMatrixToDFTransform(),  # Make a DataFrame with an index
    inout.WriteTransform(outfile)       # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()

Setup

Requirements

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.

Installation

To install HiDi, simply run

$ pip install hidi

Run the Tests

$ pip install tox
$ tox