Skip to content

Commit

Permalink
fixed content type on README + misc reformatting (#84)
Browse files Browse the repository at this point in the history
  • Loading branch information
iskandr authored and sergeyf committed Nov 9, 2018
1 parent 882720a commit 69e3281
Show file tree
Hide file tree
Showing 11 changed files with 53 additions and 32 deletions.
4 changes: 0 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,6 @@ before_install:
addons:
apt:
packages:
# install pandoc for use with pypandoc for converting the README
# from markdown to RST
- pandoc
# Even though I'm installing cvxopt via conda, still seem to need these:
- liblapack-dev
- libatlas-base-dev
Expand All @@ -41,7 +38,6 @@ install:
- source activate test-environment
- conda install -c cvxgrp scs=1.2.6
- pip install tensorflow
- pip install pypandoc
- pip install -r requirements.txt
- pip install .
- pip install coveralls
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ from fancyimpute import KNN, NuclearNormMinimization, SoftImpute, IterativeImput
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Model each feature with missing values as a function of other features, and
# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)

Expand Down Expand Up @@ -66,10 +66,10 @@ matrix. Not guaranteed to converge but works well in practice. Taken from [Matri

## Note about Inductive vs Transductive Imputation
Most imputation algorithms in `fancyimpute` are *transductive*. In the elegant language of `scikit-learn`'s API
this means that you can only call `solver.fit_transform(X_incomplete)`, but then the "fitted" `solver` will not
this means that you can only call `solver.fit_transform(X_incomplete)`, but then the "fitted" `solver` will not
be able to be applied to new data via a call to `solver.transform`. A simple example is the `MatrixFactorization`
imputer, which decomposes as follows: `<A,B> = X_incomplete`, such that the product of `A` and `B` is close
to `X_incomplete` on its non-missing values. How then, can we apply the learned `A` and `B` matrices to
imputer, which decomposes as follows: `<A,B> = X_incomplete`, such that the product of `A` and `B` is close
to `X_incomplete` on its non-missing values. How then, can we apply the learned `A` and `B` matrices to
held-out data? It is not doable in general, but there are special cases. `fancyimpute` aims to be of general
use and we have not implemented an inductive mode for `MatrixFactorization`.

Expand Down Expand Up @@ -108,13 +108,13 @@ XY_completed = []
for i in range(n_imputations):
imputer = IterativeImputer(n_iter=5, sample_posterior=True, random_state=i)
XY_completed.append(imputer.fit_transform(XY_incomplete))

XY_completed_mean = np.mean(XY_completed, 0)
XY_completed_std = np.std(XY_completed, 0)
```

See [2], chapter 4 for more discussion on multiple
vs. single imputations.
vs. single imputations.

It is still an open problem as to how useful single vs. multiple imputation is in
the context of prediction and classification when the user is not interested in
Expand All @@ -123,6 +123,6 @@ measuring uncertainty due to missing values.
[1] Stef van Buuren, Karin Groothuis-Oudshoorn (2011). "mice: Multivariate
Imputation by Chained Equations in R". Journal of Statistical Software 45:
1-67.

[2] Roderick J A Little and Donald B Rubin (1986). "Statistical Analysis
with Missing Data". John Wiley & Sons, Inc., New York, NY, USA.
2 changes: 2 additions & 0 deletions fancyimpute/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
from .knn import KNN
from .similarity_weighted_averaging import SimilarityWeightedAveraging

__version__ = "0.4.1"

__all__ = [
"Solver",
"NuclearNormMinimization",
Expand Down
12 changes: 12 additions & 0 deletions fancyimpute/iterative_imputer.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import division

import warnings
Expand Down
1 change: 1 addition & 0 deletions fancyimpute/iterative_svd.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

F32PREC = np.finfo(np.float32).eps


class IterativeSVD(Solver):
def __init__(
self,
Expand Down
1 change: 1 addition & 0 deletions fancyimpute/knn.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

from .solver import Solver


class KNN(Solver):
"""
k-Nearest Neighbors imputation for arrays with missing data.
Expand Down
1 change: 1 addition & 0 deletions fancyimpute/nuclear_norm_minimization.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

from sklearn.utils import check_array


class NuclearNormMinimization(Solver):
"""
Simple implementation of "Exact Matrix Completion via Convex Optimization"
Expand Down
1 change: 1 addition & 0 deletions fancyimpute/scaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

import numpy as np


class Scaler(object):
"""
Iterative estimation of row and column centering/scaling
Expand Down
1 change: 1 addition & 0 deletions fancyimpute/soft_impute.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

F32PREC = np.finfo(np.float32).eps


class SoftImpute(Solver):
"""
Implementation of the SoftImpute algorithm from:
Expand Down
18 changes: 10 additions & 8 deletions fancyimpute/solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,10 +206,11 @@ def fit(self, X, y=None):
using `fit` or `fit_transform` on `X_train` and then `transform`
on new `X_test`.
"""
raise ValueError("%s.fit not implemented! This imputation algorithm likely "
"doesn't support inductive mode. Only fit_transform is "
"supported at this time." % (
self.__class__.__name__,))
raise ValueError(
"%s.fit not implemented! This imputation algorithm likely "
"doesn't support inductive mode. Only fit_transform is "
"supported at this time." % (
self.__class__.__name__,))

def transform(self, X, y=None):
"""
Expand All @@ -220,7 +221,8 @@ def transform(self, X, y=None):
using `fit` or `fit_transform` on `X_train` and then `transform`
on new `X_test`.
"""
raise ValueError("%s.transform not implemented! This imputation algorithm likely "
"doesn't support inductive mode. Only %s.fit_transform is "
"supported at this time." % (
self.__class__.__name__,))
raise ValueError(
"%s.transform not implemented! This imputation algorithm likely "
"doesn't support inductive mode. Only %s.fit_transform is "
"supported at this time." % (
self.__class__.__name__,))
30 changes: 17 additions & 13 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,37 @@

import os
import logging
import re

from setuptools import setup

package_name = "fancyimpute"


readme_dir = os.path.dirname(__file__)
readme_filename = os.path.join(readme_dir, 'README.md')

try:
with open(readme_filename, 'r') as f:
readme = f.read()
readme_markdown = f.read()
except:
logging.warn("Failed to load %s" % readme_filename)
readme = ""
readme_markdown = ""

try:
import pypandoc
readme = pypandoc.convert(readme, to='rst', format='md')
except:
logging.warn("Conversion of long_description from MD to RST failed")
pass
with open('%s/__init__.py' % package_name, 'r') as f:
version = re.search(
r'^__version__\s*=\s*[\'"]([^\'"]*)[\'"]',
f.read(),
re.MULTILINE).group(1)

if __name__ == '__main__':
setup(
name='fancyimpute',
version="0.4.0",
name=package_name,
version=version,
description="Matrix completion and feature imputation algorithms",
author="Alex Rubinsteyn, Sergey Feldman",
author_email="alex.rubinsteyn@gmail.com",
url="https://github.com/hammerlab/fancyimpute",
url="https://github.com/openvax/%s" % package_name,
license="http://www.apache.org/licenses/LICENSE-2.0.html",
classifiers=[
'Development Status :: 3 - Alpha',
Expand All @@ -64,6 +67,7 @@
'np_utils',
'tensorflow',
],
long_description=readme,
packages=['fancyimpute'],
long_description=readme_markdown,
long_description_content_type='text/markdown',
packages=[package_name],
)

0 comments on commit 69e3281

Please sign in to comment.