Skip to content

mru4913/CorrPy

 
 

Repository files navigation

Build Status

CorrPy

Latest Update Date: 2019 Feb.

Overview

This package is developed to help users calculate correlation coefficients and covariance matrix of a given data with missing values. In order to implement correlation coefficients and covariance matrix, the standard deviation of the data is needed however the world of data is not always clean and tidy. Python's numpy fails to return standard deviation and calculation of the correlation coefficients when the data has missing values. This package aims to overcome this obstacle and help users handle missing values when calculating correlation coefficients and covariance matrix. CorrPy uses likewise deletion method to handle missing values: removing the rows of a data frame where the missing values are present.

Note: If the course timeline permits, CorrPy will handle missing values via single manipulation with mean value: replacing the missing values with the mean of existing values.

Team

Name Slack Handle Github.com Link
KERA YUCEL @KERA YUCEL @K3ra-y Kera's link
GOPALAKRISHNAN ANDIVEL @Krish @Gopsathvik Krish's link
WEISHUN DENG @Wilson Deng @xiaoweideng Wilson's link
Mengda Yu @Mengda(Albert) Yu @mru4913 Albert's link

Installation

CorrPy can be installed with pip in a command window:

pip install git+https://github.com/UBC-MDS/CorrPy.git

Branch Coverage Test

To test branch coverage, we use coverage.py. You can install by pip install coverage.

We also create a Makefile to automate the process. You can try the following to observe branch coverage.

make report_branch

The results are shown below.

Name                            Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------------------------
CorrPy/__init__.py                  4      0      0      0   100%
CorrPy/corr_plus.py                26      0     12      0   100%
CorrPy/cov_mx.py                   20      0      8      0   100%
CorrPy/std_plus.py                 15      0      8      0   100%
CorrPy/test/__init__.py             0      0      0      0   100%
CorrPy/test/test_corr_plus.py      41      0      0      0   100%
CorrPy/test/test_cov_mx.py         45      0      0      0   100%
CorrPy/test/test_std_plus.py       35      0      0      0   100%
---------------------------------------------------------------------------

Test

To test all the files, we use pytest by make test_all.

The results are shown below.

Functions

Standard Deviation (std_plus)

Standard deviation calculates how close the data points to the mean, in which an insight for the variation of the data points. This function would automatically handle the missing values in the input.



std_plus will omit frustration from workflows.


Example:

>>> import CorrPy
>>> x = [1,2, np.nan, 4, np.nan, 6]
>>> std_plus(x)
array([1.920286436967152])

>>> y = [1,2, np.inf, 4, np.nan, 6, "a"]
>>> np.std_plus(y)
array([1.920286436967152])

Correlation Coefficients (corr_plus)

Correlation coefficients calculates the relationship between two variables as well as the magnitude of this relationship. This function would automatically handle the missing values in the input.


Example:

>>> import CorrPy
>>> x = [1,2,np.nan,4,5]
>>> y = [-6,-7,-8,9,True]
>>> corr_plus(x,y)
array([0.7391090892601785])

Covariance Matrix (cov_mx)

A Covariance matrix displays the variance and covariance together. This function would use the above two functions.



A covariance matrix displays the variance and covariance together. The diagonal elements represent the variances and the covariances are represented by the other elements in the matrix shown below.


Example:

>>> import CorrPy
>>> x = [1,2,np.nan,4,5]
>>> y = [-6,-7,-8,9,True]
>>> cov_mx([x,y])
array([[ 2.33333333, 12.66666667],
       [12.66666667, 80.33333333]])

How does CorrPy package fits into the Python ecosystem?

Following functions are already present in Python ecosystem. However, missing values are not being handles for the following functions and CorrPy package will implement calculation of standard deviation, correlation coefficients and covariance matrix.

Python Standard Deviation: https://docs.scipy.org/doc/numpy-1.14.2/reference/generated/numpy.std.html

Python Correlation Coefficients: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.corrcoef.html

Python Covariance Matrix: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.cov.html

Milestone Progress

Milestone Tasks
Milestone 1 Proposal
Milestone 2 Function Code
Test Code

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.2%
  • Makefile 15.8%