# Assembly and demonstration of Kathode's rsMRI QC package

##### This project aims to produce a python package that is capable of taking a csv spreadsheet of information about a large number of resting state MRI scans and performing automated QC to determine which scans are usable and which should be excluded from future analyses. 

## Setting up the package

In [4]:
package_name = "kathodes_package"
%mv package_name/ kathodes_package

In [5]:
from pathlib import Path
python_dir = Path(package_name)
(python_dir / '__init__.py').touch()
Path('setup.py').touch()
Path('LICENSE').touch()
Path('README.md').touch()

#### Adding setup.py

In [6]:
%%writefile setup.py
import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="kathodes_package", 
    version="0.0.1",
    author="Katherine Soderberg",
    author_email="katherine.soderberg@nih.gov",
    description="A small example package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/packaging_demo",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',

)

Overwriting setup.py


#### Creating README file

In [1]:
%%writefile README.md
# Example Package

This is a package that can read in a table of data describing resting MRI scans and filter 
by specific scanner metrics to perform automated quality control. It produces information
about which scans are high quality enough to proceed with processing. 

Overwriting README.md


#### Creating LICENSE file

In [6]:
%%writefile LICENSE
Copyright (c) 2018 The Python Packaging Authority

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Overwriting LICENSE


## Coding the content of the package

#### Creating filesummary module

In [3]:
%%writefile kathodes_package/filesummary.py

def summarize(filename):
    """Return basic information about the contents of a csv file with the rows as scan instances."""
    import pandas as pd
    if filename.endswith('.csv'):
        loaded_file = pd.read_csv(filename)
        print("The head of the file is:" + "\n" + str(loaded_file.head()))
        print("Summary information about the file is as follows:")
        print(loaded_file.info())
        number_of_scans = str(len(loaded_file.index))
        print("The file contains information about " + number_of_scans + " scans.")
        closing_statement = filename + " has been summarized."
        return closing_statement
    else: 
        print("The file provided is not in comma separated value format. Please convert your input file to .csv format before using this package.")
        
def listColumns(filename):
    import pandas as pd
    loaded_file = pd.read_csv(filename)
    colNames_list = loaded_file.columns
    return colNames_list


Overwriting kathodes_package/filesummary.py


#### Writing test for filesummary to ensure correct file type

In [2]:
%%writefile tests/test_input.py

def test_input_is_csv():
    from kathodes_package import filesummary
    filename = 'sample.csv'
    output = filesummary.summarize(filename)
    assert output == "sample.csv has been summarized."

Overwriting tests/test_input.py


In [2]:
!pytest

platform darwin -- Python 3.7.4, pytest-5.2.1, py-1.8.0, pluggy-0.13.0
rootdir: /Users/katherinesoderberg/Documents/PythonClass/project_spring_2020
plugins: arraydiff-0.3, remotedata-0.3.2, doctestplus-0.4.0, openfiles-0.4.0
collected 2 items                                                              [0m

tests/sample_test.py [32m.[0m[36m                                                   [ 50%][0m
tests/test_input.py [32m.[0m[36m                                                    [100%][0m



#### Creating data cleaning module

In [2]:
%%writefile kathodes_package/data_cleaning.py

def removeMissing(filename):
    """Takes a file that contains missing scans and removes those rows, while providing the subject name and reason for removal."""
    loaded_file = pd.read_csv(filename)
    cleaned_list = []
    missing_counter = 0
    for row in loaded_file.index:
        if loaded_file[3][row] == "":
            print("Dropping subject scan " + loaded_file[0][row] + " because of " + loaded_file[1][row])
            counter = counter + 1
        else:
            cleaned_list.append(loaded_file[:][row])
    print("There were " + str(counter) + " scans with missing data dropped.")
    cleaned_df = pd.DataFrame(cleaned_list)
    return cleaned_df

def voxelConsistency(cleaned_dataframe, column_number, expected_size):
    """Checks that every scan has the same voxel dimension, specified by the user."""
    consistency_boolean = True
    for row in cleaned_dataframe.index:
        if cleaned_dataframe[column_number][row] == expected_size:
            continue
        elif cleaned_dataframe[column_number][row] != expected_size:
            print("Subject scan " + cleaned_dataframe[0][row] + " does not have voxel size of " +str(expected_size))
            consistency_boolean = False
    return consistency_boolean


Writing kathodes_package/data_cleaning.py


#### Creating outlier assessment module

In [3]:
%%writefile kathodes_package/outlier_assessment.py

def outlierStats(outlier_list):
    """Takes a list of outliers and computes the mean and standard deviation"""
    try:
        outlierMean = outlier_list.mean()
        outlierStdev = outlier_list.std()
        return outlierMean, outlierStdev
    except TypeError :
        explanation = "Cannot compute statistics on a list of non-numerical elements."
        return explanation
    
def outlierExclude(cleaned_dataframe, column_number, stdev_cutoff_factor):
    """Uses outlierStats to determine which scans have outlying volumes above a specified threshold and removes them."""
    column_as_series = cleaned_dataframe[column_number]
    column_as_list = column_as_series.tolist()
    mean, stdev = outlierStats(column_as_list)
    upper_threshold = mean + (stdev * stdev_cutoff_factor)
    lower_threshold = mean - (stdev * stdev_cutoff_factor)
    noOutlier_list = []
    for row in cleaned_dataframe.index:
        if cleaned_dataframe[column_number][row] > upper_threshold | cleaned_dataframe[column_number][row] < lower_threshold:
            print("Dropping subject scan " + cleaned_dataframe[0][row] " due to " + str(cleaned_dataframe[column_number][row]) + " outlying volumes." )
        else:
            noOutlier_list.append(cleaned_dataframe[:][row])
    noOutlier_dataframe = pd.DataFrame(noOutlier_list)

Writing kathodes_package/outlier_assessment.py


In [None]:

# see http://katyhuff.github.io/python-testing/03-exceptions/
def mean(num_list):
    try:
        return sum(num_list)/len(num_list)
    except ZeroDivisionError :
        return 0
    except TypeError as detail :
        msg = "The algebraic mean of an non-numerical list is undefined.\
               Please provide a list of numbers."
        raise TypeError(detail.__str__() + "\n" +  msg)

In [7]:
pip install -e .

Obtaining file:///Users/katherinesoderberg/Documents/PythonClass/project_spring_2020
Installing collected packages: kathodes-package
  Found existing installation: kathodes-package 0.0.1
    Can't uninstall 'kathodes-package'. No files were found to uninstall.
  Running setup.py develop for kathodes-package
Successfully installed kathodes-package
Note: you may need to restart the kernel to use updated packages.


In [8]:
import kathodes_package

In [12]:
example_pkg.add(4,5)

AttributeError: module 'example_pkg' has no attribute 'add'

In [None]:
## Demonstrating the package with example spreadsheet

In [1]:
from kathodes_package import filesummary
filesummary.summarize("restqclist.040220_KS.csv")
column_names = filesummary.listColumns("restqclist.040220_KS.csv")
print(column_names)

The head of the file is:
                         sub rest1.date.seq  rest1.NumOutliers  rest1.TR  \
0  abductor_lothian.06192015        61915.4               10.0       2.0   
1  abductor_lothian.11082017       110817.4               15.0       2.0   
2    addict_bavduin.09262013        92613.4               12.0       2.0   
3    address_humans.04102014        41014.4                5.0       2.0   
4    address_humans.04082016        40816.4               19.0       2.0   

   rest1.nvox1  rest1.nvox2  rest1.nvox3  rest1.numTRs Keep?  rest2.date.seq  \
0        1.875        1.875          3.0         184.0     y         61915.5   
1        1.875        1.875          3.0         184.0     y        112917.4   
2        1.875        1.875          3.0         184.0     y         92613.5   
3        1.875        1.875          3.0         184.0     y         41014.5   
4        1.875        1.875          3.0         184.0     y         40816.5   

   ...  rest7.nvox2  rest7.nvox3  res