# Multilayer Perceptron for Pancreatic Cell Classification

* For this homework, you will apply scRNA-seq data to classify different types of pancreatic cells

* To do so, we will apply and compare various classification algorithms demonstrated in class


Preprocessed RNA sequence data is published by:

* [Abdelaal, T.; Michielsen, L.; Cats, D.; Hoogduin, D.; Mei, H.; Reinders, M. J. T.; Mahfouz, A. A Comparison of Automatic Cell Identification Methods for Single-Cell RNA Sequencing Data. Genome Biology 2019, 20 (1), 194.](https://doi.org/10.1186/s13059-019-1795-z)


## Problem Statement

1. Load pancreatic cell labels from `data/Pancreatic_Labels.csv` into a list or vector called `all_labels`, and print this vector. Load the single-cell RNA-seq data from `data/subset_combined_humanpancreas_data.csv` into a pandas data frame called `scRNAseq_data`, making sure to specify that the index column is column 0. Print the head of this data frame. The classes in `all_labels` correspond to rows in `scRNAseq_data`.

2. Isolate indices in `all_labels` corresponding either "alpha" or "beta" pancreatic cell labels. Isolate values in `all_labels` corresponding to these indices, and store this new label vector as `labels_ab`. Likewise, isolate rows in `scRNAseq_data` corresponding to these indices, and store these rows in a data frame called `scRNAseq_ab`. Print the **length** of `labels_ab` and the **shape** of `scRNAseq_ab`.

3. We will train classifiers to predict "alpha" or "beta" pancreatic cell types from scRNA-seq data. Our covariates are the mRNA counts in `scRNAseq_ab` and our labels are the "alpha" and "beta" classes in `labels_ab`. Create an 80%/20% train/test split of this data.

4. Using the training set, train a logistic regression classifier to predict pancreatic cell labels. Then, evaluate the classifier performance by reporting its accuracy at predicting cell classes in the test set.

5. Repeat problem 4 for an SVM classifier. You may select a kernel of your choice, but please explicitly specify the kernel when initializing your classifier.

6. Repeat problem 4 for an MLP classifier. You may select an activation function of your choice, but please explicitly specify the activation function when initializing your classifier. You may change the hidden layer architecture from its default settings if you would like, but you do not have to.

7. How did the performance of the three classifiers compare to one-another? 

## Solutions

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

1. Load pancreatic cell labels from `data/Pancreatic_Labels.csv` into a list or vector called `all_labels`, and print this vector. Load the single-cell RNA-seq data from `data/subset_combined_humanpancreas_data.csv` into a pandas data frame called `scRNAseq_data`, making sure to specify that the index column is column 0. Print the head of this data frame. The classes in `all_labels` correspond to rows in `scRNAseq_data`.

2. Isolate indices in `all_labels` corresponding either "alpha" or "beta" pancreatic cell labels. Isolate values in `all_labels` corresponding to these indices, and store this new label vector as `labels_ab`. Likewise, isolate rows in `scRNAseq_data` corresponding to these indices, and store these rows in a data frame called `scRNAseq_ab`. Print the **length** of `labels_ab` and the **shape** of `scRNAseq_ab`.

3. We will train classifiers to predict "alpha" or "beta" pancreatic cell types from scRNA-seq data. Our covariates are the mRNA counts in `scRNAseq_ab` and our labels are the "alpha" and "beta" classes in `labels_ab`. Create an 80%/20% train/test split of this data. Print the number of samples in the training set.

4. Using the training set, train a logistic regression classifier to predict pancreatic cell labels. Then, evaluate the classifier performance by reporting its accuracy at predicting cell classes in the test set.

5. Repeat problem 4 for an SVM classifier. You may select a kernel of your choice, but please explicitly specify the kernel when initializing your classifier.

6. Repeat problem 4 for an MLP classifier. You may select an activation function of your choice, but please explicitly specify the activation function when initializing your classifier. You may change the hidden layer architecture from its default settings if you would like, but you do not have to.

7. How did the performance of the three classifiers compare to one-another? 