# Tutorial
This notebook is for those who are starting to learn scProject. It is recommended to use Python Virtual Environment and run these .ipynb files in the given parent directory ('test' subfolder) within the repository.

Import necessary packages.

In [None]:
import scanpy as sc
import scProject

Reading in the datasets with scanpy.

In [2]:
patterns = sc.read_h5ad('data/patterns_anndata.h5ad')
dataset = sc.read_h5ad('data/test_target.h5ad') # ignore the error message

In this case, the patterns and dataset do not have the same genes, so we take the set intersection of their genes.

In [None]:
dataset_filtered, patterns_filtered = scProject.matcher.filterAnnDatas(dataset, patterns, 'gene_id')

Now that the datasets have the same genes, we can now run a regression to find the use of the patterns in the dataset. In our case, the "discovered" matrix can be found in dataset_filtered.obsm['retinaProject']

In [None]:
scProject.rg.NNLR_ElasticNet(dataset_filtered, patterns_filtered, 'retinaProject', alpha=.01, L1=.01)

Now that we have our pattern matrix as described in Enter the Matrix: Factorization Uncovers Knowledge of Omics. To see if certain features correlate with a cell type we create a pearson matrix and plot.

In [None]:
scProject.viz.pearsonMatrix(dataset_filtered, patterns_filtered, 'CellType', 12, 'retinaProject', 'PearsonRetina', True)

Now we are going to project the pattern matrix down into 2-dimension and then plot it colored by cell type.

In [None]:
scProject.viz.UMAP_Projection(dataset_filtered, 'CellType', 'retinaProject', 'retinaUMAP', 12, plot=True)

Now we are going to make plots that show the usage of each feature in each sample. Each point's/cell's color is based on the coefficient of the feature. Also, above each plot a few metrics are displayed to better understand the usage of that feature in the dataset. Here we are only going to plot the first 10. It is worth noting here that while feature 3 has a very high pearson correlation with microglia. It is really because microglia has a lot of zeros for feature 3, which causes a high pearson coefficient.

In [None]:
scProject.viz.featurePlots(dataset_filtered, 10, 'retinaProject', 'retinaUMAP')

And that's it! You have now learned how to use scProject. For more information, please refer to the documentation.