## Introduction to Seurat:
In the project, you will predominantly use Seurat, an R package that combines many functionalities for the
analysis of single-cell data. For the initial steps, you can find helpful documentation here. They will be explicitly
mentioned if other packages are required for specific tasks.

Before you start programming, you should set up the system as follows:

## System-setup:
1. Install Conda

2. Install all packages using the provided environment.yml file. [Mac users: delete singleR line from the
.yml file before running the following command]
conda env create -f environment.yml

3. Start the conda environment with:
conda activate single-cell

4. Install CellChat by starting R and install CellChat using devtools
devtools::install_github("sqjinCellChat")

If you are using a Mac, delete the line for singleR from environment.yml file and manually install
SingleR using the following commands in R:

```R 
if (!require("BiocManager", quietly = TRUE))

install.packages("BiocManager")

BiocManager::install("SingleR")
```


5. If you have problems with Seurat’s clustering functions, please try to downgrade Matrix and spatstat.core
packages.

`remove.packages(grep("spatstat", installed.packages(), value = T))`

`devtools::install_version("spatstat", version = "2.4.4")`

`install.packages("Matrix", ".", type = "source", repos = "http://R-Forge.R-p6.`

 Test if you have installed all necessary libraries:

```R 
suppressPackageStartupMessages({
library(dplyr)
library(spatstat.core)
library(Seurat)
library(patchwork)
library(DoubletFinder)
library(SingleR)
library(enrichR)
library(CellChat)
library(SingleCellExperiment)
library(SeuratWrappers)
library(tidyverse)
library(monocle3)
library(celldex)
})
```

In [18]:
library(dplyr)
# library(spatstat.core)
library(Seurat)
library(patchwork)
library(DoubletFinder)
library(SingleR)
# library(enrichR)
library(CellChat)
library(SingleCellExperiment)
library(SeuratWrappers)
library(tidyverse)
# library(monocle3)
# library(celldex)

## Download the data:
You can download the dataset for this project under the following link: [https://icbb-share.s3.eucentral-1.amazonaws.com/single-cell-bioinformatics/scbi_ds1.zip](https://icbb-share.s3.eu-central-1.amazonaws.com/single-cell-bioinformatics/scbi_ds1.zip) The file contains the data of four samples:
BMMC_D1T1, BMMC_D1T2, CD34_D2T1 and CD34_D3T1, with separate expression matrices for each sample.

## Week 1: (5 Points)
### 1 Loading the Data (1P)
Load the expression matrices from the dataset and construct a Seurat object. You will need to load two files: one
containing data on Bone Marrow Mononuclear Cells (BMMC) and the other on CD34+ Enriched Bone Marrow
Cells (CD34).

In [30]:
# Load the data on Bone Marrow Mononuclear Cells (BMMC) Cells
BMMC_D1T1 <- readRDS("scbi_ds1/GSM4138872_scRNA_BMMC_D1T1.rds")
BMMC_D1T2 <- readRDS("scbi_ds1/GSM4138873_scRNA_BMMC_D1T2.rds")
# Load the data on CD34+ Cells
CD34_D2T1 <- readRDS("scbi_ds1/GSM4138874_scRNA_CD34_D2T1.rds")
CD34_D3T1 <- readRDS("scbi_ds1/GSM4138875_scRNA_CD34_D3T1.rds")


In [31]:
# Create Seurat objects for each sample
BMMC_D1T1 <- CreateSeuratObject(BMMC_D1T1, project = "BMMC_D1T1")
BMMC_D1T2 <- CreateSeuratObject(BMMC_D1T2, project = "BMMC_D1T2")
CD34_D2T1 <- CreateSeuratObject(CD34_D2T1, project = "CD34_D2T1")
CD34_D3T1 <- CreateSeuratObject(CD34_D3T1, project = "CD34_D3T1")

I create a metadata data frame based on the "Table 1".

In [33]:
# Create metadata dataframe
metadata <- data.frame(
  Sample = c("BMMC_D1T1", "BMMC_D1T2", "CD34_D2T1", "CD34_D3T1"),
  Donor = c("D1", "D1", "D2", "D3"),
  Replicate = c("T1", "T2", "T1", "T1"),
  Sex = c("F", "F", "M", "F")
)

metadata


Sample,Donor,Replicate,Sex
<chr>,<chr>,<chr>,<chr>
BMMC_D1T1,D1,T1,F
BMMC_D1T2,D1,T2,F
CD34_D2T1,D2,T1,M
CD34_D3T1,D3,T1,F


### 2 Create the sample sheet (1P)
Question: Label each sample with the corresponding metadata from Table 1.

I add the metadata information for each sample to the respective Seurat object.

In [34]:
# Add metadata to each Seurat object
BMMC_D1T1$Donor <- "D1"
BMMC_D1T1$Replicate <- "T1"
BMMC_D1T1$Sex <- "F"

BMMC_D1T2$Donor <- "D1"
BMMC_D1T2$Replicate <- "T2"
BMMC_D1T2$Sex <- "F"

CD34_D2T1$Donor <- "D2"
CD34_D2T1$Replicate <- "T1"
CD34_D2T1$Sex <- "M"

CD34_D3T1$Donor <- "D3"
CD34_D3T1$Replicate <- "T1"
CD34_D3T1$Sex <- "F"


### Add Meta-data (3P)
For each sample report the following information:
1. How many cells are in each sample?
2. How many genes are in the expression matrices?
3. What information is now part of the meta-data of the objects?