<a href="https://colab.research.google.com/github/giacomoorsini/single_cell-ML_project/blob/main/ML_project_LTGO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cell type recognition from single cell sequencing data in Mus musculus




Tabula Muris is a compendium of single cell transcriptome data from the model organism Mus musculus, containing nearly 100,000 cells from 20 organs and tissues.

- data retrieval
- models selection
- models understanding
- data preprocessing
- Training/testing dataset cration
- model building
- model evaluation
- model comparison



## Data retrieval

In [1]:
%%bash

#get the raw gene-cell files,
for i in 10700143 10842785 13088129;
do
  wget -q https://figshare.com/files/$i
  if [ $? -eq 0 ]; then
    echo "Download of file $i done."
  else
    echo "Failed to download file $i."
  fi
done

Download of file 10700143 done.
Download of file 10842785 done.
Download of file 13088129 done.


In [2]:
%%bash

unzip 10700143

mkdir heart_data

cp FACS/Heart-counts.csv heart_data/heart_counts_raw.csv

rm -r FACS
rm 10700143

Archive:  10700143
   creating: FACS/
  inflating: FACS/.DS_Store          
   creating: __MACOSX/
   creating: __MACOSX/FACS/
  inflating: __MACOSX/FACS/._.DS_Store  
  inflating: FACS/Aorta-counts.csv   
  inflating: FACS/Bladder-counts.csv  
  inflating: __MACOSX/FACS/._Bladder-counts.csv  
  inflating: FACS/Brain_Myeloid-counts.csv  
  inflating: __MACOSX/FACS/._Brain_Myeloid-counts.csv  
  inflating: FACS/Brain_Non-Myeloid-counts.csv  
  inflating: __MACOSX/FACS/._Brain_Non-Myeloid-counts.csv  
  inflating: FACS/Diaphragm-counts.csv  
  inflating: FACS/Fat-counts.csv     
  inflating: __MACOSX/FACS/._Fat-counts.csv  
  inflating: FACS/Heart-counts.csv   
  inflating: FACS/Kidney-counts.csv  
  inflating: __MACOSX/FACS/._Kidney-counts.csv  
  inflating: FACS/Large_Intestine-counts.csv  
  inflating: __MACOSX/FACS/._Large_Intestine-counts.csv  
  inflating: FACS/Limb_Muscle-counts.csv  
  inflating: FACS/Liver-counts.csv   
  inflating: __MACOSX/FACS/._Liver-counts.csv  
  inflating

In [3]:
import pandas as pd
import numpy as np

In [None]:
# create metadata dataframe
metadata=pd.read_csv("/content/10842785")
metadata.head(10)

Unnamed: 0,plate.barcode,mouse.id,tissue,subtissue,FACS.selection,mouse.sex
0,D041914,3_8_M,Bladder,,Multiple,M
1,D042253,3_9_M,Bladder,,Multiple,M
2,MAA000487,3_10_M,Bladder,,Multiple,M
3,B000610,3_56_F,Bladder,,Multiple,F
4,B002764,3_38_F,Bladder,,Multiple,F
5,B002771,3_39_F,Bladder,,Multiple,F
6,MAA000538,3_8_M,Brain_Non-Myeloid,Cerebellum,Neurons,M
7,MAA000550,3_10_M,Brain_Non-Myeloid,Hippocampus,Neurons,M
8,MAA000553,3_10_M,Brain_Non-Myeloid,Hippocampus,,M
9,MAA000560,3_10_M,Brain_Non-Myeloid,Cortex,Neurons,M


In [None]:
# create a heart metadata table
heart_metadata=metadata[metadata['tissue']=="Heart"].reset_index(drop=True)
print(heart_metadata.head(10))


  plate.barcode mouse.id tissue subtissue FACS.selection mouse.sex
0     MAA000398    3_9_M  Heart        LA         Viable         M
1     MAA000399    3_9_M  Heart        RV         Viable         M
2     MAA000400    3_8_M  Heart        LA         Viable         M
3     MAA000452    3_8_M  Heart        RV         Viable         M
4     MAA000586    3_8_M  Heart        RA         Viable         M
5     MAA000587    3_8_M  Heart        LV         Viable         M
6     MAA000589    3_9_M  Heart        LV         Viable         M
7     MAA000594    3_8_M  Heart     Aorta         Viable         M
8     MAA000595    3_9_M  Heart     Aorta         Viable         M
9     MAA000898   3_11_M  Heart        RV         Viable         M


In [22]:
annotations=pd.read_csv("/content/13088129")
annotations_heart=annotations[annotations["tissue"]=="Heart"].reset_index(drop=True)
annotations_heart

  annotations=pd.read_csv("/content/13088129")


Unnamed: 0,Neurog3>0_raw,Neurog3>0_scaled,cell,cell_ontology_class,cell_ontology_id,cluster.ids,free_annotation,mouse.id,mouse.sex,plate.barcode,...,subsetC,subsetC_cluster.ids,subsetD,subsetD_cluster.ids,subsetE,subsetE_cluster.ids,subtissue,tissue,tissue_tSNE_1,tissue_tSNE_2
0,,,A1.B000412.3_56_F.1.1,endothelial cell,CL:0000115,2,,3_56_F,F,B000412,...,,,,,,,RA,Heart,-29.766385,-4.010628
1,,,A1.B000633.3_56_F.1.1,leukocyte,CL:0000738,4,,3_56_F,F,B000633,...,,,,,,,RV,Heart,10.689980,-26.034551
2,,,A1.B000634.3_56_F.1.1,endothelial cell,CL:0000115,6,,3_56_F,F,B000634,...,,,,,,,LA,Heart,-22.531322,-23.512410
3,,,A1.B002423.3_39_F.1.1,fibroblast,CL:0000057,3,,3_39_F,F,B002423,...,,,,,,,RV,Heart,-1.045144,-13.425768
4,,,A1.B002427.3_39_F.1.1,myofibroblast cell,CL:0000186,7,,3_39_F,F,B002427,...,,,,,,,LA,Heart,-8.340210,-41.185128
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4360,,,P9.MAA000586.3_8_M.1.1,fibroblast,CL:0000057,0,,3_8_M,M,MAA000586,...,,,,,,,RA,Heart,-2.964871,7.099931
4361,,,P9.MAA000587.3_8_M.1.1,endothelial cell,CL:0000115,2,,3_8_M,M,MAA000587,...,,,,,,,LV,Heart,-31.539531,9.813377
4362,,,P9.MAA000589.3_9_M.1.1,endothelial cell,CL:0000115,2,,3_9_M,M,MAA000589,...,,,,,,,LV,Heart,-33.234753,-9.461885
4363,,,P9.MAA000903.3_11_M.1.1,fibroblast,CL:0000057,0,,3_11_M,M,MAA000903,...,,,,,,,RA,Heart,2.500826,22.358329


In [5]:
heart_count_raw=pd.read_csv("heart_data/heart_counts_raw.csv",index_col="Unnamed: 0")
print(heart_count_raw.shape)
heart_count_raw.head(10)

(23433, 6002)


Unnamed: 0,B12.MAA000398.3_9_M.1.1,D16.MAA000398.3_9_M.1.1,F10.MAA000398.3_9_M.1.1,L17.MAA000398.3_9_M.1.1,N18.MAA000398.3_9_M.1.1,H15.MAA000398.3_9_M.1.1,J14.MAA000398.3_9_M.1.1,B14.MAA000398.3_9_M.1.1,D17.MAA000398.3_9_M.1.1,F14.MAA000398.3_9_M.1.1,...,D5.MAA100097.3_39_F.1.1,A8.MAA100097.3_39_F.1.1,D7.MAA100097.3_39_F.1.1,F2.MAA100097.3_38_F.1.1,F3.MAA100097.3_38_F.1.1,A7.MAA100097.3_39_F.1.1,D6.MAA100097.3_39_F.1.1,F1.MAA100097.3_38_F.1.1,A9.MAA100097.3_39_F.1.1,D8.MAA100097.3_39_F.1.1
0610005C13Rik,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0610007C21Rik,272,21,54,21,73,105,0,55,141,0,...,0,0,51,0,66,0,147,0,0,100
0610007L01Rik,93,95,43,0,82,71,0,2,71,0,...,0,0,92,0,150,0,116,0,0,0
0610007N19Rik,65,0,0,27,66,0,0,9,11,0,...,0,0,0,0,11,0,0,0,0,31
0610007P08Rik,1,0,0,0,0,34,0,0,0,0,...,0,0,1,0,8,0,0,0,0,2
0610007P14Rik,0,0,0,27,0,24,0,20,0,0,...,0,0,0,0,0,0,0,0,0,1
0610007P22Rik,0,0,0,0,58,99,0,0,44,0,...,0,0,0,0,0,0,0,0,0,0
0610008F07Rik,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0610009B14Rik,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0610009B22Rik,0,34,0,0,68,0,28,33,0,15,...,0,23,32,2,73,0,157,0,0,0


##Data preprocessing
This first section is divided into different substeps


1.   Remove unclassified cells
2.   Remove genes which have no expression in any of the cells
3.   Remove house-keeping genes

###Remove unclassified cells
To remove unclassified cells, the cells from `heart_count_raw` have to be labeled with their respective cell type classification (fibroblast, endothelial cell, leukocyte, myofibroblast cell, endocardial cell, cardiac muscle cell, smooth muscle cell). This information can be found in `annotations_heart`.

The raw `heart_count_raw` dataframe contains **23433 genes and 6002 cells**.

In [None]:
annotations_heart.value_counts("cell_ontology_class")

cell_ontology_class
fibroblast             2119
endothelial cell       1177
leukocyte               523
myofibroblast cell      178
endocardial cell        165
cardiac muscle cell     133
smooth muscle cell       42
Name: count, dtype: int64

In [23]:
heart_count_raw_t = heart_count_raw.T
annotations_heart.set_index('cell', inplace=True)

cell_types_heart = annotations_heart["cell_ontology_class"].rename("cell_classification")

combined_data = heart_count_raw_t.join(cell_types_heart, how='inner')

print(f"There are {heart_count_raw_t.shape[0]-combined_data.shape[0]} cells that have no classification information.")

There are 1637 cells that have no classification information.


Before moving to the next step, NaN values have to be removed from the dataset (some cells had an entry in `annotations_heart` but had no associated classification):

In [30]:
combined_data=combined_data.dropna()
combined_data #number of genes is columns-1 (added "cell_classification" column)

Unnamed: 0,0610005C13Rik,0610007C21Rik,0610007L01Rik,0610007N19Rik,0610007P08Rik,0610007P14Rik,0610007P22Rik,0610008F07Rik,0610009B14Rik,0610009B22Rik,...,Zxdc,Zyg11a,Zyg11b,Zyx,Zzef1,Zzz3,a,l7Rn6,zsGreen_transgene,cell_classification
B12.MAA000398.3_9_M.1.1,0,272,93,65,1,0,0,0,0,0,...,0,0,0,68,1,0,0,17,0,fibroblast
D16.MAA000398.3_9_M.1.1,0,21,95,0,0,0,0,0,0,34,...,0,0,0,389,3,6,0,41,0,endothelial cell
F10.MAA000398.3_9_M.1.1,0,54,43,0,0,0,0,0,0,0,...,41,0,28,45,0,31,0,19,0,myofibroblast cell
L17.MAA000398.3_9_M.1.1,0,21,0,27,0,27,0,0,0,0,...,0,0,0,50,38,0,0,11,0,myofibroblast cell
N18.MAA000398.3_9_M.1.1,0,73,82,66,0,0,58,0,0,68,...,22,0,14,97,99,124,0,39,0,fibroblast
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
D7.MAA100097.3_39_F.1.1,0,51,92,0,1,0,0,0,0,32,...,6,0,3,0,31,0,0,0,0,cardiac muscle cell
F3.MAA100097.3_38_F.1.1,0,66,150,11,8,0,0,0,0,73,...,56,0,0,0,0,0,0,212,0,cardiac muscle cell
D6.MAA100097.3_39_F.1.1,0,147,116,0,0,0,0,0,0,157,...,113,0,123,0,0,193,0,23,0,cardiac muscle cell
F1.MAA100097.3_38_F.1.1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,cardiac muscle cell


This last substep has removed 28 cells with no classification. Therefore, this whole step reduced the raw dataset to **4337 cells** and **23433 genes**.

###Remove genes which have no expression in any of the cells
To remove the genes that have no expression across all cells, rows that have 0 value across all columns were exculded.

In [35]:
nonzero_genes = combined_data.loc[:, (combined_data != 0).any()]
print(f"There are {combined_data.shape[1]-nonzero_genes.shape[1]} genes that have no expression in any cells.")
nonzero_genes

There are 2382 genes that have no expression in any cells.


Unnamed: 0,0610005C13Rik,0610007C21Rik,0610007L01Rik,0610007N19Rik,0610007P08Rik,0610007P14Rik,0610007P22Rik,0610008F07Rik,0610009B14Rik,0610009B22Rik,...,Zxdb,Zxdc,Zyg11a,Zyg11b,Zyx,Zzef1,Zzz3,a,l7Rn6,cell_classification
B12.MAA000398.3_9_M.1.1,0,272,93,65,1,0,0,0,0,0,...,8,0,0,0,68,1,0,0,17,fibroblast
D16.MAA000398.3_9_M.1.1,0,21,95,0,0,0,0,0,0,34,...,0,0,0,0,389,3,6,0,41,endothelial cell
F10.MAA000398.3_9_M.1.1,0,54,43,0,0,0,0,0,0,0,...,0,41,0,28,45,0,31,0,19,myofibroblast cell
L17.MAA000398.3_9_M.1.1,0,21,0,27,0,27,0,0,0,0,...,0,0,0,0,50,38,0,0,11,myofibroblast cell
N18.MAA000398.3_9_M.1.1,0,73,82,66,0,0,58,0,0,68,...,0,22,0,14,97,99,124,0,39,fibroblast
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
D7.MAA100097.3_39_F.1.1,0,51,92,0,1,0,0,0,0,32,...,0,6,0,3,0,31,0,0,0,cardiac muscle cell
F3.MAA100097.3_38_F.1.1,0,66,150,11,8,0,0,0,0,73,...,0,56,0,0,0,0,0,0,212,cardiac muscle cell
D6.MAA100097.3_39_F.1.1,0,147,116,0,0,0,0,0,0,157,...,0,113,0,123,0,0,193,0,23,cardiac muscle cell
F1.MAA100097.3_38_F.1.1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,cardiac muscle cell


This step reduced the dataset to **21051 genes** across **4337 cells**.

###Removal of house-keeping genes
HOuse-keeping genes are genes whose expression does not vary between cell types; therefore, they are not informative in the ML task that concerns this project, as we are interested in using informative genes to discern between cell types. To be able to determine these genes, the Median Gene Expression within each Cell Type (MGECT) was computed, and genes with zero variance for their MGECT across all cell types were excluded (*Le H, 2022*).

In [43]:
mgect_expression = nonzero_genes.groupby('cell_classification').median()
mgect_expression

Unnamed: 0_level_0,0610005C13Rik,0610007C21Rik,0610007L01Rik,0610007N19Rik,0610007P08Rik,0610007P14Rik,0610007P22Rik,0610008F07Rik,0610009B14Rik,0610009B22Rik,...,Zxda,Zxdb,Zxdc,Zyg11a,Zyg11b,Zyx,Zzef1,Zzz3,a,l7Rn6
cell_classification,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
cardiac muscle cell,0.0,51.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,16.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0
endocardial cell,0.0,101.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,205.0,0.0,0.0,0.0,0.0
endothelial cell,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
fibroblast,0.0,197.0,0.0,26.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,61.0,0.0,0.0,0.0,0.0
leukocyte,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,90.0,0.0,0.0,0.0,0.0
myofibroblast cell,0.0,43.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
smooth muscle cell,0.0,10.0,0.0,9.5,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,369.0,0.0,0.0,0.0,0.0


In [56]:
variance = mgect_expression.var(axis=0)
filtered_genes = variance[variance != 0].index
heart_count = nonzero_genes[filtered_genes.tolist() + ['cell_classification']]
heart_count

Unnamed: 0,0610007C21Rik,0610007L01Rik,0610007N19Rik,0610009B22Rik,0610009D07Rik,0610009O20Rik,0610010K14Rik,0610010O12Rik,0610012G03Rik,0610031J06Rik,...,Zmat5,Zmiz1,Zmynd11,Zmynd8,Znhit1,Zranb2,Zwint,Zyx,l7Rn6,cell_classification
B12.MAA000398.3_9_M.1.1,272,93,65,0,141,0,32,2,2,78,...,0,98,0,11,14,196,0,68,17,fibroblast
D16.MAA000398.3_9_M.1.1,21,95,0,34,0,57,38,0,132,772,...,0,338,25,28,22,0,0,389,41,endothelial cell
F10.MAA000398.3_9_M.1.1,54,43,0,0,37,0,0,145,0,46,...,0,0,132,12,0,11,66,45,19,myofibroblast cell
L17.MAA000398.3_9_M.1.1,21,0,27,0,9,0,0,98,11,62,...,0,4,3,0,10,0,84,50,11,myofibroblast cell
N18.MAA000398.3_9_M.1.1,73,82,66,68,0,0,0,50,0,79,...,0,5,3,0,1,28,0,97,39,fibroblast
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
D7.MAA100097.3_39_F.1.1,51,92,0,32,125,80,0,0,290,64,...,0,0,0,66,18,14,99,0,0,cardiac muscle cell
F3.MAA100097.3_38_F.1.1,66,150,11,73,66,102,38,8,145,147,...,52,0,1,5,34,106,85,0,212,cardiac muscle cell
D6.MAA100097.3_39_F.1.1,147,116,0,157,106,16,24,31,154,136,...,70,178,26,0,4,26,93,0,23,cardiac muscle cell
F1.MAA100097.3_38_F.1.1,0,0,0,0,0,0,0,0,0,0,...,37,0,66,0,17,0,0,0,0,cardiac muscle cell


In [107]:
heart_count.to_csv("heart_data.csv")

# Git commit and push

In [115]:
%cd /content/
!rm -rf single_cell-ML_project/

/content


In [116]:
from google.colab import userdata

# Set your token and repository details
token = userdata.get("GitHub_LTM")  # Replace with your actual token
repository = "giacomoorsini/single_cell-ML_project"  # Replace with your actual repository

# Install Git and configure credentials
!git init
!git config --global user.email "torresmasdeu@gmail.com"
!git config --global user.name "torresmasdeu"

# Clone the GitHub repository using the token
!git clone https://{token}@github.com/{repository}.git
!wait 5

# Change directory to the cloned repository
%cd /content/single_cell-ML_project/

!pwd

# Create a test file (or move any files you want to push)
!echo "This is a test file" > test_file.txt

# Add, commit, and push the changes to GitHub
!git add test_file.txt
!git commit -m "Add test file"
!git push origin main  # Replace 'main' with the appropriate branch name


Reinitialized existing Git repository in /content/.git/
Cloning into 'single_cell-ML_project'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 12 (delta 2), reused 6 (delta 1), pack-reused 0[K
Receiving objects: 100% (12/12), 14.12 MiB | 15.50 MiB/s, done.
Resolving deltas: 100% (2/2), done.
/bin/bash: line 1: wait: pid 5 is not a child of this shell
/content/single_cell-ML_project
/content/single_cell-ML_project
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
^C


In [119]:
!git add .
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mdeleted:    test_file.txt[m



In [120]:
!git commit -m "Remove test files"
!git push origin main  # Replace 'main' with the appropriate branch name

[main b612c9e] Remove test files
 1 file changed, 1 deletion(-)
 delete mode 100644 test_file.txt
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 2 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 221 bytes | 221.00 KiB/s, done.
Total 2 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/giacomoorsini/single_cell-ML_project.git
   c48a363..b612c9e  main -> main


In [98]:
!pwd
%cd /content/single_cell-ML_project/
!pwd

/content/single_cell-ML_project
/content/single_cell-ML_project
/content/single_cell-ML_project


In [57]:
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/


In [58]:
!git branch -m main

In [76]:
!git clone https://github.com/giacomoorsini/single_cell-ML_project

Cloning into 'single_cell-ML_project'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 6 (delta 0), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (6/6), 6.50 KiB | 6.50 MiB/s, done.


In [77]:
!git status

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31m10842785[m
	[31m13088129[m
	[31m__MACOSX/[m
	[31mheart_data/heart_counts_raw.csv[m
	[31msample_data/[m
	[31msingle_cell-ML_project/[m

nothing added to commit but untracked files present (use "git add" to track)


In [68]:
!git add heart_data/heart_data.csv

In [70]:
!git config --global user.email "giacomoorsini2001@gmail.com"
!git config --global user.name "giacomoorsini"

In [66]:
! git restore --staged heart_data/heart_counts_raw.csv

In [71]:
!git commit -m "Heart data processed"

[main f569a54] Heart data processed
 1 file changed, 4338 insertions(+)
 create mode 100644 heart_data/heart_data.csv


In [75]:
!git push origin main
!git remote add origin https://{userdata.get('GitHub_GO')}@github.com/giacomoorsini/single_cell-ML_project

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `git remote add origin https://{userdata.get('GitHub_GO')}@github.com/giacomoorsini/single_cell-ML_project'


In [74]:
!git push --set-upstream origin main


fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


In [None]:
!tar -zcvf heart_data.tar.gz heart_data/

heart_data/
heart_data/Heart-counts.csv


In [None]:
!ls -l

total 353052
-rw-r--r-- 1 root root 303966623 Apr 11  2021 10700143
drwxrwxrwx 2 root root      4096 May 12 16:43 FACS
drwxr-xr-x 2 root root      4096 May 12 16:43 heart_data
-rw-r--r-- 1 root root  28742225 May 12 16:44 heart_data.tar.gz
-rw-r--r-- 1 root root  28742475 May 12 16:33 heart_data.zip
drwxrwxr-x 3 root root      4096 May 12 15:45 __MACOSX
-rw-r--r-- 1 root root       150 May 12 15:56 README.md
drwxr-xr-x 1 root root      4096 May  9 13:24 sample_data
-rw-r--r-- 1 root root     42939 May 12 15:56 single_cell-ML_project.ipynb


##References

*   Le H, Peng B, Uy J, Carrillo D, Zhang Y, Aevermann BD, et al. (2022) Machine learning for cell type classification from single nucleus RNA sequencing data. PLoS ONE 17(9): e0275070. https://doi.org/10.1371/journal.pone.0275070
*   List item



In [121]:
!ls

heart_data.csv	README.md  single_cell-ML_project.ipynb


In [122]:
!ls ../

10842785  heart_data  README.md    single_cell-ML_project
13088129  __MACOSX    sample_data  test_file.txt
