# Extracting count matrix from STARsolo

When we use STARsolo to generate the aligned bam file, it will also be able to generate the count matrix. During this project, we are working with scRNA-seq data and we are dividing the file structure based on the sample. Example: For the dataset GSE100344 we have BJ (Day 0, D2, D8, D16+ and D16-. In each of these conditions we have 96 samples. Therefore, when we are analyzing the samples with STARsolo, as output we will have 96 folder per each condition (1 per sample). The structure will be the following:

`/output/BJ_fibroblasts_{BJ|D2|D8|D16-|D16_plus}_{sampleNumber, from 1..96}_Solo.out/Gene/filtered`

in this folder we will always find the file `matrix.mtx` which can be loaded with scanpy

# Bash script to compress in 1 zip file all matrix.mtx

In [None]:
#!/bin/bash

# Set the base directory
BASE_DIR="/mnt/faster/GSE100344/fastq/D8/output" # CHANGE THIS

# Set the output zip file name
OUTPUT_ZIP="BJ_fibroblasts_matrices.zip"

# Create a temporary directory
TEMP_DIR=$(mktemp -d)

# Loop through all 96 samples
for i in {1..96} # THIS WORKS ONLY FOR GSE100344 BECAUSE IT HAS THIS AMOUNT OF SAMPLES
do
    # Define the source directory
    SRC_DIR="${BASE_DIR}/BJ_fibroblasts_8_${i}_Solo.out/Gene/filtered"
    
    # Define the destination directory
    DEST_DIR="${TEMP_DIR}/BJ_fibroblasts_8_${i}"
    
    # Check if the source directory exists
    if [ -d "$SRC_DIR" ]; then
        # Create the destination directory
        mkdir -p "$DEST_DIR"
        
        # Copy the matrix.mtx file if it exists
        if [ -f "${SRC_DIR}/matrix.mtx" ]; then
            cp "${SRC_DIR}/matrix.mtx" "$DEST_DIR"
        else
            echo "Warning: matrix.mtx not found in ${SRC_DIR}"
        fi
    else
        echo "Warning: Directory not found: ${SRC_DIR}"
    fi
done

# Create the zip file
zip -r "$OUTPUT_ZIP" "$TEMP_DIR"/*

# Clean up the temporary directory
rm -rf "$TEMP_DIR"

echo "Zip file created: $OUTPUT_ZIP"

# How to upload to s3 bucket

In [None]:
aws s3 cp BJ_fibroblasts_matrices.zip s3://scgpt-dataset/