# Deconvolutional Neural Network

This notebook describes how the DCNN was built and trained.

## Importing Stuff

In [7]:
pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.16.1-cp310-cp310-win_amd64.whl.metadata (3.5 kB)
Collecting tensorflow-intel==2.16.1 (from tensorflow)
  Downloading tensorflow_intel-2.16.1-cp310-cp310-win_amd64.whl.metadata (5.0 kB)
Downloading tensorflow-2.16.1-cp310-cp310-win_amd64.whl (2.1 kB)
Downloading tensorflow_intel-2.16.1-cp310-cp310-win_amd64.whl (376.9 MB)
   ---------------------------------------- 0.0/376.9 MB ? eta -:--:--
   ---------------------------------------- 0.2/376.9 MB 5.1 MB/s eta 0:01:14
   ---------------------------------------- 0.7/376.9 MB 8.3 MB/s eta 0:00:46
   ---------------------------------------- 1.5/376.9 MB 12.1 MB/s eta 0:00:31
   ---------------------------------------- 2.4/376.9 MB 14.0 MB/s eta 0:00:27
   ---------------------------------------- 3.4/376.9 MB 15.6 MB/s eta 0:00:24
   ---------------------------------------- 4.0/376.9 MB 16.1 MB/s eta 0:00:24
    --------------------------------------- 4.8/376.9 MB 15.3 MB/s eta 0:00:25
    --

In [8]:
import tensorflow as tf

In [10]:
from tensorflow.keras import layers, models

In [None]:
import numpy as np
import h5py
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2DTranspose, Conv2D

## File / Data Preprocessing

In [None]:
# Step 3: Data Loading
data_file = 'Human_PBMC_TotalSeqB_3p_nextgem_gemx_nobatchcorrect_count_filtered_feature_bc_matrix.h5'
single_cell_data = load_data(data_file)

In [9]:
# Open the HDF5 file
with h5py.File('Human_PBMC_TotalSeqB_3p_nextgem_gemx_nobatchcorrect_count_filtered_feature_bc_matrix.h5', 'r') as f:
    # Specify the group name you want to inspect
    group_name = 'matrix'  # Replace 'matrix' with the actual group name

    # Access the group
    group = f[group_name]

    # Print group attributes
    print("Group name:", group_name)
    # As it's a group, it won't have a shape attribute
    # Add more attributes as needed

    # Print the datasets within the group
    print("Datasets within the group:")
    for dataset_name in group:
        print(dataset_name)


Group name: matrix
Datasets within the group:
barcodes
data
features
indices
indptr
shape


Inspecting the datasets.

In [14]:
# Open the HDF5 file
with h5py.File('Human_PBMC_TotalSeqB_3p_nextgem_gemx_nobatchcorrect_count_filtered_feature_bc_matrix.h5', 'r') as f:
    # Specify the group name
    group_name = 'matrix'

    # Access the group
    group = f[group_name]

    # Print group attributes
    print("Group name:", group_name)
    # As it's a group, it won't have a shape attribute
    # Add more attributes as needed

    # Print the datasets within the group
    print("Datasets within the group:")
    for dataset_name in group:
        print(dataset_name)

    # Access and inspect the datasets
    for dataset_name in group:
        dataset = group[dataset_name]
        if isinstance(dataset, h5py.Dataset):
            print("\nDataset:", dataset_name)
            print("Shape:", dataset.shape)
            print("dtype:", dataset.dtype)
            print("Preview of dataset values:")
            print(dataset[:5])  # Print the first five rows of the dataset

Group name: matrix
Datasets within the group:
barcodes
data
features
indices
indptr
shape

Dataset: barcodes
Shape: (15554,)
dtype: |S18
Preview of dataset values:
[b'AAACCCAAGCTTCGTA-1' b'AAACCCAAGGCACTCC-1' b'AAACCCACACGGCACT-1'
 b'AAACCCATCAACTGGT-1' b'AAACCCATCTACAGGT-1']

Dataset: data
Shape: (58842498,)
dtype: int32
Preview of dataset values:
[5 1 1 1 3]

Dataset: indices
Shape: (58842498,)
dtype: int64
Preview of dataset values:
[36 51 60 64 67]

Dataset: indptr
Shape: (15555,)
dtype: int64
Preview of dataset values:
[    0  3672  6671  9762 12903]

Dataset: shape
Shape: (2,)
dtype: int32
Preview of dataset values:
[38616 15554]


In [25]:
# Step 1: Data Preprocessing
def load_data(file_path):
    with h5py.File(file_path, 'r') as f:
        # Inspect keys
        print(list(f.keys()))
        # Load the correct dataset
        data = np.array(f['matrix'])
    # Perform any necessary preprocessing such as normalization or scaling
    return data


Another way of loading in the data and creating test / train variables.

In [None]:
# Load MNIST data
f = h5py.File('./train.hdf5', 'r')
input_train = f['image'][...]
label_train = f['label'][...]
f.close()
f = h5py.File('./test.hdf5', 'r')
input_test = f['image'][...]
label_test = f['label'][...]
f.close()

# Reshape data
input_train = input_train.reshape((len(input_train), img_width, img_height, img_num_channels))
input_test  = input_test.reshape((len(input_test), img_width, img_height, img_num_channels))

## Building the Model's Architecture

Model Configuration (put towards the beginning later on)

In [None]:
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1

This model uses a max pooling layer. It is a down-sampling operation that reducte the spatial dimensions of the feature maps. Works by partitioning the input feature map into non overlapping rectangles and for each subregion outputs the max value. Used in the encoder portion to reduce spatial dimension but increase depth.

The Upsampling Layer, increases the spatial dimensions of the feature maps. It is used to recover the spatial information lost during the downsampling, and produce hihger resolution feature maps (some upsampling techniques include NN-interpolation, bilinear interpolation, transposed convolution). SIMPLE AND COMPUTATIONALLY EFFICIENT.

In [30]:

# Step 2: Model Architecture
def build_autoencoder(input_shape):
    model = models.Sequential([
        layers.Input(shape=input_shape),
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2), padding='same'),
        layers.Conv2D(16, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2), padding='same'),
        layers.Conv2D(8, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2), padding='same'),

        
        layers.Conv2D(8, (3, 3), activation='relu', padding='same'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2D(16, (3, 3), activation='relu', padding='same'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2D(32, (3, 3), activation='relu'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')
    ])
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model



Hidden layers are the convolution and pooling.
In a convolution we take a filter of a small dimension and move it across an image.
The filter's values are tuned iteratively during training.
Pooling layers help reduce the amount of parameters, reduce computation.
Max pooling selects the maximum value within that pool.


The transpose convolution (deconvolution), does the inverse of the convolution operation. It is used to increase the spatial dimensions of feature maps and can be thought of as learning to fill in the missing spatial information. ABLE TO LEARN A SET OF TRAINABLE PARAMETERS.

In [None]:
# Define the deconvolutional neural network architecture
def create_deconv_nn(input_shape):
    model = models.Sequential([
        # Encoder
        layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=input_shape),
        layers.MaxPooling2D((2, 2), padding='same'),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2), padding='same'),
        
        # Decoder
        layers.Conv2DTranspose(64, (3, 3), activation='relu', padding='same'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2DTranspose(32, (3, 3), activation='relu', padding='same'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2DTranspose(1, (3, 3), activation='sigmoid', padding='same')
    ])
    return model


# Create the deconvolutional neural network model
input_shape = (28, 28, 1)
model = create_deconv_nn(input_shape)

## Compiling the Model

In [None]:
# Step 3: Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy')

## Training the Model

In [None]:
# Train the model
model.fit(train_images, train_images, epochs=10, batch_size=128, validation_split=0.1)


In [None]:
# Step 4: Model Building and Training
input_shape = single_cell_data.shape[1:]
autoencoder = build_autoencoder(input_shape)
autoencoder.fit(single_cell_data, single_cell_data, epochs=10, batch_size=32, shuffle=True)

# Step 5: Model Evaluation (Optional)
# Evaluate the trained model using appropriate metrics

# Step 6: Deployment (Optional)
# Save the trained model for future use
#autoencoder.save('single_cell_autoencoder_model.h5')

In [None]:
# Train the model
model.fit(train_images, train_images, epochs=10, batch_size=128, validation_split=0.1)


## Evaluating the Model

In [18]:
# Evaluate the model
test_loss = model.evaluate(test_images, test_images)
print('Test loss:', test_loss)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - loss: 0.0640
Test loss: 0.06468354910612106
