**Build a Fully Connected Neural Network (FCNN) for Gene Expression Classification [2pt]:**

Begin by loading the dynamic gene expression data from gene_expression_data.csv, which comprises
1000 samples, each featuring 10 genes across 20 time steps with normalized expression levels.
Your task is to develop an FCNN that can classify gene expression dynamics into two distinct
classes as indicated by the 'outcome' column in the dataset—'1' for upregulation and '0' for
downregulation. After loading the dataset, divide it into training and testing sets, maintaining a
test size of 20%. Design your FCNN to include two dense layers and embark on the training
process for a duration of 30 epochs.

In [1]:
import pandas as pd
import numpy as np

In [3]:
data = pd.read_csv("gene_expression_data.csv")
data

Unnamed: 0,Gene_0_Time_0,Gene_1_Time_0,Gene_2_Time_0,Gene_3_Time_0,Gene_4_Time_0,Gene_5_Time_0,Gene_6_Time_0,Gene_7_Time_0,Gene_8_Time_0,Gene_9_Time_0,...,Gene_1_Time_19,Gene_2_Time_19,Gene_3_Time_19,Gene_4_Time_19,Gene_5_Time_19,Gene_6_Time_19,Gene_7_Time_19,Gene_8_Time_19,Gene_9_Time_19,Outcome
0,0.097627,0.430379,0.205527,0.089766,-0.152690,0.291788,-0.124826,0.783546,0.927326,-0.233117,...,-0.580313,-0.627614,0.888745,0.479102,-0.019082,-0.545171,-0.491287,-0.883942,-0.131167,0.0
1,-0.376408,0.392687,-0.244496,-0.640793,-0.950643,-0.865501,0.358786,-0.092606,0.073158,0.793343,...,-0.151935,-0.482632,0.698077,-0.933391,0.917965,-0.289262,-0.286586,-0.967343,-0.629535,0.0
2,-0.197481,0.858583,-0.800770,0.890603,0.738977,-0.091675,-0.346598,-0.534512,0.228929,-0.933851,...,0.796125,0.345165,0.057880,-0.391107,0.995925,-0.275622,-0.058702,-0.243510,0.959054,0.0
3,-0.650683,-0.344024,0.360697,-0.873585,0.214499,-0.044707,-0.432000,-0.523173,0.029025,-0.264145,...,-0.506886,0.192866,-0.764949,0.951768,0.865122,-0.216406,-0.515643,-0.499204,-0.033213,1.0
4,-0.920014,0.279410,-0.183394,-0.245187,0.618730,0.418071,0.908668,-0.296128,0.795086,0.539934,...,-0.354341,-0.540865,0.013726,0.473706,-0.804647,0.029844,0.876824,-0.542707,0.354282,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0.345687,0.891820,-0.722923,-0.083658,0.334432,0.069732,-0.159362,0.978778,-0.587955,-0.091782,...,-0.810849,-0.843594,-0.430956,0.408179,-0.926150,-0.936081,-0.298893,-0.319864,0.970448,1.0
996,0.222877,0.043567,0.204768,0.122620,0.030977,-0.227791,0.522652,0.003211,0.758899,0.302562,...,-0.411692,0.020867,-0.260117,-0.036038,0.602312,-0.214049,0.376580,0.952415,-0.939095,1.0
997,-0.596026,-0.067197,-0.993058,0.257908,0.174466,0.315893,0.217009,-0.902111,-0.665098,-0.455738,...,-0.562983,-0.032180,-0.340774,-0.929111,0.281156,-0.350243,-0.153192,-0.814431,-0.925460,0.0
998,-0.204735,0.951563,-0.774936,0.985786,-0.533224,-0.715886,-0.651394,0.515273,0.107087,-0.509535,...,0.961992,0.361953,0.107900,-0.827085,-0.223911,-0.163993,-0.286123,0.819078,-0.512931,1.0


In [4]:
# Splitting the data into features (X) and target variable (y)
y = data.iloc[:, 0].values
y

array([ 0.09762701, -0.37640824, -0.197481  , -0.65068323, -0.9200144 ,
        0.18576054, -0.38294408, -0.37281878,  0.838342  ,  0.09390012,
        0.62303694, -0.79585654,  0.77336644, -0.70810422, -0.13134019,
       -0.17207501,  0.73161051, -0.00804995, -0.48506704, -0.54690894,
       -0.4147159 ,  0.16507114, -0.48521445, -0.30322723, -0.18016392,
       -0.26643956,  0.25785114,  0.48547693, -0.41494917,  0.87481515,
       -0.25041294, -0.12168073, -0.57460769,  0.30678856, -0.86363945,
        0.20312001, -0.93381933,  0.12636248,  0.21610996,  0.27047879,
       -0.0844806 , -0.48086885,  0.96171774, -0.75302646,  0.28317733,
        0.27339081, -0.06067498, -0.40260431,  0.23386994, -0.46060416,
        0.49653596,  0.74505507,  0.53923901,  0.45095682, -0.45885956,
       -0.61407151,  0.03520687,  0.85076152,  0.37607679,  0.54397387,
        0.12938081,  0.22041783, -0.11043561,  0.09824021,  0.16862318,
       -0.7079852 ,  0.55539195, -0.47793617, -0.34558971,  0.45

**Model Performance Evaluation and Analysis [2pt]:**

Evaluate your FCNN's performance by
examining the training and testing accuracy and loss. Use history and plots functions to capture
and visualize these metrics and provide an analysis of your findings. Discuss any observed issues
with the training or testing results and suggest applicable techniques from our coursework to
address them. Justify your choice of techniques.

**LSTM Model for Temporal Gene Expression Classification [2pt]:**

Shift focus to predicting gene
expression dynamics by developing an LSTM model that captures temporal dependencies. Load
and preprocess the dataset to match the LSTM's expected input shape (num_samples,
num_time_steps, num_genes), basing your reshaping on the provided column names, and keep
a test size of 20%. Design the LSTM for binary classification, embarking on a training process for
a duration of 30 epochs with the goal of surpassing a test accuracy of 0.8. You can start with
LSTM’s units=50

**LSTM Model Evaluation and FCNN Comparison [2pt]:**

Post-training, evaluate your LSTM model as you do for the Q2. Provide an analysis of the LSTM's results, comparing its performance
against the FCNN. Conclude by discussing the implications of your findings and draw conclusions
on the effectiveness of LSTM models in capturing temporal gene expression dynamics versus
FCNN