## Step 3: make your classifier

### Background

Transcription factors are proteins that bind DNA at promoters to drive gene expression. Most preferentially bind to specific sequences while ignoring others. Traditional methods to determine these sequences (called motifs) have assumed that binding sites in the genome are all independent. However, in some cases people have identified motifs where positional interdependencies exist.

### Your task

You will implement a multi-layer fully connected neural network using your NeuralNetwork class to predict whether a short DNA sequence is a binding site for the yeast transcription factor Rap1. The training data is incredibly imbalanced, with way fewer positive sequences than negative sequences, so you will implement a sampling scheme to ensure that class imbalance does not affect training. As in step 2, all of the following work should be done in a Jupyter Notebook.

### To-do

 - Use the read_text_file function from io.py to read in the 137 positive Rap1 motif examples.
 - Use the read_fasta_file function from io.py to read in all the negative examples. Note that these sequences are much longer than the positive sequences, so you will need to process them to the same length.
 - Balance your classes using your sample_seq function and explain why you chose the sampling scheme you did.
 - One-hot encode the data using your one_hot_encode_seqs function.
 - Split the data into training and validation sets.
 - Generate an instance of your NeuralNetwork class with an appropriate architecture.
 - Train your neural network on the training data.
 - Plot your training and validation loss by epoch.
 - Report the accuracy of your classifier on your validation dataset.
 - Explain your choice of loss function and hyperparameters.

#### Imports and helper functions

In [None]:
# %load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
import sklearn as sk
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from typing import List, Dict, Tuple, Union
from numpy.typing import ArrayLike
from nn import io
import re
from collections import Counter
from nn.nn import NeuralNetwork
import matplotlib.pyplot as plt
import seaborn as sns
import time

In [None]:
# timing function
def display_run_time(s,e,task):
    rt=(e-s)
    if rt>=60:
        rt=rt/60
        print(f"{task}: {rt} m")
    else:
        print(f"{task}: {rt} s")