# **CSCE 5218 / CSCE 4930 Deep Learning**

# **The Perceptron** (20 pt)


In [2]:
#get the datasets
!curl -o test.dat https://raw.githubusercontent.com/huangyanann/CSCE5218/main/test_small.txt
!curl -o train.dat https://raw.githubusercontent.com/huangyanann/CSCE5218/main/train.txt


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   136  100   136    0     0   1455      0 --:--:-- --:--:-- --:--:--  1462
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11645  100 11645    0     0   147k      0 --:--:-- --:--:-- --:--:--  147k


In [3]:
import pandas as pd

# Load and display the first few rows
train_df = pd.read_csv("train.dat", sep="\t")
test_df = pd.read_csv("test.dat", sep="\t")

print("Train Dataset:")
print(train_df.head())

print("\nTest Dataset:")
print(test_df.head())


Train Dataset:
   A1  A2  A3  A4  A5  A6  A7  A8  A9  A10  A11  A12  A13  Unnamed: 13
0   1   1   0   0   0   0   0   0   1    1    0    0    1            0
1   0   0   1   1   0   1   1   0   0    0    0    0    1            0
2   0   1   0   1   1   0   1   0   1    1    1    0    1            1
3   0   0   1   0   0   1   0   1   0    1    1    1    1            0
4   0   1   0   0   0   0   0   1   1    1    1    1    1            0

Test Dataset:
   X1  X2  X3
1   1   1   1
0   0   1   1
0   1   1   0
0   1   1   0
0   1   1   0


### Build the Perceptron Model

You will need to complete some of the function definitions below.  DO NOT import any other libraries to complete this. 

In [5]:
import math
import re


# Corpus reader, all columns but the last one are coordinates;
#   the last column is the label
def read_data(file_name):
    with open(file_name, 'r') as f:
        data = []
        # Discard header line
        f.readline()
        for instance in f.readlines():
            if not re.search('\t', instance): continue  # Skip lines that don't have tab-separated values
            instance = list(map(int, instance.strip().split('\t')))
            # Add a dummy input so that w0 becomes the bias
            instance = [-1] + instance  # -1 for the bias term
            data += [instance]
    return data


def dot_product(weights, instance):
    """Compute the dot product of weights and instance (ignores the last column which is the label)."""
    return sum(w * x for w, x in zip(weights, instance))


def sigmoid(x):
    """Sigmoid activation function."""
    return 1 / (1 + math.exp(-x))


def output(weights, instance):
    """The output of the model, applying sigmoid to the dot product of the weights and the instance."""
    in_value = dot_product(weights, instance)
    return sigmoid(in_value)


def predict(weights, instance):
    """Predict the label of an instance. If the output is >= 0.5, return 1; otherwise, return 0."""
    return 1 if output(weights, instance) >= 0.5 else 0


def get_accuracy(weights, instances):
    """Calculate the accuracy of the model by comparing predictions with the true labels."""
    correct = sum([1 if predict(weights, instance) == instance[-1] else 0
                   for instance in instances])
    return correct * 100 / len(instances)


def train_perceptron(instances, lr, epochs):
    """Train a perceptron using the given instances, learning rate (lr), and number of epochs."""
    weights = [0] * (len(instances[0]) - 1)  # Initialize weights (one less than the number of columns)

    for _ in range(epochs):
        for instance in instances:
            in_value = dot_product(weights, instance)  # Compute the dot product
            out_value = sigmoid(in_value)  # Apply sigmoid to get output
            error = instance[-1] - out_value  # Compute error (label - predicted value)

            # Update weights using the gradient descent rule
            for i in range(len(weights)):
                weights[i] += lr * error * out_value * (1 - out_value) * instance[i]  # Weight update rule

    return weights


## Run it

In [7]:
# Load the data
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")

# Hyperparameters
lr = 0.005
epochs = 5

# Train the perceptron
weights = train_perceptron(instances_tr, lr, epochs)

# Calculate accuracy
accuracy = get_accuracy(weights, instances_te)

# Output the results
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}%")


#tr: 400, epochs:   5, learning rate: 0.005; Accuracy (test, 14 instances): 71.4%


## Questions

Answer the following questions. Include your implementation and the output for each question.



### Question 1

In `train_perceptron(instances, lr, epochs)`, we have the following code:
```
in_value = dot_product(weights, instance)
output = sigmoid(in_value)
error = instance[-1] - output
```

Why don't we have the following code snippet instead?
```
output = predict(weights, instance)
error = instance[-1] - output
```

#### TODO Add your answer here (text only)




### Answer 1 
In the above code that we have used for train _perceptron(instances, lr, epochs) is to update the weights based on gradient descent using the sigmoid function.
1. the sigmoid function produces a continuous value between 0 and 1 , which allows for a smother weight update
2. in the snippet predict(weights, instance either gives 0 or 1
the code in the snippet would break the gradient descent so we are not using that code

### Question 2
Train the perceptron with the following hyperparameters and calculate the accuracy with the test dataset.

```
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]              # number of epochs
lr = [0.005, 0.01, 0.05]              # learning rate
```

TODO: Write your code below and include the output at the end of each training loop (NOT AFTER EACH EPOCH)
of your code.The output should look like the following:
```
# tr:  20, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
[and so on for all the combinations]
```
You will get different results with different hyperparameters.

#### TODO Add your answer here (code and output in the format above) 


In [12]:

### code
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
    print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

### output
#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 200, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 300, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 400, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 42.9
#tr: 40, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 100, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 28.6
#tr: 200, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 300, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 400, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 42.9
#tr: 40, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 42.9
#tr: 100, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 28.6
#tr: 200, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 85.7
#tr: 300, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 85.7
#tr: 400, epochs: 100, learning rate: 0.050; Accuracy (test, 14 instances): 71.4

#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 200, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 300, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 400, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 42.9
#tr: 40, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 100, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 28.6
#tr: 200, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 300, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 85.7
#tr: 400, epochs: 100, learning rate: 0.010; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning r

### Question 3
Write a couple paragraphs interpreting the results with all the combinations of hyperparameters. Drawing a plot will probably help you make a point. In particular, answer the following:
- A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
- B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?
   ```
#tr: 100, epochs:  20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
```
- C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
- D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)



### answers
# A 
no, training with full dataset is not always necessary.In some cases , a subset of the dataset can be sufficient for effective learning a better combination of training percentage and hyperparameter yeild gives even better accuracy than the complete dataset.
# B
With more data, the model takes longer to converge because it needs more updates to minimize the loss. A lower learning rate means each update is smaller, potentially preventing the model from reaching a good minimum within 20 epochs.
# C
Yes, It is possible to achive higher accuracy by tuning additional hyperparameters such as batch normalization or learning rates implementing advanced techniques like regularization or switching activation function can also help yeild higher accuracy.
# D
No, training for more epochs is not always beneficial and can sometimes hurt model performance due to overfitting, increased training time, and diminishing returns.