## **CS-340 Assignment 3**
### **Eric Wallace**

In [1]:
import keras
from keras import backend as K
from keras.datasets import cifar10
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop, Adadelta
from keras.preprocessing.image import ImageDataGenerator
import keras
import matplotlib
from matplotlib import pyplot as plt
import numpy as np

# CIFAR_10 is a set of 60K images 32x32 pixels on 3 channels
IMG_CHANNELS = 3
IMG_ROWS = 32
IMG_COLS = 32
NUM_TO_AUGMENT = 5

#constant
BATCH_SIZE = 128
NB_EPOCH = 20
NB_CLASSES = 10
VERBOSE = 1
VALIDATION_SPLIT = 0.2
OPTIM = RMSprop()

#load dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data() 

# convert to categorical
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES) 

# float and normalization
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255
X_test /= 255

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=(IMG_ROWS, IMG_COLS, IMG_CHANNELS)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy', optimizer=OPTIM,
	metrics=['accuracy'])

# network
datagen = ImageDataGenerator(
    rotation_range=40,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=True,
    fill_mode='nearest')  # randomly flip images

datagen.fit(X_train)

for x_aug in datagen.flow(X_train, Y_train, batch_size=9):
    for i in range(0, 9):
        plt.subplot(330 + 1 + i)
        plt.imshow(X_train[i], cmap=plt.get_cmap('gray'))
    plt.show()
    break
          

history = history = model.fit(X_train, Y_train, batch_size=BATCH_SIZE,
	epochs=NB_EPOCH, validation_split=VALIDATION_SPLIT, 
	verbose=VERBOSE)
print('Testing...')
score = model.evaluate(X_test, Y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])

#save model
model_json = model.to_json()
open('cifar10_architecture.json', 'w').write(model_json)
model.save_weights('cifar10_weights.h5', overwrite=True)

# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Using TensorFlow backend.


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)       

<Figure size 640x480 with 9 Axes>

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Testing...

Test score: 0.8544288459777832
Test accuracy: 0.7839999794960022
dict_keys(['val_loss', 'val_accuracy', 'loss', 'accuracy'])


KeyError: 'acc'

The ethical and privacy concerns exposed by this algorithm are very possible.

In reference to the question of could this algorithm be used for photos of people's faces and the answer is yes.  As outlined in my discussion post for this week, the situation produced by this algorithm may be very similar to the bias created by the Google Photos application in 2015 One of the most notable ethical and privacy implications in AI is data input and algorithms, they are two of the most common entry points for bias in AI.  Data used to train from real-world scenarios and created by humans will inexplicably contain bias (Niral Sutaria, 2022).  Regarding the algorithm I created, possible biases that could produce ethical or privacy concerns could be the photos used to train the AI did not include enough diversity which could result in the incorrect categorization of photos.  For instance, if test data was not diverse enough, the AI could incorrectly identify men with long hair as women and women with short hair as men.

Another issue that could present ethical concerns is the fact so much focus is given to the performance of the algorithms or data scientists or engineers do not have the ability to identify bias which could raise ethical concerns (Niral Sutaria, 2022).  One thing that I noticed myself while reading the content in the book for this course is how much focus is given to the performance and accuracy of the algorithms.  I understand performance and accuracy is a good metric to go from but how accurate is the algorithms if the test data contains bias or is not diverse enough to correctly categorize photos in totality.  If the test data is not broad enough to cover the entire spectrum of photos it must categorize it could introduce ethical concerns.

Addressing privacy concerns is a little bit different.  In my opinion the introduction of privacy concerns come as a result of oversight.  Not stripping away personal information and metadata related to photos must be done or at the very least randomize the data contained within the metadata.  The privacy concerns raised by AI is very large, there is an ongoing debate right now as to move forward with legislation to address privacy concerns (Kerry, 2020).  The article also raises one of the major concerns with privacy concerns in relation to photos and that is facial recognitions.  What is an Ai violates your privacy by illegally accessing photos in efforts to identify a person by facial recognition and what other concerns does it raise due to incorrectly identifying someone (Kerry, 2020).

The last thing I would like to cover with privacy concerns with photos.  What is the test data used to train an AI in which no personal information was divulged but the random data was indeed biased.  This could lead AI would infer sensitive information related to medical conditions or political views based on a person’s figure or location (Machine Learning, 2022).  For instance, what if an AI made the inferences that a person based on a photo had a heart condition or because they are from the south, they are likely to be Republican.

The ethical and privacy concerns raised by AI are real and is something that must be addressed.  Whether it be through a set of standards or regulations has yet to be seen.  A company creating an AI that deploys software and they miss something so simple like what happened with the Google Photos app in 2015, there should be repercussions.  No one can convince me that the issue was not found prior to the launch during the testing of the AI, not something that simple and easy to introduce.  On the future will tell where things go and how the issues will be addressed.


References

Kerry, C. (2020). Protecting privacy in an AI-driven world. Brookings.edu. https://www.brookings.edu/research/protecting-privacy-in-an-ai-driven-world/

Machine Learning. (2022). What are the key privacy concerns associated with machine learning? LinkedIN. https://www.linkedin.com/pulse/what-key-privacy-concerns-associated-machine-/

Niral Sutaria, C., ACA. (2022). Bias and ethical concerns in machine learning. https://www.isaca.org/resources/isaca-journal/issues/2022/volume-4/bias-and-ethical-concerns-in-machine-learning