<a href="https://colab.research.google.com/github/joshuahurd515/ai-and-data-science-work/blob/main/P4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
import numpy as np
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score, KFold

# Load the dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

# Preprocess the data by padding the sequences to a fixed length
max_length = 500
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_length)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_length)

# Define a function to create the model
def create_model():
    model = keras.Sequential([
        layers.Embedding(input_dim=10000, output_dim=16, input_length=max_length),
        layers.GlobalAveragePooling1D(),
        layers.Dense(units=16, activation='relu'),
        layers.Dense(units=1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Create a KerasClassifier with the create_model function
model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)

# Concatenate the training and test sets for cross-validation
x = np.concatenate((x_train, x_test), axis=0)
y = np.concatenate((y_train, y_test), axis=0)

# Perform 5-fold cross-validation and print the results
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
results = cross_val_score(model, x, y, cv=kfold)
print("Cross-validation accuracy: %.4f%% (%.4f%%)" % (results.mean()*100, results.std()*100))

  model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)


Cross-validation accuracy: 88.8900% (0.3925%)


**Evaluation**

The provided code defines a binary classification model using the IMDB movie review dataset. The model is composed of an embedding layer that maps the input word indices to a 16-dimensional embedding space, followed by a GlobalAveragePooling1D layer that averages over the dimensions, and two fully connected layers with ReLU and sigmoid activations. The model is compiled with the Adam optimizer and binary cross-entropy loss, and trained for 10 epochs on batches of size 32. Finally, the model is evaluated on the test set and its loss and accuracy is printed with cross validation. Overall, the model performed very well for the most part with 5 fold cross vaidation. However, it could perform much better by changing the activation functions, adding more layers, or possibly adding a different amount of hidden units.

It is also important to note that when downloading the dataset from stanford, my computer kept bugging out, I know that we were not supposed to use the keras dataset, but I had no idea what else to do. If there is any way that I could resubmit it and meet with someone that would be great, if not, I understand. But with this in mind, even with the keras dataset, the model still performed fairly well for the most part. As I stated before, there are some ways that this model could be improved, such as fine tuning it and adjusting some of the hyper parameters, but all in all, it performed well