# MinatKu Model

This Jupyter Notebook trains a neural network model (MinatKu Model) using TensorFlow and Keras to predict academic interests based on the responses to the RIASEC survey. The model is trained on a dataset loaded from 'RIASEC12Q.csv'.

**Steps:**
1. **Load Data:**
   - Loads the dataset from 'RIASEC12Q.csv'.
   - Handles exceptions, attempting to download the dataset if not found.

2. **Data Preprocessing:**
   - Drops the 'Unnamed: 0' column.
   - Converts categorical labels to numerical indices.

3. **Data Splitting and Normalization:**
   - Separates features and labels without randomization.
   - Normalizes the features using sklearn's `normalize` function.
   - Splits the data into training and testing sets.

4. **Label Indexing:**
   - Converts label classes to numerical indices using the `to_categorical` function.

5. **Model Architecture:**
   - Builds a sequential neural network model using Keras with two dense layers, a dropout layer, and a softmax output layer.

6. **Model Training:**
   - Compiles and trains the model using the training data.
   - Implements a custom callback to stop training early based on specified conditions.

7. **Predictions:**
   - Makes predictions for three different interest categories: Science, Arts and Literature, Technology.
   - Displays top 3 predictions for each category.

8. **Save Model:**
   - Saves the trained model to '/content/model.h5'.



### Initialization
This code cell initializes the MinatKu Model notebook by importing necessary libraries and setting up the environment for subsequent processes.

In [1]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import math
from IPython.core.magic import register_line_magic
from IPython.display import Javascript

### Load Dataset

This code cell attempts to load the dataset from 'RIASEC12Q.csv'. If the file is not found, it uses the Google Drive link to download the dataset and then loads it again.

In [2]:
file_path = '/content/RIASEC12Q.csv'
try:
  df = pd.read_csv(file_path)
  print("File Loaded!")
except:
  !gdown --id 1Dd6iz13sbZ2Q7IhuwgYS_Y-FzIYoLd7g
  try:
    df = pd.read_csv(file_path)
    print("File Loaded!")
  except:
    print("Something's wrong! check your path!!!")

Downloading...
From: https://drive.google.com/uc?id=1Dd6iz13sbZ2Q7IhuwgYS_Y-FzIYoLd7g
To: /content/RIASEC12Q.csv
100% 5.03k/5.03k [00:00<00:00, 21.2MB/s]
File Loaded!


In [3]:
df.head(n = 10)

Unnamed: 0.1,Unnamed: 0,Jurusan,R1,R2,I1,I2,A1,A2,S1,S2,E1,E2,C1,C2
0,0,Science,0,0,1,1,0,0,2,2,0,0,2,2
1,1,Science,0,0,1,1,0,0,2,2,0,0,2,2
2,2,Arts and Literature,0,0,0,0,1,1,2,2,0,0,0,0
3,3,Economics,0,0,0,0,0,0,0,0,1,1,1,0
4,4,Technology,1,1,0,0,0,0,0,0,0,0,1,1
5,5,Social,0,0,0,0,0,0,1,1,0,0,0,0
6,6,Arts and Literature,0,0,0,0,1,1,2,2,0,0,0,0
7,7,Economics,0,0,0,0,0,0,0,0,1,1,1,0
8,8,Technology,1,1,0,0,0,0,0,0,0,0,1,1
9,9,Social,0,0,0,0,0,0,1,1,0,0,0,0


### Drop Unused Column


In [4]:
# Deleting unused column
try:
  df = df.drop(columns=['Unnamed: 0'])
  print("Columns Dropped!")
except:
  print("Columns Already Dropped!")

Columns Dropped!


In [5]:
df.head()

Unnamed: 0,Jurusan,R1,R2,I1,I2,A1,A2,S1,S2,E1,E2,C1,C2
0,Science,0,0,1,1,0,0,2,2,0,0,2,2
1,Science,0,0,1,1,0,0,2,2,0,0,2,2
2,Arts and Literature,0,0,0,0,1,1,2,2,0,0,0,0
3,Economics,0,0,0,0,0,0,0,0,1,1,1,0
4,Technology,1,1,0,0,0,0,0,0,0,0,1,1


### Indexing Label
This code cell defines the label ('Jurusan') and the classes (in this case, a single class '0') for the MinatKu Model.

In [6]:
label = "Jurusan"
classes = [0]

### Map Label Classes
This code cell checks if the classes are of type integer. If so, it maps unique values from the 'Jurusan' column to numerical indices.

In [7]:
if type(classes[0]) is int:
  classes = df[label].unique().tolist()
  df[label] = df[label].map(classes.index)
  print(f"Label classes: {classes}")
else:
  print("Label classes Already indexed!")
  print(f"Label classes: {classes}")

Label classes: ['Science', 'Arts and Literature', 'Economics', 'Technology', 'Social']


In [8]:
df.head()

Unnamed: 0,Jurusan,R1,R2,I1,I2,A1,A2,S1,S2,E1,E2,C1,C2
0,0,0,0,1,1,0,0,2,2,0,0,2,2
1,0,0,0,1,1,0,0,2,2,0,0,2,2
2,1,0,0,0,0,1,1,2,2,0,0,0,0
3,2,0,0,0,0,0,0,0,0,1,1,1,0
4,3,1,1,0,0,0,0,0,0,0,0,1,1


### Separate Labels and Features
This code cell separates the dataset into features (X) and labels (y) without randomization.

In [9]:
# separating labels and features without randomize
X = df.drop("Jurusan", axis=1).values
y = df.Jurusan.values

### Normalize Features
This code cell performs feature normalization using the sklearn `normalize` function. It normalizes the features in the 'X' dataset along axis 0 with the L2 norm.

In [11]:
# normalization
from sklearn.preprocessing import normalize
X_normalized = normalize(X, axis = 0, norm='l2')

### Split Data into Training and Testing Sets
This code cell splits the normalized features and labels into training and testing sets. It prints the lengths of the training and testing sets.

In [12]:
# splitting data
total_length=len(df)
train_length=int(0.7*total_length)
test_length=int(0.3*total_length)

X_train=X_normalized[:train_length]
X_test=X_normalized[train_length:]
y_train=y[:train_length]
y_test=y[train_length:]

print("Length of train set x:",X_train.shape[0],"y:",y_train.shape[0])
print("Length of test set x:",X_test.shape[0],"y:",y_test.shape[0])

Length of train set x: 91 y: 91
Length of test set x: 39 y: 39


### Convert Labels to Categorical
This code cell converts the categorical labels ('y_train' and 'y_test') to one-hot encoded vectors using Keras' `to_categorical` function.

In [13]:
from keras.utils import to_categorical
y_train = to_categorical(y_train, num_classes=len(classes))
y_test = to_categorical(y_test, num_classes=len(classes))
print("Shape of y_train",y_train.shape)
print("Shape of y_test",y_test.shape)

Shape of y_train (91, 5)
Shape of y_test (39, 5)


# MinatKu Model Architecture
This code cell defines the architecture of the MinatKu Model using Keras and TensorFlow. It comprises a sequential neural network with multiple dense layers, activation functions, dropout, and softmax output.

In [14]:
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout
from keras import regularizers
from keras.optimizers import Adam

shape = X_train.shape[1]

model=Sequential()
model.add(Dense(128,input_dim=X_train.shape[1],activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.6))
# model.add(Dense(32,activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(len(classes),activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

### Custom Callback for Early Stopping
This code cell defines a custom callback named `cb` using TensorFlow's Keras API. The callback monitors certain metrics during training epochs and stops training if specific conditions are met.

In [15]:
class cb(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy') > 0.90 and logs.get('val_loss') < 0.4 and logs.get('loss') < 0.4 and logs.get('val_accuracy') > 0.80):
      self.model.stop_training = True

callbacks = cb()

### Train MinatKu Model
This code cell trains the MinatKu Model using the training data (`X_train`, `y_train`) and validates it on the testing data (`X_test`, `y_test`). It specifies batch size, number of epochs, verbosity, and includes the custom callback for early stopping.

In [16]:
model.fit(X_train,y_train,validation_data=(X_test,y_test),batch_size=20,epochs=350,verbose=1, callbacks=[callbacks])

Epoch 1/350
Epoch 2/350
Epoch 3/350
Epoch 4/350
Epoch 5/350
Epoch 6/350
Epoch 7/350
Epoch 8/350
Epoch 9/350
Epoch 10/350
Epoch 11/350
Epoch 12/350
Epoch 13/350
Epoch 14/350
Epoch 15/350
Epoch 16/350
Epoch 17/350
Epoch 18/350
Epoch 19/350
Epoch 20/350
Epoch 21/350
Epoch 22/350
Epoch 23/350
Epoch 24/350
Epoch 25/350
Epoch 26/350
Epoch 27/350
Epoch 28/350
Epoch 29/350
Epoch 30/350
Epoch 31/350
Epoch 32/350
Epoch 33/350
Epoch 34/350
Epoch 35/350
Epoch 36/350
Epoch 37/350
Epoch 38/350
Epoch 39/350
Epoch 40/350
Epoch 41/350
Epoch 42/350
Epoch 43/350
Epoch 44/350
Epoch 45/350
Epoch 46/350
Epoch 47/350
Epoch 48/350
Epoch 49/350
Epoch 50/350
Epoch 51/350
Epoch 52/350
Epoch 53/350
Epoch 54/350
Epoch 55/350
Epoch 56/350
Epoch 57/350
Epoch 58/350
Epoch 59/350
Epoch 60/350
Epoch 61/350
Epoch 62/350
Epoch 63/350
Epoch 64/350
Epoch 65/350
Epoch 66/350
Epoch 67/350
Epoch 68/350
Epoch 69/350
Epoch 70/350
Epoch 71/350
Epoch 72/350
Epoch 73/350
Epoch 74/350
Epoch 75/350
Epoch 76/350
Epoch 77/350
Epoch 78

<keras.src.callbacks.History at 0x7cda180d7c10>

# Predictions for Specific Interest Categories
This code cell provides predictions for specific interest categories ('Science', 'Arts and Literature', 'Technology') using predefined lists.

In [17]:
science = [0, 0, 1, 1, 0,	0, 2, 2, 0, 0, 2, 2]
arts_and_literature = [0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 0]
technology = [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

### Predict Interest Categories for 'Science'
This code cell normalizes the 'Science' interest list, makes predictions using the MinatKu Model, and analyzes the results.


In [18]:
fitur_normalized = normalize([science], axis = 0)
prediction = model.predict([fitur_normalized])

clas = np.argmax(prediction, axis = 1)
clas1 = np.argsort(clas)
print(clas)
indextion = np.argsort((-prediction), axis = 1)[0,:3]
top1, top2, top3 = indextion
indextion0 = np.argsort((-prediction), axis = 1)[0]
indextion1 = np.argmax(prediction, axis = 1)
indextion2 = np.argmin(prediction)
print(f"top 3 prediksi adalah : {indextion}")
print(f"keseluruhan prediksi adalah : {indextion0}")
print(f"prediksi paling mendekati adalah : {indextion1}")
print(f"prediksi paling tidak mendekati adalah : {indextion2}")
print(f"label top 3 prediksi adalah : {classes[top1]}, {classes[top2]}, {classes[top3]}")

[0]
top 3 prediksi adalah : [0 4 3]
keseluruhan prediksi adalah : [0 4 3 1 2]
prediksi paling mendekati adalah : [0]
prediksi paling tidak mendekati adalah : 2
label top 3 prediksi adalah : Science, Social, Technology


### Predict Interest Categories for 'Arts and Literature'
This code cell normalizes the 'Arts and Literature' interest list, makes predictions using the MinatKu Model, and analyzes the results.

In [19]:
fitur_normalized = normalize([arts_and_literature], axis = 0)
prediction = model.predict([fitur_normalized])

clas = np.argmax(prediction, axis = 1)
clas1 = np.argsort(clas)
print(clas)
indextion = np.argsort((-prediction), axis = 1)[0,:3]
top1, top2, top3 = indextion
indextion0 = np.argsort((-prediction), axis = 1)[0]
indextion1 = np.argmax(prediction, axis = 1)
indextion2 = np.argmin(prediction)
print(f"top 3 prediksi adalah : {indextion}")
print(f"keseluruhan prediksi adalah : {indextion0}")
print(f"prediksi paling mendekati adalah : {indextion1}")
print(f"prediksi paling tidak mendekati adalah : {indextion2}")
print(f"label top 3 prediksi adalah : {classes[top1]}, {classes[top2]}, {classes[top3]}")

[1]
top 3 prediksi adalah : [1 4 0]
keseluruhan prediksi adalah : [1 4 0 2 3]
prediksi paling mendekati adalah : [1]
prediksi paling tidak mendekati adalah : 3
label top 3 prediksi adalah : Arts and Literature, Social, Science


### Predict Interest Categories for 'Technology'

This code cell normalizes the 'Technology' interest list, makes predictions using the MinatKu Model, and analyzes the results.


In [20]:
fitur_normalized = normalize([technology], axis = 0)
prediction = model.predict([fitur_normalized])

clas = np.argmax(prediction, axis = 1)
clas1 = np.argsort(clas)
print(clas)
indextion = np.argsort((-prediction), axis = 1)[0,:3]
top1, top2, top3 = indextion
indextion0 = np.argsort((-prediction), axis = 1)[0]
indextion1 = np.argmax(prediction, axis = 1)
indextion2 = np.argmin(prediction)
print(f"top 3 prediksi adalah : {indextion}")
print(f"keseluruhan prediksi adalah : {indextion0}")
print(f"prediksi paling mendekati adalah : {indextion1}")
print(f"prediksi paling tidak mendekati adalah : {indextion2}")
print(f"label top 3 prediksi adalah : {classes[top1]}, {classes[top2]}, {classes[top3]}")

[3]
top 3 prediksi adalah : [3 4 0]
keseluruhan prediksi adalah : [3 4 0 2 1]
prediksi paling mendekati adalah : [3]
prediksi paling tidak mendekati adalah : 1
label top 3 prediksi adalah : Technology, Social, Science


### Save MinatKu Model

This code cell saves the trained MinatKu Model to a file named "model.h5" in the "/content/" directory.

In [21]:
model.save("/content/model.h5")

  saving_api.save_model(


In [22]:
# new_model = tf.keras.models.load_model('/content/model.h5')

In [23]:
# fitur_normalized = normalize([economics], axis = 0)
# pred = new_model.predict([fitur_normalized])
# np.argmax(pred, axis=1)

In [24]:
# fitur_normalized = normalize([science], axis = 0)
# pred = new_model.predict([fitur_normalized])
# np.argmax(pred, axis=1)

In [25]:
# new_model.evaluate(X_test, y_test)

In [26]:
# # Convert the model.
# converter = tf.lite.TFLiteConverter.from_keras_model(model)
# tflite_model = converter.convert()

# # Save the model.
# with open('model_minatku.tflite', 'wb') as f:
#   f.write(tflite_model)