I'm making this project with TensorFlow 2.0 so I need to specify that for Google Colab since it should be compatible with Colab.

In [None]:
!pip install -q seaborn
!pip install -q git+https://github.com/tensorflow/docs
!pip install -U tensorboard >piplog 2>&1

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

Importing required libraries

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Here, we are downloading the flags dataset from UCI's archive and then we turn it into Pandas DataFrame so we can use it on our model.

In [2]:
data_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.data'

dataset_path = tf.keras.utils.get_file("train.csv", data_url)

column_names = ['Name','Landmass','Zone','Area','Population', 'Language', 
                'Religion', 'Bars', 'Stripes', 'Colours', 'Red', 'Green', 
                'Blue', 'Gold', 'White', 'Black', 'Orange', 'Main Hue',
                'Circles', 'Crosses', 'Saltires', 'Quarters', 'Sunstars',
                'Crescent', 'Triangle', 'Icon', 'Animate', 'Text', 'Top Left',
                'Bottom Right']

raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "Unknown", sep=",")

raw_dataset.tail()

Unnamed: 0,Name,Landmass,Zone,Area,Population,Language,Religion,Bars,Stripes,Colours,...,Saltires,Quarters,Sunstars,Crescent,Triangle,Icon,Animate,Text,Top Left,Bottom Right
189,Western-Samoa,6,3,3,0,1,1,0,0,3,...,0,1,5,0,0,0,0,0,blue,red
190,Yugoslavia,3,1,256,22,6,6,0,3,4,...,0,0,1,0,0,0,0,0,blue,red
191,Zaire,4,2,905,28,10,5,0,0,4,...,0,0,0,0,0,1,1,0,green,green
192,Zambia,4,2,753,6,10,5,3,0,4,...,0,0,0,0,0,0,1,0,green,brown
193,Zimbabwe,4,2,391,8,10,5,0,7,5,...,0,0,1,0,1,1,1,0,green,green


We need to drop some values such as country names, landmass etc. since I believe they don't have any relationship with religion of a country.
I also choose to drop very spesific things & things that may cause misleading things (For example, green colour might be often to seen in let's say a Muslim country flag but asking for every colour wouldn't be great in my opinion because this project is kind of prediction game as you can see at the end of the codes.)

In [3]:
dataset = raw_dataset.copy()

dataset = dataset.drop(['Name', 'Area', 'Landmass', 'Population', 'Bars', 
                        'Red', 'Green', 'Blue', 'Gold', 
                        'White', 'Black', 'Orange', 'Main Hue', 'Circles', 'Crosses', 
                        'Saltires', 'Quarters', 'Crescent', 'Triangle', 
                        'Top Left', 'Bottom Right'], axis = 1)

dataset.tail()

Unnamed: 0,Zone,Language,Religion,Stripes,Colours,Sunstars,Icon,Animate,Text
189,3,1,1,0,3,5,0,0,0
190,1,6,6,3,4,1,0,0,0
191,2,10,5,0,4,0,1,1,0
192,2,10,5,0,4,0,0,1,0
193,2,10,5,7,5,1,1,1,0


At this step, we prepare our training dataset. We need to drop religion column from training dataset because model needs to find a correlation between the other parameters & religion of a country. If we leave religion column inside the training dataset, it will probably cause our model to not to actually find any correlations because it will think like "Whenever the religion is x, the output should be x" since that will be the case while fitting the model, which is a thing we should be avoiding if we want a model that can ACTUALLY predict instead of just cheating.

In [4]:
train_dataset = dataset
train_dataset = train_dataset.drop('Religion', axis = 1)

train_stats = train_dataset.describe()
train_stats = train_stats.transpose()

train_labels = dataset['Religion']
train_dataset.tail()

Unnamed: 0,Zone,Language,Stripes,Colours,Sunstars,Icon,Animate,Text
189,3,1,0,3,5,0,0,0
190,1,6,3,4,1,0,0,0
191,2,10,0,4,0,1,1,0
192,2,10,0,4,0,0,1,0
193,2,10,7,5,1,1,1,0


There is a halt callback because this model generally reaches up to 90% accuracy so there is no need to wait until all 2500 epochs finish.

In [5]:
class haltCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy') >= 0.90):
            print("\n\n\nReached 90% accuracy so cancelling training!\n\n\n")
            self.model.stop_training = True


modelHaltCallback = haltCallback()

We are building the model here. I have chosen to use Sparse Categorical Crossentropy since we have 8 different categories of religions (You can refer to them from the flag.names file in this repository. The whole dataset explaining is written in that file which is taken from UCI's archive.)

In [6]:
def build_model():
  model = tf.keras.Sequential([
    layers.Dense(128, input_shape=[len(train_dataset.keys())]),
    layers.Dense(128, activation='relu'),
    layers.Dense(len(train_dataset), activation='softmax')
  ])

  model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', 
                metrics=['accuracy', 'sparse_categorical_crossentropy'])
  return model

model = build_model()

model.summary()

W0108 21:59:22.263863 18840 deprecation.py:506] From C:\Users\TEKNO\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               1152      
_________________________________________________________________
dense_1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense_2 (Dense)              (None, 194)               25026     
Total params: 42,690
Trainable params: 42,690
Non-trainable params: 0
___________________________________

We need to specify our logs dir so we can store our data and then upload them to TensorBoard later on. We also need to create a callback for TensorBoard because we need to store our logs in a way that TensorBoard can directly upload it by itself.

In [7]:
log_dir="logs"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

Now, it's time to fit the model so we can use that model to predict on some values later on. We're defining a history variable while fitting so we can later use that history logs to create some graphs about our model.

In [8]:
history = model.fit(train_dataset, train_labels, epochs=2500, validation_split = 0.2, callbacks=[tfdocs.modeling.EpochDots(), modelHaltCallback, tensorboard_callback])

Train on 155 samples, validate on 39 samples
Epoch 1/2500
 32/155 [=====>........................] - ETA: 0s - loss: 5.5037 - acc: 0.0000e+00 - sparse_categorical_crossentropy: 5.5037
Epoch: 0, acc:0.0323,  loss:5.1830,  sparse_categorical_crossentropy:5.1830,  val_acc:0.3333,  val_loss:4.1589,  val_sparse_categorical_crossentropy:4.1589,  
.

TypeError: '>=' not supported between instances of 'NoneType' and 'float'

As the fitting is over now, we can see the correlation between Sparse Categorical Crossentropy & Model Accuracy on a graph. (We will be able to see all of these graphs interactively when we upload our fitting history to TensorBoard at the end of this project.)

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
# print(hist.tail())

def mapRange(valueToBeRanged, currentMin, currentMax, newMin, newMax):
    return ((valueToBeRanged) / (currentMax - currentMin) * (newMax - newMin))

plt.plot(hist['epoch'], mapRange(hist['sparse_categorical_crossentropy'], 0, hist['sparse_categorical_crossentropy'].max(), 0, 100), 'b', label='Sparse Categorical Crossentropy')
plt.plot(hist['epoch'], mapRange(hist['accuracy'], 0, 1, 0, 100), 'g', label='Model Accuracy')
plt.title('Correlation between Sparse Categorical Crossentropy & Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

And now it's the final time! I actually thought about making this project like a game so any user could just download this and then enter their own values from the GUI itself but since it's not something possible on Google Colab as far as I know, I entered the values of Peru from the dataset and luckily, our model predicted it correctly!

In [None]:
religion_dict = ['Catholic', 'Other Christian', 'Muslim', 'Buddhist', 'Hindu', 'Ethnic', 'Marxist', 'Others']
print("Now, it's time to predict!")
# predZone = input("In which geographic quadrant (based on Greenwich and the Equator) is that country? (Enter 1 for North-East, 2 for South-East, 3 for South-West and 4 for North-West)")
# predLang = input("What language is being spoken in that country? (1=English, 2=Spanish, 3=French, 4=German, 5=Slavic, 6=Other Indo-European, 7=Chinese, 8=Arabic, 9=Japanese/Turkish/Finnish/Magyar, 10=Others)")
# predStripes = input("How many stripes are there in that country's flag?")
# predColours = input("How many DIFFERENT colours are there in that country's flag?")
# predSunStars = input("How many suns or stars are there in that country's flag?")
# predIcon = input("Are there any kind of inanimate images (e.g., a boat) on that country's flag? (Type 1 for yes, 0 for no)")
# predAnimate = input("Are there any kind of animate images (e.g., an eagle, a tree, a human hand) on that country's flag? (Type 1 for yes, 0 for no)")
# predText = input("Are there any letters or writing on the flag (e.g., a motto or slogan) on that country's flag? (Type 1 for yes, 0 for no)")


predZone = 3
predLang = 2
predStripes = 0
predColours = 2
predSunStars = 0
predIcon = 0
predAnimate = 0
predText = 0
predData = pd.DataFrame([[predZone, predLang, predStripes, predColours, predSunStars, predIcon, predAnimate, predText]], ['Zone', 'Language', 'Stripes', 'Colours', 'Sunstars', 'Icon', 'Animate', 'Text'])
prediction = model.predict(predData)
print(f"Model predicts that country's religion is: {religion_dict[np.argmax(prediction)]}.")
print(f"Based on this, that country might be one of these: {', '.join(raw_dataset['Name'].loc[raw_dataset['Religion'] == np.argmax(prediction)])}") #PERU

Optionally, I will also save my model to a file so I can use it in browsers using TensorFlow.js library.

In [None]:
model.save('trained_model.h5')
tf.saved_model.save(model, 'saved_model')

After the final, it's time that we upload everything we did to TensorBoard so we can have pretty cool interactive graphs about our model instead of just 1 static image which we created earlier.

In [None]:
!tensorboard dev upload --logdir ./logs