<header>
    <h1 align=center>Retraining of All Training Data</h1>
    <h3 align=center>Computational Intelligence</h3>
</header>
<main>
    <font size=12>Members:</font>
    <ol>
        <li>Matin Amani</li>
        <li>Motahare Hoseyni</li>
        <li>Fateme Safayi</li>
        <li>Shaghayegh Shirvani</li>
    </ol>
</main>

<hr />

<h3>Why Retrain a Trained Model?</h3>

<p style="font-size:larger;">
Retraining an AI model in machine learning is a crucial process that allows us to continually improve and adapt the model's performance over time. As new data becomes available or the problem domain changes, retraining becomes essential to ensure that the AI model remains accurate, relevant, and up to date. By exposing the model to additional data, it can learn from new patterns, trends, and insights that were not present during its initial training phase. Retraining helps the model to <font color="#09c" size=6><u>refine its predictions</u></font>, <font color="#09c" size=6><u>enhance its ability to generalize</u></font>, <font color="#09c" size=6><u> adapt to evolving conditions</u></font> and ultimately leading to more robust and reliable results. Additionally, retraining also enables the model to address any biases or errors that may have been present in the original training data, promoting fairness and inclusivity in AI applications. Thus, retraining an AI model is a continuous iterative process that empowers it to stay competent, adaptive, and accountable in an ever-changing world.
</p>

<br />
<hr />

<p style="font-size:larger;">
Now, let's dive into the code and see how retraining effects our models' predictions.
</p>

<hr />

<p style="font-size:larger;">
At the very top, we import our models.<br />
Initially, we don't discuss how all these models are created. we just feed them some data & their configurations and wait for the results to come out.<br />
These results will be plotted on graph and saved in the project directory.  
</p>

In [None]:
from mnist import Mnist_Model
from imdb import IMDB_Model
from reuters import Reuters_Model
from boston import Boston_Model

<p style="font-size:larger;">
After importing the models, we set some configurations for each models' hidden layers and a general test case.<br />
These configurations will tell our model:
<ul>
    <li>how many hidden layer it will have</li>
    <li>how each layer should behave</li>
    and
    <li>what learning rate and batch size should the network consider</li>
</ul>
</p>

In [None]:
mnist_layers = [
    {"neurons": 512, "activation": "relu"},
    {"neurons": 10, "activation": "softmax"},
]

imdb_layers = [
    {"neurons": 16, "activation": "relu"},
    {"neurons": 16, "activation": "relu"},
    {"neurons": 1, "activation": "sigmoid"},
]

reuters_layers = [
    {"neurons": 64, "activation": "relu"},
    {"neurons": 64, "activation": "relu"},
    {"neurons": 46, "activation": "softmax"},
]

boston_layers = [
    [
        {"neurons": 64, "activation": "relu"},
        {"neurons": 32, "activation": "relu"},
        {"neurons": 1, "activation": None},
    ],
    [
        {"neurons": 64, "activation": "relu"},
        {"neurons": 64, "activation": "relu"},
        {"neurons": 1, "activation": None},
    ],
    [
        {"neurons": 64, "activation": "relu"},
        {"neurons": 128, "activation": "relu"},
        {"neurons": 1, "activation": None},
    ],
]

test_cases = [
    {"batch_size": 128, "learning_rate": 0.001},
    {"batch_size": 256, "learning_rate": 0.001},
    {"batch_size": 512, "learning_rate": 0.001},
    {"batch_size": 1024, "learning_rate": 0.001},
    {"batch_size": 128, "learning_rate": 0.0001},
    {"batch_size": 256, "learning_rate": 0.0001},
    {"batch_size": 512, "learning_rate": 0.0001},
    {"batch_size": 1024, "learning_rate": 0.0001},
]

<h3 >
First off, lets see <font color="#09c">MNIST Dataset</font>
</h3>

In [None]:
for test_case in test_cases:
    model = Mnist_Model(
        20, test_case["batch_size"], test_case["learning_rate"], mnist_layers
    )
    model.run()
    model.plot()

<p style="font-size:larger;">
After running the cell above, these graphs will be saved in our projects' directory.<br /> Let's see what we have.
</p>
<ul>
    <li>
        <code>batch_size = 128 ; learning_rate = 0.001</code>
        <br />
        <img src="./mnist-fig;bs:128_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.001</code>
        <br />
        <img src="./mnist-fig;bs:256_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.001</code>
        <br />
        <img src="./mnist-fig;bs:512_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.001</code>
        <br />
        <img src="./mnist-fig;bs:1024_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 128 ; learning_rate = 0.0001</code>
        <br />
        <img src="./mnist-fig;bs:128_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.0001</code>
        <br />
        <img src="./mnist-fig;bs:256_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.0001</code>
        <br />
        <img src="./mnist-fig;bs:512_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.0001</code>
        <br />
        <img src="./mnist-fig;bs:1024_lr:0.0001.png"/>
    </li>
    <br />
</ul>

<p style="font-size:larger;">
After analyzing all the graphs for MNIST dataset, we've come to the conclusion that:<br />
<code style="font-size:large;">{"epochs": 6, "batch_size": 128, "learning_rate": 0.001}</code>
<br />
is the best configuration for <font color="#09c"><u>this network architecture</u></font>.
</p>

<hr />

<p style="font-size:larger;">
After MNIST, we run the same operation on <font color="#09c">IMDB Dataset</font>.
</p>

In [None]:
for test_case in test_cases:
    model = IMDB_Model(
        20,
        test_case["batch_size"],
        test_case["learning_rate"],
        10000,
        imdb_layers,
    )
    model.run()
    model.plot()

<p style="font-size:larger;">
The graphs generated by the different models of IMDB are as below.<br /> Let's Analyze!
</p>
<ul>
    <li>
        <code>batch_size = 128 ; learning_rate = 0.001</code>
        <br />
        <img src="./imdb-fig;bs:128_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.001</code>
        <br />
        <img src="./imdb-fig;bs:256_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.001</code>
        <br />
        <img src="./imdb-fig;bs:512_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.001</code>
        <br />
        <img src="./imdb-fig;bs:1024_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 128 ; learning_rate = 0.0001</code>
        <br />
        <img src="./imdb-fig;bs:128_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.0001</code>
        <br />
        <img src="./imdb-fig;bs:256_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.0001</code>
        <br />
        <img src="./imdb-fig;bs:512_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.0001</code>
        <br />
        <img src="./imdb-fig;bs:1024_lr:0.0001.png"/>
    </li>
    <br />
</ul>

<p style="font-size:larger;">
After analyzing all the graphs for IMDB dataset, we've come to the conclusion that:<br />
<code style="font-size:large;">{"epochs": 3, "batch_size": 128, "learning_rate": 0.001}</code>
<br />
is the best configuration for <font color="#09c"><u>this network architecture</u></font>.
</p>

<hr />

<p style="font-size:larger;">
Let's repeat the same process for <font color="#09c">Reuters Dataset</font>.
</p>

In [None]:
for test_case in test_cases:
    model = Reuters_Model(
        20,
        test_case["batch_size"],
        test_case["learning_rate"],
        10000,
        reuters_layers,
    )
    model.run()
    model.plot()

<p style="font-size:larger;">
The graphs generated by the different models of Reuters are as below.<br /> Let's Analyze!
</p>
<ul>
    <li>
        <code>batch_size = 128 ; learning_rate = 0.001</code>
        <br />
        <img src="./reuters-fig;bs:128_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.001</code>
        <br />
        <img src="./reuters-fig;bs:256_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.001</code>
        <br />
        <img src="./reuters-fig;bs:512_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.001</code>
        <br />
        <img src="./reuters-fig;bs:1024_lr:0.001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 128 ; learning_rate = 0.0001</code>
        <br />
        <img src="./reuters-fig;bs:128_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 256 ; learning_rate = 0.0001</code>
        <br />
        <img src="./reuters-fig;bs:256_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 512 ; learning_rate = 0.0001</code>
        <br />
        <img src="./reuters-fig;bs:512_lr:0.0001.png"/>
    </li>
    <br />
    <li>
        <code>batch_size = 1024 ; learning_rate = 0.0001</code>
        <br />
        <img src="./reuters-fig;bs:1024_lr:0.0001.png"/>
    </li>
    <br />
</ul>

<p style="font-size:larger;">
After analyzing all the graphs for Reuters dataset, we've come to the conclusion that:<br />
<code style="font-size:large;">{"epochs": 5,"batch_size": 128, "learning_rate": 0.001}</code>
<br />
is the best configuration for <font color="#09c"><u>this network architecture</u></font>.
</p>

<hr />

<p style="font-size:larger;">
Finally, let's train the <font color="#09c">Boston Housing Dataset</font>.
</p>

In [None]:
for test_case in boston_layers:
    model = Boston_Model(500, 0.001, test_case)
    model.run(4)
    model.plot()

<p style="font-size:larger;">
Boston model graphs are a bit different!<br />
in these graphs, the Y-Axis is <u>Mean Absolute Error</u>.<br />
</p>

<font size=18 align=center>

$\text{{MAE}} = \frac{\sum_{i=1}^{n} \left| y_i - x_i \right|}{n}$

Which:<br />
$y_i$ = predicted_value<br />
$x_i$ = true_value

</font>

<p>
Let's see which configuration gives us the minimum <code>MAE</code>.
</p>

<ul>
    <li>
        <code>hidden_layers: [64, 32, 1]</code><br />
        <img src="./boston-fig;layers:[64,32,1].png"/>
    </li>
    <br />
    <li>
        <code>hidden_layers: [64, 64, 1]</code><br />
        <img src="./boston-fig;layers:[64,64,1].png"/>
    </li>
    <br />
    <li>
        <code>hidden_layers: [64, 128, 1]</code><br />
        <img src="./boston-fig;layers:[64,128,1].png"/>
    </li>
    <br />
</ul>


<p style="font-size:larger;">
After analyzing all the graphs for Boston dataset, we've come to the conclusion that:<br />
<code style="font-size:large;">{"neurons": [64, 64, 1]}</code>
<br />
is the best network configuration for <font color="#09c">Boston Housing Dataset</font>.
</p>