# Scripting the Process
Given that we have successfully trained a model, we need to look ahead and consider the possibility of dataset updates and the necessity to retrain a new model based on new data.<br>

At first sight, this might look like an overkill to you and you may ask: Why do I need to do this? If I want this badly, I can do the data cleaning & model training again on the notebooks I already have.<br>
👉 __There's a good point in creating an independent script.__ Real-life projects need to be agile and we're always on tight scheduling. Suppose you have weekly data updates and you'd like to update your model on weekly basis as well, and every couple of months revisit data for an EDA review. Wouldn't it be nice to have the freedom to change data preprocessing steps easily and just run a cron-job from time to time to retrain a new model on new data, all in an automated process?

Having scripts to automate our ml pipeline is essential in real-life projects. From a simplistic point of view, at least two scripts are needed to achieve this goal. One for cleaning the new dataset based on our EDA insights; and the other for training the model. I have created self-explanatory scripts in __*script*__ folder of this repo. Make sure to check them out.

## Creating a Run-time Unique Identifier
Well, having a unique identifier for saving files is necessary to avoid accidental re-writing on previous versions of them. I like to use a combination of (the current datetime + a random string).
The very simple below lines create an almost-unique identifier:

In [1]:
import string
import random
from datetime import datetime

unique_run_identifier = ''.join([
    datetime.now().strftime('%Y.%m.%d-%H%M%S-'),
    ''.join(random.choices(string.ascii_uppercase + string.digits, k=4))
])

print(unique_run_identifier)

2021.11.04-124459-DWTU


The identifier is in this form: __"Date-Time-RandomString"__

The nice thing about this identifier string is that it embodies the date and time simultaneously, which could be useful to perceive the exact time it was generated.<br>
You can also increase _k_ value in _random.choices()_ function parameters to generate a longer random string sequence.