On how to store the models in the database #2

pomodoren · 2021-05-27T19:58:16Z

The first question seems to be:

Firstly, develop a simple classification algorithm which attempts
to predict the variable "promoted" through the other variables. 
The focus of this model is pure prediction capability.

Here is described how to do it

Bonus: Please save the model, the current page,
the coefficients and any relevant statistical measure 
to the SQLite database (on a different table than "data") while you are updating it.

Step by step

pick which model to play with
understand fields that model needs ( create table )
store these fields into SQLite ( problem might be pickling )
load back
connect to On how to respond to the script request #1

The text was updated successfully, but these errors were encountered:

pomodoren · 2021-05-27T20:39:42Z

Which is best model for prediction?

After running the usual suspects (SGD, ASGD, Perceptron, Passive Aggressive (I and II)), then we could see that

SGD
ASGD

did better than the others, and were more stable.

Which model has best timing?

Also, after checking their training and prediction time we see that they are similar (<100ms difference).

So the choice wont matter that much. As ASGD prediction time (for 1000 instances) is quicker, then we will pick that.
If we have any issue, we can change into standard SGD.

Additionally, lets read quickly about ASGD and SGD just as not to be ignorant.

pomodoren · 2021-05-27T20:45:00Z

So SGD defines the speed of the change: stochastic gradient descent of the linear classifier. Right now it has in the background an SVM classifier. Read more here.
On the other hand ASGD is SGD with average=True.

average: bool or int, default=False
When set to True, computes the averaged SGD weights accross all updates and stores the
result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total 
number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

pomodoren · 2021-05-27T23:22:15Z

The database basic table was set under #1 , an issue that just got solved.
To do next would be to understand how to save the model in the database.

On another note

IMPORTANT The ingestion script should ingest data in batches and feed it to the model in batches.
Do not just pre-load all the data in advance. This is the "streaming" part of the challenge.

Batch-size? I guess we can keep the batch size as a CONFIG value, and then load and train based on that. Remember that the model, the current page are stored together, so maybe its implicated that BATCH_SIZE can be dependent on the documents per page. Still, this does not solve the issue of how do we test the model...

pomodoren · 2021-05-28T06:40:11Z

After #6, we have kind of decided the process of learning.

when load batch of 10
- check if Instance.count == N 
    - if yes, then train new model
    - store model in PredictionModel Table
- check elif Instance.count() % N == 0 - if yes
    - test existing model
    - store stats results
    - new model: train with the new N - this will wait for next input
    - store new model in db

This can be a class method, because it does not depend that much on the ingestion-batch.

pomodoren · 2021-05-28T07:05:54Z

Storing pickled data into SQLite

Bonus: Please save the model, the current page,
the coefficients and any relevant statistical measure 
to the SQLite database (on a different table than "data") while you are updating it.

I am new at this, so do not have really specific idea of what is needed to save - and how we can use these later.
Still, after searching around, I found something really interesting: Modellogger.
I will check its code, and store the model in a similar way.

Integration?

Before letting this go, I might kind of force a bit an integration ... Somehow to find a way to use the script of modellogger in the SQLAlchemy model structure.

Second thoughts?

(We do not think that it was a bad idea to store these with SQLAlchemy #1 , right?!)

pomodoren · 2021-05-28T07:14:48Z

These are additional notes regarding the process of: what to store.
Source

…ong time)

pomodoren · 2021-05-28T13:33:04Z

Use case ( to remember what we were doing ):

Load 1000 cases
Update PredictionModel table with a method to take care of the checks
Create a model, train it, save it with parameters
Load 1000 new cases
Test for these new cases
Create a model, train it, save it with new parameters

pomodoren added the discussion label May 27, 2021

pomodoren added a commit that referenced this issue May 28, 2021

#2 Modellogger testing added

82c0364

pomodoren added a commit that referenced this issue May 28, 2021

#2 Made a structure to extract parameters (not actually needed this l…

4b5241b

…ong time)

pomodoren added a commit that referenced this issue May 28, 2021

#2 Bug on RQ run

e38f03f

pomodoren added a commit that referenced this issue May 28, 2021

#2 RQ working now

b94b2d4

pomodoren added a commit that referenced this issue May 28, 2021

#2 Training done

782ef1e

pomodoren closed this as completed in c0d4983 May 28, 2021

pomodoren mentioned this issue May 28, 2021

On validation of the results #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On how to store the models in the database #2

On how to store the models in the database #2

pomodoren commented May 27, 2021 •

edited

Loading

pomodoren commented May 27, 2021

pomodoren commented May 27, 2021

pomodoren commented May 27, 2021

pomodoren commented May 28, 2021

pomodoren commented May 28, 2021

pomodoren commented May 28, 2021 •

edited

Loading

pomodoren commented May 28, 2021 •

edited

Loading

On how to store the models in the database #2

On how to store the models in the database #2

Comments

pomodoren commented May 27, 2021 • edited Loading

The first question seems to be:

Here is described how to do it

Step by step

pomodoren commented May 27, 2021

Which is best model for prediction?

Which model has best timing?

pomodoren commented May 27, 2021

pomodoren commented May 27, 2021

On another note

pomodoren commented May 28, 2021

pomodoren commented May 28, 2021

Storing pickled data into SQLite

Integration?

Second thoughts?

pomodoren commented May 28, 2021 • edited Loading

pomodoren commented May 28, 2021 • edited Loading

pomodoren commented May 27, 2021 •

edited

Loading

pomodoren commented May 28, 2021 •

edited

Loading

pomodoren commented May 28, 2021 •

edited

Loading