Introduction

With this Klondike Classifier you can predict a target field according to the model generated by the train.

⚙️ Setup the environment

▶️ STEP 1:

Download the latest release from this project page. Unzip the file in /etc/klondike_classifier

▶️ STEP 2:

Install these requirements:

cd /etc/klondike_classifier
pip3 install -r requirements.txt
python3 -m nltk.downloader stopwords or python3 -m nltk.downloader all
chmod 777 -R treetagger
chmod 777 -R CNN_tuning
pip3 install -e git+git://github.com/ildiopantofola/nonconformist.git@master#egg=nonconformist
python3 -m spacy download en_core_web_lg
python3 -m spacy download fr_core_news_lg
python3 -m spacy download de_core_news_lg
python3 -m spacy download it_core_news_lg
python3 -m spacy download es_core_news_lg
python3 -m spacy download pt_core_news_lg
python3 -m spacy download nb_core_news_lg
python3 -m spacy download da_core_news_lg
cd treetagger && unzip lib.zip

▶️ STEP 3:

Go to https://huggingface.co/neuraly/bert-base-italian-cased-sentiment > Files and versions > download and copy in folder pretrained_models the files:

config.json
special_tokens_map.json
tf_model.h5
tokenizer_config.json
vocab.txt

✏️ Configuration

In utilities/connection.json* there are the connections to MySQL databases used by the classifier to read data during the train process and write the result of the prediction.

In utilities/connection_service.json you can configure the connection to the table that contains various services: a service is a specific classifier with its trained model.

In utilities/connection_cron.json you can configure the connection to the cron table which can contain various version of the same service.

In the services table you can configure the source table with data to train your classifier in these columns:

training_table: table name
training_table_key: key column of the table
training_columns: llist of columns read to train the classifier
training_where: conditions to the table
training_target: column with the attribute to predict

You can configure the connection to training_table in utilities/connection.json

In utilities/connection_predictions.json there is the connection to the ai_classified table that contains the predictions.

CREATE TABLE `services` (
  `id` int(11) NOT NULL,
  `training_table` varchar(50) NOT NULL DEFAULT '',
  `training_table_key` varchar(50) NOT NULL DEFAULT '',
  `training_columns` text NOT NULL,
  `training_where` text,
  `training_target` varchar(50) NOT NULL DEFAULT '',
  `parameters` json DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `cron` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `serviceid` int(11) NOT NULL,
  `planned` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `started` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `ended` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `status` int(1) NOT NULL DEFAULT '0',
  `training_result` text,
  PRIMARY KEY (`id`),
  KEY `serviceid` (`serviceid`,`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `ai_classified` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `crmid` int(19) NOT NULL,
  `cronid` int(11) NOT NULL,
  `guessed` longtext,
  `guessed_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `applied` int(1) NOT NULL DEFAULT '0',
  `applied_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `applied` (`applied`),
  KEY `crmid` (`crmid`,`cronid`,`applied`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

example of a classifier's configuration that predict the category of a list of tickets

INSERT INTO `services` (`id`, `training_table`, `training_table_key`, `training_columns`, `training_where`, `training_target`, `parameters`)
VALUES
	(1, 'tickets', 'ticketid', 'ticket_title,description', 'ticketcategories <> \'test\' and createdtime >= \"2015-01-01 00:00:00\"', 'ticketcategories', '{\"lemming\": true, \"language\": \"italian\", \"disable_CNN\": true, \"min_cardinality\": 20}');
INSERT INTO `cron` (`id`, `serviceid`, `planned`, `started`, `ended`, `status`, `training_result`)
VALUES
	(1, 1, '2022-06-03 18:00:00', NULL, '0000-00-00 00:00:00', 0, NULL);

Service parameters

In the column parameters of the table services you can configure:

min_cardinality to exclude from the dataset rows with cardinality lower than this value
language (""/"italian"/"english") to interpret text in a specific language
lemming (true/false)
stemming (true/false)
disable_CNN (true/false) to skip the CNN in the train command
CNN_config is a json with these attributes

{    
    "NB_WORDS" : 10000,                                     -->  number of words in the dictionary
    "NB_EPOCHS" : 30,                                       -->  Number of epochs
    "BATCH_SIZE" : 512,                                     -->  Size of the batches used in the mini-batch gradient descent    
    "MAX_LEN" : 400,                                        -->  Maximum number of words in a sequence
    "EMBEDDING_DIM" : 150,                                  -->  Number of dimensions of the GloVe word embeddings
    "NB_CONVOLUTION_FILTERS" : 128,                         -->  Number of convolution filters
    "CONVOLUTION_KERNEL_SIZE" : 4,                          -->  Convolution Kernel Size
    "LABEL_SMOOTHING" : 0.3,                                -->  label smoothing index
    "EARLYSTOPPING_PATIENCE" : 10,                          -->  number of epochs without improvement in the monitored param that the model waits before stopping
    "EARLYSTOPPING_MONITOR_PARAM" : "val_loss",             -->  the value monitored for early stopping
    "DROPOUT_PROB" : 0.5,                                   -->  dropout CNN index
    "PARAMS_AUTOTUNING" : false,                            -->  enables CNN hyperparams autotuning via Keras tuner class
    "MULTIGROUP_CNN" : true,                                -->  enables CNN MultiGroup custom embeddings mode
    "MG_GLOVE_EMB_FILE" : "itwiki_20180420_300d.txt",       -->  CNN MultiGroup Glove embeddings file name (has to be inside the embeddings folder)
    "MG_FASTTEXT_EMB_FILE" : "embed_wiki_it_1.3M_52D.vec",  -->  CNN MultiGroup FastText embeddings file name (has to be inside the embeddings folder)
    "MG_GLOVE_EMB_DIM" : 300,                               -->  CNN MultiGroup Glove embeddings vectors dimension
    "MG_FASTTEXT_EMB_DIM" : 52,                             -->  CNN MultiGroup FastText embeddings vectors dimension
}

🖥️ Usage

👩‍🏫 Train

The train command tests several algorithms, then holds the most accured model.

python3 CRM_classifier.py --train --from_db --cron_id <CRONID>

example for train the service 1

python3 CRM_classifier.py --train --from_db --cron_id 1

🔮 Predict

The classification command is used to predict a target field according to the generated model by the train. If you want to predict the target value of a new row you have to insert data in training_table with training_columns populated and execute this command with the id of the row.

python3 CRM_classifier.py --classify --from_db --cron_id <CRONID> --table <TABLENAME> --target <TARGETKEY> --id <ID>

example for classify the service 1

python3 CRM_classifier.py --classify --from_db --cron_id 1 --table tickets --target ticketid --id 54723

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AI		AI
CNN_tuning		CNN_tuning
Checkpoints		Checkpoints
Models		Models
Results		Results
embeddings		embeddings
logs		logs
pretrained_models		pretrained_models
treetagger		treetagger
utilities		utilities
CRM_classifier.py		CRM_classifier.py
Classifier.ipynb		Classifier.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
spacy_tags_index.py		spacy_tags_index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

⚙️ Setup the environment

▶️ STEP 1:

▶️ STEP 2:

▶️ STEP 3:

✏️ Configuration

Service parameters

🖥️ Usage

👩‍🏫 Train

🔮 Predict

About

Releases

Packages

Languages

License

klondike-AI/klassifier

Folders and files

Latest commit

History

Repository files navigation

Introduction

⚙️ Setup the environment

▶️ STEP 1:

▶️ STEP 2:

▶️ STEP 3:

✏️ Configuration

Service parameters

🖥️ Usage

👩‍🏫 Train

🔮 Predict

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages