rasanlu parser does not manage the new returned response after training request #886

iobreaker · 2019-03-30T12:07:32Z

Description

This issue was initially detected by @jhofeditz when testing the fix #881.
The new rasanlu trainer instead of returning a json response after a model training, it return a zip file with all files generated by the trainer.

Steps to Reproduce

1- Download docker image rasa/rasa_nlu:latest-spacy
2- Start docker image
3- Use a skill that interact with rasanlu and that need rasa to be trained using an intents.yml file
4- activate the skill in your opsdroid config file
5- Start opsdroid

Expected Functionality

Opsdroid should be capable to manage json response and zip response

Experienced Functionality

Crash of opsdroid due to parsing error

Versions

Opsdroid version: v0.14.1+37.ge71ea43
Rasanlu version: 0.15.0a1
Python version: 3.7.2
**OS/Docker version: Docker version 18.09.2, build 6247962 **

Additional Details

From Rasa side :

2019-03-30 11:37:34 INFO     rasa_nlu.data_router  - Logging of requests is disabled. (No 'request_log' directory configured)
2019-03-30 11:37:34 INFO     __main__  - Started http server on port 5000
2019-03-30 11:37:34+0000 [-] Log opened.
2019-03-30 11:37:34+0000 [-] Site starting on 5000
2019-03-30 11:37:34+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f65ffbff470>
2019-03-30 11:50:43+0000 [-] 2019-03-30 11:50:43 DEBUG    rasa_nlu.data_router  - New training queued
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
2019-03-30 11:50:59+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:50:58 +0000] "POST /train?project=ergo&fixed_model_name=1c0badaf3eb8c2bf6546465eadfd8492e7d79c1f0f3520d103d4bc39b4bfc&token=sfgz654zfg546qs4fg646rg64efg HTTP/1.1" 200 11767 "-" "Python/3.7 aiohttp/3.5.4"
2019-03-30 11:53:24+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:53:23 +0000] "GET / HTTP/1.1" 200 29 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
2019-03-30 11:53:24+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:53:23 +0000] "GET /favicon.ico HTTP/1.1" 404 233 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
2019-03-30 11:54:24+0000 [-] Timing out client: IPv4Address(type='TCP', host='172.17.0.1', port=59948)

From Opsdroid side :

INFO opsdroid.parsers.rasanlu: Starting Rasa NLU training.
INFO opsdroid.parsers.rasanlu: Now training the model. This may take a while...
DEBUG asyncio: Using selector: KqueueSelector
Traceback (most recent call last):
...
  File "/Users/hicham/Developments/Bots/opsdroid-iobreaker/opsdroid/opsdroid/parsers/rasanlu.py", line 117, in train_rasanlu
    result = await resp.json()
  File "/Users/hicham/.virtualenvs/opsdroid/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 1027, in json
    headers=self.headers)
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/zip'

The text was updated successfully, but these errors were encountered:

iobreaker · 2019-03-30T12:29:07Z

I started working on a fix that detect if the returned data is a json one or a zip one.
Now opsdroid know how to deal with it

Example :

From Opsdroid Side:

INFO opsdroid.parsers.rasanlu: Starting Rasa NLU training.
INFO opsdroid.parsers.rasanlu: Now training the model. This may take a while...
INFO opsdroid.parsers.rasanlu: Rasa NLU training completed in 16 seconds.
INFO opsdroid.parsers.rasanlu: Initialising Rasa NLU model.
DEBUG opsdroid.parsers.rasanlu: Rasa NLU response - {"intent": {"name": null, "confidence": 0.0}, "entities": [], "text": "", "project": "ergo", "model": "model_20190330-121825"}
INFO opsdroid.parsers.rasanlu: Initialisation complete in 15 seconds.
DEBUG opsdroid.parsers.rasanlu: =====> Model file saved to : /Users/hicham/Desktop/model.zip

From Rasanlu side :

2019-03-30 12:18:09+0000 [-] 2019-03-30 12:18:09 DEBUG    rasa_nlu.data_router  - New training queued
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
2019-03-30 12:18:26+0000 [-] "172.17.0.1" - - [30/Mar/2019:12:18:26 +0000] "POST /train?project=ergo&fixed_model_name=1c0badaf3eb8c2bfc535cf8553eadfd8492e7d79c1f0f3520d103d4bc39b4bfc&token=fg31d3f5g4e3b13fds21bh3eth13eg1he5zt HTTP/1.1" 200 11766 "-" "Python/3.7 aiohttp/3.5.4"

Testing the new model (intent:affirm) :

DEBUG opsdroid.connector.telegram: {'update_id': 246644773, 'message': {'message_id': 656, 'from': {'id': 65465465, 'is_bot': False, 'first_name': 'IOBreaker', 'language_code': 'en'}, 'chat': {'id': 452282388, 'first_name': 'IOBreaker', 'type': 'private'}, 'date': 1553948755, 'text': 'yes'}}
DEBUG opsdroid.core: Parsing input: yes
DEBUG opsdroid.core: Processing parsers...
DEBUG opsdroid.core: Checking Rasa NLU...
DEBUG opsdroid.parsers.rasanlu: Rasa NLU response - {"intent": {"name": "affirm", "confidence": 0.8215188398116678}, "entities": [], "intent_ranking": [{"name": "affirm", "confidence": 0.8215188398116678}, {"name": "user_say_iam_back", "confidence": 0.09246078336795119}, {"name": "goodbye", "confidence": 0.08602037682038108}], "text": "yes", "project": "ergo", "model": "model_20190330-121825"}

Now still 2 things to address :

Finding why rasa keep training models even if it is the same as before (already trained)
What to do with the returned zip
- Just ignore it
- Add entry in opsdroid config file to allow user to save it in a location or ignore it
- Extract files and do things

Do not hesitate if you have any ideas :-)

iobreaker · 2019-04-14T17:17:19Z

really, do not hesitate to comment :-)

iobreaker · 2019-05-02T12:56:25Z

ok no answer :-(
So I am going to ignore the returned zip for the moment and deal with that in new version if needed

jacobtomlinson · 2019-05-02T13:26:29Z

Sorry! That sounds good.

iobreaker · 2019-05-13T05:20:37Z

Ok Thanks :-)
Will apply the changes and submit a PR

iobreaker · 2019-06-09T12:04:02Z

Hi All,

I Finished the fix. it's working, but i have some problem with the tox test (adding tests to test_parser_rasanlu.py).

________________________ TestParserRasaNLU.test_train_rasanlu _______________________

...
...
            result.status = 200
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
>           self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)
E           AssertionError: False != True

---------------------------- Captured log call ------------------------
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
WARNING  opsdroid.parsers.rasanlu:rasanlu.py:88 No intents found, skipping training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:93 This model already exists, skipping training...
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:113 Unable to connect to Rasa NLU, training failed.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:146 Bad Rasa NLU response - <MagicMock name='mock.text()' id='4513204600'>
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:147 Rasa NLU training failed.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:146 Bad Rasa NLU response - <MagicMock name='mock.text()' id='4513204600'>
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:147 Rasa NLU training failed.

I am not a gourou of tox, so can someone assist with that please

Fixed Function :

async def train_rasanlu(config, skills):
    """Train a Rasa NLU model based on the loaded skills."""
    _LOGGER.info(_("Starting Rasa NLU training."))
    intents = await _get_all_intents(skills)
    if intents is None:
        _LOGGER.warning(_("No intents found, skipping training."))
        return False

    config["model"] = await _get_intents_fingerprint(intents)
    if config["model"] in await _get_existing_models(config):
        _LOGGER.info(_("This model already exists, skipping training..."))
        await _init_model(config)
        return True

    async with aiohttp.ClientSession() as session:
        _LOGGER.info(_("Now training the model. This may take a while..."))

        url = await _build_training_url(config)

        # https://github.com/RasaHQ/rasa_nlu/blob/master/docs/http.rst#post-train
        # Note : The request should always be sent as
        # application/x-yml regardless of wether you use
        # json or md for the data format. Do not send json as
        # application/json for example.+
        headers = {"content-type": "application/x-yml"}

        try:
            training_start = arrow.now()
            resp = await session.post(url, data=intents, headers=headers)
        except aiohttp.client_exceptions.ClientConnectorError:
            _LOGGER.error(_("Unable to connect to Rasa NLU, training failed."))
            return False

        if resp.status == 200:
            if resp.content_type == "application/json":
                result = await resp.json()
                if "info" in result and "new model trained" in result["info"]:
                    time_taken = (arrow.now() - training_start).total_seconds()
                    _LOGGER.info(_("Rasa NLU training completed in %s seconds."),
                                 int(time_taken))
                    await _init_model(config)
                    return True

                    _LOGGER.debug(result)
            if resp.content_type == "application/zip" and resp.content_disposition.type == "attachment":
                time_taken = (arrow.now() - training_start).total_seconds()
                _LOGGER.info(_("Rasa NLU training completed in %s seconds."),
                            int(time_taken))
                await _init_model(config)
                
                 # As inditated in the issue #886, returned zip file is ignored, this can be changed
                # This can be changed in future release if needed
                # Saving model.zip file example :
                # try:
                #     output_file = open("/target/directory/model.zip","wb")
                #     data = await resp.read()
                #     output_file.write(data)
                #     output_file.close()
                #     _LOGGER.debug("Rasa taining model file saved to /target/directory/model.zip")
                # except:
                #     _LOGGER.error("Cannot save rasa taining model file to /target/directory/model.zip")
                return True

        _LOGGER.error(_("Bad Rasa NLU response - %s"), await resp.text())
        _LOGGER.error(_("Rasa NLU training failed."))
        return False

Now the test should include when returned response is a zip file and when a returned response is a json dict

Thanks

jacobtomlinson · 2019-06-09T20:29:18Z

Its hard to tell without seeing the whole test. But my guess would be that you need to set the content type of result.

iobreaker · 2019-06-10T12:55:26Z

Hi,
The whole test is in the original file (the problem is with the content of the original file)

    async def test_train_rasanlu(self):
        result = amock.Mock()
        result.status = 404
        result.text = amock.CoroutineMock()
        result.json = amock.CoroutineMock()
        result.json.return_value = {"info": "new model trained: abc123"}

        with amock.patch(
            "aiohttp.ClientSession.post"
        ) as patched_request, amock.patch.object(
            rasanlu, "_get_all_intents"
        ) as mock_gai, amock.patch.object(
            rasanlu, "_init_model"
        ) as mock_im, amock.patch.object(
            rasanlu, "_build_training_url"
        ) as mock_btu, amock.patch.object(
            rasanlu, "_get_existing_models"
        ) as mock_gem, amock.patch.object(
            rasanlu, "_get_intents_fingerprint"
        ) as mock_gif:

            mock_gai.return_value = None
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            mock_gai.return_value = "Hello World"
            mock_gem.return_value = ["abc123"]
            mock_gif.return_value = "abc123"
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)
            self.assertTrue(mock_im.called)
            self.assertFalse(mock_btu.called)

            mock_gem.return_value = []
            mock_btu.return_value = "http://example.com"
            patched_request.side_effect = aiohttp.client_exceptions.ClientConnectorError(
                "key", amock.Mock()
            )
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            patched_request.side_effect = None
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            result.status = 200
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)

            result.json.return_value = {"info": "error"}
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

During my tests i tried to adapt it by adding

            result.status = 200
            result.content_type == "application/zip"
            result.content_disposition == "attachment"
            result.json.return_value = {'Content-Type': 'application/zip', 'Content-Disposition': 'attachment'}
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)

but the problem still the same

jacobtomlinson · 2019-06-10T12:57:35Z

Hmm strange. It might good for you to raise a PR at this point. Then we can see the tests fail in Travis and dig a little deeper. Also makes testing locally a bit easier.

iobreaker · 2019-06-10T13:34:45Z

Ok thanks for your help
Going to do this

jacobtomlinson added the bug label Apr 1, 2019

iobreaker mentioned this issue Jun 10, 2019

Fixing rasanlu parser does not manage the new returned response after training request #988

Merged

4 tasks

FabioRosado closed this as completed in #988 Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rasanlu parser does not manage the new returned response after training request #886

rasanlu parser does not manage the new returned response after training request #886

iobreaker commented Mar 30, 2019

iobreaker commented Mar 30, 2019 •

edited

iobreaker commented Apr 14, 2019

iobreaker commented May 2, 2019

jacobtomlinson commented May 2, 2019

iobreaker commented May 13, 2019

iobreaker commented Jun 9, 2019 •

edited

jacobtomlinson commented Jun 9, 2019

iobreaker commented Jun 10, 2019

jacobtomlinson commented Jun 10, 2019

iobreaker commented Jun 10, 2019

rasanlu parser does not manage the new returned response after training request #886

rasanlu parser does not manage the new returned response after training request #886

Comments

iobreaker commented Mar 30, 2019

Description

Steps to Reproduce

Expected Functionality

Experienced Functionality

Versions

Additional Details

iobreaker commented Mar 30, 2019 • edited

Example :

iobreaker commented Apr 14, 2019

iobreaker commented May 2, 2019

jacobtomlinson commented May 2, 2019

iobreaker commented May 13, 2019

iobreaker commented Jun 9, 2019 • edited

jacobtomlinson commented Jun 9, 2019

iobreaker commented Jun 10, 2019

jacobtomlinson commented Jun 10, 2019

iobreaker commented Jun 10, 2019

iobreaker commented Mar 30, 2019 •

edited

iobreaker commented Jun 9, 2019 •

edited