Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasanlu parser does not manage the new returned response after training request #886

Closed
iobreaker opened this issue Mar 30, 2019 · 10 comments · Fixed by #988
Closed

rasanlu parser does not manage the new returned response after training request #886

iobreaker opened this issue Mar 30, 2019 · 10 comments · Fixed by #988
Labels

Comments

@iobreaker
Copy link
Contributor

Description

This issue was initially detected by @jhofeditz when testing the fix #881.
The new rasanlu trainer instead of returning a json response after a model training, it return a zip file with all files generated by the trainer.

Steps to Reproduce

1- Download docker image rasa/rasa_nlu:latest-spacy
2- Start docker image
3- Use a skill that interact with rasanlu and that need rasa to be trained using an intents.yml file
4- activate the skill in your opsdroid config file
5- Start opsdroid

Expected Functionality

Opsdroid should be capable to manage json response and zip response

Experienced Functionality

Crash of opsdroid due to parsing error

Versions

  • Opsdroid version: v0.14.1+37.ge71ea43
  • Rasanlu version: 0.15.0a1
  • Python version: 3.7.2
  • **OS/Docker version: Docker version 18.09.2, build 6247962 **

Additional Details

From Rasa side :

2019-03-30 11:37:34 INFO     rasa_nlu.data_router  - Logging of requests is disabled. (No 'request_log' directory configured)
2019-03-30 11:37:34 INFO     __main__  - Started http server on port 5000
2019-03-30 11:37:34+0000 [-] Log opened.
2019-03-30 11:37:34+0000 [-] Site starting on 5000
2019-03-30 11:37:34+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f65ffbff470>
2019-03-30 11:50:43+0000 [-] 2019-03-30 11:50:43 DEBUG    rasa_nlu.data_router  - New training queued
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
2019-03-30 11:50:59+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:50:58 +0000] "POST /train?project=ergo&fixed_model_name=1c0badaf3eb8c2bf6546465eadfd8492e7d79c1f0f3520d103d4bc39b4bfc&token=sfgz654zfg546qs4fg646rg64efg HTTP/1.1" 200 11767 "-" "Python/3.7 aiohttp/3.5.4"
2019-03-30 11:53:24+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:53:23 +0000] "GET / HTTP/1.1" 200 29 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
2019-03-30 11:53:24+0000 [-] "172.17.0.1" - - [30/Mar/2019:11:53:23 +0000] "GET /favicon.ico HTTP/1.1" 404 233 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
2019-03-30 11:54:24+0000 [-] Timing out client: IPv4Address(type='TCP', host='172.17.0.1', port=59948)

From Opsdroid side :

INFO opsdroid.parsers.rasanlu: Starting Rasa NLU training.
INFO opsdroid.parsers.rasanlu: Now training the model. This may take a while...
DEBUG asyncio: Using selector: KqueueSelector
Traceback (most recent call last):
...
  File "/Users/hicham/Developments/Bots/opsdroid-iobreaker/opsdroid/opsdroid/parsers/rasanlu.py", line 117, in train_rasanlu
    result = await resp.json()
  File "/Users/hicham/.virtualenvs/opsdroid/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 1027, in json
    headers=self.headers)
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/zip'
@iobreaker
Copy link
Contributor Author

iobreaker commented Mar 30, 2019

I started working on a fix that detect if the returned data is a json one or a zip one.
Now opsdroid know how to deal with it

Example :

From Opsdroid Side:

INFO opsdroid.parsers.rasanlu: Starting Rasa NLU training.
INFO opsdroid.parsers.rasanlu: Now training the model. This may take a while...
INFO opsdroid.parsers.rasanlu: Rasa NLU training completed in 16 seconds.
INFO opsdroid.parsers.rasanlu: Initialising Rasa NLU model.
DEBUG opsdroid.parsers.rasanlu: Rasa NLU response - {"intent": {"name": null, "confidence": 0.0}, "entities": [], "text": "", "project": "ergo", "model": "model_20190330-121825"}
INFO opsdroid.parsers.rasanlu: Initialisation complete in 15 seconds.
DEBUG opsdroid.parsers.rasanlu: =====> Model file saved to : /Users/hicham/Desktop/model.zip

From Rasanlu side :

2019-03-30 12:18:09+0000 [-] 2019-03-30 12:18:09 DEBUG    rasa_nlu.data_router  - New training queued
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
2019-03-30 12:18:26+0000 [-] "172.17.0.1" - - [30/Mar/2019:12:18:26 +0000] "POST /train?project=ergo&fixed_model_name=1c0badaf3eb8c2bfc535cf8553eadfd8492e7d79c1f0f3520d103d4bc39b4bfc&token=fg31d3f5g4e3b13fds21bh3eth13eg1he5zt HTTP/1.1" 200 11766 "-" "Python/3.7 aiohttp/3.5.4"

Testing the new model (intent:affirm) :

DEBUG opsdroid.connector.telegram: {'update_id': 246644773, 'message': {'message_id': 656, 'from': {'id': 65465465, 'is_bot': False, 'first_name': 'IOBreaker', 'language_code': 'en'}, 'chat': {'id': 452282388, 'first_name': 'IOBreaker', 'type': 'private'}, 'date': 1553948755, 'text': 'yes'}}
DEBUG opsdroid.core: Parsing input: yes
DEBUG opsdroid.core: Processing parsers...
DEBUG opsdroid.core: Checking Rasa NLU...
DEBUG opsdroid.parsers.rasanlu: Rasa NLU response - {"intent": {"name": "affirm", "confidence": 0.8215188398116678}, "entities": [], "intent_ranking": [{"name": "affirm", "confidence": 0.8215188398116678}, {"name": "user_say_iam_back", "confidence": 0.09246078336795119}, {"name": "goodbye", "confidence": 0.08602037682038108}], "text": "yes", "project": "ergo", "model": "model_20190330-121825"}

Now still 2 things to address :

  • Finding why rasa keep training models even if it is the same as before (already trained)
  • What to do with the returned zip
    • Just ignore it
    • Add entry in opsdroid config file to allow user to save it in a location or ignore it
    • Extract files and do things

Do not hesitate if you have any ideas :-)

@iobreaker
Copy link
Contributor Author

really, do not hesitate to comment :-)

@iobreaker
Copy link
Contributor Author

ok no answer :-(
So I am going to ignore the returned zip for the moment and deal with that in new version if needed

@jacobtomlinson
Copy link
Member

Sorry! That sounds good.

@iobreaker
Copy link
Contributor Author

Ok Thanks :-)
Will apply the changes and submit a PR

@iobreaker
Copy link
Contributor Author

iobreaker commented Jun 9, 2019

Hi All,

I Finished the fix. it's working, but i have some problem with the tox test (adding tests to test_parser_rasanlu.py).

________________________ TestParserRasaNLU.test_train_rasanlu _______________________

...
...
            result.status = 200
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
>           self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)
E           AssertionError: False != True
---------------------------- Captured log call ------------------------
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
WARNING  opsdroid.parsers.rasanlu:rasanlu.py:88 No intents found, skipping training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:93 This model already exists, skipping training...
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:113 Unable to connect to Rasa NLU, training failed.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:146 Bad Rasa NLU response - <MagicMock name='mock.text()' id='4513204600'>
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:147 Rasa NLU training failed.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:85 Starting Rasa NLU training.
INFO     opsdroid.parsers.rasanlu:rasanlu.py:98 Now training the model. This may take a while...
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:146 Bad Rasa NLU response - <MagicMock name='mock.text()' id='4513204600'>
ERROR    opsdroid.parsers.rasanlu:rasanlu.py:147 Rasa NLU training failed.

I am not a gourou of tox, so can someone assist with that please

Fixed Function :

async def train_rasanlu(config, skills):
    """Train a Rasa NLU model based on the loaded skills."""
    _LOGGER.info(_("Starting Rasa NLU training."))
    intents = await _get_all_intents(skills)
    if intents is None:
        _LOGGER.warning(_("No intents found, skipping training."))
        return False

    config["model"] = await _get_intents_fingerprint(intents)
    if config["model"] in await _get_existing_models(config):
        _LOGGER.info(_("This model already exists, skipping training..."))
        await _init_model(config)
        return True

    async with aiohttp.ClientSession() as session:
        _LOGGER.info(_("Now training the model. This may take a while..."))

        url = await _build_training_url(config)

        # https://github.com/RasaHQ/rasa_nlu/blob/master/docs/http.rst#post-train
        # Note : The request should always be sent as
        # application/x-yml regardless of wether you use
        # json or md for the data format. Do not send json as
        # application/json for example.+
        headers = {"content-type": "application/x-yml"}

        try:
            training_start = arrow.now()
            resp = await session.post(url, data=intents, headers=headers)
        except aiohttp.client_exceptions.ClientConnectorError:
            _LOGGER.error(_("Unable to connect to Rasa NLU, training failed."))
            return False

        if resp.status == 200:
            if resp.content_type == "application/json":
                result = await resp.json()
                if "info" in result and "new model trained" in result["info"]:
                    time_taken = (arrow.now() - training_start).total_seconds()
                    _LOGGER.info(_("Rasa NLU training completed in %s seconds."),
                                 int(time_taken))
                    await _init_model(config)
                    return True

                    _LOGGER.debug(result)
            if resp.content_type == "application/zip" and resp.content_disposition.type == "attachment":
                time_taken = (arrow.now() - training_start).total_seconds()
                _LOGGER.info(_("Rasa NLU training completed in %s seconds."),
                            int(time_taken))
                await _init_model(config)
                
                 # As inditated in the issue #886, returned zip file is ignored, this can be changed
                # This can be changed in future release if needed
                # Saving model.zip file example :
                # try:
                #     output_file = open("/target/directory/model.zip","wb")
                #     data = await resp.read()
                #     output_file.write(data)
                #     output_file.close()
                #     _LOGGER.debug("Rasa taining model file saved to /target/directory/model.zip")
                # except:
                #     _LOGGER.error("Cannot save rasa taining model file to /target/directory/model.zip")
                return True

        _LOGGER.error(_("Bad Rasa NLU response - %s"), await resp.text())
        _LOGGER.error(_("Rasa NLU training failed."))
        return False

Now the test should include when returned response is a zip file and when a returned response is a json dict

Thanks

@jacobtomlinson
Copy link
Member

Its hard to tell without seeing the whole test. But my guess would be that you need to set the content type of result.

@iobreaker
Copy link
Contributor Author

Hi,
The whole test is in the original file (the problem is with the content of the original file)

    async def test_train_rasanlu(self):
        result = amock.Mock()
        result.status = 404
        result.text = amock.CoroutineMock()
        result.json = amock.CoroutineMock()
        result.json.return_value = {"info": "new model trained: abc123"}

        with amock.patch(
            "aiohttp.ClientSession.post"
        ) as patched_request, amock.patch.object(
            rasanlu, "_get_all_intents"
        ) as mock_gai, amock.patch.object(
            rasanlu, "_init_model"
        ) as mock_im, amock.patch.object(
            rasanlu, "_build_training_url"
        ) as mock_btu, amock.patch.object(
            rasanlu, "_get_existing_models"
        ) as mock_gem, amock.patch.object(
            rasanlu, "_get_intents_fingerprint"
        ) as mock_gif:

            mock_gai.return_value = None
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            mock_gai.return_value = "Hello World"
            mock_gem.return_value = ["abc123"]
            mock_gif.return_value = "abc123"
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)
            self.assertTrue(mock_im.called)
            self.assertFalse(mock_btu.called)

            mock_gem.return_value = []
            mock_btu.return_value = "http://example.com"
            patched_request.side_effect = aiohttp.client_exceptions.ClientConnectorError(
                "key", amock.Mock()
            )
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            patched_request.side_effect = None
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

            result.status = 200
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)

            result.json.return_value = {"info": "error"}
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), False)

During my tests i tried to adapt it by adding

            result.status = 200
            result.content_type == "application/zip"
            result.content_disposition == "attachment"
            result.json.return_value = {'Content-Type': 'application/zip', 'Content-Disposition': 'attachment'}
            patched_request.return_value = asyncio.Future()
            patched_request.return_value.set_result(result)
            self.assertEqual(await rasanlu.train_rasanlu({}, {}), True)

but the problem still the same

@jacobtomlinson
Copy link
Member

Hmm strange. It might good for you to raise a PR at this point. Then we can see the tests fail in Travis and dig a little deeper. Also makes testing locally a bit easier.

@iobreaker
Copy link
Contributor Author

Ok thanks for your help
Going to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants