Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronisation error when importing from MongoDB #17

Open
alallema opened this issue Aug 2, 2023 · 15 comments
Open

Synchronisation error when importing from MongoDB #17

alallema opened this issue Aug 2, 2023 · 15 comments

Comments

@alallema
Copy link

alallema commented Aug 2, 2023

Description

I tried to import documents with meilisync from the MongoDB and got an error:

TypeError: meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType
2023-08-02 09:52:16.401 | ERROR    | meilisync.main:interval:122 - Error when insert data to MeiliSearch: meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType

Data format:

I used this sample data:

{
  "id": "287947",
  "title": "Shazam!",
  "poster": "https://image.tmdb.org/t/p/w1280/xnopI5Xtky18MPhK40cZAGAOVeV.jpg",
  "overview": "A boy is given the ability to become an adult superhero in times of need with a single magic word.",
  "release_date": { "$numberLong": "1553299200" }
}

Configuration file

debug: true
plugins:
  - meilisync.plugin.Plugin
progress:
  type: file
source:
  type: mongo
  host: mongodb+srv://cluster_IP.mongodb.net
  port: 27017
  username: amelie
  password: ***
  database: database_name
meilisearch:
  api_url: http://localhost:7700/
  api_key: 'masterKey'
  insert_size: 1000
  insert_interval: 10
sync:
  - table: movies
    index: movies
    fields:
      id:
      title:

Screenshots & Logs:

I put my logs in a file in case:
output_error.txt

@sanders41
Copy link
Contributor

I have been able to recreate this error with bad connection information to MongoDB (I used port 2701 instead of 27017, but any bad information would do it). I can't say for sure this is the issue here, but what I can say is in situations like this current_progress doesn't get set and causes this error at await progress.set(**current_progress) because it is None and not a dictionary.

I think the real cause of the error may getting masked at await asyncio.gather(_(), interval()). With asyncio.gather only one error is returned even if multiple error occur. To test this out I used a TaskGroup which returns an ExceptionGroup on error that contains all the errors.

        try:
            async with asyncio.TaskGroup() as tg:
                tg.create_task(_())
                tg.create_task(interval())
        except* ServerSelectionTimeoutError as e:
            logger.exception(e)
            logger.error(f"Error occurred while syncing: {e}")

In this case I now see both the error from **current_progress being None and the connection error.

@alallema can you verify that your connection information is correct in your config, and you are able to connect with it?

@long2ice TaskGroup and ExceptionGroup are only available in Python 3.11+ so unfortunately while I could use it for debugging, you can't use it while supporting 3.8+. So we can potentially find a solution for @alallema here, but unfortunately I don't have any good ideas on how to give the end user better information in the errors.

As a guess, #18 is probably a similar issue.

@alallema
Copy link
Author

Thank you so much @sanders41! I've re-tested and checked my configuration but everything seems ok, I've probably missed something. I've also switched everything over to Python 3.11 just in case.

@sanders41
Copy link
Contributor

@alallema since you are on 3.11 now can you try my modification to task groups to see if any new error information comes out? If you need help with that let me know and I can make a branch for you to test with.

@alallema
Copy link
Author

alallema commented Aug 22, 2023

Yes of course! I tried it and I get this error:

meilisync-meilisync-1  | ValidationError: 2 validation errors for Settings
meilisync-meilisync-1  | sync.0.fields
meilisync-meilisync-1  |   Field required [type=missing, input_value={'table': 'movies', 'inde... 
meilisync-meilisync-1  | 'movies', 'full': True}, input_type=dict]
meilisync-meilisync-1  |     For further information visit https://errors.pydantic.dev/2.1.2/v/missing
meilisync-meilisync-1  | sentry
meilisync-meilisync-1  |   Field required [type=missing, input_value={'debug': True, 'plugins'...movies',
meilisync-meilisync-1  | 'full': True}]}, input_type=dict]
meilisync-meilisync-1  |     For further information visit https://errors.pydantic.dev/2.1.2/v/missing
meilisync-meilisync-1 exited with code 1

I put the full error in this file output_error.txt but it's seems that it came from my configuration no Field required?

@sanders41
Copy link
Contributor

Yes, this is a Pydantic validation error saying your config is missing values. At a quick glance through I don't see which values are missing though.

@sanders41
Copy link
Contributor

Interesting, when I load your config I don't get an error. If you run this do you get an error or does it load?

import yaml

from meilisync.settings import Settings

with open("test.yml") as f:  # replace test.yml with the path to your config
    config = f.read()

settings = Settings.parse_obj(yaml.safe_load(config))
print(settings)

@alallema
Copy link
Author

alallema commented Aug 22, 2023

Alright, my bad ... I had to modify my config without taking care to do too many tests it works well now thank you very much for the script it helped me a lot.
So now my configuration is ok, I'm in Python 3.11 and I add your modification maybe not in the right place because I still have the same issue. I replace line 139 in main.py with your code

meilisync-meilisync-1  | 2023-08-22 15:34:07.768 | DEBUG    | meilisync.main:_:37 - plugins=['meilisync.plugin.Plugin'] progress=Progress(type=<ProgressType.file: 'file'>) debug=True source=Source(type=<SourceType.mongo: 'mongo'>, database='cluster0', host='mongodb+srv://cluster0.oulh76d.mongodb.net', port=27017, username='amelie', password='52RXb1vzuG6sPipk') meilisearch=MeiliSearch(api_url='http://localhost:7700/', api_key='masterKey', insert_size=1000, insert_interval=10) sync=[Sync(plugins=[], table='movies', pk='id', full=False, index='movies', fields={'id': None, 'title': None})] sentry=None
meilisync-meilisync-1  | 2023-08-22 15:34:08.459 | INFO     | meilisync.main:_:102 - Start increment sync data from "SourceType.mongo" to MeiliSearch...
meilisync-meilisync-1  | 2023-08-22 15:34:18.472 | ERROR    | meilisync.main:interval:133 - meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType
meilisync-meilisync-1  | Traceback (most recent call last):
meilisync-meilisync-1  | 
meilisync-meilisync-1  |   File "/usr/local/bin/meilisync", line 6, in <module>
meilisync-meilisync-1  |     sys.exit(app())
meilisync-meilisync-1  |     │   │    └ <typer.main.Typer object at 0x4003512910>
meilisync-meilisync-1  |     │   └ <built-in function exit>
meilisync-meilisync-1  |     └ <module 'sys' (built-in)>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/typer/main.py", line 311, in __call__
meilisync-meilisync-1  |     return get_command(self)(*args, **kwargs)
meilisync-meilisync-1  |            │           │      │       └ {}
meilisync-meilisync-1  |            │           │      └ ()
meilisync-meilisync-1  |            │           └ <typer.main.Typer object at 0x4003512910>
meilisync-meilisync-1  |            └ <function get_command at 0x40038e4400>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
meilisync-meilisync-1  |     return self.main(*args, **kwargs)
meilisync-meilisync-1  |            │    │     │       └ {}
meilisync-meilisync-1  |            │    │     └ ()
meilisync-meilisync-1  |            │    └ <function TyperGroup.main at 0x40038b2c00>
meilisync-meilisync-1  |            └ <TyperGroup callback>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/typer/core.py", line 778, in main
meilisync-meilisync-1  |     return _main(
meilisync-meilisync-1  |            └ <function _main at 0x40038b1c60>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/typer/core.py", line 216, in _main
meilisync-meilisync-1  |     rv = self.invoke(ctx)
meilisync-meilisync-1  |          │    │      └ <click.core.Context object at 0x400ebc3190>
meilisync-meilisync-1  |          │    └ <function MultiCommand.invoke at 0x4003039800>
meilisync-meilisync-1  |          └ <TyperGroup callback>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
meilisync-meilisync-1  |     return _process_result(sub_ctx.command.invoke(sub_ctx))
meilisync-meilisync-1  |            │               │       │       │      └ <click.core.Context object at 0x400ba7a310>
meilisync-meilisync-1  |            │               │       │       └ <function Command.invoke at 0x40030391c0>
meilisync-meilisync-1  |            │               │       └ <TyperCommand start>
meilisync-meilisync-1  |            │               └ <click.core.Context object at 0x400ba7a310>
meilisync-meilisync-1  |            └ <function MultiCommand.invoke.<locals>._process_result at 0x400ebd67a0>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
meilisync-meilisync-1  |     return ctx.invoke(self.callback, **ctx.params)
meilisync-meilisync-1  |            │   │      │    │           │   └ {}
meilisync-meilisync-1  |            │   │      │    │           └ <click.core.Context object at 0x400ba7a310>
meilisync-meilisync-1  |            │   │      │    └ <function start at 0x400ebd44a0>
meilisync-meilisync-1  |            │   │      └ <TyperCommand start>
meilisync-meilisync-1  |            │   └ <function Context.invoke at 0x4002ee7b00>
meilisync-meilisync-1  |            └ <click.core.Context object at 0x400ba7a310>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
meilisync-meilisync-1  |     return __callback(*args, **kwargs)
meilisync-meilisync-1  |                        │       └ {}
meilisync-meilisync-1  |                        └ ()
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
meilisync-meilisync-1  |     return callback(**use_params)  # type: ignore
meilisync-meilisync-1  |            │          └ {'context': <click.core.Context object at 0x400ba7a310>}
meilisync-meilisync-1  |            └ <function start at 0x400ebd4220>
meilisync-meilisync-1  | 
meilisync-meilisync-1  |   File "/meilisync/meilisync/main.py", line 148, in start
meilisync-meilisync-1  |     asyncio.run(run())
meilisync-meilisync-1  |     │       │   └ <function start.<locals>.run at 0x400ebd6ca0>
meilisync-meilisync-1  |     │       └ <function run at 0x4002680400>
meilisync-meilisync-1  |     └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'>
meilisync-meilisync-1  | 
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
meilisync-meilisync-1  |     return runner.run(main)
meilisync-meilisync-1  |            │      │   └ <coroutine object start.<locals>.run at 0x400ebdd640>
meilisync-meilisync-1  |            │      └ <function Runner.run at 0x4002cfb560>
meilisync-meilisync-1  |            └ <asyncio.runners.Runner object at 0x400ea924d0>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
meilisync-meilisync-1  |     return self._loop.run_until_complete(task)
meilisync-meilisync-1  |            │    │     │                  └ <Task pending name='Task-4' coro=<start.<locals>.run() running at /meilisync/meilisync/main.py:141> wait_for=<Future pending ...
meilisync-meilisync-1  |            │    │     └ <function BaseEventLoop.run_until_complete at 0x4002cf91c0>
meilisync-meilisync-1  |            │    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
meilisync-meilisync-1  |            └ <asyncio.runners.Runner object at 0x400ea924d0>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 640, in run_until_complete
meilisync-meilisync-1  |     self.run_forever()
meilisync-meilisync-1  |     │    └ <function BaseEventLoop.run_forever at 0x4002cf9120>
meilisync-meilisync-1  |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
meilisync-meilisync-1  |     self._run_once()
meilisync-meilisync-1  |     │    └ <function BaseEventLoop._run_once at 0x4002cfaf20>
meilisync-meilisync-1  |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once
meilisync-meilisync-1  |     handle._run()
meilisync-meilisync-1  |     │      └ <function Handle._run at 0x4002680ea0>
meilisync-meilisync-1  |     └ <Handle Task.task_wakeup(<Future finished result=None>)>
meilisync-meilisync-1  |   File "/usr/local/lib/python3.11/asyncio/events.py", line 80, in _run
meilisync-meilisync-1  |     self._context.run(self._callback, *self._args)
meilisync-meilisync-1  |     │    │            │    │           │    └ <member '_args' of 'Handle' objects>
meilisync-meilisync-1  |     │    │            │    │           └ <Handle Task.task_wakeup(<Future finished result=None>)>
meilisync-meilisync-1  |     │    │            │    └ <member '_callback' of 'Handle' objects>
meilisync-meilisync-1  |     │    │            └ <Handle Task.task_wakeup(<Future finished result=None>)>
meilisync-meilisync-1  |     │    └ <member '_context' of 'Handle' objects>
meilisync-meilisync-1  |     └ <Handle Task.task_wakeup(<Future finished result=None>)>
meilisync-meilisync-1  | 
meilisync-meilisync-1  | > File "/meilisync/meilisync/main.py", line 131, in interval
meilisync-meilisync-1  |     await progress.set(**current_progress)
meilisync-meilisync-1  |           │        │     └ None
meilisync-meilisync-1  |           │        └ <function File.set at 0x400bd944a0>
meilisync-meilisync-1  |           └ <meilisync.progress.file.File object at 0x400e9456d0>
meilisync-meilisync-1  | 
meilisync-meilisync-1  | TypeError: meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType
meilisync-meilisync-1  | 
meilisync-meilisync-1  | 2023-08-22 15:34:18.549 | ERROR    | meilisync.main:interval:134 - Error when insert data to MeiliSearch: meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType
meilisync-meilisync-1  | 2023-08-22 15:34:28.560 | ERROR    | meilisync.main:interval:133 - meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType

@sanders41
Copy link
Contributor

I expect it will still error, but I'm hoping to get more error information. I was expecting to see something like this with the change. I purposefully created an error here so yours will look different, but I was expecting to see an ExceptionGroup

> File "/home/paul/development/python/meilisync/meilisync/main.py", line 130, in run
    async with asyncio.TaskGroup() as tg:
               │       │              └ <TaskGroup cancelling>
               │       └ <class 'asyncio.taskgroups.TaskGroup'>
               └ <module 'asyncio' from '/home/paul/.pyenv/versions/3.11.4/lib/python3.11/asyncio/__init__.py'>

  File "/home/paul/.pyenv/versions/3.11.4/lib/python3.11/asyncio/taskgroups.py", line 147, in __aexit__
    raise me from None
          └ ExceptionGroup('unhandled errors in a TaskGroup', [ServerSelectionTimeoutError("127.0.0.1:2701: [Errno 111] Connection refuse...

ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2023-08-22 12:13:57.559 | ERROR    | meilisync.main:run:135 - Error occurred while syncing: unhandled errors in a TaskGroup (1 sub-exception)

Just to be sure we made the same change, my async def run(): in main.py looks like this after the change:

    async def run():
        nonlocal lock
        lock = asyncio.Lock()
        from pymongo.errors import ServerSelectionTimeoutError

        try:
            async with asyncio.TaskGroup() as tg:
                tg.create_task(_())
                tg.create_task(interval())
        except* ServerSelectionTimeoutError as e:
            logger.exception(e)
            logger.error(f"Error occurred while syncing: {e}")

@alallema
Copy link
Author

I expect it will still error, but I'm hoping to get more error information.

Yes, me too, to be honest but I just get this:

meilisync-meilisync-1  | 2023-08-23 08:38:41.674 | INFO     | meilisync.main:_:101 - Start increment sync data from "SourceType.mongo" to MeiliSearch...
meilisync-meilisync-1  | 2023-08-23 08:38:51.684 | ERROR    | meilisync.main:interval:132 - meilisync.progress.file.File.set() argument after ** must be a mapping, not NoneType

Just to be sure we made the same change

We get exactly the same instead I put the import at the beginning of the file.

I will try to keep digging but thank you so much for your help. Did synchronization work well with Mongo for you?

@sanders41
Copy link
Contributor

I have not gotten it to work, mine always fails with Mongodb and the replication set setup. It has nothing to do with Meilisync, just my lack of MongoDB sys admin knowledge 😄

@alallema
Copy link
Author

mine always fails with Mongodb and the replication set setup

Me too! I've used the free version of MongoDB Cloud, and I've also finally managed to create a docker-compose that seems to work in replication.

@babarburiro
Copy link

You need to define the file path to a dummy json with {} in it... or you could setup redis and use type: redis
I got it to work by creating the dummy json
for example:
progress:
type: file
path: '/workspace/crosxa/melisync/c.json'

@alallema
Copy link
Author

alallema commented Sep 12, 2023

Hi @babarburiro,
Thanks for your comment, though we can see from the README that the default file is created if it doesn't exist at progress.json.
This configuration file works very well with SQL and Postgres, so I doubt the problem comes from there.

@babarburiro
Copy link

Hi @babarburiro, Thanks for your comment, though we can see from the README that the default file is created if it doesn't exist at progress.json. This configuration file works very well with SQL and Postgres, so I doubt the problem comes from there.

Yes but it doesn't create the file for MongoDB.

@LetTheComputerDecide
Copy link

Thanks, @babarburiro, your suggestion solved my issue. I believe this is a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants