Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/ingest to embed message exchange #113

Merged
merged 5 commits into from
Mar 15, 2024

Conversation

gecBurton
Copy link
Collaborator

@gecBurton gecBurton commented Mar 15, 2024

Context

In #108 I erroneously introduced an error where I put a Chunk rather than a EmbedQueueItem message into the embed queue

Changes proposed in this pull request

  1. in the ingest app i now put an EmbedQueueItem onto the embed queue
  2. I have also simplified the emebed app

Guidance to review

docker compose down
docker compose build ingest embed
docker compose up -d  elasticsearch minio rabbitmq ingest embed core-api

upload a file via: http://0.0.0.0:5002/docs#/file/create_upload_file_file_post

watch http://localhost:15672/#/queues and see that:

  1. the ingest queue grows to 1 message
  2. the ingest queue falls back to 0 messages and the embed queue grows
  3. the embed queue shrinks as the embed worker processes docker compose logs -f emebd

Relevant links

n/a

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed

docker-compose.yml Outdated Show resolved Hide resolved
@@ -59,11 +59,12 @@ def ingest_file(self, file: File):
logging.info(f"written {len(items)} chunks to elasticsearch")

for chunk in chunks:
queue_item = EmbedQueueItem(model=env.embedding_model, sentence=chunk.text)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key change


poll_thread = threading.Thread(target=poll_queue_every, args=(queue_uri, queue_name, queue_poll_interval))
poll_thread.start()
thread = threading.Thread(target=subscribe_to_queue)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we always want to run this thread?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree


connection = None

for i in range(max_connection_attempts):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pika already handles retries

@@ -171,62 +161,24 @@ def embed_sentences(model: str, sentences: list[str]):
return output


def poll_queue_every(queue_uri: str, queue_name: str, interval: int = 5):
Copy link
Collaborator Author

@gecBurton gecBurton Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive tested this IRL, there is no need to poll as pika handles this via the callback mechanism. see guidance-for-review

ch.basic_ack(delivery_tag=method.delivery_tag)
ch.stop_consuming()
finally:
ch.basic_ack(delivery_tag=method.delivery_tag)
Copy link
Collaborator Author

@gecBurton gecBurton Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is basic_nack then the unprocessed items remain in the queue.... tbh i wasnt sure what we want to do here?

@gecBurton gecBurton changed the title Bugfix/ingest to embed message exachange Bugfix/ingest to embed message exchange Mar 15, 2024

poll_thread = threading.Thread(target=poll_queue_every, args=(queue_uri, queue_name, queue_poll_interval))
poll_thread.start()
thread = threading.Thread(target=subscribe_to_queue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@gecBurton gecBurton merged commit 094ac7b into main Mar 15, 2024
5 checks passed
@gecBurton gecBurton deleted the bugfix/ingest-to-embed-message-exachange branch March 15, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants