Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Endpoint for uploading CSV file #17

Closed
nishant-nayak opened this issue Jun 29, 2022 · 3 comments
Closed

API Endpoint for uploading CSV file #17

nishant-nayak opened this issue Jun 29, 2022 · 3 comments
Labels
question Further information is requested

Comments

@nishant-nayak
Copy link
Contributor

Is there an API endpoint to POST a .csv file to insert it into the database? Currently the only documented methods are CLI based.
As part of the National Language Translation Mission, the team at AI4Bhārat is developing an Indic language glossary to help translators and annotators in the translation of domain-specific text. We are looking to use dictpress as an open-source solution for this task, and the API endpoint to upload a dataset would be a requirement for the same.

@knadh
Copy link
Owner

knadh commented Jun 29, 2022

Hi @nishant-nayak. There isn't an API to upload bulk data, but there are APIs to insert and manage individual entries and definitions. What is your usecase? To integrate with a system to continuously add data (the entry/definition APIs are sufficient for that) or to continuously add large datasets (which is typically unlikely).

An API for CSV bulk load isn't ideal because:

  • Bulk loading is an expensive process and could take arbitrarily long time to complete. So an API request can't wait for the process to finish (broken request may cause partial import), instead, it'll have to queue the file for import.
  • This'll either need the app to have filesystem access or an in-memory blob to store the bulk data.
  • Implementing this is complex.
  • Now to check the status of a queued file, there'll have to be a status check API that would have to be polled.

Dictionaries typically do not get frequent bulk loads of data. It's almost always one off, or rare, and for that, CLI is sufficient.

@knadh knadh added the question Further information is requested label Jun 29, 2022
@nishant-nayak
Copy link
Contributor Author

Hi @knadh , the use case that we were looking at is to add large datasets to our dictionary. As you mentioned, it won't be a frequent operation so it should be doable with just the CLI. We have a similar upload feature on another application where we do implement the in-memory blob and status of a queued file using Django, but I believe with the glossary it may not be necessary.

Thanks for the clarification!

@knadh
Copy link
Owner

knadh commented Jun 30, 2022

I've implemented an asynchronous CSV->DB import system in another Go project where frequent bulk uploads are a necessity, but had decided not to bring that complexity into dictpress because of the reasons mentioned above; dictionaries seldom need continuous bulk imports.

Please do let me know if you need any help using dictpress in the glossary project. Excited to see it!

@knadh knadh closed this as completed Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants