Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Data importer #6911

Open
wants to merge 172 commits into
base: master
Choose a base branch
from

Conversation

SchrodingersGat
Copy link
Member

@SchrodingersGat SchrodingersGat commented Apr 2, 2024

This PR represents a major change in approach to how we import / export data.

Previously we have relied on external libraries for importing / exporting - which all come with various downsides.

The main idea of this PR is to utilize the existing DRF API framework for both importing and exporting of data. This should provide the following major advantages:

  • Significantly reduce code duplication
  • Hook in to existing API framework which is well tested
  • Improve speed of import / export
  • Allow finer control over import / export processes
  • Allow specification of custom import hooks
  • Expose import / export processes to custom plugins

Framework

Uses existing DRF serializers, just need to register with the @register_importer() decorator.

Related Issues

TODO

  • Remove the APIDownloadMixin class (and all references)
  • Remove the download_queryset method in a many existing views
  • Use database model verbose_name if serializer does not provide a label attribute
  • Add / update unit testing for new dataset export approach
  • Add unit testing for new dataset import apprpoach
  • Add worker task to cleanup old import / export sessions
  • Add unit test for cleanup backup task
  • Ensure user has correct permissions before creating a new import session
  • Code coverage for additional code
  • Ensure all 'required' fields are mapped before progressing
  • Allow user to nagivate back to previous data steps
  • Add data import section to "admin center"

Further Work

  • Expose import / export functions to plugins
  • Allow plugins to define custom export fields / processes
  • Tag all "model resource" classes as deprecated and will be removed in further release
  • Profiling for new data export functionality, to determine where we can optimize
  • Allow bulk data export via django admin interface
  • Allow data import via django admin interface
  • Run data export in background worker
  • Replace existing "BOM Import" tool with new importer
  • Replace existing "BOM export" tool with new exporter
  • Replace existing "part import" tool with new importer
  • Replace existing "purchase order import" tool with new importer
  • Remove django-import-export framework entirely
  • Allow bulk "update" via import (override current records, not create new ones)

Expose to admin interface also
- Use @register_importer tag for any serializer class
- Do not use one-time hard-coded values here
- Must be importable by tablib
- Remove "progress" field (will be calculated)
- Added "timestamp" field
- Added "complete" field to DataImportRow
- Provide "sensible" default values
- For large data files this may take a significant amount of time
- Offload it to the background worker process
- Add "columns" field to DataImportSession
- Add "errors" field to DataImportRow
- Ignore importer models in import/export
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Relates to the API enhancement This is an suggested enhancement or new feature Fund This issue can be specifically funded for development import / export Data importing, exporting and processing roadmap This is a roadmap feature with no immediate plans for implementation
Projects
None yet
4 participants