-
Notifications
You must be signed in to change notification settings - Fork 212
Improve DataPipeline API #67
Comments
To be precise, So the first one does have some degree of conflict with the dataset, but the second two do not. About the proposalI generally like the idea. I can see us having to duplicate collate logic between tasks with different uncollate logic. But doing it with mixins might grow to be confusing. See this example: class A:
def a(self):
print('a')
class B:
def b(self):
print('b')
class C(A, B):
...
x = C()
x.a() # a
x.b() # b
# great!
class D:
def a(self):
print('d')
class C(D, B): ...
x = C()
x.a() # d
x.b() # b
# great!
class E:
def a(self):
print('e')
# If they subclass C, now order matters
class F(E, C):
...
class G(C, E):
...
x = F()
x.a() # e
x.b() # b
x = G()
x.a() # d eek!
x.b() # b so yeah... if this grows it will become a nightmare to follow |
A better solution is to do: class DataPipeline:
def __init__(self, collate: CollatePipeline, uncollate: UncollatePipeline):
self.collate = collate
self.uncollate = uncollate
def before_collate(self, ...):
self.collate.before_collate(...)
... |
Yes, and let's create the default for each data-type and data-type task. |
So people are just left to implement uncollate_fn |
Note that this is stepping into over-engineering territory. As of right now, there is no real duplication to warrant this extra abstraction but as we implement new tasks we will find if this is worth |
Users should be able to modify the preprocessing step (on the GPU preferably) in after the dataloading/batching and before the model execution. There should be a overridable "batch preprocessing" function defined in the datapipeline that is called unconditionally before the model when running it for either training or inference or maybe split by training or inference |
Adding @carmocca and @kaushikb11 as reviewers! |
@tchaton DataPipeline was already merge. Can we close this ? |
🚀 Feature
Motivation
Reason:
default DataPipeline
create text classification data pipeline.
Pitch
Alternatives
Additional context
The text was updated successfully, but these errors were encountered: