-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
This is how I do create a table in a docx document currently.
doc = docx.Document()
tab = doc.add_table(rows=300, cols=5)
So the table object is "connected" to its parent document.
Is there a way to create a table object without having a connection to a parent object and add it to the document later? Somehow like this?
doc = docx.Document()
tab_one = docx.Table(rows=300, cols=5)
tab_two = docx.Table(rows=100, cols=3)
doc.add_table(tab_two)
doc.add_table(tab_one)
Or (as a workaround) can I move a table object from one document instance to another like this?
doc_temp = docx.Document()
tab = doc_temp.add_table(rows=300, cols=5)
doc_main = docx.Document()
doc_main.add_table(tab)
The background of my question is that I do create multiple tables with 100-300 rows and do formatting operations on each of its cells. So there is a lot of row and cell iterations going on which cost a lot of performance.
Doing this in multiprocessing where each worker has its own table object would increase the performance. I would like to create multiple tables in parallel and adding them to the document in a later step.
It is also clear that multiprocessing isn't the whole and best solution for a performance problem. Such a problem isn't solved just with adding more CPU resources into it. The algorithm itself should be optimized. For me the mutliprocessing is just one step of the way to a better soltion.
EDIT: As a real world example here you can see how I create docx-tables based on pandas.DataFrame
objects.