Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Write mode for topen #36

Closed
2 tasks
roll opened this issue Jan 8, 2016 · 10 comments · Fixed by #80
Closed
2 tasks

Write mode for topen #36

roll opened this issue Jan 8, 2016 · 10 comments · Fixed by #80
Assignees
Labels
Milestone

Comments

@roll
Copy link
Member

roll commented Jan 8, 2016

Overview

It's a long shot but eventually I suppose it could be implemented.

Having a task to write some tabular data to the filesystem is casual. With all the boilerplate code to support py2/3, csv verbose interface etc - it's a little bit annoying.

Analysis

Interface could be:

with topen('table.csv', mode='w') as table:
    table.write(data)

Implementation could be:

  • topen returns ReadTable or WriteTable regarding to the mode='r/w'
  • for writing there will be new modules like Formatter (anti-parser) and Writer (anti-loader) with the same modular arhictecture for different targets and formats.

So we will be able to have memory lean things like:

with (topen('http://site.com/source.xls') as source, 
      topen('target.csv', mode='w') as target):
    target.write(source)

Or even with something like tcopy helper:

tcopy(data, 'target.csv')
tcopy('http://site.com/source.xls', 'target.csv')

Tasks

@roll roll added the feature label Jan 8, 2016
@pwalsh pwalsh added this to the Backlog milestone Jan 10, 2016
@pwalsh
Copy link
Member

pwalsh commented Jan 10, 2016

@roll excellent idea. Presumably this could DRY code for exporting to Data Package from other data stores, like SQL, BigQuery, etc.

@roll
Copy link
Member Author

roll commented Jan 11, 2016

Yes, I've written too many boilerplate code lately 😃

@pwalsh
Copy link
Member

pwalsh commented Aug 7, 2016

Is this really the same as #50? I guess internally, both could be supported by the same processor? I'm just a bit worried that by closing #50 we've lost the particularity of that request.

@roll
Copy link
Member Author

roll commented Aug 7, 2016

@pwalsh
I've added it to the tasks list. I think it's exactly what we need - on a write stage we will be able to set encoding so any recoding could be possible.

@pwalsh
Copy link
Member

pwalsh commented Aug 7, 2016

@roll I see, but it still seems to me that a high-level write interface is different from a write processor, used in the read interface, to create a new, recoded file. No?

@roll
Copy link
Member Author

roll commented Aug 7, 2016

@pwalsh
I've re-opened those issue because if something like this is not enough:

with topen('source.xls') as source:
  with topen('target.csv', mode='w', encoding='utf-8') as target:
    target.write(source)

than it's really different. For now it's just not clear from high-level requirements why you need it as a side effect using processor. This processor will be much less powerful than general writing system and will require some duplication.

@pwalsh
Copy link
Member

pwalsh commented Aug 7, 2016

@roll

Maybe it is, not sure. Imagine piping data through a chain of processors like so:

source -> structure | schema | recoder | writer

the recoder just recodes the stream, and is followed by a final processor writing the stream to some new destination.

@roll
Copy link
Member Author

roll commented Aug 7, 2016

@pwalsh
After loader and parser there is no encodings - it's python objects. So I suppose this WriteTable functionality will be your recoder from pipeline.

So let see closer to real design proposals 😃
I've reopened the recoding issue to be sure.

@pwalsh
Copy link
Member

pwalsh commented Aug 7, 2016

@roll ok, if that is the internal design. In your examples above, would the writing require the file contents to be loaded to memory? The API description looks like yes, but that would be a mistake IMHO.

@roll
Copy link
Member Author

roll commented Aug 7, 2016

@pwalsh
Just lazy example) We use streams here) So it should be memory lean.

After we will finish it my idea to create example like loading 1GB xls from the web and saving it to csv file with memory profiler showing we don't use memory)

@roll roll added the priority label Aug 8, 2016
@roll roll removed this from the tools-v1 milestone Aug 8, 2016
@roll roll modified the milestone: tabulator-v1 Aug 9, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants