Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Implement sql writer #164

Closed
4 tasks
roll opened this issue May 10, 2017 · 13 comments
Closed
4 tasks

Implement sql writer #164

roll opened this issue May 10, 2017 · 13 comments

Comments

@roll
Copy link
Member

roll commented May 10, 2017

Overview

For now tabulator support only stream.save(format='csv') to csv format. It's pretty easy to implement sql writer just porting writers.csv.CSVWriter to writers.sql.SQLWrter.

What we're aiming for:

from tabulator import Stream

with Stream('data.xls', headers=1) as stream:
  stream.save('postgresql://user:pass@host:5432/database', table='excel_export')

And of course it will be a pretty cool and useful feature 👍

Plan

  • port writers.csv.CSVWriter to writers.sql.SQLWrter
  • register new writer in config.py
  • add writing tests to tests.formats.sql
  • mention writing ability in readme sql format section
@pwalsh
Copy link
Member

pwalsh commented May 23, 2017

@roll @akariv how about an SQL reader too? I thought a reader had already been discussed, but I can't find it. We need it for some Frictionless Data piloting work, and @danfowler has expressed interest in implementing it.

@akariv
Copy link
Member

akariv commented May 23, 2017 via email

@akariv
Copy link
Member

akariv commented May 23, 2017

@pwalsh
Copy link
Member

pwalsh commented May 23, 2017

@danfowler see above

@pwalsh
Copy link
Member

pwalsh commented May 23, 2017

thanks @roll and @akariv

@roll
Copy link
Member Author

roll commented May 23, 2017

@pwalsh
@danfowler
@CallMeAlien
We now have readme withh all schemes, formats, options etc in details - https://github.com/frictionlessdata/tabulator-py/blob/master/README.md#sql

@danfowler
Copy link

@roll @akariv @CallMeAlien @pwalsh to clarify: for the DM4T pilot, one of the datasets (ENLITEN) was provided as a MySQL dump. Given that one of the goals of this DM4T project more generally is to "make your data public to the rest of the public and beyond" (and that publishing a SQL dump is not super friendly), I thought there might be value in going straight from a SQL database directly to a Data Package.

I initially tried to use jsontableschema-sql-py directly, but:

  1. There were some issues with the conversion
  2. SQL type support needs to be better (I manually dropped/edited some of the source tables to make it sort of work)

Given that the publisher of this kind of data would want do make some edits to the published Data Package (like dropping user tables, adding metadata, etc.) without needing to do much programming directly, I suppose what probably makes more sense is to do this with some higher level tool, like datapackage-pipelines where you can have an SQL connection as the source. I suppose what one would need to implement is a datapackage_pipelines_sql plugin. How long do you think that would take for a new person to the codebase @akariv?

@akariv
Copy link
Member

akariv commented May 29, 2017

@danfowler there's no need for a datapackage_pipelines_sql plugin, as now you can specify resource URLs which are SQL connection strings directly (using tabulator's built-in support)

@danfowler
Copy link

@akariv thanks! That helps so much with understanding how these pieces fit together 😄 .

/me rushing off to add some SQL connections strings to some YAML

@roll roll mentioned this issue Jun 4, 2018
4 tasks
@roll roll added this to Software in Frictionless General Mar 19, 2019
@roll roll removed the {contribute} label May 20, 2019
@roll roll added the contribute label Oct 2, 2019
@eyalhei
Copy link
Contributor

eyalhei commented Oct 12, 2019

Hi, I would like to take a crack at this, is that OK?

@akariv
Copy link
Member

akariv commented Oct 12, 2019

Go ahead @eyalhei !

@roll
Copy link
Member Author

roll commented Oct 14, 2019

That's great @eyalhei

Please take a look at #273 (comment) (and this comment especially) to ensure that the issue is properly described (probably it wasn't for the JSON writer).

The test from the comment I linked could be easily updated to be a POC SQL writer test (round-trip using SQL as an intermediate format)

@roll
Copy link
Member Author

roll commented Oct 21, 2019

DONE in #276

@roll roll closed this as completed Oct 21, 2019
Frictionless General automation moved this from Software (core) to Done Oct 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants