-
-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate CSV linter #3493
Comments
Hi @wesley-dean-flexion :) That seems to be a good idea , you have my go to start implementing :) About the complexity to call csvkit, you might need to create a python class to handle it :) |
(apologies.. I muscle-memoried Started with csv-clean which is not yet ready for anything. A few questions:
so, this is where a wrapper which would live in the linters directory would reside... right? The Python class would look to see if we're running in fix mode or not and apply the |
I'm working with @jpmckinney on some interface changes (wireservice/csvkit#1239) that ought to simplify this integration. As a result, when v2.0.0 comes out, a lot of what I wrote before will no longer matter. Additionally, I submitted wireservice/csvkit#1240 to containerize the tool and publish official images that could be used instead of building the tool via |
This issue has been automatically marked as stale because it has not had recent activity. If you think this issue should stay open, please remove the |
I'm waiting on a PR approval from the csvkit folks so I can move forward with this. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this issue should stay open, please remove the |
see the aforementioned |
This issue has been automatically marked as stale because it has not had recent activity. If you think this issue should stay open, please remove the |
This issue has been automatically marked as stale because it has not had recent activity. If you think this issue should stay open, please remove the |
the PR (wireservice/csvkit#1240) was approved. I just need to do the thing. |
please don't ding me, stalebot... |
This issue has been automatically marked as stale because it has not had recent activity. If you think this issue should stay open, please remove the |
@wesley-dean-flexion what's the status 🥳 |
tl;dr: I would like to add
csvclean
(from csvkit) as a linter. I'm happy to do the work if people think this is a good idea.Is your feature request related to a problem? Please describe.
One of my repos includes CSV files and they can sub-optimal. Just as we have linters (and reformaters) for JSON, XML, and YAML, I would like to add a CSV linter.
Describe the solution you'd like
There exists a package, csvkit, that includes a tool to lint and cleanup CSV files:
https://csvkit.readthedocs.io/en/latest/scripts/csvclean.html
I would like to add
CSV_CSVCLEAN
(the name isn't consequential to me; I just picked it because ofYAML_YAMLLINT
) that would lint the list of files. When run withAPPLY_FIXES
, it would not include the--dry-run
flag tocsvclean
; when run withoutAPPLY_FIXES
set, it would include the--dry-run
flag).Running
csvclean
on a CSV file results in two files being created, per the documentationI noticed that running
csvclean
on a known messy file (i.e., one that produces errors due to being not totally valid) will NOT set$?
but it will generate[basename]_err.csv
, so something like this might be helpful:(so
$?
is set,$@
would have the additional arguments,$APPLY_FIXES
is set to something when we want to fix stuff, etc.. point: a little "syntactic sugar" could be helpful in making it work the way we want.)Describe alternatives you've considered
I haven't thought through very many alternatives. I did look through prettier to see if it could clean up CSV like it can for YAML and such; however, it does not appear to have that functionality. If it's there and I missed it, cool, there's that much less work that needs to be done.
Additional context
There are a few images on Docker Hub that provide csvkit, but they're largely several years old. For what it's worth, csvkit regularly provides revisions, the most recent of which (latest / v1.5.0) was released on 28 March, 2024. (point: existing images are behind the current release). I can put together a pipeline to watch the csvkit repo for new releases and package / publish updated images.
I'm happy to do the work to implement this and submit a PR assuming folks are cool with the idea.
The fact that CSV has a bunch of limitations, that JSON, TOML, XML, or YAML (etc.) may be a better match to represent data. That's given and I don't dispute it. Unfortunately, it's not my call about how the data are represented but I do have responsibilities to make sure the pipeline from developer to production detects (and notifies me on) as much noise as possible.
The text was updated successfully, but these errors were encountered: