-
-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Improve tabular output formats #3043
Comments
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
I was looking into https://www.python-excel.org/ for what we can use to implement this: https://foss.heptapod.net/openpyxl/openpyxl/ has Others are libraries that only use the standard library, no other requirements: https://github.com/jmcnamara/XlsxWriter seems the best choice, actively maintained and lots of functionality. Author maintains tools in other languages for the same too :P https://github.com/python-excel/xlwt was widely used but this is for old excel formats and not actively maintained anymore. @pombredanne what do you think? |
@ayan Sinha Mahapatra IMHO just reuse what is used in SCIO |
Not arguing that the current CSV file is unwieldy, but it's easier to automatically process than multiple-tabs. It contains all the data, which is sometimes what you want in a hurry. My other approach has been to process the json into a custom CSV, but sometimes that's a hassle. Would you consider keeping the existing file around as |
@rspier you wrote:
of course, but then we can may be design it so that it has everything AND not too much at the same time, so this is compact and efficient to review? For instance, returning the start and end lines of copyrights and license matches may not be needed there, and we could design something that has the key data that could fit on a single row per file? |
I think "compact and efficient to review" might be something that comes from the other views, while the "big one" is for those cases where you want all the data, or things that aren't in the other formats. It's significantly easier to hide/remove data than it is to add/merge it back in, which is one reason I'd lean towards having a CSV with too much info. For example, I often use There are a lot of nested fields that I would drop before I dropped start and end line. For example:
are highly repetitive of each other and other fields (like also An alternative idea would be to leverage |
The current CSV doesn't include |
@armijnhemel re:
The problem is that this is too big in practice to be routinely included in a CSV... all commercial and libre spreadsheets I know off will choke with an AGPL matched text :]
All tools start to choke with a few 1000. The way out is IMHO using ScanCode.io or the ScanCode workbench that both display the license detection and matched text loaded from a JSON scan even if you do not have the original scanned code. |
The current CSV output is a mess, albeit a convenient mess. We need something and quick. I suggest these short term and long term actions
For now in v31:
--csv
option so that we do not further the mess in r31.... drop it entirely in v32 #3047The deprecation message should have a link to this PR.
In v32:
Create a new
--csv-file
option that would only list file level details in this way:Create a new
--csv-package
option that would only list package details:Create a new
--csv-dependency
option that would only list dependency details:Create a new
--csv-license
option that would only list file level license scan information, used for debugging and hidden from the CLI help:Drop the hidden
--csv
option in v32Add new XLSX option that creates a proper spreadsheet with multiples tabs
where these essentially mirror the new csv-* options
For reference, we have these related issues:
--from-json
throws error when trying to covert to CSV from a package id's json file #1398--license-text
option only works when output is json #1330 shows we have documentation issuesThe text was updated successfully, but these errors were encountered: