Add option for tabular output format #830

pombredanne · 2017-11-01T10:20:32Z

.... to have only a single row per file. This may be a tad messy at times as some cells may contain the data for different scans, but it may be handy too.

yash-nisar · 2018-01-29T11:07:02Z

Can you elaborate a bit with an example @pombredanne ?

pombredanne · 2018-02-09T09:15:44Z

Try to run a CSV output for a file that contains more than one emails, copyrights and licenses. Use -clipe as a scan option, you will see possibly lines that are mostly empty

pombredanne · 2018-11-04T19:34:35Z

from @pombredanne in #1236

I find it conveient at times to have a CSV with a single row for each file with data compressed in single cells rather than multiple cells across rows. The CSV output from ScanCode for any Scan other than -info is no longer well structured and may exceed the limits of spreadsheet tools but even then I could make use of this for a quick glance at results

from @mjherzog #1236 (comment)

Another important topic here is how to create an initial analysis workfile (spreadsheet) for a project. The current -info format is "clean" and simple in terms of one CSV row per file or directory, but then you need to cut and paste any copyright or license data from AbC Manager or another CSV Scan file (-cli, -clip or -clipeu) and in that case the relevant copyright and license data will be spread across multiple rows.

mjherzog · 2020-04-03T00:00:04Z

When you convert a Scan JSON file (run with any options beyond --info) to CSV format you end up with a file that is almost impossible to use for any type of analysis in Excel/LibreCalc because of the arbitrary number of rows per Resource when the data for that Resource includes one or more lists. The primary use case for CSV output is to create a workfile where you can record your analysis in more depth than the Conclusions feature of SC Workbench.

The first step is to design the target output because some data elements will probably never be well-suited to flattening a list into a single cell - the most obvious examples are correlated fields like:

Copyright Statements / Copyright Start Line / Copyright End Line
License Key / License Score / License Start Line / License End Line
Email / Email Start Line / Email End Line
URL / URL Start Line / URL End Line

So it may be the case that the best CSV output will be two types of files:

Data that fits the one row per Path/Resource model
Data that has correlated fields with the minimal Path/Resource data needed to identify the item

There seem to be three options for implememtation:

Direct output from SCTK
Refactoring of the current JSON to CSV conversion within SCTK
Post-scan plugin

pombredanne added GUI and outputs new feature labels Nov 1, 2017

pombredanne mentioned this issue Nov 4, 2018

Add option to have a fully "flat" csv output #1236

Closed

pombredanne added this to the v3.1 milestone Nov 5, 2018

pombredanne mentioned this issue Nov 5, 2018

CSV output always returns at least two rows #829

Open

pombredanne modified the milestones: v3.1 Documentation, documentation, documentation, v3.2 Feb 16, 2019

mjherzog added the should have label Apr 2, 2020

mjherzog assigned mjherzog, pombredanne and chinyeungli and unassigned mjherzog Apr 2, 2020

mjherzog unassigned chinyeungli Apr 3, 2020

mjherzog changed the title ~~Add option to "flatten" a CSV output~~ Add option for tabular output format Apr 3, 2020

pombredanne removed this from the v3.3 milestone Sep 24, 2021

pombredanne mentioned this issue Aug 5, 2022

RFC: Improve tabular output formats #3043

Open

17 tasks

pombredanne added this to the v32.1 milestone Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for tabular output format #830

Add option for tabular output format #830

pombredanne commented Nov 1, 2017

yash-nisar commented Jan 29, 2018

pombredanne commented Feb 9, 2018

pombredanne commented Nov 4, 2018

mjherzog commented Apr 3, 2020 •

edited

Loading

Add option for tabular output format #830

Add option for tabular output format #830

Comments

pombredanne commented Nov 1, 2017

yash-nisar commented Jan 29, 2018

pombredanne commented Feb 9, 2018

pombredanne commented Nov 4, 2018

mjherzog commented Apr 3, 2020 • edited Loading

mjherzog commented Apr 3, 2020 •

edited

Loading