Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for tabular output format #830

Open
Tracked by #3043
pombredanne opened this issue Nov 1, 2017 · 4 comments
Open
Tracked by #3043

Add option for tabular output format #830

pombredanne opened this issue Nov 1, 2017 · 4 comments

Comments

@pombredanne
Copy link
Member

.... to have only a single row per file. This may be a tad messy at times as some cells may contain the data for different scans, but it may be handy too.

@yash-nisar
Copy link
Contributor

Can you elaborate a bit with an example @pombredanne ?

@pombredanne
Copy link
Member Author

Try to run a CSV output for a file that contains more than one emails, copyrights and licenses. Use -clipe as a scan option, you will see possibly lines that are mostly empty

@pombredanne
Copy link
Member Author

from @pombredanne in #1236

I find it conveient at times to have a CSV with a single row for each file with data compressed in single cells rather than multiple cells across rows. The CSV output from ScanCode for any Scan other than -info is no longer well structured and may exceed the limits of spreadsheet tools but even then I could make use of this for a quick glance at results

from @mjherzog #1236 (comment)

Another important topic here is how to create an initial analysis workfile (spreadsheet) for a project. The current -info format is "clean" and simple in terms of one CSV row per file or directory, but then you need to cut and paste any copyright or license data from AbC Manager or another CSV Scan file (-cli, -clip or -clipeu) and in that case the relevant copyright and license data will be spread across multiple rows.

@mjherzog
Copy link
Member

mjherzog commented Apr 3, 2020

When you convert a Scan JSON file (run with any options beyond --info) to CSV format you end up with a file that is almost impossible to use for any type of analysis in Excel/LibreCalc because of the arbitrary number of rows per Resource when the data for that Resource includes one or more lists. The primary use case for CSV output is to create a workfile where you can record your analysis in more depth than the Conclusions feature of SC Workbench.

The first step is to design the target output because some data elements will probably never be well-suited to flattening a list into a single cell - the most obvious examples are correlated fields like:

  • Copyright Statements / Copyright Start Line / Copyright End Line
  • License Key / License Score / License Start Line / License End Line
  • Email / Email Start Line / Email End Line
  • URL / URL Start Line / URL End Line

So it may be the case that the best CSV output will be two types of files:

  1. Data that fits the one row per Path/Resource model
  2. Data that has correlated fields with the minimal Path/Resource data needed to identify the item

There seem to be three options for implememtation:

  1. Direct output from SCTK
  2. Refactoring of the current JSON to CSV conversion within SCTK
  3. Post-scan plugin

@mjherzog mjherzog changed the title Add option to "flatten" a CSV output Add option for tabular output format Apr 3, 2020
@pombredanne pombredanne removed this from the v3.3 milestone Sep 24, 2021
@pombredanne pombredanne added this to the v32.1 milestone Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants