Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop data model for open repository of software metrics #1

Open
jmatsushita opened this issue Aug 17, 2015 · 1 comment
Open

Develop data model for open repository of software metrics #1

jmatsushita opened this issue Aug 17, 2015 · 1 comment

Comments

@jmatsushita
Copy link
Member

We want to aggregate existing metrics and develop new metrics for software projects. Which data structure do we need? Do we go for API first design (apiary.io)? A data schema (JSON Schema? LD?)? Data Cube? In some ways this is a data catalogue (so maybe CKAN and the DCAT ontology are relevant) but in other ways it's more detailed and granular (with a focus at the measurement level).

Here's an ongoing effort to list potential metrics.

Here are a number of questions that relate to this:

  • How do we keep track of metrics that are relevant for open source projects (because they can be automatically collected from repositories or issue trackers) from others that might also be collected for closed source projects but that wouldn't have the same degree of trust-worthiness because they wouldn't be verifiable?
  • How do we keep track of provenance (related to the question above), collection agents (scrapers, people),...?
  • How do we differentiate metrics that relate to user facing tools versus ones that are relevant for libraries (number of dependent projects - downstream projects - is relevant for libraries but usually not for user facing tools)? How about online services?
  • How do we enable different measurement partners (usability is a good example) to develop taxonomies that are interoperable with the broader data structure? Is the Open Integrity data structure just a "skeleton" or "framework" that ties in/helps glue together other data structures?
  • What are the minimum data models needed for an MVP with Libraries.io data, TOSDR data and CVEs? See Collect and make available metrics data #2.
@jmatsushita
Copy link
Member Author

Submitted a CSV version of the list e285538

From #5, #3 maybe

{
  "metric_id": "project/package/dependencies",
  "metric_provider": "https://libraries.io",
  "metric_hosting": "cached", // other types could be "remote" to indicate that the data is not hosted by OII or "stream" to indicate that OII can serve/proxy a realtime stream. "hosted" would mean that OII is also hosting historical data (then the date range should also be there).
  "metric_type": "calculated", // provenance needs to be probably more detailed than that.
  "metric_source_start_range": "2014-05-02T10:10:00",
  "metric_source_end_range": "2015-08-20T19:45:00",
  "metric_source_api_endpoint": "https//libraries.io/api/{project_id}/dependencies",
  "metric_source_api_parameter_project_id": "project_id",
  "metric_source_api_response_path_value": "response.data.value"
}

I think that the interesting bit is going to be with provenance. For dependencies for instance, specifying where is the raw data (like the package manager's fikes), which bit of code processes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant