Add a how-to guide on successful collaborative data model development #1744

cmungall · 2023-11-21T15:44:52Z

The LinkML ecosystem attempts to encourage best practice for collaborative schema development - for example, the schema cookiecutter sets you up with standard CI workflows to encourage contributions via PRs, together with default CONTRIBUTING.md, CODE_OF_CONDUCT.md, etc.

However, we would benefit from a more direct narrative guide. the best place may be in the howto section (but FAQ entries also welcome)

There would be no need to write this de-novo. It would heavily reference and link out to other guides, in particular

Open code, open data, and open infrastructure to promote the longevity of curated scientific resources, aka the O3 principles from @cthoyt and @bgyori
- Primarily concerned with longevity, however, many of the best practices apply to successful schema development
How to be an Open Science Engineer - maximising impact for a better world from @nvasilevsky @matentzn @bvarner-ebi, published in the obook
- The emphasis is on ontology development but most applies to data models and schemas too
Tislab collaboration guide (link TBD) @jmcmurry @diatomsRcool @mellybelly et al

Although we would heavily reference rather than duplicate, we could include some concrete examples of docs in existing schema or data modeling projects:

Bioschemas community and governance docs @AlasdairGray
Biolink @sierra-moxon
schema.org how we work

However, we should recognize that different kinds of projects call for different kinds of processes. A small schema designed primarily to support a single data portal does not need processes for creating working groups. A project that needs direct input from a large number of non-technical SMEs may need to insulate from technical GitHub interfaces

O3 principles

Here's a summary of each point in the provided text as bullet points (from ChatGPT):

Version Control Your Code and Data
- Version control systems like git track changes in files and enable multiple users to collaborate. They are essential for organizing and maintaining both code and data, and they facilitate open distribution over traditional methods. However, they are less efficient for large or complex files, with alternatives like Zenodo and FigShare being recommended for these cases.
Permissively License Your Code and Data
- Using recognizable, permissive licenses (e.g., CC0, CC BY) encourages contribution and ensures content longevity. Non-permissive licenses or custom terms can hinder reuse and engagement. Permissive licensing doesn't typically lead to a lack of credit for the original resource.
Make Your Data Approachable
- Data should be in a simple, non-proprietary format, use well-known metadata standards, and be stored in a single source of truth for ease of use and contribution. Examples include JSON and TSV formats. Avoiding complex formats like XML and ensuring data is canonicalized are also suggested.
Use Technical Workflows (Automation)
- Automating quality control, generation of artifacts, releases, and deployment helps in maintaining and contributing to projects. This includes using continuous integration for quality checks, automating the generation of data views, and packaging data and code for easy deployment.
Use Social Workflows
- Implementing social workflows through tools like GitHub enhances community engagement. Transparent discussion forums, giving credit to contributors, and using issue trackers and discussion boards are crucial for maintaining a dynamic and active contributor base.
Establish Project Governance
- Clear governance defines roles, responsibilities, and behavior expectations in a project. Establishing codes of conduct, standard operating procedures, and guidelines for administration and contribution roles is vital. Governance should evolve over time to meet the project's needs.
Attract and Engage Contributors
- Detailed contribution guidelines, offering various ways to contribute, and hosting projects in neutral spaces increase contributor engagement and recruitment. Projects should be accessible and welcoming to new contributors to ensure longevity and development.

Obook Open Science Engineer guide

The document "Maximising impact as an open science engineer - OBO Semantic Engineering Training" outlines principles and practices for effective collaboration and impact in the field of open science engineering. Here's a summary:

Principle of Collaboration: Emphasizes the importance of social collaborative workflows in open science. It advises on effective online communication, upvoting helpful answers on platforms like Stack Overflow and GitHub, answering questions even outside one's specific project, conducting basic research before posting queries, and continuously improving open science documentation.
Principle of Upstream Fixing: Encourages fixing issues at the earliest possible stage in the dependency chain, maximizing the impact of changes and benefiting a wider community. It includes a case study highlighting the importance of quality control and community contributions in ontology development.
Principle of No-ownership: Advocates for a mindset of shared ownership and collaborative development in open science projects, particularly in the context of publicly funded work like ontologies. It suggests embracing community-driven development without specific owners or decision-makers and emphasizes the importance of proactive involvement and decision-making in a decentralized environment.

The document also includes a TL;DR summary with key takeaways:

Upvote and get involved in issue trackers.
Always conduct a basic search before asking questions.
Continuously improve documentation.
Be generous with likes and gratitude.
Promote open communication and push fixes upstream.
See issues and pull requests through to the end.
Encourage reviewing each other's work and reducing fear of making mistakes or having pull requests rejected.

These principles and practices are aimed at fostering a more collaborative, efficient, and impactful open science community.

TisLab guide

The document outlines several common pitfalls encountered in transdisciplinary and geographically distributed research teams. Here's a bullet list summarizing each pitfall:

Conflicts of Interest: Failing to recognize and manage conflicts of interest within the team, which can lead to biased decision-making and undermine the team's integrity and objectives.
Time Zone Challenges: Difficulties in planning and coordinating activities across different time zones, leading to scheduling conflicts and reduced team efficiency.
Miscommunication in Electronic Communication: Misunderstandings and misinterpretations that arise from reliance on electronic communication, lacking the nuances of face-to-face interaction.
Excessive Project Management Change: Frequent changes in project management strategies or personnel, leading to disruption, confusion, and a lack of continuity in the team’s approach.
Poor Version Control: Inadequate version control practices for managing documents and data, resulting in disorganization, confusion, and potential loss of important information.
Avoiding Difficult Conversations: Reluctance to address challenging issues or conflicts within the team, which can lead to unresolved problems and deteriorate team dynamics.

These pitfalls highlight the complexities and challenges of managing large, diverse, and distributed research teams, underscoring the need for effective management and communication strategies.

Best practices:

The document outlines several best practices for managing transdisciplinary and geographically distributed research teams effectively. Here's a summary of these practices:

Clear Collaboration Rules: Establishing explicit guidelines and rules for collaboration to ensure everyone understands expectations and responsibilities.
Effective Communication Strategies: Developing robust communication strategies that accommodate different time zones and leverage various communication tools to facilitate clear, frequent, and inclusive dialogue.
Onboarding and Offboarding Processes: Implementing structured processes for introducing new members to the team (onboarding) and managing the departure of members (offboarding) to maintain continuity and knowledge transfer.
Collaborative Writing Processes: Encouraging a cooperative approach to writing and document creation, which includes shared authorship and equitable contribution recognition.
Investing in Project Management: Recognizing the value of professional project management and allocating resources accordingly to enhance project coordination and efficiency.
Strong Leadership: Fostering strong, empathetic leadership that can guide the team, resolve conflicts, and motivate members towards common goals.
Addressing Bullying: Proactively addressing and preventing bullying within the team to maintain a respectful and productive working environment.
Nurturing Careers: Supporting the career development of team members, recognizing their contributions, and providing opportunities for growth and advancement.

These best practices are designed to foster a cohesive, efficient, and respectful working environment within diverse and distributed research teams, thereby enhancing productivity and team satisfaction.

Bioschemas governance

The document titled "Bioschemas Governance" outlines the governance structure and guidelines for the Bioschemas community, a project aimed at improving data interoperability in life sciences. Here's a summary:

Overview: Bioschemas adheres to five core principles of OpenStand: Respectful cooperation, adherence to standards development parameters, collective empowerment, availability, and voluntary adoption. Community members must follow the Code of Conduct based on FORCE11 guidelines.
Steering Council: Responsible for strategic and organizational planning, oversight of community activities, and promoting Bioschemas activities. The council meets every two months and communicates regularly via email and online messaging platforms.
Community and Working Groups: Day-to-day activities are conducted by the community, focusing on profile and type development and adoption. Working groups, each led by two individuals, develop markup practices for specific concepts. The Steering Council approves releases of profiles and types.
Role Holder Appointment and Removal Processes: Describes the election of Steering Council members and Working Group Leads, with a 2-year term of service. Inactive role holders may be removed following a defined process.
Specification Development and Versioning: Explains how specifications (profiles or types) are developed, including collaborative community engagement, version numbering, and authorship acknowledgment.
Profile and Type Development: Details steps for developing profiles and types, including identifying base types, property cardinality, and use cases. Processes for proposing new profiles or types and renaming or deprecating them are also covered.
Changing Governance Documents: Future changes to governance documents are to be submitted via a GitHub pull request, with public comment and Steering Council approval.
Sources: Lists references used in the document, including links to governing principles and codes of conduct from other organizations like FORCE11, Jupyter, and W3C.

This document provides a comprehensive guide to the governance structure, roles, processes, and best practices within the Bioschemas community, emphasizing open collaboration, transparency, and adherence to established standards.

schema.org how we work doc

The document titled "How We Work - Schema.org" provides an overview of the processes and practices employed by Schema.org in developing and updating its schemas. Here's a summary:

Overview and Process:
- Schema.org updates materials through official named releases every few weeks.
- Simple improvements and bug fixes can be fast-tracked as "Early Access Fixes".
- A development version of the site, "webschemas.org", reflects the latest work-in-progress based on community discussions and proposals.
- A "pending" extension at pending.webschemas.org showcases new vocabulary proposals, which may not yet reflect wider consensus.
- Steering group reviews and approves release candidates; if no concerns are raised within 10 business days, the official site is updated.
Versioning and Change Control:
- Schema.org is developed incrementally, with several updates a year, each documented as a release.
- Content for each release is based on public discussions and unanimous agreement of the Steering Group.
- Two types of extension vocabulary are introduced: hosted (part of Schema.org but tagged within a subdomain) and external (published elsewhere and managed by other organizations).
Schema Structure:
- Schema.org contains term definitions (types, properties, enumerated values), machine-readable files, and a JSON-LD context file.
- The approach to schema definitions is based on W3C RDFS with customizations.
Extensibility Mechanisms:
- Schema.org allows for extensibility through mechanisms independent of its vocabulary definitions and release planning.
- This includes publishing Schema.org data alongside other structured data types and using PropertyValue and Role mechanisms for additional annotations.
Early Access Fixes and Pending Releases:
- Early Access Fixes allow for rapid updates between official releases.
- The "pending schemas" extension is a staging area for work-in-progress terms, subject to change and community review.
Workflow FAQ:
- The document addresses common questions regarding public-vocabs, W3C WebSchemas group, the role of the Schema.org webmaster, and how to get involved or propose new schemas.
Related Links and Further Reading:
- The document provides links to additional resources for more in-depth understanding of Schema.org's work and processes.

This document serves as a comprehensive guide to the operational framework, versioning system, and collaborative nature of Schema.org's efforts in structuring and standardizing web data.

See #1744

* First pass at collaboration best practice doc 1744 See #1744 * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> * Update docs/howtos/collaborative-development.md Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com> --------- Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>

cmungall · 2023-11-22T20:17:16Z

Current docs live (but not linked): https://linkml.io/linkml/howtos/collaborative-development

Pull requests welcome on:

https://github.com/linkml/linkml/blob/main/docs/howtos/collaborative-development.md

cthoyt · 2023-11-22T20:38:38Z

@cmungall FYI we're going to rename from O3 Principles to O3 Guidelines. I can send a PR later or feel free to update

nlharris · 2023-12-07T05:09:23Z

Permissively License Your Code and Data

(adapted from O3 guidelines)

Using recognizable, permissive licenses (e.g., CC0, CC BY) encourages contribution and ensures content longevity.
Non-permissive licenses or custom terms can hinder reuse and engagement.
Permissive licensing doesn't typically lead to a lack of credit for the original resource.

Given our recent discussion, should we point out that CC0 and CC BY are good for non-code resources (or maybe also for projects that include code and non-code resources) but that code should be licensed with a software license like Apache 2.0 or MIT?

...for code and non-code resources Hopefully closes #1744?

nlharris added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Nov 21, 2023

This was referenced Nov 21, 2023

EPIC: add more high level guidelines for working collaboratively OBOAcademy/obook#459

Open

First pass at collaboration best practice doc 1744 #1745

Merged

cmungall added a commit that referenced this issue Nov 21, 2023

First pass at collaboration best practice doc 1744

4fc75cd

See #1744

nlharris added a commit that referenced this issue Feb 15, 2024

Add suggestions of permissive licenses

fbbb703

...for code and non-code resources Hopefully closes #1744?

nlharris mentioned this issue Feb 15, 2024

Add suggestions of permissive licenses #1928

Merged

nlharris closed this as completed in #1928 Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a how-to guide on successful collaborative data model development #1744

Add a how-to guide on successful collaborative data model development #1744

cmungall commented Nov 21, 2023 •

edited

cmungall commented Nov 22, 2023 •

edited

cthoyt commented Nov 22, 2023

nlharris commented Dec 7, 2023

Permissively License Your Code and Data

Add a how-to guide on successful collaborative data model development #1744

Add a how-to guide on successful collaborative data model development #1744

Comments

cmungall commented Nov 21, 2023 • edited

O3 principles

Obook Open Science Engineer guide

TisLab guide

Bioschemas governance

schema.org how we work doc

cmungall commented Nov 22, 2023 • edited

cthoyt commented Nov 22, 2023

nlharris commented Dec 7, 2023

Permissively License Your Code and Data

cmungall commented Nov 21, 2023 •

edited

cmungall commented Nov 22, 2023 •

edited