diff --git a/why-REST.qmd b/why-REST.qmd index ff6676e..6861303 100644 --- a/why-REST.qmd +++ b/why-REST.qmd @@ -17,21 +17,24 @@ categories: Application programming interfaces (APIs) are ways that applications communicate to each other through a standard structure or design. There -are several design styles available for building APIs, including Representational State Transfer (REST), Simple Object Access Protocol (SOAP), and GraphQL. +are several design styles available for building APIs, including +Representational State Transfer (REST), Simple Object Access Protocol +(SOAP), and GraphQL. ## Decision -We will use the REST architectural style for implementing the communication -protocol between our client and server components, instead of SOAP or -GraphQL. +We will use the REST architectural style for implementing the +communication protocol between our client and server components, instead +of SOAP or GraphQL. ## Alternatives ### SOAP -[SOAP](https://www.w3.org/TR/soap12-part0/) is a widely used technology for implementing web services. SOAP is -based on the XML protocol and supports a wide range of messaging -formats, such as binary and MIME types, making it quite flexible. +[SOAP](https://www.w3.org/TR/soap12-part0/) is a widely used technology +for implementing web services. SOAP is based on the XML protocol and +supports a wide range of messaging formats, such as binary and MIME +types, making it quite flexible. However, SOAP is fairly complex and heavyweight, which can make it more difficult to implement and maintain. SOAP requires a lot of additional @@ -42,13 +45,13 @@ communication logs. ### GraphQL -[GraphQL](https://graphql.org) is an increasingly popular technology for building web APIs. -GraphQL allows for very fine-grained control over what data is requested -and given, so it offers the most flexibility. However, GraphQL can be -complex to set up and maintain, especially for simple use cases. GraphQL -can be integrated with REST API, which is a useful feature especially -when handling large and/or complex requests. So if needed, we could -incorporate GraphQL into existing REST API. +[GraphQL](https://graphql.org) is an increasingly popular technology for +building web APIs. GraphQL allows for very fine-grained control over +what data is requested and given, so it offers the most flexibility. +However, GraphQL can be complex to set up and maintain, especially for +simple use cases. GraphQL can be integrated with REST API, which is a +useful feature especially when handling large and/or complex requests. +So if needed, we could incorporate GraphQL into existing REST API. ## Reasons for the decision @@ -68,5 +71,7 @@ These are the reasons we are deciding on REST: - Has better tooling support and is more widely adopted. ## Conclusion + We believe that the simplicity, flexibility, and scalability of REST -make it a better choice for the client-server communication needs of Seedcase. \ No newline at end of file +make it a better choice for the client-server communication needs of the +Seedcase framework. diff --git a/why-conventional-commits.qmd b/why-conventional-commits.qmd index 53cbd1c..eca0093 100644 --- a/why-conventional-commits.qmd +++ b/why-conventional-commits.qmd @@ -1,6 +1,6 @@ --- title: "Why conventional commits (with optional emojis)" -description: "Our reasons for using conventional commits (with optional emojis following the Gitmoji convention) across Seedcase projects." +description: "Our reasons for using conventional commits (with optional emojis following the Gitmoji convention) across Seedcase repositories." date: "2024-05-14" categories: - git @@ -41,7 +41,7 @@ Since commit message conventions are not enforced by Git itself, but rather by the team working on a given project, the question becomes: > Which commit message convention should we follow when we write commit -> messages in Seedcase projects? +> messages in Seedcase codebases? ## Decision drivers @@ -50,7 +50,7 @@ List some reasons for why we need to make this decision and what things have arisen that impact work. ::: -In Seedcase projects, we emphasise the open-source philosophies of +In the Seedcase Project, we emphasise the open-source philosophies of transparency and collaboration. Therefore, it is essential to have a clear and consistent commit message convention to maintain these principles, both within the team and across external contributions. diff --git a/why-docker.qmd b/why-docker.qmd index 99c0b17..d3f2dd5 100644 --- a/why-docker.qmd +++ b/why-docker.qmd @@ -22,7 +22,8 @@ are different platforms on the market, each with unique approaches and use cases. Even though all have a similar concept of images and containers, there are some technical differences worth noting. -As to why we have chosen containerization, see the page [Why choose containerization technology](why-containers.qmd). +As to why we have chosen containerization, see the page [Why choose +containerization technology](why-containers.qmd). ## Comparison of technologies @@ -31,9 +32,12 @@ As to why we have chosen containerization, see the page [Why choose containeriza Docker is without a doubt the most popular container application/platform. According to Stack Overflow's [2020 Developer Survey](https://insights.stackoverflow.com/survey/2020), which included -almost 65,000 respondents, Docker was the [third most popular](https://insights.stackoverflow.com/survey/2020#technology-platforms) platform -among developers, trailing only Linux and Windows. In this survey, Docker -was the [most wanted and the second most loved](https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-platforms) platform. +almost 65,000 respondents, Docker was the [third most +popular](https://insights.stackoverflow.com/survey/2020#technology-platforms) +platform among developers, trailing only Linux and Windows. In this +survey, Docker was the [most wanted and the second most +loved](https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-platforms) +platform. Docker is incredible in many ways. It is a developer-friendly open source platform that can be used for rapid application development and @@ -53,10 +57,11 @@ from the outside environment. ### Podman -[Podman](https://podman.io) is an open-source, Linux-native tool designed to develop, manage, -and run containers and pods under the Open Container Initiative (OCI) -standards. Presented as a user-friendly container orchestrator developed -by [Red Hat](https://www.redhat.com/en). +[Podman](https://podman.io) is an open-source, Linux-native tool +designed to develop, manage, and run containers and pods under the Open +Container Initiative (OCI) standards. Presented as a user-friendly +container orchestrator developed by [Red +Hat](https://www.redhat.com/en). Podman is a daemonless container engine that enables users to create, manage, and run OCI Containers on the Linux system. Podman, like Docker, @@ -72,11 +77,11 @@ containers is viewed by some as improving system security. ### rkt -[rkt](https://github.com/rkt/rkt), like other container technologies, lets you separate your software -from its surroundings. However, rkt provides adjustable isolation, -allowing you to choose the appropriate amount of isolation utilising -rkt's pluggable runtime architecture, which is divided into different -phases. +[rkt](https://github.com/rkt/rkt), like other container technologies, +lets you separate your software from its surroundings. However, rkt +provides adjustable isolation, allowing you to choose the appropriate +amount of isolation utilising rkt's pluggable runtime architecture, +which is divided into different phases. rkt also includes security measures like a signature verification by default and even privilege separation, which is in charge of retrieving @@ -86,7 +91,9 @@ unforeseen vulnerabilities. The rkt is not a complete platform, end-to-end solution. It is instead utilised in conjunction with other technologies or in substitute of -particular Docker system components. Furthermore, the [rkt project](https://github.com/rkt/rkt) has been archived on GitHub as of February, 2020. +particular Docker system components. Furthermore, the [rkt +project](https://github.com/rkt/rkt) has been archived on GitHub as of +February, 2020. ### Hyper-V @@ -107,13 +114,13 @@ as the host. ## Conclusion After researching the container platform on the market, Docker still is -the best option for Seedcase. First of all, Docker has the world's -largest repository of container images that allow Docker users to -create, test, store and distribute containers. Secondly, Docker is a +the best option for Seedcase software. First of all, Docker has the +world's largest repository of container images that allow Docker users +to create, test, store and distribute containers. Secondly, Docker is a single, robust, and autonomous tool. Docker manages, runs, builds, and does all other container-related tasks independently of any other third-party tools. Lastly, Docker has a lengthy history of working with -well-known cloud platforms like Amazon Web Services (AWS) and Google Cloud -Platform (GCP). It is also compatible with Microsoft, Azure, and OpenStack. -Overall, regardless of how great the alternatives are, Docker is the most -suitable container platform for this project. +well-known cloud platforms like Amazon Web Services (AWS) and Google +Cloud Platform (GCP). It is also compatible with Microsoft, Azure, and +OpenStack. Overall, regardless of how great the alternatives are, Docker +is the most suitable container platform for this project. diff --git a/why-fly.qmd b/why-fly.qmd index 641009c..1de69ce 100644 --- a/why-fly.qmd +++ b/why-fly.qmd @@ -13,69 +13,124 @@ categories: ## Context and problem statement -We need some way to test and visually see how Seedcase deploys and works "live". The best solution would be to use a cloud provider, so we can easily deploy and visualize the progress of Seedcase. +We need some way to test and visually see how Seedcase software deploys +and works "live". The best solution would be to use a cloud provider, so +we can easily deploy and visualize the progress of Seedcase products. -This is relevant for reviewing code changes and for manual and user-experience testing. Ideally, this environment should be similar to a future production environment, but this may vary a lot based on the team/organisation that will use Seedcase. +This is relevant for reviewing code changes and for manual and +user-experience testing. Ideally, this environment should be similar to +a future production environment, but this may vary a lot based on the +team/organisation that will use our software. So our question is: -> Which cloud hosting provider should we use for our demonstrating and testing purposes? +> Which cloud hosting provider should we use for our demonstrating and +> testing purposes? ## Decision drivers We want to decide on a cloud provider based on these features/metrics: -- Price: Relatively inexpensive in costs, since it will only be for demonstration purposes. -- Easy of use: It shouldn't be too difficult to use. -- Customisable: The provider should be easy to customise to use different frameworks/tools, if we need to adjust. For instance we should be able to use either of: venv, poetry, docker and etc. -- GitHub Authentication: The cloud provider should integrate with GitHub Authentication, because we already use GitHub for the Seedcase repositories and project management. -- GitHub Actions Integration: The cloud provider should integrate with GitHub Actions, allowing us to create pipelines easily. -- Managed PostgreSQL: It should have a managed PostgreSQL database to store application data if needed. -- "Native" support for Django application: It should have support for running a Django application without the need to wrap it in package or Docker container. -- Docker image or Dockerfile support: The cloud provider should also have support for running Docker images. Potentially, providing a Dockerfile would be convenient. -- Docker-compose support: A cloud provider with docker-compose support would allow us to easily spin up complicated environment (application, database and etc) in a similar way as we do locally. -- Logging and alerting: Logging and alerts would be nice features for the cloud provider, but not so relevant (and, therefore, not required) for a demo environment. -- Similar to production environment: We want the demo environment to be similar to the production environment, but this may vary a lot based on the team/organisation using the software. +- Price: Relatively inexpensive in costs, since it will only be for + demonstration purposes. +- Easy of use: It shouldn't be too difficult to use. +- Customisable: The provider should be easy to customise to use + different frameworks/tools, if we need to adjust. For instance we + should be able to use either of: venv, poetry, docker and etc. +- GitHub Authentication: The cloud provider should integrate with + GitHub Authentication, because we already use GitHub for the + Seedcase repositories and project management. +- GitHub Actions Integration: The cloud provider should integrate with + GitHub Actions, allowing us to create pipelines easily. +- Managed PostgreSQL: It should have a managed PostgreSQL database to + store application data if needed. +- "Native" support for Django application: It should have support for + running a Django application without the need to wrap it in package + or Docker container. +- Docker image or Dockerfile support: The cloud provider should also + have support for running Docker images. Potentially, providing a + Dockerfile would be convenient. +- Docker-compose support: A cloud provider with docker-compose support + would allow us to easily spin up complicated environment + (application, database and etc) in a similar way as we do locally. +- Logging and alerting: Logging and alerts would be nice features for + the cloud provider, but not so relevant (and, therefore, not + required) for a demo environment. +- Similar to production environment: We want the demo environment to + be similar to the production environment, but this may vary a lot + based on the team/organisation using the software. ## Considered options -We have considered five different cloud providers: [Azure](https://azure.microsoft.com/en-us), [Render](https://render.com/), [Digital Ocean](https://www.digitalocean.com/), [Vercel](https://vercel.com/) and [Fly.io](https://fly.io/) . Below you see a "decision matrix" based on the different features. - -| | Azure | Render | Digital Ocean | Vercel | Fly.io | -|--------------------------------------------------------------|-----------------|--------------------------------------|--------------------------|------------------------|-----------------------------| -| Price (per month) | 1000kr | Free or 19$ user + 20$ DB | ? | Free or 20$ user | Cost for usage only | -| Ease of use | Hard | Easy | Medium | Easy | Easy / Medium | -| Customizability (venv, Poetry, Docker) | High | Medium | High | Low | Medium | -| GitHub Authentication | No | Yes | Yes | Yes | Yes | -| GitHub Actions Integration | Very good | Sort of (deploy webhook) | Yes | Yes | Very good | -| Managed Postgres | Yes | Yes | Yes | Yes | Yes (sort of) | -| File Storage service | Yes | Yes (FTP) 0.25\$/GB | | | Yes (\$0.15 GB) | -| Native support for Django app | Sort of | Yes (Poetry support) | ? | Yes (requirements.txt) | No | -| Docker Image or Dockerfile support | Yes | Yes | Yes | No | Yes | -| Docker-compose | Yes | No (but has render.yml) | Yes (with a normal node) | No | No | -| Logging | Very good logs | Simple | Simple | Simple | Yes | -| Alerts | Yes | No | ? | No | ? | -| Similar to production environment (Aarhus University server) | Likely possible | Maybe if Docker on university server | Likely | No | Yes if docker | +We have considered five different cloud providers: +[Azure](https://azure.microsoft.com/en-us), +[Render](https://render.com/), [Digital +Ocean](https://www.digitalocean.com/), [Vercel](https://vercel.com/) and +[Fly.io](https://fly.io/) . Below you see a "decision matrix" based on +the different features. + +| | Azure | Render | Digital Ocean | Vercel | Fly.io | +|----------------|------------|------------|------------|------------|------------| +| Price (per month) | 1000kr | Free or 19\$ user + 20\$ DB | ? | Free or 20\$ user | Cost for usage only | +| Ease of use | Hard | Easy | Medium | Easy | Easy / Medium | +| Customizability (venv, Poetry, Docker) | High | Medium | High | Low | Medium | +| GitHub Authentication | No | Yes | Yes | Yes | Yes | +| GitHub Actions Integration | Very good | Sort of (deploy webhook) | Yes | Yes | Very good | +| Managed Postgres | Yes | Yes | Yes | Yes | Yes (sort of) | +| File Storage service | Yes | Yes (FTP) 0.25\$/GB | | | Yes (\$0.15 GB) | +| Native support for Django app | Sort of | Yes (Poetry support) | ? | Yes (requirements.txt) | No | +| Docker Image or Dockerfile support | Yes | Yes | Yes | No | Yes | +| Docker-compose | Yes | No (but has render.yml) | Yes (with a normal node) | No | No | +| Logging | Very good logs | Simple | Simple | Simple | Yes | +| Alerts | Yes | No | ? | No | ? | +| Similar to production environment (Aarhus University server) | Likely possible | Maybe if Docker on university server | Likely | No | Yes if docker | ## Decision outcome -We have decided to utilise Fly.io for our demo and testing needs. Fly.io is very cheap and the cost is not based on the number of users - only the actual usage (around 1$). It integrates with GitHub and the deployment is easy to setup with a GitHub action. This part is less user-friendly than **Render** and **Vercel**, where the deployment happens automatically. However, we would eventually adjust this "magic deployment" anyway to do tests and etc. before deploying. +We have decided to utilise Fly.io for our demo and testing needs. Fly.io +is very cheap and the cost is not based on the number of users - only +the actual usage (around 1\$). It integrates with GitHub and the +deployment is easy to setup with a GitHub action. This part is less +user-friendly than **Render** and **Vercel**, where the deployment +happens automatically. However, we would eventually adjust this "magic +deployment" anyway to do tests and etc. before deploying. -Fly.io uses Docker images, so there is no "native" support for a Django application, but `flyctl` automatically creates a Dockerfile which makes it easy anyway. +Fly.io uses Docker images, so there is no "native" support for a Django +application, but `flyctl` automatically creates a Dockerfile which makes +it easy anyway. -Fly.io automatically and relatively quickly scales up an app when needed, which makes it very cheap. Furthermore, Fly.io owns all the servers, which makes it cheap compared to other providers which are usually relying on AWS or Azure. +Fly.io automatically and relatively quickly scales up an app when +needed, which makes it very cheap. Furthermore, Fly.io owns all the +servers, which makes it cheap compared to other providers which are +usually relying on AWS or Azure. -Fly.io has a customisable metrics dashboard (every app has access to Prometheus and Grafana). +Fly.io has a customisable metrics dashboard (every app has access to +Prometheus and Grafana). The other cloud providers, where not chosen for these reasons: -- Render was our initial choice. It was cheap and easy to setup, but the cost is per user plus the usage (around 100$ per month vs 1$ for Fly). Render is very user friendly with magic deployments, but we would need to adjust these anyway for test and etc. -- Azure is able to do everything we require, but it is too complicated for a demo environment. -- Vercel is easy to use, but lacks customisability. For instance, it seemed not to work so well with Poetry and it is unable to work with Docker. -- Digital Ocean could be a fair choice, but I was unable to make it work - so in that sense it fails the "ease of use"-criteria. +- Render was our initial choice. It was cheap and easy to setup, but + the cost is per user plus the usage (around 100\$ per month vs 1\$ + for Fly). Render is very user friendly with magic deployments, but + we would need to adjust these anyway for test and etc. +- Azure is able to do everything we require, but it is too complicated + for a demo environment. +- Vercel is easy to use, but lacks customisability. For instance, it + seemed not to work so well with Poetry and it is unable to work with + Docker. +- Digital Ocean could be a fair choice, but I was unable to make it + work - so in that sense it fails the "ease of use"-criteria. ### Consequences -- PostgreSQL is a natural part of Fly.io, but it is not a **Managed PostgreSQL** like the other cloud providers. A PostgreSQL is "just an app" in Fly.io, so it might require a bit more maintenance. However, even in the unlikely scenario of complete data loss, there is no significant impact since Fly.io serves solely as a demonstration environment. -- Storage is cheap in Fly.io, but there are some limitations [Volume considerations](https://fly.io/docs/reference/volumes/#volume-considerations). But again this is only a demo environment -- Fly.io lacks docker-compose support, so we will not be able to reuse docker-compose files from our local development. +- PostgreSQL is a natural part of Fly.io, but it is not a **Managed + PostgreSQL** like the other cloud providers. A PostgreSQL is "just + an app" in Fly.io, so it might require a bit more maintenance. + However, even in the unlikely scenario of complete data loss, there + is no significant impact since Fly.io serves solely as a + demonstration environment. +- Storage is cheap in Fly.io, but there are some limitations [Volume + considerations](https://fly.io/docs/reference/volumes/#volume-considerations). + But again this is only a demo environment +- Fly.io lacks docker-compose support, so we will not be able to reuse + docker-compose files from our local development. diff --git a/why-github-flow.qmd b/why-github-flow.qmd index 9907847..a993723 100644 --- a/why-github-flow.qmd +++ b/why-github-flow.qmd @@ -85,9 +85,19 @@ State the context and some background on the issue, then write a statement in the form of a question for the problem. ::: -Since we develop software in a collaborative setting, we believe it's important to implement an explicitly stated Git branching strategy. This aligns with our [Guiding Principles](https://seedcase-project.org/principles). A clear and well-defined branching strategy enables consistency and efficiency, and, as a result, cleaner workflows across contributions with more time to focus on actual collaboration, problem-solving, and ensuring high quality work. - -There are several branching strategies available, each with its own set of advantages and disadvantages. Which strategy is the most suitable depends on the project, the team, and the organisation (as well as preferences). +Since we develop software in a collaborative setting, we believe it's +important to implement an explicitly stated Git branching strategy. This +aligns with our [Guiding +Principles](https://seedcase-project.org/principles). A clear and +well-defined branching strategy enables consistency and efficiency, and, +as a result, cleaner workflows across contributions with more time to +focus on actual collaboration, problem-solving, and ensuring high +quality work. + +There are several branching strategies available, each with its own set +of advantages and disadvantages. Which strategy is the most suitable +depends on the project, the team, and the organisation (as well as +preferences). The question is, therefore: @@ -100,17 +110,26 @@ List some reasons for why we need to make this decision and what things have arisen that impact work. ::: -With collaborative software development, each developer might have their own way of doing things such as branching, committing, creating pull requests and issues, and reviewing. However, explicitly agreeing on how we do these things will ensure common workflows across developers to help along efficient collaboration. +With collaborative software development, each developer might have their +own way of doing things such as branching, committing, creating pull +requests and issues, and reviewing. However, explicitly agreeing on how +we do these things will ensure common workflows across developers to +help along efficient collaboration. -When a team follows the same workflows, the focus can be shifted from trying to understand *what* each other are currently working on (*how* the problem at hand is solved, and *why* this work is needed) to harnessing each other's expertise and prior experience and improving the quality of everyone's work. +When a team follows the same workflows, the focus can be shifted from +trying to understand *what* each other are currently working on (*how* +the problem at hand is solved, and *why* this work is needed) to +harnessing each other's expertise and prior experience and improving the +quality of everyone's work. For the Seedcase Project, we want to employ a branching strategy that: -1) is simple, transparent, and beginner-friendly -2) enables consistency across contributions through clear guidelines for branching, committing, and reviewing -3) works well with parallel, asynchronous development -4) supports continuous delivery -5) works well for smaller teams +1) is simple, transparent, and beginner-friendly +2) enables consistency across contributions through clear guidelines + for branching, committing, and reviewing +3) works well with parallel, asynchronous development +4) supports continuous delivery +5) works well for smaller teams ## Considered options @@ -119,78 +138,134 @@ List and describe some of the options, as well as some of the pros and cons for each option. ::: -In the following sections, we evaluate commonly used branching strategies to decide on which strategy fits the project and our needs the best. These strategies include: **Trunk-based development**, **Git flow**, and **GitHub flow**. +In the following sections, we evaluate commonly used branching +strategies to decide on which strategy fits the project and our needs +the best. These strategies include: **Trunk-based development**, **Git +flow**, and **GitHub flow**. -Note: To keep this decision post relatively short, the strategies and their differences are outlined in a rather simple way, which might result in the loss of some nuances. +Note: To keep this decision post relatively short, the strategies and +their differences are outlined in a rather simple way, which might +result in the loss of some nuances. ### Trunk-based development -In [trunk-based development](https://trunkbaseddevelopment.com), developers frequently integrate their code changes into a shared `main` branch, the **trunk**, instead of working on long-lived additional branches that will be merged into `main` less frequently [@tilburg]. This workflow focuses on making smaller, self-contained changes which helps reduce complexity, minimise conflicts, and enable faster review processes and integration [@sooni]. Naturally, this leads to a more continuous integration with frequent merges to the `main` branch. +In [trunk-based development](https://trunkbaseddevelopment.com), +developers frequently integrate their code changes into a shared `main` +branch, the **trunk**, instead of working on long-lived additional +branches that will be merged into `main` less frequently [@tilburg]. +This workflow focuses on making smaller, self-contained changes which +helps reduce complexity, minimise conflicts, and enable faster review +processes and integration [@sooni]. Naturally, this leads to a more +continuous integration with frequent merges to the `main` branch. -Some smaller teams might even avoid branching altogether and commit directly to the trunk/`main` branch. +Some smaller teams might even avoid branching altogether and commit +directly to the trunk/`main` branch. -::: {.columns} -::: {.column} -#### Benefits +::: columns +::: column +#### Benefits -- More continuous integration with frequent merges to the `main` branch -- Focuses on smaller, self-contained changes -- Minimises merge conflicts -- Allows for quick releases -- Works well for smaller teams +- More continuous integration with frequent merges to the `main` + branch +- Focuses on smaller, self-contained changes +- Minimises merge conflicts +- Allows for quick releases +- Works well for smaller teams ::: -::: {.column} + +::: column #### Drawbacks -- Frequent integration requires strong collaboration and communication skills, potentially with frequent sync-up meetings -- Works best with small, self-contained tasks to enable short-lived branches (or omission of additional branches all together) +- Frequent integration requires strong collaboration and communication + skills, potentially with frequent sync-up meetings +- Works best with small, self-contained tasks to enable short-lived + branches (or omission of additional branches all together) ::: ::: ### Git flow -A contrast to trunk-based development is the [Git flow](https://nvie.com/posts/a-successful-git-branching-model/). Git flow is a comprehensive branching strategy with two central branches: **`main`** and **`develop`**. In this strategy, the **`main`** branch always reflect a production ready state of the codebase. In contrast, the **`develop`** branch contains the latest development changes for the next release. When the new developments are at a stable point and is ready to be released, all the changes from the `develop` branch will be merged into the `main` branch. As a result, whenever there is a new change to the `main` branch, this is a new release by definition. Each release version will be tagged [@thummala]. - -Besides the two central branches, supporting branches will be created to enable parallel development across contributors. These supporting branches are created for specific purposes, such as adding or modifying features (a *feature* branch) or fixing a critical issue in the code (a *hotfix* branch). -A feature branch must always be created from and merged into the `develop` branch, while a `hotfix` branch is usually created from the `main` branch and is be merged into both `main` and `develop`. - -::: {.columns} -::: {.column} +A contrast to trunk-based development is the [Git +flow](https://nvie.com/posts/a-successful-git-branching-model/). Git +flow is a comprehensive branching strategy with two central branches: +**`main`** and **`develop`**. In this strategy, the **`main`** branch +always reflect a production ready state of the codebase. In contrast, +the **`develop`** branch contains the latest development changes for the +next release. When the new developments are at a stable point and is +ready to be released, all the changes from the `develop` branch will be +merged into the `main` branch. As a result, whenever there is a new +change to the `main` branch, this is a new release by definition. Each +release version will be tagged [@thummala]. + +Besides the two central branches, supporting branches will be created to +enable parallel development across contributors. These supporting +branches are created for specific purposes, such as adding or modifying +features (a *feature* branch) or fixing a critical issue in the code (a +*hotfix* branch). A feature branch must always be created from and +merged into the `develop` branch, while a `hotfix` branch is usually +created from the `main` branch and is be merged into both `main` and +`develop`. + +::: columns +::: column #### Benefits -- Clear framework offering an explicit shared understanding of the branching and releasing processes -- Clear responsibilities for each branch -- Versioning per definition -- Production versions are easy to navigate through tags +- Clear framework offering an explicit shared understanding of the + branching and releasing processes +- Clear responsibilities for each branch +- Versioning per definition +- Production versions are easy to navigate through tags ::: -::: {.column} + +::: column #### Drawbacks -- Revolves around releases, and we, currently, need a more continuous delivery-like approach -- Complexity due to the number of branches, which could lead to merge conflicts and slow down the development process +- Revolves around releases, and we, currently, need a more continuous + delivery-like approach +- Complexity due to the number of branches, which could lead to merge + conflicts and slow down the development process ::: ::: ### GitHub flow -[GitHub flow](https://docs.github.com/en/get-started/quickstart/github-flow) is a simpler branching strategy than Git flow, revolving around the **`main`** branch. The only "hard" rule in this workflow is that anything on the `main` branch is deployable [@githubflowpost]. Whenever new work needs to be done, a new branch with a descriptive name is created from the `main` branch. Like with the Git flow, types of branches include (among others) `feature` and `hotfix` branches. After a new branch has been created, changes are made on this branch with regular pushes and descriptive commit messages. - -When the developer wants feedback, they create a pull request, which their collaborators review. Any suggested changes are addressed and implemented. When the work is complete, the branch can be merged into the `main` branch and is deleted. With this branching strategy, the work on the new branch is deployed as soon as it is merged into `main`. - -::: {.columns} -::: {.column} +[GitHub +flow](https://docs.github.com/en/get-started/quickstart/github-flow) is +a simpler branching strategy than Git flow, revolving around the +**`main`** branch. The only "hard" rule in this workflow is that +anything on the `main` branch is deployable [@githubflowpost]. Whenever +new work needs to be done, a new branch with a descriptive name is +created from the `main` branch. Like with the Git flow, types of +branches include (among others) `feature` and `hotfix` branches. After a +new branch has been created, changes are made on this branch with +regular pushes and descriptive commit messages. + +When the developer wants feedback, they create a pull request, which +their collaborators review. Any suggested changes are addressed and +implemented. When the work is complete, the branch can be merged into +the `main` branch and is deleted. With this branching strategy, the work +on the new branch is deployed as soon as it is merged into `main`. + +::: columns +::: column #### Benefits -- Allows for continuous development and the ability to quickly address issues of all kinds (including security issues, bugs, and small feature requests) -- The same simple processes are used to address smaller and larger developments -- Works well for smaller teams and asynchronous collaboration, common in open-source projects +- Allows for continuous development and the ability to quickly address + issues of all kinds (including security issues, bugs, and small + feature requests) +- The same simple processes are used to address smaller and larger + developments +- Works well for smaller teams and asynchronous collaboration, common + in open-source projects ::: -::: {.column} + +::: column #### Drawbacks -- Does not by definition include releases -- Might be more susceptible to bugs in production (compared to Git flow) because of the lack of dedicated development branches -- Long-living branches can increase the risk of merge conflicts +- Does not by definition include releases +- Might be more susceptible to bugs in production (compared to Git + flow) because of the lack of dedicated development branches +- Long-living branches can increase the risk of merge conflicts ::: ::: @@ -203,13 +278,17 @@ REASONS." We decided on using the GitHib flow branching strategy because: -1) It is simple and beginner-friendly strategy -2) It is both well-known and well documented, creating clear guidelines that enable consistency across contributions -3) It offers clear guidelines on every step of collaborative development, including branching, committing, and review processes -4) Longer-living branches works well with parallel, asynchronous work -5) The balance between having multiple branches as well as using continuous integration and delivery approaches -6) Not overly complex for a smaller team like ours -7) Allows for continuous development with the same simple processes for both +1) It is simple and beginner-friendly strategy +2) It is both well-known and well documented, creating clear guidelines + that enable consistency across contributions +3) It offers clear guidelines on every step of collaborative + development, including branching, committing, and review processes +4) Longer-living branches works well with parallel, asynchronous work +5) The balance between having multiple branches as well as using + continuous integration and delivery approaches +6) Not overly complex for a smaller team like ours +7) Allows for continuous development with the same simple processes for + both ### Consequences @@ -217,6 +296,15 @@ We decided on using the GitHib flow branching strategy because: List some potential consequences of this decision. ::: -Even though the GitHub flow is the most suitable branching strategy for the Seedcase Project right now, this choice does come with consequences. For example, working on longer-living branches (compared to trunk-based-development) could increase the risk of merge conflicts (which is easier to avoid using trunk-based-development). This strategy also comes without release tagging (as Git flow does), something we might want to implement for Seedcase Software Products later on. - -However, GitHub flow eases the process of parallel, asynchronous development and is ideal for smaller teams like ours. While we deploy continuously, a relatively simple workflow like GitHub flow is the best fit for us. +Even though the GitHub flow is the most suitable branching strategy for +the Seedcase Project right now, this choice does come with consequences. +For example, working on longer-living branches (compared to +trunk-based-development) could increase the risk of merge conflicts +(which is easier to avoid using trunk-based-development). This strategy +also comes without release tagging (as Git flow does), something we +might want to implement for Seedcase software products later on. + +However, GitHub flow eases the process of parallel, asynchronous +development and is ideal for smaller teams like ours. While we deploy +continuously, a relatively simple workflow like GitHub flow is the best +fit for us. diff --git a/why-material-design.qmd b/why-material-design.qmd index a940612..07ab9c9 100644 --- a/why-material-design.qmd +++ b/why-material-design.qmd @@ -1,6 +1,6 @@ --- title: "Why Material Design" -description: "We aim to utilise existing CSS or UI frameworks instead of building from scratch so Seedcase will look good with minimal effort. Since Material Design lives up to these requirements, we have chosen this framework for Seedcase." +description: "We aim to utilise existing CSS or UI frameworks instead of building from scratch so Seedcase software products will look good with minimal effort. Since Material Design lives up to these requirements, we have chosen this framework for our software." date: "2023-12-15" categories: - user interface @@ -11,16 +11,22 @@ categories: ## Context and problem statement -The Seedcase software products' frontend should look good with minimal effort. Therefore, we aim to utilise existing CSS or UI frameworks instead of building from scratch. +The front end of Seedcase software products should look good with +minimal effort. Therefore, we aim to utilise existing CSS or UI +frameworks instead of building from scratch. ## Decision drivers The UI framework should: -- Be aesthetically pleasing. -- Integrate well with Figma. The wireframes in Figma should be easy to reproduce in Django templates. -- Be easy to use with Django. Since we use Django and it is a Python framework, we don't want to rely on frameworks that require node/npm or too much JavaScript. Preferably, we want a CSS framework, where CSS classes can be directly added to HTML elements. -- Have a great documentation and community. +- Be aesthetically pleasing. +- Integrate well with Figma. The wireframes in Figma should be easy to + reproduce in Django templates. +- Be easy to use with Django. Since we use Django and it is a Python + framework, we don't want to rely on frameworks that require node/npm + or too much JavaScript. Preferably, we want a CSS framework, where + CSS classes can be directly added to HTML elements. +- Have a great documentation and community. ## Considered options @@ -28,60 +34,72 @@ We have considered the following: ### Bootstrap -[Bootstrap](https://getbootstrap.com/) is one of the older and more widely used CSS frameworks. +[Bootstrap](https://getbootstrap.com/) is one of the older and more +widely used CSS frameworks. -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Widely used, lots of support and community -- Great integration with Django through the Python package [`django-bootstrap5`](https://pypi.org/project/django-bootstrap5/) +- Widely used, lots of support and community +- Great integration with Django through the Python package + [`django-bootstrap5`](https://pypi.org/project/django-bootstrap5/) ::: -::: {.column} +::: column #### Drawbacks -- Harder to customize individual components (for example, a button) +- Harder to customize individual components (for example, a button) ::: ::: ### Material Design -[Material Design](https://m3.material.io/) is a popular framework designed by Google. +[Material Design](https://m3.material.io/) is a popular framework +designed by Google. -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Has multiple implementations, like [Materialize Web](https://materializeweb.com/) and [BeerCSS](https://www.beercss.com/) that works well with Django -- Prioritizes CSS over JavaScript -- Very customizable -- Looks very aesthetically pleasing (backed by the strong design community within Google) -- [Figma](https://www.figma.com/) (which can be used for sketching out the UI) has great support for Material Design +- Has multiple implementations, like [Materialize + Web](https://materializeweb.com/) and + [BeerCSS](https://www.beercss.com/) that works well with Django +- Prioritizes CSS over JavaScript +- Very customizable +- Looks very aesthetically pleasing (backed by the strong design + community within Google) +- [Figma](https://www.figma.com/) (which can be used for sketching out + the UI) has great support for Material Design ::: -::: {.column} + +::: column #### Drawbacks -- Has multiple implementations, which might take time to decide on +- Has multiple implementations, which might take time to decide on ::: ::: ### Tailwind -[Tailwind](https://tailwindcss.com/) is a popular framework that allows a high level of customisability. +[Tailwind](https://tailwindcss.com/) is a popular framework that allows +a high level of customisability. -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- CSS only -- Very highly customisable +- CSS only +- Very highly customisable ::: -::: {.column} + +::: column #### Drawbacks -- Requires time and skill to customize -- Relies on [Node.js](https://nodejs.org/en) (via [npm](https://www.npmjs.com/)) which is an additional dependency and needs time to learn to use +- Requires time and skill to customize +- Relies on [Node.js](https://nodejs.org/en) (via + [npm](https://www.npmjs.com/)) which is an additional dependency and + needs time to learn to use ::: ::: @@ -89,23 +107,27 @@ We have considered the following: [Bulma](https://bulma.io/) is a simple CSS or Sass only framework. -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Seems relatively easy to use +- Seems relatively easy to use ::: -::: {.column} + +::: column #### Drawbacks -- Requires Node.js (via npm) to install +- Requires Node.js (via npm) to install ::: ::: ## Decision outcome -We decided on Material Design because it has great integration with existing UI/UX (like Figma), looks amazing (has the design team at Google backing it's development), is widely used, and customizable. +We decided on Material Design because it has great integration with +existing UI/UX (like Figma), looks amazing (has the design team at +Google backing it's development), is widely used, and customizable. ### Consequences -- There is better Django integration with Bootstrap, so we might need to spend some time properly integrating Material Design +- There is better Django integration with Bootstrap, so we might need + to spend some time properly integrating Material Design diff --git a/why-mit-license.qmd b/why-mit-license.qmd index 7f18f13..eb73e8a 100644 --- a/why-mit-license.qmd +++ b/why-mit-license.qmd @@ -9,53 +9,137 @@ categories: - copyright - software-architecture --- + ## Context and problem statement -When developing a new piece of software it is important to consider the following question as early as possible. +When developing a new piece of software it is important to consider the +following question as early as possible. -> What kind of re-use will we ultimately allow other developers to make of our software? +> What kind of re-use will we ultimately allow other developers to make +> of our software? -In order to control this use, it is important to have considered which license type we want our software to be available under, as this will tell the rest of the development community what they can (and can't) do with it. +In order to control this use, it is important to have considered which +license type we want our software to be available under, as this will +tell the rest of the development community what they can (and can't) do +with it. ## Decision drivers -Although no license means that a piece of software on GitHub will be under exclusive copyright, leaving our code without one will make it difficult for other users to (easily) (re-)use, modify, contribute, or enhance our software due to copyright reasons. Our mission isn't only to build a piece of software; we also aim to develop a creative community around Seedcase and the topics we work on. We want people to be able to freely and smoothly contribute enhancements and other improvements that can be incorporated into future releases of Seedcase. It is also our intention to make the Seedcase software available to commercial enterprises, which means that we need to carefully consider which license we adopt. +Although no license means that a piece of software on GitHub will be +under exclusive copyright, leaving our code without one will make it +difficult for other users to (easily) (re-)use, modify, contribute, or +enhance our software due to copyright reasons. Our mission isn't only to +build a piece of software; we also aim to develop a creative community +around the Seedcase Project and the topics we work on. We want people to +be able to freely and smoothly contribute enhancements and other +improvements that can be incorporated into future releases of Seedcase +software products. It is also our intention to make the Seedcase +software available to commercial enterprises, which means that we need +to carefully consider which license we adopt. ## Considered options -The Open Source Initiative (OSI) [approves](https://opensource.org/licenses) a specific set of licenses that determine whether a project can be called "open source". These licenses are our starting point on deciding which license to use and how permissive we want it to be, in terms of who can work on our code and how it can be used. +The Open Source Initiative (OSI) +[approves](https://opensource.org/licenses) a specific set of licenses +that determine whether a project can be called "open source". These +licenses are our starting point on deciding which license to use and how +permissive we want it to be, in terms of who can work on our code and +how it can be used. -The primary license is the one that governs how our code can be used, modified, and shared, which is described more below. For managing copyright of contributed code from users who are external to the project, agreements could include either a Contributor License Agreement (CLA), a Developer Certificate of Origin (DCO), or neither of those, which we discussion below as well. +The primary license is the one that governs how our code can be used, +modified, and shared, which is described more below. For managing +copyright of contributed code from users who are external to the +project, agreements could include either a Contributor License Agreement +(CLA), a Developer Certificate of Origin (DCO), or neither of those, +which we discussion below as well. ### Software licenses -Overall, the open source community have licenses that work along two strands, **permissive** and **copy-left**. Both of these "allow software to be freely used, modified, and shared" (see more detail on the [OSI](https://opensource.org/licenses) website). We will be using the OSI's definition of free software that among other things covers the source code being available to download and read/study, to allow for derived works to be created and distributed without violating the given license. - -The **copy-Left licenses** generally state that if a third party makes changes to the existing product, or incorporates the code alongside another code set, then the resulting software must also be available under the same license. This is the broadest definition of free software, as it forces any subsequent development to be shared for free (although most of those licenses state that you are allowed to modify code for own use, the copy-left license only comes into play if you make your modifications available outside your organisation). Examples of Copy-Left licenses are [EUPL-1.2](https://joinup.ec.europa.eu/sites/default/files/custom-page/attachment/2020-03/EUPL-1.2%20EN.txt), [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html), and [LGPL](https://www.gnu.org/licenses/lgpl-3.0.en.html). - -The **permissive licenses** gives you all the above mentioned rights, but they do not enforce that derived works or new works created by taking bits of source code from the original product are made available under the same license. These type of licenses are generally seen as more friendly to commercial enterprises as they will allow companies to use bits of code in proprietary software without having to release the source code for free. Examples of Permissive licenses are [Apache](https://www.apache.org/licenses/LICENSE-2.0), BSD (no matter the number of clauses, e.g., [BSD 2-Clause](https://opensource.org/license/bsd-2-clause/) or [BSD 3-Clause](https://opensource.org/license/bsd-3-clause/)), and [MIT](https://opensource.org/license/mit/). +Overall, the open source community have licenses that work along two +strands, **permissive** and **copy-left**. Both of these "allow software +to be freely used, modified, and shared" (see more detail on the +[OSI](https://opensource.org/licenses) website). We will be using the +OSI's definition of free software that among other things covers the +source code being available to download and read/study, to allow for +derived works to be created and distributed without violating the given +license. + +The **copy-Left licenses** generally state that if a third party makes +changes to the existing product, or incorporates the code alongside +another code set, then the resulting software must also be available +under the same license. This is the broadest definition of free +software, as it forces any subsequent development to be shared for free +(although most of those licenses state that you are allowed to modify +code for own use, the copy-left license only comes into play if you make +your modifications available outside your organisation). Examples of +Copy-Left licenses are +[EUPL-1.2](https://joinup.ec.europa.eu/sites/default/files/custom-page/attachment/2020-03/EUPL-1.2%20EN.txt), +[GPL](https://www.gnu.org/licenses/gpl-3.0.en.html), and +[LGPL](https://www.gnu.org/licenses/lgpl-3.0.en.html). + +The **permissive licenses** gives you all the above mentioned rights, +but they do not enforce that derived works or new works created by +taking bits of source code from the original product are made available +under the same license. These type of licenses are generally seen as +more friendly to commercial enterprises as they will allow companies to +use bits of code in proprietary software without having to release the +source code for free. Examples of Permissive licenses are +[Apache](https://www.apache.org/licenses/LICENSE-2.0), BSD (no matter +the number of clauses, e.g., [BSD +2-Clause](https://opensource.org/license/bsd-2-clause/) or [BSD +3-Clause](https://opensource.org/license/bsd-3-clause/)), and +[MIT](https://opensource.org/license/mit/). ### CLA versus DCO -Some open source projects are asking contributors to sign up to a Contributor License Agreement (CLA) or, alternatively, a Developer Certificate of Origin (DCO). - -Looking at a number of **CLAs** (in particular [Threema](https://threema.ch/en/open-source/cla), [Meta](https://code.facebook.com/cla), and [ImageWorks](https://www.imageworks.com/technology/opensource/cla)), it seems that these projects are mainly concerned with the following: - -- A person submitting code that they are not the copyright holder for. -- Withdrawal of the right to use the submitted code. -- The need to apply for patents in future for parts or all of the source code. -- A change in the type of license that the source code was given at the time of contribution. - -There are also some agreements that touch on the subject of loss and damages that may arise from the use of a particular section of code, as well as how a request for support will be dealt with in future. - -An alternative to a CLA is a **DCO**. The **DCO** was first employed by the Linux Foundation in 2004 and is basically a short document that confirms that the person contributing code is allowed to do so, gives permission for the project to subsequently use it, by adding a Signed-off-by line to their commit message (for an example see the [BeeWare projects DCO](https://beeware.org/contributing/how/dco/what/)). +Some open source projects are asking contributors to sign up to a +Contributor License Agreement (CLA) or, alternatively, a Developer +Certificate of Origin (DCO). + +Looking at a number of **CLAs** (in particular +[Threema](https://threema.ch/en/open-source/cla), +[Meta](https://code.facebook.com/cla), and +[ImageWorks](https://www.imageworks.com/technology/opensource/cla)), it +seems that these projects are mainly concerned with the following: + +- A person submitting code that they are not the copyright holder for. +- Withdrawal of the right to use the submitted code. +- The need to apply for patents in future for parts or all of the + source code. +- A change in the type of license that the source code was given at + the time of contribution. + +There are also some agreements that touch on the subject of loss and +damages that may arise from the use of a particular section of code, as +well as how a request for support will be dealt with in future. + +An alternative to a CLA is a **DCO**. The **DCO** was first employed by +the Linux Foundation in 2004 and is basically a short document that +confirms that the person contributing code is allowed to do so, gives +permission for the project to subsequently use it, by adding a +Signed-off-by line to their commit message (for an example see the +[BeeWare projects DCO](https://beeware.org/contributing/how/dco/what/)). ## Decision outcome -It is in our stated [goals](https://seedcase-project.org/about) that the Seedcase software is available for commercial enterprises as well as academic/healthcare organizations and groups. Aligning with our stated [Guiding Principles](https://seedcase-project.org/about.html#principles), we will use a **permissive** license as it will be the best fit for the Seedcase project. Of the permissive license types we will go with the MIT License as it is the most permissive and easiest to understand and use. +It is in our stated [goals](https://seedcase-project.org/about) that +Seedcase software is available for commercial enterprises as well as +academic/healthcare organizations and groups. Aligning with our stated +[Guiding +Principles](https://seedcase-project.org/about.html#principles), we will +use a **permissive** license as it will be the best fit for the Seedcase +Project. Of the permissive license types we will go with the MIT License +as it is the most permissive and easiest to understand and use. ### Consequences -Currently, we're not sure if we need to implement a CLA before people outside the team contribute to the code. However, we will likely implementing a **DCO** that future contributors will need to agree to before making a contribution to the project. This could be done either by checking that a commit contains the signed-off-by clause before merging it, or by implementing something like the [GitHub App DCO](https://github.com/apps/dco). The license text itself is available [here](https://developercertificate.org). +Currently, we're not sure if we need to implement a CLA before people +outside the team contribute to the code. However, we will likely +implementing a **DCO** that future contributors will need to agree to +before making a contribution to the project. This could be done either +by checking that a commit contains the signed-off-by clause before +merging it, or by implementing something like the [GitHub App +DCO](https://github.com/apps/dco). The license text itself is available +[here](https://developercertificate.org). diff --git a/why-polyrepo.qmd b/why-polyrepo.qmd index 45753f7..dea8af7 100644 --- a/why-polyrepo.qmd +++ b/why-polyrepo.qmd @@ -8,57 +8,122 @@ categories: - structure - management --- + + ## Context and problem statement -The core issue and question here is: +The core issue and question here is: -> How do we decide to structure and organize our projects, both software products as well as documentation and training material? +> How do we decide to structure and organize our projects, both software +> products as well as documentation and training material? -We are ultimately building a final, single software product that can be installed on servers and used as is. However, some components of our software could be useful on their own. So we've started building another product as a Git repo. Which has lead us into this issue and question, since we don't want to start doing something major without considering why we are doing it and what the impact might be. And then coming to a conscious and agreed upon decision. +We initially set out to build a single final software product that can +be installed on servers and used as is. However, some components of our +software could be useful on their own. So we've started building another +product as a Git repo. Which has lead us into this issue and question, +since we don't want to start doing something major without considering +why we are doing it and what the impact might be. And then coming to a +conscious and agreed upon decision. -We also will be working on other, side projects for Steno Aarhus and other potential projects related, but not connected to the core Seedcase product. For example, we will be helping design and build CPR validation checks or volunteer databases. We will be creating multiple repositories/projects, so if we have a streamlined approach and workflow to creating and developing projects in this way, of having multiple repositories for different projects, it would be easier to manage. +We also will be working on other, side projects for Steno Aarhus and +other potential projects related, but not connected to the core Seedcase +product. For example, we will be helping design and build CPR validation +checks or volunteer databases. We will be creating multiple +repositories/projects, so if we have a streamlined approach and workflow +to creating and developing projects in this way, of having multiple +repositories for different projects, it would be easier to manage. -See some discussion of the beginning of this issue [here](https://github.com/seedcase-project/seedcase-registry/pull/9). +See some discussion of the beginning of this issue +[here](https://github.com/seedcase-project/seedcase-registry/pull/9). ## Decision drivers -- Tracking and managing projects: - - Looking over the list of issues in any given project right now can feel overwhelming because of the number of issues (the list will only grow). Trying to look through the list to find ones relevant to the component of the project you are working on can be a bit of an "analysis paralysis". - - Tracking issues in sub-projects/components within a single repo means making heavily use of custom labels, which can be a challenge from a management point of view. - - I'd like to eventually get to a place where we can dedicate a chunk of time to working on one specific project. My inspiration for how to do that comes from the tidyverse team at RStudio/Posit. They let issues build up in an R package/project over time before dedicating time to working through as many of those issues as possible. Once done, they switch to another project that has a lot of issues. -- Setting up continuous integration and deployment: We haven't yet set up CI/CD, but a lot of standard templates are based on a one repo is one package/app/product approach (for instance, CI's for testing and building Python packages or deploying Docker images to DockerHub). -- We're already splitting projects into other repos. - - We're building a separate `seedcase-registry` product, which is the data project registration component of `seedcase`. This product will be imported/loaded into `seedcase`. -- Open source projects are usually a "one-repo is one-product/output" format (e.g. R or Python package). Contributors will likely be people who have experience working in open source communities and projects. -- We are starting to and will be creating and building multiple products, both within the overall aim of Seedcase, but also sub-projects at Steno and potential (independent) extensions to Seedcase. -- Providing common repository templates, build processes, and CI/CD for Steno Aarhus (and others in the future), so we'll need to build these anyway. -- We need to consider the workflow for the team over the long term. Some decisions make more sense in the short term, but in the long term don't make sense. +- Tracking and managing projects: + - Looking over the list of issues in any given project right now + can feel overwhelming because of the number of issues (the list + will only grow). Trying to look through the list to find ones + relevant to the component of the project you are working on can + be a bit of an "analysis paralysis". + - Tracking issues in sub-projects/components within a single repo + means making heavily use of custom labels, which can be a + challenge from a management point of view. + - I'd like to eventually get to a place where we can dedicate a + chunk of time to working on one specific project. My inspiration + for how to do that comes from the tidyverse team at + RStudio/Posit. They let issues build up in an R package/project + over time before dedicating time to working through as many of + those issues as possible. Once done, they switch to another + project that has a lot of issues. +- Setting up continuous integration and deployment: We haven't yet set + up CI/CD, but a lot of standard templates are based on a one repo is + one package/app/product approach (for instance, CI's for testing and + building Python packages or deploying Docker images to DockerHub). +- We're already splitting projects into other repos. + - We're building a separate `seedcase-registry` product, which is + the data project registration component of `seedcase`. This + product will be imported/loaded into `seedcase`. +- Open source projects are usually a "one-repo is one-product/output" + format (e.g. R or Python package). Contributors will likely be + people who have experience working in open source communities and + projects. +- We are starting to and will be creating and building multiple + products, both within the overall aim of the Seedcase Project, but + also sub-projects at Steno and potential (independent) extensions to + Seedcase products. +- Providing common repository templates, build processes, and CI/CD + for Steno Aarhus (and others in the future), so we'll need to build + these anyway. +- We need to consider the workflow for the team over the long term. + Some decisions make more sense in the short term, but in the long + term don't make sense. ## Considered options -There are multiple existing posts about this exact issue of mono- vs poly-repos, which I am listing below, that helped me write up this decision. - -- [Monorepo vs polyrepo](https://github.com/joelparkerhenderson/monorepo-vs-polyrepo) -- [StackOverflow: Where keep deployment files in Multi Container, Multi Repository Project?](https://stackoverflow.com/questions/47502859/where-keep-deployment-files-in-multi-container-multi-repository-project) -- [StackOverflow: Git repository setup for a Docker application consisting of multiple repositories](https://stackoverflow.com/questions/49918636/git-repository-setup-for-a-docker-application-consisting-of-multiple-repositorie) -- [Monorepo vs Multi-repo](https://kinsta.com/blog/monorepo-vs-multi-repo/) -- [CircleCI: Benefits and challenges of monorepo development practices](https://circleci.com/blog/monorepo-dev-practices/) -- [Monorepos: Please don't!](https://medium.com/@mattklein123/monorepos-please-dont-e9a279be011b) -- [Coupling in Microservices, Part 2: Single vs. Multi-Repo](https://medium.com/flippengineering/coupling-in-microservices-part-2-single-vs-multi-repo-35c5d5f3057b) -- [Earthly: Monorepo vs polyrepo](https://earthly.dev/blog/monorepo-vs-polyrepo/) -- [Mono-Repo vs Multi-Repo: Throwing Light On Code Repository Strategies](https://geekflare.com/code-repository-strategies/) -- [Monorepo vs multirepo decision: Using monorepos for rapid iteration, polyrepos for sustained stability](https://github.com/joelparkerhenderson/architecture-decision-record/tree/main/examples/monorepo-vs-multirepo) -- [Monorepos and the Fallacy of Scale](https://presumably.de/monorepos-and-the-fallacy-of-scale.html) -- [Monorepo Vs Polyrepo Architecture: A Comparison For Effective Software Development](https://intuji.com/monorepo-vs-polyrepo-architecture/) +There are multiple existing posts about this exact issue of mono- vs +poly-repos, which I am listing below, that helped me write up this +decision. + +- [Monorepo vs + polyrepo](https://github.com/joelparkerhenderson/monorepo-vs-polyrepo) +- [StackOverflow: Where keep deployment files in Multi Container, + Multi Repository + Project?](https://stackoverflow.com/questions/47502859/where-keep-deployment-files-in-multi-container-multi-repository-project) +- [StackOverflow: Git repository setup for a Docker application + consisting of multiple + repositories](https://stackoverflow.com/questions/49918636/git-repository-setup-for-a-docker-application-consisting-of-multiple-repositorie) +- [Monorepo vs + Multi-repo](https://kinsta.com/blog/monorepo-vs-multi-repo/) +- [CircleCI: Benefits and challenges of monorepo development + practices](https://circleci.com/blog/monorepo-dev-practices/) +- [Monorepos: Please + don't!](https://medium.com/@mattklein123/monorepos-please-dont-e9a279be011b) +- [Coupling in Microservices, Part 2: Single vs. + Multi-Repo](https://medium.com/flippengineering/coupling-in-microservices-part-2-single-vs-multi-repo-35c5d5f3057b) +- [Earthly: Monorepo vs + polyrepo](https://earthly.dev/blog/monorepo-vs-polyrepo/) +- [Mono-Repo vs Multi-Repo: Throwing Light On Code Repository + Strategies](https://geekflare.com/code-repository-strategies/) +- [Monorepo vs multirepo decision: Using monorepos for rapid + iteration, polyrepos for sustained + stability](https://github.com/joelparkerhenderson/architecture-decision-record/tree/main/examples/monorepo-vs-multirepo) +- [Monorepos and the Fallacy of + Scale](https://presumably.de/monorepos-and-the-fallacy-of-scale.html) +- [Monorepo Vs Polyrepo Architecture: A Comparison For Effective + Software + Development](https://intuji.com/monorepo-vs-polyrepo-architecture/) Almost all sources say, which you choose depends on your own situation. ### Mono-repo -A mono-repo is where all code for a product is kept and developed in one Git repository. Many large companies like Google and Facebook use a mono-repo approach to developing their software products. This approach looks a bit like this, where each project is a folder under the main repo: +A mono-repo is where all code for a product is kept and developed in one +Git repository. Many large companies like Google and Facebook use a +mono-repo approach to developing their software products. This approach +looks a bit like this, where each project is a folder under the main +repo: -``` +``` main/ ├── .git/ ├── project1/ @@ -78,38 +143,67 @@ main/ └── build ``` -**Pros:** +**Pros:** -- It works quite well when the core software service will inevitably be deployed and used as a single service, for instance with Google's Search Engine. -- It's a bit easier for smaller teams to use a mono-repo approach since it allows the team to move a bit faster in developing the product. -- Project management and issue tracking all happens in one location, so it's easier for a dedicated coordinator or manager to track the project. -- Components can be tightly coupled and easily updated within a mono-repo. -- Single codebase is enticing, since it theoretically makes it easier to get onboarded, manage, and track progress on a project. -- All code is in one location, so finding code might be easier. +- It works quite well when the core software service will inevitably + be deployed and used as a single service, for instance with Google's + Search Engine. +- It's a bit easier for smaller teams to use a mono-repo approach + since it allows the team to move a bit faster in developing the + product. +- Project management and issue tracking all happens in one location, + so it's easier for a dedicated coordinator or manager to track the + project. +- Components can be tightly coupled and easily updated within a + mono-repo. +- Single codebase is enticing, since it theoretically makes it easier + to get onboarded, manage, and track progress on a project. +- All code is in one location, so finding code might be easier. **Cons:** -- Paradoxically, it is harder for smaller teams to manage the complexity of a mono-repo because of the reasons below. -- Within the open source world, mono-repo's are not common, so contributors might not know how to navigate the repo. -- Tend to require custom built CI/CD tooling rather than make use the many open source templates available. -- Versioning of the software happens all at once, so a change in one component requires a version update, even though other components don't change. -- Project and issue management all happens in the same repo, so for a small team with many many tasks to manage, it gets overwhelming to focus on what needs to be done. -- If one component could be used as an independent product, it would have to be split out of the mono-repo, otherwise people would have to install the whole product just to use the small component they actually need or want. -- Deploying and testing may take longer because you have to deploy and test the *whole* repo, so if a small change was made, that would trigger long deployment and testing times. -- Effective management of a project may require more complex git branching processes. -- It's more difficult to manage ownership of code on a per-directory level. -- If a bug or conflict occurs, it breaks the whole product. -- As a codebase grows, the complexity involved in managing a mono-repo can increase substantially. - -### Poly-repo +- Paradoxically, it is harder for smaller teams to manage the + complexity of a mono-repo because of the reasons below. +- Within the open source world, mono-repo's are not common, so + contributors might not know how to navigate the repo. +- Tend to require custom built CI/CD tooling rather than make use the + many open source templates available. +- Versioning of the software happens all at once, so a change in one + component requires a version update, even though other components + don't change. +- Project and issue management all happens in the same repo, so for a + small team with many many tasks to manage, it gets overwhelming to + focus on what needs to be done. +- If one component could be used as an independent product, it would + have to be split out of the mono-repo, otherwise people would have + to install the whole product just to use the small component they + actually need or want. +- Deploying and testing may take longer because you have to deploy and + test the *whole* repo, so if a small change was made, that would + trigger long deployment and testing times. +- Effective management of a project may require more complex git + branching processes. +- It's more difficult to manage ownership of code on a per-directory + level. +- If a bug or conflict occurs, it breaks the whole product. +- As a codebase grows, the complexity involved in managing a mono-repo + can increase substantially. + +### Poly-repo (also known as multi-repo) -While many large companies use a mono-repo approach, there are also large companies who use a poly-repo approach, like Amazon. Within the open source world, poly-repos are extremely common. For instance, the [tidyverse](https://github.com/orgs/tidyverse/repositories), [ROpenSci](https://ropensci.org/packages/), or [Gen3](https://github.com/orgs/uc-cdis/repositories) teams develop dozens of packages, and their teams are quite small. +While many large companies use a mono-repo approach, there are also +large companies who use a poly-repo approach, like Amazon. Within the +open source world, poly-repos are extremely common. For instance, the +[tidyverse](https://github.com/orgs/tidyverse/repositories), +[ROpenSci](https://ropensci.org/packages/), or +[Gen3](https://github.com/orgs/uc-cdis/repositories) teams develop +dozens of packages, and their teams are quite small. This structure looks a bit like: -``` +``` project1/ ├── .git/ ├── tests/ @@ -130,35 +224,59 @@ project3/ └── build ``` -In many ways, the pros and cons are the reverse with the mono-repo, but there are also other considerations included here. +In many ways, the pros and cons are the reverse with the mono-repo, but +there are also other considerations included here. **Pros:** -- Testing and deployment per unit of change in code is faster. -- Versioning of components is easier since each component is its own repo. -- Project management on a repo-level is easier, since issues and progress is smaller and more focused. -- Standard open source templates for CI/CD and other basic repo and build files can be used. -- Onboarding for contributors is easier if they come from the open source community. -- Open source projects very often follow this approach, so its easier to draw inspiration and learn how other projects do things. -- Using version control, managing the commit history, and working with pull requests can be easier, since they will be specific to the project. +- Testing and deployment per unit of change in code is faster. +- Versioning of components is easier since each component is its own + repo. +- Project management on a repo-level is easier, since issues and + progress is smaller and more focused. +- Standard open source templates for CI/CD and other basic repo and + build files can be used. +- Onboarding for contributors is easier if they come from the open + source community. +- Open source projects very often follow this approach, so its easier + to draw inspiration and learn how other projects do things. +- Using version control, managing the commit history, and working with + pull requests can be easier, since they will be specific to the + project. **Cons:** -- Creating another project requires time and setup. -- Project management at a organization level is a bit more challenging, since there are now multiple repos to manage rather than one. -- Onboarding of team members can be more tricky because there are now many repos to consider and keep mental track of then before. -- When something changes in one repo, managing its impact on other repos might be tricky, if strict de-coupling is not managed well and architectural designs aren't followed or developed soon enough. +- Creating another project requires time and setup. +- Project management at a organization level is a bit more + challenging, since there are now multiple repos to manage rather + than one. +- Onboarding of team members can be more tricky because there are now + many repos to consider and keep mental track of then before. +- When something changes in one repo, managing its impact on other + repos might be tricky, if strict de-coupling is not managed well and + architectural designs aren't followed or developed soon enough. ### Hybrid -In general, whether you decide on following a mono-repo or poly-repo approach, there is *always* some level of mono- or poly- structure in the codebase. It is a bit of a spectrum and no project is truly at either end. However, explicitly deciding on a hybrid approach doesn't seem like there are any benefits. +In general, whether you decide on following a mono-repo or poly-repo +approach, there is *always* some level of mono- or poly- structure in +the codebase. It is a bit of a spectrum and no project is truly at +either end. However, explicitly deciding on a hybrid approach doesn't +seem like there are any benefits. ## Decision outcome -Ultimately, we decided on using a poly-repo approach because we'll need to build workflows and processes for developing multiple repositories simultaneously anyway, in addition to the easier project-level management and ability for users to install individual components as well. It also works well for our ideal team workflow of working on individual projects in rotations. +Ultimately, we decided on using a poly-repo approach because we'll need +to build workflows and processes for developing multiple repositories +simultaneously anyway, in addition to the easier project-level +management and ability for users to install individual components as +well. It also works well for our ideal team workflow of working on +individual projects in rotations. ### Consequences -- Means we will need to develop templates for different project types (website, Python Package, R Package, Django App). -- We'll need to connect and synch common files across projects. -- We'll have to learn and apply team-based workflows around using this approach. +- Means we will need to develop templates for different project types + (website, Python Package, R Package, Django App). +- We'll need to connect and synch common files across projects. +- We'll have to learn and apply team-based workflows around using this + approach. diff --git a/why-postgres.qmd b/why-postgres.qmd index 3c62099..b601e89 100644 --- a/why-postgres.qmd +++ b/why-postgres.qmd @@ -16,112 +16,176 @@ categories: ## Context and problem statement -Building databases are best done through by using formal database systems. Most scientific research makes use of or uses databases that are relational rather than unstructured, and we believe that the user base for Seedcase will likely want or be familiar with a relational database structure. The type of database systems that are relational are called Structured Query Language ([SQL](https://en.wikipedia.org/wiki/SQL)). There are a large number of different SQL variants available, so we need to decide which one to use. - -We're only look at the top three open source relational databases (as defined by [DB-Engines](https://db-engines.com/en/ranking/relational+dbms) in November 2022): MySQL, PostgreSQL, and SQLite. - -As we are planning to use [container technology](why-containers.qmd) to run the database it is not as important which operating systems the database will run on. Having said that, of the three systems that we are looking at, MySQL and PostgreSQL will run on multiple operating systems (e.g. Linux, Mac OS, and Windows), and SQLite is a classic serverless application. - -A [side-by-side comparison](https://db-engines.com/en/system/MySQL%3BPostgreSQL%3BSQLite) -on [DB-Engines](https://db-engines.com) was used to compile some of the comparison below. +Building databases are best done through by using formal database +systems. Most scientific research makes use of or uses databases that +are relational rather than unstructured, and we believe that the user +base for Seedcase software will likely want or be familiar with a +relational database structure. The type of database systems that are +relational are called Structured Query Language +([SQL](https://en.wikipedia.org/wiki/SQL)). There are a large number of +different SQL variants available, so we need to decide which one to use. + +We're only look at the top three open source relational databases (as +defined by +[DB-Engines](https://db-engines.com/en/ranking/relational+dbms) in +November 2022): MySQL, PostgreSQL, and SQLite. + +As we are planning to use [container technology](why-containers.qmd) to +run the database it is not as important which operating systems the +database will run on. Having said that, of the three systems that we are +looking at, MySQL and PostgreSQL will run on multiple operating systems +(e.g. Linux, Mac OS, and Windows), and SQLite is a classic serverless +application. + +A [side-by-side +comparison](https://db-engines.com/en/system/MySQL%3BPostgreSQL%3BSQLite) +on [DB-Engines](https://db-engines.com) was used to compile some of the +comparison below. ## Decision drivers -One of the most important functions of Seedcase is to handle data, and the most efficient and flexible ways of doing this is to store it in a database. We need a SQL database system that fits our needs best. +One of the most important functions of Seedcase software is to handle +data, and the most efficient and flexible ways of doing this is to store +it in a database. We need a SQL database system that fits our needs +best. ## Considered options ### MySQL -[MySQL](www.mysql.com) was first released in 1995 and is maintained by Oracle Corp. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation languages are C and C++, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, and native APIs). +[MySQL](www.mysql.com) was first released in 1995 and is maintained by +Oracle Corp. It is an open source platform with the option to deploy +either as a local server solution or cloud based. The implementation +languages are C and C++, and it runs of a variety of operating systems. +The system allows access through standard technologies (ADO.NET, JDBC, +ODBC, and native APIs). ::: columns -::: {.column} +::: column +#### Benefits -#### Benefits +- At present the second most popular database both open source and + overall with good support and a large community. -* At present the second most popular database both open source and overall with good support and a large community. +- Traditional database system with a recognisable format which should + be easy to manipulate and work with for the advanced Seedcase user. -* Traditional database system with a recognisable format which should be easy to manipulate and work with for the advanced Seedcase user. - -* Support for both XML and JSON formats, both reading and writing. - -* There are a number of ways for MySQL to interact with Apache Parquet files. +- Support for both XML and JSON formats, both reading and writing. +- There are a number of ways for MySQL to interact with Apache Parquet + files. ::: -::: {.column} +::: column #### Drawbacks -* MySQL is run by Oracle which is a commercial entity. There is always a risk that the company decides to reverse the open source concept and go move to a solution with a free light version and full payable version. In the case of MySQL, it is very unlikely as the software is very well established and the user user community quite large. - -* There is currently no option in MySQL to store data in a columnar (rather than row-based) table. +- MySQL is run by Oracle which is a commercial entity. There is always + a risk that the company decides to reverse the open source concept + and go move to a solution with a free light version and full payable + version. In the case of MySQL, it is very unlikely as the software + is very well established and the user user community quite large. +- There is currently no option in MySQL to store data in a columnar + (rather than row-based) table. ::: ::: ### PostgreSQL -[PostgreSQL](www.postgresql.org) was first released in 1989 from UC Berkeley and is maintained by the PostgreSQL Development Group. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation language is C, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, a native C library, and streaming APIs). +[PostgreSQL](www.postgresql.org) was first released in 1989 from UC +Berkeley and is maintained by the PostgreSQL Development Group. It is an +open source platform with the option to deploy either as a local server +solution or cloud based. The implementation language is C, and it runs +of a variety of operating systems. The system allows access through +standard technologies (ADO.NET, JDBC, ODBC, a native C library, and +streaming APIs). ::: columns -::: {.column} +::: column +#### Benefits -#### Benefits +- At present, it is the fourth most popular database overall, and the + second most popular open source database. There is a thriving + community with a lot of engaging users delivering support. -* At present, it is the fourth most popular database overall, and the second most popular open source database. There is a thriving community with a lot of engaging users delivering support. +- Traditional database system with a recognisable format which should + be easy to manipulate and work with for the advanced Seedcase user. -* Traditional database system with a recognisable format which should be easy to manipulate and work with for the advanced Seedcase user. +- Support for both XML and JSON formats, both reading and writing. -* Support for both XML and JSON formats, both reading and writing. - -* There are scripts that will allow for PostgreSQL to interact with Apache Parquet files. - -* It is possible to create columnar based tables directly in PostgreSQL. +- There are scripts that will allow for PostgreSQL to interact with + Apache Parquet files. +- It is possible to create columnar based tables directly in + PostgreSQL. ::: -::: {.column} +::: column #### Drawbacks -* We don't see any major drawbacks. - +- We don't see any major drawbacks. ::: ::: ### SQLite -First released in 2000, SQLite is slightly different to the two systems described above, as it is an embedded serverless database primarily maintained by an international team of programmers (see [About SQLite](https://www.sqlite.org/about.html)). It is an open source platform with the option to deploy either locally or in the cloud. The implementation language is C, and it is platform independent. The system allows access through standard technologies (ADO.NET, JDBC, and ODBC). +First released in 2000, SQLite is slightly different to the two systems +described above, as it is an embedded serverless database primarily +maintained by an international team of programmers (see [About +SQLite](https://www.sqlite.org/about.html)). It is an open source +platform with the option to deploy either locally or in the cloud. The +implementation language is C, and it is platform independent. The system +allows access through standard technologies (ADO.NET, JDBC, and ODBC). ::: columns -::: {.column} +::: column +#### Benefits -#### Benefits +- Support for both XML and JSON formats, both reading and writing. -* Support for both XML and JSON formats, both reading and writing. - -* Easy to set up and implement, works well with R and other languages. - -* There is always a risk that an open source community will break apart and leave a product unsupported, but the risk here looks minimal. The explicitly stated intention from the core developers of SQLite is to support the product until at least 2050. +- Easy to set up and implement, works well with R and other languages. +- There is always a risk that an open source community will break + apart and leave a product unsupported, but the risk here looks + minimal. The explicitly stated intention from the core developers of + SQLite is to support the product until at least 2050. ::: -::: {.column} +::: column #### Drawbacks -* SQLite is not fully ACID (atomicity, consistency, isolation, and durability) compliant, which is always a risk when working with larger data sets. - -* The database is designed primarily as a tool to sit underneath applications running in single user mode. This means that the database does not as a standard support multi-user work. +- SQLite is not fully ACID (atomicity, consistency, isolation, and + durability) compliant, which is always a risk when working with + larger data sets. -* As the database is serverless it is quite possible that the target audience for the Seedcase project will struggle to work with the database in the instances where local development is needed. +- The database is designed primarily as a tool to sit underneath + applications running in single user mode. This means that the + database does not as a standard support multi-user work. +- As the database is serverless it is quite possible that the target + audience for the Seedcase Project will struggle to work with the + database in the instances where local development is needed. ::: ::: ## Decision outcome -We've decided to work with PostgreSQL as our backend database as it fulfils all our needs and is a very popular open source tool. MySQL would be the other obvious choice, the application does everything that Seedcase needs, but the user community for PostgreSQL seems to be a bit more active. SQLite is quite popular within the application developer community, but it doesn't have a reliable multi-user functionality, so it may be an uphill battle to get it to do the things we are hoping to do with Seedcase. +We've decided to work with PostgreSQL as our backend database as it +fulfils all our needs and is a very popular open source tool. MySQL +would be the other obvious choice, the application does everything that +Seedcase software needs, but the user community for PostgreSQL seems to +be a bit more active. SQLite is quite popular within the application +developer community, but it doesn't have a reliable multi-user +functionality, so it may be an uphill battle to get it to do the things +we are hoping to do with Seedcase products. ### Consequences -The main consequence of our choice is the limiting factor in who can work on the project. Anyone wanting to work on the database part will need to have an understanding of for instance [psql](https://www.postgresql.org/docs/current/app-psql.html), which is the command line tool to work with Postgres. There are not many differences between Postgres and the other large database systems, but there is always some differences in the version of SQL they use, and which terms are used. +The main consequence of our choice is the limiting factor in who can +work on the project. Anyone wanting to work on the database part will +need to have an understanding of for instance +[psql](https://www.postgresql.org/docs/current/app-psql.html), which is +the command line tool to work with Postgres. There are not many +differences between Postgres and the other large database systems, but +there is always some differences in the version of SQL they use, and +which terms are used. diff --git a/why-python.qmd b/why-python.qmd index f5065c6..b9b6c30 100644 --- a/why-python.qmd +++ b/why-python.qmd @@ -12,8 +12,8 @@ categories: --- ::: content-hidden -Use other decision posts as inspiration to writing these. -Leave the content-hidden sections in the text for future reference. +Use other decision posts as inspiration to writing these. Leave the +content-hidden sections in the text for future reference. ::: ## Context and problem statement @@ -23,11 +23,16 @@ State the context and some background on the issue, then write a statement in the form of a question for the problem. ::: -One of the first things to do when deciding to write a software application is to decide on the programming language. There are several languages that can be used, among them C++, Java, Python, and R. In the context of Seedcase it is important to chose a language that can handle large amounts -of data, provide efficient data processing capabilities, and integrate -well with other technologies commonly used in the research area. +One of the first things to do when deciding to write a software +application is to decide on the programming language. There are several +languages that can be used, among them C++, Java, Python, and R. In the +context of the Seedcase Project it is important to choose a language +that can handle large amounts of data, provide efficient data processing +capabilities, and integrate well with other technologies commonly used +in the research area. -> Which programming language should we use for developing the Seedcase application? +> Which programming language should we use for developing Seedcase +> software? ## Decision drivers @@ -36,102 +41,146 @@ List some reasons for why we need to make this decision and what things have arisen that impact work. ::: -In the context of Seedcase it is important to chose a language that can handle large amounts of data, provide efficient data processing capabilities, and integrate well with other technologies commonly used in the research area. There is also a consideration with regards to the skills already available in the core team, as we would like to minimize the amount of time that we will need to use in order to be able to program the application. +In the context of the Seedcase Project it is important to chose a +language that can handle large amounts of data, provide efficient data +processing capabilities, and integrate well with other technologies +commonly used in the research area. There is also a consideration with +regards to the skills already available in the core team, as we would +like to minimize the amount of time that we will need to use in order to +be able to program the application. ## Considered options ::: content-hidden -List and describe some of the options, as well as some of the benefits and -drawbacks for each option. +List and describe some of the options, as well as some of the benefits +and drawbacks for each option. ::: ### C++ -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Well-suited for real-time and performance-critical applications. -- Allows fine-grained control over memory and hardware resources, enabling low-level, use-case-specific optimisations. -- Compilation to native machine code for the target platform prior to execution guarantees better runtime performance than Python and, potentially by a smaller margin, Java. -- Static type-checking at compile time helps to catch certain types of bugs early on. -- Mature libraries and frameworks available for web development and data analysis. -- Active community and extensive resources. +- Well-suited for real-time and performance-critical applications. +- Allows fine-grained control over memory and hardware resources, + enabling low-level, use-case-specific optimisations. +- Compilation to native machine code for the target platform prior to + execution guarantees better runtime performance than Python and, + potentially by a smaller margin, Java. +- Static type-checking at compile time helps to catch certain types of + bugs early on. +- Mature libraries and frameworks available for web development and + data analysis. +- Active community and extensive resources. ::: -::: {.column} + +::: column #### Drawbacks -- Compilation to native machine code makes C++ programs more platform dependent, broadly speaking, than programs in interpreted languages. -- Use of platform-specific language features or libraries can lead to portability issues. -- Lack of automatic garbage collection means that developers need more awareness of how memory is managed in the application. -- Offer of database management libraries is more limited than for Java or Python. -- Syntax is verbose and less close to natural language, making development less rapid. -- No inline documentation generation system out of the box. -- Less widespread in academic and research communities. -- Has a steep learning curve, arguably steeper than Java's. +- Compilation to native machine code makes C++ programs more platform + dependent, broadly speaking, than programs in interpreted languages. +- Use of platform-specific language features or libraries can lead to + portability issues. +- Lack of automatic garbage collection means that developers need more + awareness of how memory is managed in the application. +- Offer of database management libraries is more limited than for Java + or Python. +- Syntax is verbose and less close to natural language, making + development less rapid. +- No inline documentation generation system out of the box. +- Less widespread in academic and research communities. +- Has a steep learning curve, arguably steeper than Java's. ::: ::: ### Java -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Code is run on a Java Virtual Machine, making Java programs, generally, platform independent (provided the host has a Java Runtime Environment installed). -- Better runtime performance than Python. -- Static type-checking at compile time helps to catch certain types of bugs early on. -- Comes with inline documentation generation system out of the box (Javadoc). -- Large ecosystem of mature libraries and frameworks for web development and database management. -- Active community and extensive resources. +- Code is run on a Java Virtual Machine, making Java programs, + generally, platform independent (provided the host has a Java + Runtime Environment installed). +- Better runtime performance than Python. +- Static type-checking at compile time helps to catch certain types of + bugs early on. +- Comes with inline documentation generation system out of the box + (Javadoc). +- Large ecosystem of mature libraries and frameworks for web + development and database management. +- Active community and extensive resources. ::: -::: {.column} + +::: column #### Drawbacks -- Fewer libraries for data analysis than Python. -- Syntax is verbose and less close to natural language, making development less rapid. -- Less widespread in academic and research communities. -- Has a steeper learning curve than Python, especially for people with little programming experience. +- Fewer libraries for data analysis than Python. +- Syntax is verbose and less close to natural language, making + development less rapid. +- Less widespread in academic and research communities. +- Has a steeper learning curve than Python, especially for people with + little programming experience. ::: ::: ### Python -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Code is executed by a Python interpreter, making Python programs platform independent (provided the host has an interpreter installed). -- Concise, easy-to-read, beginner-friendly syntax. -- Easy to iterate and prototype rapidly. -- Large ecosystem of mature libraries and frameworks for web development (e.g. [Django](https://www.djangoproject.com), [Flask](https://flask.palletsprojects.com/en/2.3.x/)), database management (e.g. [SQLAlchemy](https://www.sqlalchemy.org), Django's Object Relational Mapper), and data analysis. -- Active community and extensive resources. -- Very popular in data-intensive research and academia, some target users of Seedcase are likely to be familiar with Python. -::: -::: {.column} +- Code is executed by a Python interpreter, making Python programs + platform independent (provided the host has an interpreter + installed). +- Concise, easy-to-read, beginner-friendly syntax. +- Easy to iterate and prototype rapidly. +- Large ecosystem of mature libraries and frameworks for web + development (e.g. [Django](https://www.djangoproject.com), + [Flask](https://flask.palletsprojects.com/en/2.3.x/)), database + management (e.g. [SQLAlchemy](https://www.sqlalchemy.org), Django's + Object Relational Mapper), and data analysis. +- Active community and extensive resources. +- Very popular in data-intensive research and academia, some target + users of Seedcase are likely to be familiar with Python. +::: + +::: column #### Drawbacks -- Worse runtime performance than Java or C++, partly because the interpreter does less pre-execution optimisation and instead follows the layout of the source coded sequentially during program execution. -- Dynamic type-checking at runtime can make it more difficult to catch certain types of bugs early on. +- Worse runtime performance than Java or C++, partly because the + interpreter does less pre-execution optimisation and instead follows + the layout of the source coded sequentially during program + execution. +- Dynamic type-checking at runtime can make it more difficult to catch + certain types of bugs early on. ::: ::: ### R -::: {.columns} -::: {.column} +::: columns +::: column #### Benefits -- Designed for statistical computing with excellent data visualisation capabilities. -- Wide array of packages for data analysis. -- Active community and extensive resources. -- Very popular in data-intensive research and academia, some target users of Seedcase are likely to be familiar with R. +- Designed for statistical computing with excellent data visualisation + capabilities. +- Wide array of packages for data analysis. +- Active community and extensive resources. +- Very popular in data-intensive research and academia, some target + users of Seedcase are likely to be familiar with R. ::: -::: {.column} + +::: column #### Drawbacks -- Less versatile and general purpose than the other languages considered. -- While there is a web application framework for R ([Shiny](https://shiny.posit.co/)), this is intended primarily for interactive data visualisation, and offers more limited features than the frameworks for the other languages considered. +- Less versatile and general purpose than the other languages + considered. +- While there is a web application framework for R + ([Shiny](https://shiny.posit.co/)), this is intended primarily for + interactive data visualisation, and offers more limited features + than the frameworks for the other languages considered. ::: ::: @@ -142,12 +191,26 @@ What decision was made, use the form "We decided on CHOICE because of REASONS." ::: -We have decided to use Python as the main development language for Seedcase because it is a powerful, flexible and easy-to-use platform for building a data management web application. Drawing on the skills of the core Seedcase team, Python offers the best balance of capabilities and ease of development out of the options considered. It is, moreover, a language likely to be familiar to technologically-oriented Seedcase users and prospective contributors. - -While Java and C++ offer more scope for context-specific performance optimisation, we are unlikely to need this level of control for Seedcase, because the number of concurrent user requests is expected to be low, resource-intensive backend tasks are expected to run without much competition, and we don't expect to do heavy-duty, real-time data analysis or image processing. Therefore, for our use case, the advantages of these languages don't justify their steeper learning curve and added complexity. +We have decided to use Python as the main development language for +Seedcase software because it is a powerful, flexible and easy-to-use +platform for building a data management web application. Drawing on the +skills of the core Seedcase team, Python offers the best balance of +capabilities and ease of development out of the options considered. It +is, moreover, a language likely to be familiar to +technologically-oriented Seedcase users and prospective contributors. -We decided not to use R because, while it is a powerful language for data analysis and visualization, it is less suitable for building large-scale web applications. +While Java and C++ offer more scope for context-specific performance +optimisation, we are unlikely to need this level of control for Seedcase +software, because the number of concurrent user requests is expected to +be low, resource-intensive backend tasks are expected to run without +much competition, and we don't expect to do heavy-duty, real-time data +analysis or image processing. Therefore, for our use case, the +advantages of these languages don't justify their steeper learning curve +and added complexity. +We decided not to use R because, while it is a powerful language for +data analysis and visualization, it is less suitable for building +large-scale web applications. ### Consequences @@ -155,4 +218,8 @@ We decided not to use R because, while it is a powerful language for data analys List some potential consequences of this decision. ::: -If further on we run into performance issues, we will need to look into performance improvement strategies, such as optimising our algorithms, our database queries and organisation, and how much data is loaded into program memory at once. However, these are considerations that inform everyday development decisions in any case. +If further on we run into performance issues, we will need to look into +performance improvement strategies, such as optimising our algorithms, +our database queries and organisation, and how much data is loaded into +program memory at once. However, these are considerations that inform +everyday development decisions in any case. diff --git a/why-quarto.qmd b/why-quarto.qmd index 12f3426..c39032d 100644 --- a/why-quarto.qmd +++ b/why-quarto.qmd @@ -22,10 +22,10 @@ statement in the form of a question for the problem. A software project like Seedcase needs an easy way to communicate and share knowledge for those internal and external to Seedcase (including -users and contributors), the most common way being to put these content (like -documentation) on a website. In order to minimize the work the team -needs to do, we would like to use the files we maintain in GitHub as the -basis for a public website. The question then becomes: +users and contributors), the most common way being to put these content +(like documentation) on a website. In order to minimize the work the +team needs to do, we would like to use the files we maintain in GitHub +as the basis for a public website. The question then becomes: > How do we build a website based on the files in GitHub with minimal > amount of overhead and that can be integrated into GitHub in some way? @@ -41,8 +41,8 @@ There are many different types of "static website generators"[^1], like [Jekyll](https://jekyllrb.com/) or [Hugo](https://gohugo.io/). They all have their pros and cons, and we ultimately have to choose one to use when writing content for this website and to build it, as well as when -writing the documentation for Seedcase itself. We have the following -needs when it comes to this decision: +writing the documentation for Seedcase software itself. We have the +following needs when it comes to this decision: [^1]: A static website or blog generator is a framework for building websites based on pure, plain HTML files (unlike building websites diff --git a/why-standard-shortcuts.qmd b/why-standard-shortcuts.qmd index e63015c..d327d09 100644 --- a/why-standard-shortcuts.qmd +++ b/why-standard-shortcuts.qmd @@ -18,58 +18,108 @@ categories: ## Context and problem statement -The more documentation we create for Seedcase, the more important it becomes to ensure a smooth reader experience, both in terms of consistency between authors, and in terms of managing the keywords. +The more documentation we create for the Seedcase Project, the more +important it becomes to ensure a smooth reader experience, both in terms +of consistency between authors, and in terms of managing the keywords. -> How do we ensure a uniform layout, an easier to creating documentation, and a usable set of keywords as the documentation for Seedcase grows? +> How do we ensure a uniform layout, an easier to creating +> documentation, and a usable set of keywords as the documentation for +> the Seedcase Project grows? ## Decision drivers -As the documentation for Seedcase growing, and we have reached a level where it is impossible for one person to keep an overview and ensure a consistent layout of the pages. To help with this there are a few tools that can standardise how we write the documentation. [Quarto](why-quarto.qmd) can help with quite a few things (like consistent layout of links and keywords), but it could also be of benefit to us as a team, if it is easy to use the same layout for things like tables and hidden comments. - -We also need to find a way to ensure the consistent use of keywords, so that when a reader clicks a `tag` in a document they get all relevant pages, and don't miss any due to the fact that half are tagged in one way (eg. `database`) and the other half is tagged slightly differently (eg `databases`). +As the documentation for the Seedcase Project grows, we are likely to +reach a level where it is impossible for one person to keep an overview +and ensure a consistent layout of the pages. To help with this there are +a few tools that can standardise how we write the documentation. +[Quarto](why-quarto.qmd) can help with quite a few things (like +consistent layout of links and keywords), but it could also be of +benefit to us as a team, if it is easy to use the same layout for things +like tables and hidden comments. + +We also need to find a way to ensure the consistent use of keywords, so +that when a reader clicks a `tag` in a document they get all relevant +pages, and don't miss any due to the fact that half are tagged in one +way (eg. `database`) and the other half is tagged slightly differently +(eg `databases`). ## Considered options -We have so far looked at two ways of streamlining the writing of documentation through the use of code snippets and shared keywords, which can be set using the same settings file. There aren't many "generic" methods to share code snippets across IDE's (e.g. between RStudio or PyCharm), so we only investigated ways of adding these in VS Code. +We have so far looked at two ways of streamlining the writing of +documentation through the use of code snippets and shared keywords, +which can be set using the same settings file. There aren't many +"generic" methods to share code snippets across IDE's (e.g. between +RStudio or PyCharm), so we only investigated ways of adding these in VS +Code. ### Code snippets in Quarto markdown files (`.qmd`) -It is possible to set up and share code snippets for `.qmd` files using VS Code and GitHub allowing a team to share for instance formatting code, something that can greatly benefit documentation in particular. These can be added as a file (`.code-snippets`) in the `.vscode` folder for VS Code to find it. Since this a file, we can share it with the team through Git. In the case of Quarto code snippets, the setting file name would be called `quarto.code-snippets`. +It is possible to set up and share code snippets for `.qmd` files using +VS Code and GitHub allowing a team to share for instance formatting +code, something that can greatly benefit documentation in particular. +These can be added as a file (`.code-snippets`) in the `.vscode` folder +for VS Code to find it. Since this a file, we can share it with the team +through Git. In the case of Quarto code snippets, the setting file name +would be called `quarto.code-snippets`. #### Layout of the code-snippet file -Code snippets in this file are written as a JSON structure. Because the file extension is `.code-snippets` the file itself will ignore a couple of JSON file requirements. The main one being that there shouldn't be any comments (denoted by `//`) in the file itself, as well as some requirements to keep sections separated with commas. +Code snippets in this file are written as a JSON structure. Because the +file extension is `.code-snippets` the file itself will ignore a couple +of JSON file requirements. The main one being that there shouldn't be +any comments (denoted by `//`) in the file itself, as well as some +requirements to keep sections separated with commas. An example of the code used to insert a hidden comment is given below. ``` json - "Insert a hidden comment section":{ - "prefix": "hidden", - "body": [ - "::: content-hidden" - "${0:Write comments here}" - ":::" - ], - "description": "Insert a hidden content section" - } + "Insert a hidden comment section":{ + "prefix": "hidden", + "body": [ + "::: content-hidden" + "${0:Write comments here}" + ":::" + ], + "description": "Insert a hidden content section" + } ``` -As can be seen above there are some characters that are 'reserved' when writing the body. The one that will probably most frequently be an issue is the quotation mark `"`, if used in the body it should be prefixed with a backslash `\`, which here acts like an escape character. +As can be seen above there are some characters that are 'reserved' when +writing the body. The one that will probably most frequently be an issue +is the quotation mark `"`, if used in the body it should be prefixed +with a backslash `\`, which here acts like an escape character. ### Shared keywords for documentation files -We want to standardise keywords used in the `categories:` tag of posts to be consistent in both spelling and tenses. It is important for usability of the website that the words used are the same, and spelled the same, so that if a person is looking for how Seedcase handles databases they get everything by clicking on the keyword "database", and aren't missing half the documents because they are tagged with the key word "databases". +We want to standardise keywords used in the `categories:` tag of posts +to be consistent in both spelling and tenses. It is important for +usability of the website that the words used are the same, and spelled +the same, so that if a person is looking for how Seedcase software +handles databases they get everything by clicking on the keyword +"database", and aren't missing half the documents because they are +tagged with the key word "databases". The following guidelines for selecting key words apply: -* **Form** we prefer using nouns over verbs (eg. "documentation" over "documenting", "contribution" over "contributing") -* **Plural vs Singular** we prefer singular over plural (eg. "database" over "databases", and "container" over "containers") -* **More than one word** we prefer to write keywords that consist of more than one word without hyphens (eg. "design architecture" over "design-architecture"), the exception is of course where the word is normally spelled with a hyphen (eg. "back-end") +- **Form** we prefer using nouns over verbs (eg. "documentation" over + "documenting", "contribution" over "contributing") +- **Plural vs Singular** we prefer singular over plural (eg. + "database" over "databases", and "container" over "containers") +- **More than one word** we prefer to write keywords that consist of + more than one word without hyphens (eg. "design architecture" over + "design-architecture"), the exception is of course where the word is + normally spelled with a hyphen (eg. "back-end") #### Implementation -It seems the best way to implement this shared set of keywords is through the code snippets file. +It seems the best way to implement this shared set of keywords is +through the code snippets file. ## Decision outcome -We have decided that we would create and use a set of common code snippets, with the list of keywords for the `categories:` YAML metadata included as a snippet, because that would make it easier for us to write documentation consistently. We will be looking at other editors at a later date, but for now this is the tool used by most of the team, and thus a good place to start. +We have decided that we would create and use a set of common code +snippets, with the list of keywords for the `categories:` YAML metadata +included as a snippet, because that would make it easier for us to write +documentation consistently. We will be looking at other editors at a +later date, but for now this is the tool used by most of the team, and +thus a good place to start.