Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletions why-postgres.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ description: |
SQL is the backbone of any database system. However there are many variants
of SQL. This decision post contains the reasons for using PostgreSQL, which is
a powerful and feature-full variant of SQL.
author: Kristiane Beicher
date: last-modified
date: "2024-01-05"
categories:
- backend
- database
Expand Down Expand Up @@ -37,7 +36,7 @@ One of the most important functions of Seedcase is to handle data, and the most
[MySQL](www.mysql.com) was first released in 1995 and is maintained by Oracle Corp. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation languages are C and C++, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, and native APIs).

::: columns
::: {.column style="font-size: 90%"}
::: {.column}

#### Benefits

Expand All @@ -50,7 +49,7 @@ One of the most important functions of Seedcase is to handle data, and the most
* There are a number of ways for MySQL to interact with Apache Parquet files.

:::
::: {.column style="font-size: 90%"}
::: {.column}

#### Drawbacks

Expand All @@ -66,7 +65,7 @@ One of the most important functions of Seedcase is to handle data, and the most
[PostgreSQL](www.postgresql.org) was first released in 1989 from UC Berkeley and is maintained by the PostgreSQL Development Group. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation language is C, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, a native C library, and streaming APIs).

::: columns
::: {.column style="font-size: 90%"}
::: {.column}

#### Benefits

Expand All @@ -81,7 +80,7 @@ One of the most important functions of Seedcase is to handle data, and the most
* It is possible to create columnar based tables directly in PostgreSQL.

:::
::: {.column style="font-size: 90%"}
::: {.column}

#### Drawbacks

Expand All @@ -95,7 +94,7 @@ One of the most important functions of Seedcase is to handle data, and the most
First released in 2000, SQLite is slightly different to the two systems described above, as it is an embedded serverless database primarily maintained by an international team of programmers (see [About SQLite](https://www.sqlite.org/about.html)). It is an open source platform with the option to deploy either locally or in the cloud. The implementation language is C, and it is platform independent. The system allows access through standard technologies (ADO.NET, JDBC, and ODBC).

::: columns
::: {.column style="font-size: 90%"}
::: {.column}

#### Benefits

Expand All @@ -106,7 +105,7 @@ First released in 2000, SQLite is slightly different to the two systems describe
* There is always a risk that an open source community will break apart and leave a product unsupported, but the risk here looks minimal. The explicitly stated intention from the core developers of SQLite is to support the product until at least 2050.

:::
::: {.column style="font-size: 90%"}
::: {.column}

#### Drawbacks

Expand All @@ -121,7 +120,7 @@ First released in 2000, SQLite is slightly different to the two systems describe

## Decision Outcome

We've decided to work with PostgreSQL as our backend database as it fulfills all our needs and is a very popular open source tool. MySQL would be the other obvious choice, the application does everything that Seedcase needs, but the user community for PostgreSQL seems to be a bit more active. SQLite is quite popular within the application developer community, but it doesn't have a reliable multi-user functionality, so it may be an uphill battle to get it to do the things we are hoping to do with Seedcase.
We've decided to work with PostgreSQL as our backend database as it fulfils all our needs and is a very popular open source tool. MySQL would be the other obvious choice, the application does everything that Seedcase needs, but the user community for PostgreSQL seems to be a bit more active. SQLite is quite popular within the application developer community, but it doesn't have a reliable multi-user functionality, so it may be an uphill battle to get it to do the things we are hoping to do with Seedcase.

### Consequences

Expand Down
135 changes: 98 additions & 37 deletions why-python.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,57 +4,118 @@ description: |
Python is one of the most common and widely used programming languages.
It is used across multiple domains and industries, which means more people
would be familiar with using it.
author: "Richard Ding"
date: "2023-03-22"
date-modified: last-modified
categories:
- programming
- development
- software-architecture
---

<!-- TODO: Update this to match style of other posts -->
::: content-hidden
Use other decision posts as inspiration to writing these.
Leave the content-hidden sections in the text for future reference.
:::

## Introduction
## Context and Problem Statement

Since Seedcase is a data management system and software, it requires a
programming language for its development that can handle large amounts
::: content-hidden
State the context and some background on the issue, then write a
statement in the form of a question for the problem.
:::

One of the first things to do when deciding to write a software application is to decide on the programming language. There are several languages that can be used, among them C++, Java, Python, and R. In the context of Seedcase it is important to chose a language that can handle large amounts
of data, provide efficient data processing capabilities, and integrate
well with other technologies commonly used in the research area.

> Which programming language should we use for developing the Seedcase application?

## Decision Drivers

::: content-hidden
List some reasons for why we need to make this decision and what things
have arisen that impact work.
:::

In the context of Seedcase it is important to chose a language that can handle large amounts of data, provide efficient data processing capabilities, and integrate well with other technologies commonly used in the research area. There is also a consideration with regards to the skills already available in the core team, as we would like to minimize the amount of time that we will need to use in order to be able to program the application.

## Considered Options

We considered [Python](https://www.python.org), [Java](https://www.java.com/en/), [C++](https://cplusplus.com), and [R](https://www.r-project.org).
::: content-hidden
List and describe some of the options, as well as some of the benefits and
drawbacks for each option.
:::

### C++

::: {.columns}
::: {.column}
#### Benefits

- Item 1
:::
::: {.column}
#### Drawbacks

- Item 1
:::
:::

### Java

::: {.columns}
::: {.column}
#### Benefits

- Item 1
:::
::: {.column}
#### Drawbacks

- Item 1
:::
:::

### Python

::: {.columns}
::: {.column}
#### Benefits

- Item 1
:::
::: {.column}
#### Drawbacks

- Item 1
:::
:::

### R

::: {.columns}
::: {.column}
#### Benefits

- Item 1
:::
::: {.column}
#### Drawbacks

- Item 1
:::
:::

## Decision Outcome

We have decided to use Python as the main development language for the
following reasons:

- Is widely used in the research area, particularly in data science
and machine learning, and has a rich ecosystem of libraries and
tools for data processing and analysis.
- It's syntax is concise and easy to read, making it ideal for rapid
development and prototyping.
- Has a large community of developers who contribute to its
development, ensuring that it is constantly evolving and improving.
- Has strong support for web development, with a number of popular
frameworks such as [Django](https://www.djangoproject.com) and [Flask](https://flask.palletsprojects.com/en/2.3.x/), making it easy to build RESTful
APIs for Seedcase.
- Has excellent support for working with databases, with libraries
such as [SQLAlchemy](https://www.sqlalchemy.org) and Django ORM, making it easy to manage and
query large datasets.
- Is a cross-platform language, making it easy to deploy the system on
a variety of operating systems and hardware.

While Java and C++ are also capable languages for building data
management systems, they are generally more complex and have a steeper
learning curve than Python. R is a powerful language for data analysis
and visualization, but it is less suitable for building large-scale web
applications.

## Conclusion

Python is the most suitable option for this project, as it provides a
powerful, flexible, and easy-to-use platform for building a data
management system. Python is also one of the most common and widely used programming languages and is used across multiple domains and industries.
::: content-hidden
What decision was made, use the form "We decided on CHOICE because of
REASONS."
:::



### Consequences

::: content-hidden
List some potential consequences of this decision.
:::
75 changes: 52 additions & 23 deletions why-ruff.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@ description: |
Enforcing style of code with automatic linters and formatters is important for
code reviews to focus on content, not style. This post covers the reasons why we
decided on Ruff for our linting and formatting purposes.
author:
- "Richard Ding"
- "Luke Johnston"
date: 2023-11-27
date: "2023-11-27"
categories:
- contributing
- culture
Expand All @@ -24,7 +21,7 @@ categories:

## Context and Problem Statement

Humans are prone to error when writing, whether it is code or text. In a team setting, more people working on the same things increase the chance of more issues occuring. And writing code is not done for the computer, but for other humans to read, so readability and consistency in style become important when reviewing that code. So our problem is:
Humans are prone to error when writing, whether it is code or text. In a team setting, more people working on the same things increase the chance of more issues occurring. And writing code is not done for the computer, but for other humans to read, so readability and consistency in style become important when reviewing that code. So our problem is:

> How do we enforce a consistent style across people and code? And how do we catch simple errors that happen because of the style or format of the code?

Expand All @@ -38,7 +35,7 @@ Humans are prone to error when writing, whether it is code or text. In a team se

The terms "linting" or "formatting" are used to describe scanning, analysing, and (potentially) fixing code for style and typographical issues. The important difference between linting and formatting is that linting only tells you about the issues while formatting will fix (many of) the issues. Some issues can't be solved from formatting alone, so both linting and formatting are often used together.

There are many tools available for Python, with many websites that have detailed comparisons of them (like [this](https://realpython.com/python-code-quality/), [this](https://geekflare.com/python-linter-platforms/), or [this](https://github.com/caramelomartins/awesome-linters#python) website). Based on this list and based on quick searchs on Google, these are the tools that come up the most often:
There are many tools available for Python, with many websites that have detailed comparisons of them (like [this](https://realpython.com/python-code-quality/), [this](https://geekflare.com/python-linter-platforms/), or [this](https://github.com/caramelomartins/awesome-linters#python) website). Based on this list and based on quick searches on Google, these are the tools that come up the most often:

- [Pylint](https://github.com/pylint-dev/pylint)
- [Flake8](https://github.com/PyCQA/flake8)
Expand All @@ -49,61 +46,93 @@ Below is a detailed description of the pros and cons based on what others have w

### Pylint

- Pros:
::: {.columns}
::: {.column}
#### Benefits

- Very old, well-established linter
- Large community of users and contributors
- Very comprehensive list of checks
- Highly configurable
- Is integrated into many other tools (like flake8, black, and ruff)
- Linting feedback is extensive
- Cons:
- Too much configuration needed.
- Slow to run.
- Is often not needed to use on it's own because it is integrated with other tools.
- Linting feedback is extensive and a bit overwhelming.
:::
::: {.column}
#### Drawbacks

- Too much configuration needed
- Slow to run
- Is often not needed to use on it's own because it is integrated with other tools
- Linting feedback is extensive and a bit overwhelming
:::
:::

### Flake8

- Pros:
::: {.columns}
::: {.column}
#### Benefits

- Extensive list of checks
- Includes many other linters
- Often used with formatters like Black
- Customizable
- Large userbase and community
- Large user base and community
- Can use plugins to expand functionality
- Cons:
:::
::: {.column}
#### Drawbacks

- Only lints and doesn't format
- Is integrated into newer tools (like Ruff), so might not need to be used on it's own
:::
:::

### Black

- Pros:
- Is a code formatter, not linter.
- Opinionated set of rules for code formatting, so removes need to configure things.
::: {.columns}
::: {.column}
#### Benefits

- Is a code formatter, not linter
- Opinionated set of rules for code formatting, so removes need to configure things
- Recommend to use with a linter (often suggested to use flake8 or pylint)
- Cons:
:::
::: {.column}
#### Drawbacks

- Difficult to configure customizations
- Integrated/compatible with newer tools (like Ruff)
:::
:::

### Ruff

- Pros:
::: {.columns}
::: {.column}
#### Benefits

- Very fast
- Implements almost all of Black and flake8 features
- Implements many other features from other code analysis and checking tools
- Is in very active development
- Newer and has more modern development
- Configuration is available and relatively straightforward to use
- Can be implemented alongside other tools
- Cons:
:::
::: {.column}
#### Drawbacks

- Is still new, so bugs and other features are still being developed
- Does not yet have all of pylint features implemented
:::
:::

## Decision Outcome

We decided on Ruff because it is a newer tool that implements many of the other tools that exist. It also is designed to be mostly used "as is", without needing to customize many things. It is also seems to be design in a way that makes customizations relatively easy to set up.

## Potential Consequences

- We might miss out on some features from pylint (since right now we won't include pylint).
- There may be some bugs along the way because Ruff is relatively new, though this can be minimized by relying on more stable versions of it.
- We might miss out on some features from pylint (since right now we won't include pylint)
- There may be some bugs along the way because Ruff is relatively new, though this can be minimized by relying on more stable versions of it
5 changes: 1 addition & 4 deletions why-standard-shortcuts.qmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
---
title: "Why standardized snippets"
description: "The larger a project is, the more important it becomes to have a joint set of standards when writing documentation. We decided to set up shortcuts that are shared across the team, so that all documentation follows the same classification and formatting."
author: "Kristiane Beicher"
date: last-modified
date: "2023-11-23"
categories:
- code snippets
- communication
Expand All @@ -29,8 +28,6 @@ As the documentation for Seedcase growing, and we have reached a level where it

We also need to find a way to ensure the consistent use of keywords, so that when a reader clicks a `tag` in a document they get all relevant pages, and don't miss any due to the fact that half are tagged in one way (eg. `database`) and the other half is tagged slightly differently (eg `databases`).

<!--TODO Should we link to the page on Quarto above?-->

## Considered Options

We have so far looked at two ways of streamlining the writing of documentation through the use of code snippets and shared keywords, which can be set using the same settings file. There aren't many "generic" methods to share code snippets across IDE's (e.g. between RStudio or PyCharm), so we only investigated ways of adding these in VS Code.
Expand Down
Loading