# Cloud Native Go

books:
* Titmus, Matthew A. **Cloud Native Go**. 2021. O'Reilly Media.
  * Go version: 1.21 in book, 1.22 in action

| #   | Title                                          |
| :-- | :--------------------------------------------- |
| 1   | [[#5.1 What Is a “Cloud Native” Application?]] |
| 2   | [[#5.2 Why Go Rules the Cloud Native World]]   |
| 3   | [[#5.3 Go Language Foundations]]               |
| 4   | [[#5.4 Cloud Native Patterns]]                 |
| 5   | [[#5.5 Building a Cloud Native Service]]       |
| 6   | [[#5.6 It’s All About Dependability]]          |
| 7   | [[#5.7 Scalability]]                           |
| 8   | [[#5.8 Loose Coupling]]                        |
| 9   | [[#5.9 Resilience]]                            |
| 10  | [[#5.10 Manageability]]                        |
| 11  | [[#5.11 Observability]]                        |


- Part I. Going Cloud Native: 1 - 2
- Part II. Cloud Native Go Constructs: 3 - 5
- Part III. The Cloud Native Attributes: 6 - 11

# What Is a “Cloud Native” Application?

> CNCF definition
> Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
> 
> These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
>
> The Cloud Native Computing Foundation seeks to drive adoption of this paradigm by fostering and sustaining an ecosystem of open source, vendor-neutral projects. We democratize state-of-the-art patterns to make these innovations accessible for everyone.
>
> 云原生技术有利于各组织在公有云、私有云和混合云等新型动态环境中，构建和运行可弹性扩展的应用。云原生的代表技术包括容器、服务网格、微服务、不可变基础设施和声明式API。
>
> 这些技术能够构建容错性好、易于管理和便于观察的松耦合系统。结合可靠的自动化手段，云原生技术使工程师能够轻松地对系统作出频繁和可预测的重大变更。
>
> 云原生计算基金会（CNCF）致力于培育和维护一个厂商中立的开源生态系统，来推广云原生技术。我们通过将最前沿的模式民主化，让这些创新为大众所用。

- Scalability
- Loose Coupling
- Resilience
- Manageability
- Observability

> The move towards “cloud native” is *an example of architectural and technical adaptation*, driven by environmental pressure and selection. It’s evolution—survival of the fittest. Bear with me here; I’m a biologist by training.


# Why Go Rules the Cloud Native World
- the motivation behind Go: programming langauge doesn't keep up with the needs of modern software development
- features for a cloud native world
	- Low program comprehensibility: hard to read codes
	- Slow builds
	- Inefficiency
	- High cost of updates

Go for rescure
- Composition and Structural Typing
- Comprehensibility: Its minimalist design (just 25 keywords and 1 loop type), and the strong opinions of its compiler, strongly favor clarity over cleverness.
- CSP-Style Concurrency: Do not communicate by sharing memory. Instead, share memory by communicating - Go Proverb
- Fast Builds
- Linguistic Stability
- Memory Safety
- Performance
- Static Linking
- Static Typing


# Go Language Foundations
- basic data types
	- booleans
	- simple numbers
	- complex numbers
	- strings
- variables
	- short variable declaration
	- zero values
	- blank identifier
	- constants
- container types
	- arrays
	- slices
	- maps
- pointers
- control structure
	- `for`
	- `if`
	- `switch`
- error handling
- function variadics and closures
	- anonymous functions
- structs, methods, interfaces
	- composition with type embedding
- concurrency
	- goroutines
	- channels



# Cloud Native Patterns
- standard library [`context`](https://pkg.go.dev/context) - see also [[book.Learning Go#5.12 The Context]]
	- by sharing context, cancellation signals can be coordinated among processes
	- creating context: `Background`, `TOOD`
	- defining context deadlines and timeouts: `WithDeadline`, `WithTimeout`, `WithCancel`
	- defining request-scoped values: `WithValue`

- stability patterns
	- **circuit breaker**: Circuit Breaker automatically degrades service functions in response to a likely fault, preventing larger or cascading failures by eliminating recurring errors and providing reasonable error responses.
	- **debounce**: Debounce limits the frequency of a function invocation so that only the first or last in a cluster of calls is actually performed.
	- **retry**: Retry accounts for a possible transient fault in a distributed system by transparently retrying a failed operation.
	- **throttle**: Throttle limits the frequency of a function call to some maximum number of invocations per unit of time.
	- **timeout**: Timeout allows a process to stop waiting for an answer once it’s clear that an answer may not be coming.
- concurrency patterns
	- **fan-in**: Fan-in multiplexes multiple input channels onto one output channel.
	- **fan-out**: Fan-out evenly distributes messages from an input channel to multiple output channels.
	- **future**: Future provides a placeholder for a value that’s not yet known.
	- **sharding**: Sharding splits a large data structure into multiple partitions to localize the effects of read/write locks.



# Building a Cloud Native Service
- a key-value store: [code](https://github.com/zhoujiagen/learning-cloudnative/tree/main/compute/go-cloudnative/key-value-store)

idempotence: `f(f(x)) = f(x)`


# It’s All About Dependability

Figure 6-1. The system attributes and means that contribute to dependability
- **Availability**: The ability of a system to perform its intended function at a random moment in time.
- **Reliability**: The ability of a system to perform its intended function for a given time interval.
- **Maintainability**: The ability of a system to undergo modifications and repairs.

Figure 6-2. The four means of achieving dependability, and their corresponding cloud native attributes
- Fault prevention   - Scalability, Loose coupling
	- Good programming practices
	- Language features
	- Scalability
	- Loose coupling
- Fault tolerance    - Resilience 
- Fault removal      - Manageability
	- Verification and testing
	- Manageability
- Fault forecasting  - Observability

see also [[The Twelve-Factor App]]



# Scalability
> the ability of a system to continue to provide correct service in the face of significant changes in demand

forms of scaling:
- Vertical: scale up
- Horizontal: scale out
resource bottleneck:
- CPU
- Memory
- Disk I/O
- Network I/O

stateless:
- states: application state, resource state

Efficiency:
- LRU Cache:
	- [`golang/groupcache`](https://github.com/golang/groupcache)
	- [`hashicorp/golang-lru`](https://github.com/hashicorp/golang-lru)
- Synchronization
	- [`sync`](https://pkg.go.dev/sync)
	- sharding
	- goroutine leak

Service Architecture:
- Monolith
- Microservice
- Serverless
	- API Gateway



# Loose Coupling

tight coupling forms:
- Fragile exchange protocols
- Shared dependencies
- Shared point-in-time
- Fixed addresses

Communications Between Services:
- synchronous: request-reply
	- REST: `net/https`
	- RPC
		- `net/rpc`
		- gRPC: `google.golang.org/grpc`
	- GraphQL
- asynchronous: publish-subscribe

loose coupling local resources:
- standard library [`plugin`](https://pkg.go.dev/plugin): `go build -buildmode=plugin`
- HashiCorp’s Go plugin system: [`hashicorp/go-plugin`](https://github.com/hashicorp/go-plugin)

Hexagonal Architecture
- the core application
- ports and adapters
- actors
```shell
├── core
│ └── core.go
├── frontend
│ ├── grpc.go
│ └── rest.go
├── main.go
└── transact
	├── filelogger.go
	└── pglogger.go
```



# Resilience
resilience v.s. reliability
- The **resilience** of a system is the degree to which it can continue to operate correctly in the face of errors and faults. 
- The **reliability** of a system is its ability to behave as expected for a given time interval。

building for resilience:
- redundancy
	- autoscaling
	- health check
		- liveness checks
		- shallow health checks
		- deep health checks
- circuit breaker
- request throttle: `Throttle`
- load shedding: `loadSheddingMiddleware`
- retrying requests: 
	- backoff algorithms
	- idempotence
- timeouts: use `context.Context`



# Manageability
manageability v.s. mantainability
- **Manageability** describes the ease with which changes can be made to the behavior of a system, typically without having to resort to changing its code. In other words, it’s how easy it is to change to a system from the **outside**.
- **Maintainability** describes the ease with which a software system or component can be modified to change or add capabilities, correct faults or defects, or improve performance, usually by making changes to the code. In other words, it’s how easy it is to change a system from the **inside**.

manageability functions:
- configuration and control
	- see also [[The Twelve-Factor App]] III
	- environment variables: `os.Getenv`, `os.LookupEnv`, [spf13/viper](https://github.com/spf13/viper)
	- command-line arguments: `flag`, [spf13/cobra](https://github.com/spf13/cobra)
	- files: JSON `encoding/json`, YAML [go-yaml/yaml](https://github.com/go-yaml/yaml). hashsing with `crypto`, use filesystem notification with [fsnotify](https://github.com/fsnotify/fsnotify)
	- distributed key-value store: etcd, HashiCorp Consul
	- a central source code repository
	- Kubernetes ConfigMap
- monitoring, logging and alerting
- deployment and updates
- service discovery and inventory

> *Feature flagging* (or *feature toggling*) is a software development pattern designed to increase the speed and safety with which new features can be developed and delivered by allowing specific functionality to be turned on or off during runtime, without having to deploy new code.

feature management
- levels:
	- no feature flag
	- hard-coded feature flag
	- configurable feature flag
	- dynamic feature flag
- see also [OpenFeature](https://openfeature.dev/), [LaunchDarkly](https://launchdarkly.com/blog/what-are-feature-flags/)


# Observability
> **Observability** is a system property, no different than resilience or manageability, that reflects how well a system’s internal states can be inferred from knowledge of its external outputs. 

Three Pillars of Observability:
- **Tracing**: Tracing (or distributed tracing) follows a request as it propagates through a (typically distributed) system, allowing the entire end-to-end request flow to be reconstructed as a directed acyclic graph (DAG) called a *trace*. Analysis of these traces can provide insight into how a system’s components interact, making it possible to pinpoint failures and performance issues.
- **Metrics**: Metrics involves the collection of numerical data points representing the state of various aspects of a system at specific points in time.
- **Logging**: Logging is the process of appending records of noteworthy events to an immutable record—the log—for later review or analysis. `log`, [uber-go/zap](https://github.com/uber-go/zap)
	- Treat logs as streams of events
	- Structure events for parsing
	- Less is (way) more
	- Dynamic sampling

> OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs and traces) to help you analyze your software's performance and behavior.

[OpenTelemetry](https://github.com/open-telemetry)
- Specifications
- API
- SDK
- Exporters: [Prometheus Exeporters](https://prometheus.io/docs/instrumenting/exporters/), to local log file/`stdout`, remote [Jaeger](https://www.jaegertracing.io/)
- Collector

```go
// OpenTelemetry v0.17.0
// tracing
import (
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/stdout"
	"go.opentelemetry.io/otel/exporters/trace/jaeger"
	"go.opentelemetry.io/otel/label"
	export "go.opentelemetry.io/otel/sdk/export/trace"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	"go.opentelemetry.io/otel/trace"
)

// metrics
import (
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/metric/prometheus"
	"go.opentelemetry.io/otel/label"
	"go.opentelemetry.io/otel/metric"
)
```


# More

- [Code](https://github.com/cloud-native-go/examples)

- [[#5.1 What Is a “Cloud Native” Application?]]
	- def: [CNCF Cloud Native Definition v1.0](https://github.com/cncf/toc/blob/main/DEFINITION.md)

- [[#5.4 Cloud Native Patterns]]
	- blog: L Peter Deutsch. **Fallacies of Distributed Computing**.
	- [Azure Cloud Design Patterns](https://learn.microsoft.com/en-us/azure/architecture/patterns/)
	- book: **Cloud Native Infrastructure** by Justin Garrison and Kris Nova (O’Reilly) 
	- book: **Designing Distributed Systems** by Brendan Burns (O’Reilly)

- [[#5.5 Building a Cloud Native Service]]
	- package: [Gorilla web toolkit](https://gorilla.github.io/)
	- book:  **Docker: Up & Running: Shipping Reliable Containers in Production** by Sean P. Kane and Karl Matthias (O’Reilly)
	- book:  **Kubernetes: Up and Running** by Brendan Burns, Joe Beda, and Kelsey Hightower (O’Reilly)

- [[#5.6 It’s All About Dependability]]
	- book: **Site Reliability Engineering: How Google Runs Production Systems**
	- book: Boris Scholl, Trent Swanson, Peter Jausovec. **Cloud Native: Using Containers, Functions, and Data to Build Next-Generation Applications **. O'Reilly Media: 2019. ISBN: 9781492053828.
	- blog: [The State of Caching in Go](https://dgraph.io/blog/post/caching-in-go/)

- [[#5.7 Scalability]]
	- book: Kanat-Alexander, Max. **Code Simplicity: The Science of Software Design**. O’Reilly Media: 2012.
	- blog: [Share Memory By Communicating](https://go.dev/blog/codelab-share)
	- blog: [Why is a Goroutine’s stack infinite ?](https://dave.cheney.net/2013/06/02/why-is-a-goroutines-stack-infinite)
	- blog: [Go: How Does Go Recycle Goroutines?](https://medium.com/a-journey-with-go/go-how-does-go-recycle-goroutines-f047a79ab352)
	- blog: [Never start a goroutine without knowing how it will stop](https://dave.cheney.net/2016/12/22/never-start-a-goroutine-without-knowing-how-it-will-stop)

- [[#5.8 Loose Coupling]]
	- book: Kasun Indrasiri, Danesh Kuruppu. **gRPC: Up and Running**. O’Reilly Media: 2020.
	- blog: [Hexagonal architecture](https://alistair.cockburn.us/hexagonal-architecture/)

- [[#5.9 Resilience]]
	- link: [How complex systems fail](https://how.complexsystems.fail/)
	- book: **Reliability and Availability Engineering** by Kishor S. Trivedi and Andrea Bobbio (Cambridge University Press)
	- book: **Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems** by Heather Adkins. O'Reilly Media: 2020.

- [[#5.10 Manageability]]
	- book: Kernighan, Brian W., and P. J. Plauger. **The Elements of Programming Style**. McGraw-Hill, 1978
	- spec: **Systems and Software Engineering: Vocabulary**. ISO/IEC/IEEE 24765:2010(E). [link](https://www.cse.msu.edu/~cse435/Handouts/Standards/IEEE24765.pdf)
	- blog: [What Is Manageability?](https://www.ni.com/en/shop/electronic-test-instrumentation/add-ons-for-electronic-test-and-instrumentation/what-is-systemlink-tdm-datafinder-module/what-is-rasm/what-is-manageability-.html)
	- blog: [JSON and Go](https://go.dev/blog/json)

- [[#5.11 Observability]]
	- book: **Distributed Tracing in Practice** by Austin Parker, Daniel Spoonhower, Jonathan Mace, Ben Sigelman, and Rebecca Isaacs (O’Reilly).
	- paper: Sigelman, Benjamin H., et al. **Dapper, a Large-Scale Distributed Systems Tracing Infrastructure**. Google Technical Report, Apr. 2010.