# Designing Distributed Logging System

One of the most challenging aspects of debugging distributed systems is understanding system behavior in the period leading up to a bug.
As we all know by now, a distributed system is made up of microservices calling each other to complete an operation.
Multiple services can talk to each other to complete a single business requirement.

In this architecture, logs are accumulated in each machine running the microservice. A single microservice can also be deployed to hundreds of nodes. In an archirectural setup where multiple microservices are interdependent, and failure of one service can result in failures of other services. If we do not have well organized logging, we might not be able to determine the root cause of failure.

## Restrain Log Size
At any given time, the distributed system logs hundreds of concurrent messages. 
The number of logs increases over time. But, not all logs are important enough to be logged.
To solve this, logs have to be structured. We need to decide what to log into the system on the application or logging level.

## Functional requirements
- The logging mechanism should be secure and not vulnerable. Access to logs should be for authenticated users and necessary read-only permissions granted to everyone.
- The system should avoid logging sensitive information like credit cards numbers, passwords, and so on.
- Since logging is a I/O-heavy operation, the system should avoid logging excessive information. Logging all information is unnecessary. It only takes up more space and impacts performance.
- Avoid logging personally identifiable information (PII) such as names, addresses, emails, etc.

### Log sampling
Storage and processing resources is a constraint. We must determine which messages we should log into the system so as to control volume of logs generated.

High-throughput systems will emit lots of messages from the same set of events. Instead of logging all the messages, we can use a sampler service that only logs a smaller set of messages from a larger chunk. The sampler service can use various sampling algorithms such as adaptive and priority sampling to log events. For large systems with thousands of microservices and billions of events per seconds, an appropriate 

### Structured logging
The first benefit of structured logs is better interoperability between log readers and writers.
Use structured logging to make the job of log processing system easier. 

### Categorization
The following severity levels are commonly used in logging:
- `DEBUG`
- `INFO`
- `WARNING`
- `ERROR`
- `CRITICAL`

