Skip to content

Commit

Permalink
cleanup markdown docs across multiple files (#14296)
Browse files Browse the repository at this point in the history
enable markdown-linter
  • Loading branch information
harshavardhana committed Feb 12, 2022
1 parent 2c0f121 commit e3e0532
Show file tree
Hide file tree
Showing 71 changed files with 1,028 additions and 600 deletions.
25 changes: 25 additions & 0 deletions .github/workflows/markdown-lint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Markdown Linter

on:
pull_request:
branches:
- master

# This ensures that previous jobs for the PR are canceled when the PR is
# updated.
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true

jobs:
lint:
name: Lint all docs
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v2

- name: Lint all docs
run: |
npm install -g markdownlint-cli
markdownlint --fix '**/*.md' --disable MD013 MD025 MD040 MD024
1 change: 1 addition & 0 deletions COMPLIANCE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# AGPLv3 Compliance

We have designed MinIO as an Open Source software for the Open Source software community. This requires applications to consider whether their usage of MinIO is in compliance with the GNU AGPLv3 [license](https://github.com/minio/minio/blob/master/LICENSE).

MinIO cannot make the determination as to whether your application's usage of MinIO is in compliance with the AGPLv3 license requirements. You should instead rely on your own legal counsel or licensing specialists to audit and ensure your application is in compliance with the licenses of MinIO and all other open-source projects with which your application integrates or interacts. We understand that AGPLv3 licensing is complex and nuanced. It is for that reason we strongly encourage using experts in licensing to make any such determinations around compliance instead of relying on apocryphal or anecdotal advice.
Expand Down
25 changes: 19 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,17 @@
Start by forking the MinIO GitHub repository, make changes in a branch and then send a pull request. We encourage pull requests to discuss code changes. Here are the steps in details:

### Setup your MinIO GitHub Repository

Fork [MinIO upstream](https://github.com/minio/minio/fork) source repository to your own personal repository. Copy the URL of your MinIO fork (you will need it for the `git clone` command below).

```sh
$ git clone https://github.com/minio/minio
$ go install -v
$ ls /go/bin/minio
git clone https://github.com/minio/minio
go install -v
ls /go/bin/minio
```

### Set up git remote as ``upstream``

```sh
$ cd minio
$ git remote add upstream https://github.com/minio/minio
Expand All @@ -25,13 +27,15 @@ $ git merge upstream/master
```

### Create your feature branch

Before making code changes, make sure you create a separate branch for these changes

```
$ git checkout -b my-new-feature
git checkout -b my-new-feature
```

### Test MinIO server changes

After your code changes, make sure

- To add test cases for the new code. If you have questions about how to do it, please ask on our [Slack](https://slack.min.io) channel.
Expand All @@ -40,29 +44,38 @@ After your code changes, make sure
- To run `make test` and `make build` completes.

### Commit changes

After verification, commit your changes. This is a [great post](https://chris.beams.io/posts/git-commit/) on how to write useful commit messages

```
$ git commit -am 'Add some feature'
git commit -am 'Add some feature'
```

### Push to the branch

Push your locally committed changes to the remote origin (your fork)

```
$ git push origin my-new-feature
git push origin my-new-feature
```

### Create a Pull Request

Pull requests can be created via GitHub. Refer to [this document](https://help.github.com/articles/creating-a-pull-request/) for detailed steps on how to create a pull request. After a Pull Request gets peer reviewed and approved, it will be merged.

## FAQs

### How does ``MinIO`` manage dependencies?

``MinIO`` uses `go mod` to manage its dependencies.

- Run `go get foo/bar` in the source folder to add the dependency to `go.mod` file.

To remove a dependency

- Edit your code and remove the import reference.
- Run `go mod tidy` in the source folder to remove dependency from `go.mod` file.

### What are the coding guidelines for MinIO?

``MinIO`` is fully conformant with Golang style. Refer: [Effective Go](https://github.com/golang/go/wiki/CodeReviewComments) article from Golang project. If you observe offending code, please feel free to send a pull request or ping us on [Slack](https://slack.min.io).
54 changes: 32 additions & 22 deletions README.md

Large diffs are not rendered by default.

9 changes: 5 additions & 4 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ you need access credentials for a successful exploit).

If you have not received a reply to your email within 48 hours or you have not heard from the security team
for the past five days please contact the security team directly:
- Primary security coordinator: aead@min.io
- Secondary coordinator: harsha@min.io
- If you receive no response: dev@min.io

- Primary security coordinator: aead@min.io
- Secondary coordinator: harsha@min.io
- If you receive no response: dev@min.io

### Disclosure Process

Expand All @@ -32,7 +33,7 @@ MinIO uses the following disclosure process:
If the report is rejected the response explains why.
3. Code is audited to find any potential similar problems.
4. Fixes are prepared for the latest release.
5. On the date that the fixes are applied a security advisory will be published on https://blog.min.io.
5. On the date that the fixes are applied a security advisory will be published on <https://blog.min.io>.
Please inform us in your report email whether MinIO should mention your contribution w.r.t. fixing
the security issue. By default MinIO will **not** publish this information to protect your privacy.

Expand Down
17 changes: 8 additions & 9 deletions VULNERABILITY_REPORT.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
## Vulnerability Management Policy
# Vulnerability Management Policy

This document formally describes the process of addressing and managing a
reported vulnerability that has been found in the MinIO server code base,
any directly connected ecosystem component or a direct / indirect dependency
of the code base.

### Scope
## Scope

The vulnerability management policy described in this document covers the
process of investigating, assessing and resolving a vulnerability report
Expand All @@ -14,26 +14,25 @@ opened by a MinIO employee or an external third party.
Therefore, it lists pre-conditions and actions that should be performed to
resolve and fix a reported vulnerability.

### Vulnerability Management Process
## Vulnerability Management Process

The vulnerability management process requires that the vulnerability report
contains the following information:

- The project / component that contains the reported vulnerability.
- A description of the vulnerability. In particular, the type of the
- The project / component that contains the reported vulnerability.
- A description of the vulnerability. In particular, the type of the
reported vulnerability and how it might be exploited. Alternatively,
a well-established vulnerability identifier, e.g. CVE number, can be
used instead.

Based on the description mentioned above, a MinIO engineer or security team
member investigates:

- Whether the reported vulnerability exists.
- The conditions that are required such that the vulnerability can be exploited.
- The steps required to fix the vulnerability.
- Whether the reported vulnerability exists.
- The conditions that are required such that the vulnerability can be exploited.
- The steps required to fix the vulnerability.

In general, if the vulnerability exists in one of the MinIO code bases
itself - not in a code dependency - then MinIO will, if possible, fix
the vulnerability or implement reasonable countermeasures such that the
vulnerability cannot be exploited anymore.

38 changes: 19 additions & 19 deletions docs/bigdata/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ MinIO also supports multi-cluster, multi-site federation similar to AWS regions

## **2. Prerequisites**

* Install Hortonworks Distribution using this [guide.](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/ch_Installing_Ambari.html)
* [Setup Ambari](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/set_up_the_ambari_server.html) which automatically sets up YARN
* [Installing Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/installing-spark/content/installing_spark.html)
* Install MinIO Distributed Server using one of the guides below.
* [Deployment based on Kubernetes](https://docs.min.io/docs/deploy-minio-on-kubernetes.html#minio-distributed-server-deployment)
* [Deployment based on MinIO Helm Chart](https://github.com/helm/charts/tree/master/stable/minio)
- Install Hortonworks Distribution using this [guide.](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/ch_Installing_Ambari.html)
- [Setup Ambari](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/set_up_the_ambari_server.html) which automatically sets up YARN
- [Installing Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/installing-spark/content/installing_spark.html)
- Install MinIO Distributed Server using one of the guides below.
- [Deployment based on Kubernetes](https://docs.min.io/docs/deploy-minio-on-kubernetes.html#minio-distributed-server-deployment)
- [Deployment based on MinIO Helm Chart](https://github.com/helm/charts/tree/master/stable/minio)

## **3. Configure Hadoop, Spark, Hive to use MinIO**

Expand All @@ -37,10 +37,10 @@ Navigate to **Custom core-site** to configure MinIO parameters for `_s3a_` conne

```
sudo pip install yq
alias kv-pairify='xq ".configuration[]" | jq ".[]" | jq -r ".name + \"=\" + .value"'
alias kv-pairify='yq ".configuration[]" | jq ".[]" | jq -r ".name + \"=\" + .value"'
```

Let's take for example a set of 12 compute nodes with an aggregate memory of *1.2TiB*, we need to do following settings for optimal results. Add the following optimal entries for _core-site.xml_ to configure _s3a_ with **MinIO**. Most important options here are
Let's take for example a set of 12 compute nodes with an aggregate memory of _1.2TiB_, we need to do following settings for optimal results. Add the following optimal entries for _core-site.xml_ to configure _s3a_ with **MinIO**. Most important options here are

```
cat ${HADOOP_CONF_DIR}/core-site.xml | kv-pairify | grep "mapred"
Expand All @@ -56,7 +56,7 @@ mapreduce.task.io.sort.factor=999 # Threshold before writing to disk
mapreduce.task.sort.spill.percent=0.9 # Minimum % before spilling to disk
```

S3A is the connector to use S3 and other S3-compatible object stores such as MinIO. MapReduce workloads typically interact with object stores in the same way they do with HDFS. These workloads rely on HDFS atomic rename functionality to complete writing data to the datastore. Object storage operations are atomic by nature and they do not require/implement rename API. The default S3A committer emulates renames through copy and delete APIs. This interaction pattern causes significant loss of performance because of the write amplification. *Netflix*, for example, developed two new staging committers - the Directory staging committer and the Partitioned staging committer - to take full advantage of native object storage operations. These committers do not require rename operation. The two staging committers were evaluated, along with another new addition called the Magic committer for benchmarking.
S3A is the connector to use S3 and other S3-compatible object stores such as MinIO. MapReduce workloads typically interact with object stores in the same way they do with HDFS. These workloads rely on HDFS atomic rename functionality to complete writing data to the datastore. Object storage operations are atomic by nature and they do not require/implement rename API. The default S3A committer emulates renames through copy and delete APIs. This interaction pattern causes significant loss of performance because of the write amplification. _Netflix_, for example, developed two new staging committers - the Directory staging committer and the Partitioned staging committer - to take full advantage of native object storage operations. These committers do not require rename operation. The two staging committers were evaluated, along with another new addition called the Magic committer for benchmarking.

It was found that the directory staging committer was the fastest among the three, S3A connector should be configured with the following parameters for optimal results:

Expand Down Expand Up @@ -95,8 +95,8 @@ fs.s3a.threads.max=2048 # Maximum number of threads for S3A

The rest of the other optimization options are discussed in the links below

* [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)
* [https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html](https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html)
- [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)
- [https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html](https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html)

Once the config changes are applied, proceed to restart **Hadoop** services.

Expand Down Expand Up @@ -187,16 +187,16 @@ Test the Spark installation by running the following compute intensive example,

Follow these steps to run the Spark Pi example:

* Login as user **‘spark’**.
* When the job runs, the library can now use **MinIO** during intermediate processing.
* Navigate to a node with the Spark client and access the spark2-client directory:
- Login as user **‘spark’**.
- When the job runs, the library can now use **MinIO** during intermediate processing.
- Navigate to a node with the Spark client and access the spark2-client directory:

```
cd /usr/hdp/current/spark2-client
su spark
```

* Run the Apache Spark Pi job in yarn-client mode, using code from **org.apache.spark**:
- Run the Apache Spark Pi job in yarn-client mode, using code from **org.apache.spark**:

```
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
Expand All @@ -223,9 +223,9 @@ WordCount is a simple program that counts how often a word occurs in a text file

The following example submits WordCount code to the Scala shell. Select an input file for the Spark WordCount example. We can use any text file as input.

* Login as user **‘spark’**.
* When the job runs, the library can now use **MinIO** during intermediate processing.
* Navigate to a node with Spark client and access the spark2-client directory:
- Login as user **‘spark’**.
- When the job runs, the library can now use **MinIO** during intermediate processing.
- Navigate to a node with Spark client and access the spark2-client directory:

```
cd /usr/hdp/current/spark2-client
Expand Down Expand Up @@ -269,7 +269,7 @@ Type :help for more information.
scala>
```

* At the _scala>_ prompt, submit the job by typing the following commands, Replace node names, file name, and file location with your values:
- At the _scala>_ prompt, submit the job by typing the following commands, Replace node names, file name, and file location with your values:

```
scala> val file = sc.textFile("s3a://testbucket/testdata")
Expand Down
11 changes: 11 additions & 0 deletions docs/bucket/lifecycle/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ Transition tiers can be added to MinIO using `mc admin tier add` command to asso
Lifecycle transition rules can be applied to buckets (both versioned and un-versioned) by specifying the tier name defined above as the transition storage class for the lifecycle rule.

## Implementation

ILM tiering takes place when a object placed in the bucket meets lifecycle transition rules and becomes eligible for tiering. MinIO scanner (which runs at one minute intervals, each time scanning one sixteenth of the namespace), picks up the object for tiering. The data is moved to the remote tier in entirety, leaving only the object metadata on MinIO.

The data on the backend is stored under the `bucket/prefix` specified in the tier configuration with a custom name derived from a randomly generated uuid - e.g. `0b/c4/0bc4fab7-2daf-4d2f-8e39-5c6c6fb7e2d3`. The first two prefixes are characters 1-2,3-4 from the uuid. This format allows tiering to any cloud irrespective of whether the cloud in question supports versioning. The reference to the transitioned object name and transitioned tier is stored as part of the internal metadata for the object (or its version) on MinIO.

Extra metadata maintained internally in `xl.meta` for a transitioned object

```
...
"MetaSys": {
Expand All @@ -21,6 +23,7 @@ Extra metadata maintained internally in `xl.meta` for a transitioned object
```

When a transitioned object is restored temporarily to local MinIO instance via PostRestoreObject API, the object data is copied back from the remote tier, and additional metadata for the restored object is maintained as referenced below. Once the restore period expires, the local copy of the object is removed by the scanner during its periodic runs.

```
...
"MetaUsr": {
Expand All @@ -29,16 +32,24 @@ When a transitioned object is restored temporarily to local MinIO instance via P
"x-amz-restore": "ongoing-request=false, expiry-date=Sat, 27 Feb 2021 00:00:00 GMT",
...
```

### Encrypted/Object locked objects

For objects under SSE-S3 or SSE-C encryption, the encrypted content from MinIO cluster is copied as is to the remote tier without any decryption. The content is decrypted as it is streamed from remote tier on `GET/HEAD`. Objects under retention are protected because the metadata present on MinIO server ensures that the object (version) is not deleted until retention period is over. Administrators need to ensure that the remote tier bucket is under proper access control.

### Transition Status

MinIO specific extension header `X-Minio-Transition` is displayed on `HEAD/GET` to predict expected transition date on the object. Once object is transitioned to the remote tier,`x-amz-storage-class` shows the tier name to which object transitioned. Additional headers such as "X-Amz-Restore-Expiry-Days", "x-amz-restore", and "X-Amz-Restore-Request-Date" are displayed when a object is under restore/has been restored to local MinIO cluster.

### Expiry or removal events

An object that is in transition tier will be deleted once the object hits expiry date or if removed via `mc rm` (`mc rm --vid` in the case of delete of a specific object version). Other rules specific to legal hold and object locking precede any lifecycle rules.

### Additional notes

Tiering and lifecycle transition are applicable only to erasure/distributed MinIO.

## Explore Further

- [MinIO | Golang Client API Reference](https://docs.min.io/docs/golang-client-api-reference.html#SetBucketLifecycle)
- [Object Lifecycle Management](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html)
Loading

0 comments on commit e3e0532

Please sign in to comment.