Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples: Examples of .htaccess and README.md #3600

Merged
merged 9 commits into from
Sep 28, 2023
191 changes: 191 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# Examples

This permanent W3ID is meant to be a place to host examples for publishing things on w3id.org.

* [.htaccess](#htaccess)
* [What is .htaccess?](#what-is-htaccess)
* [Put ID info and maintainer info inside .htaccess](#put-id-info-and-maintainer-info-inside-htaccess)
* [Quick intro to URL writing rules](#quick-intro-to-url-rewriting-rules)
* [Example 1: Minimalist (grouping)](#example-1-minimalist-grouping)
* [Example 2: Supporting multiple media types (MIME types)](#example-2-supporting-multiple-media-types-mime-types)
* [Example 3: Dealing with query string](#example-3-dealing-with-query-string)
* [README.md](#readmemd)
* [Publish vocabularies with W3ID](#publish-vocabularies-with-w3id)

## .htaccess

`.htaccess` file is the key for the working of URL redirection service of W3ID. Without it, redirection cannot be done.

### What is `.htaccess?`

From [Wikipedia](https://en.wikipedia.org/wiki/.htaccess):
> An .htaccess (hypertext access) file is a directory-level configuration file
supported by several web servers, used for configuration of website-access
issues, such as URL redirection, URL shortening, access control (for different
web pages and files), and more. The 'dot' (period or full stop) before the
file name makes it a hidden file in Unix-based environments.

In W3ID context, it is used primarily for URL redirection. The `.htaccess` file is where you can put URL rewriting rules in. A set of URL rewriting rules will work together and effectively made URL
redirection happen.

### Put ID info and maintainer info inside .htaccess

You are encouraged to put a breif ID info and a maintainer info in the comment
(lines staring with `#` character) of a `.htaccess` file.

```ApacheConf
# Example
#
# https://w3id.org/example redirects to https://example.com/
Copy link
Collaborator

@dgarijo dgarijo Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. It should be https://w3id.org/examples . And right now we are missing the .htaccess file no?

Copy link
Contributor Author

@bact bact Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have fixed all the instances that mentioned https://w3id.org/example wrongly.

In that example, I replace it with w3id.org/examples/simple instead.

#
# ## Contact
# This space is administered by:
#
# Firstname Secondname
# email@example.com
# GitHub username: xxx

RewriteEngine on
RewriteRule ^ https://example.com/ [R=302,L]
```

### Quick intro to URL rewriting rules

A simple `.htaccess` file for URL redirection can look like this:
```ApacheConf
RewriteEngine on
RewriteRule ^ https://example.com/ [R=302,L]
```

`RewriteEngine on` in the first line tells the web server to turn URL rewriting engine on.

While the second line, starting with `RewriteRule`, is the actual rewriting rule.

This is the syntax of the *RewriteRule* directive:
```ApacheConf
RewriteRule Pattern Substitution [Flag1,Flag2,Flag3]
```

* *Pattern* is a Perl compatible regular expression, which means you can specify a sequence of characters to match pattern in the URL.
* For example, `^` matches the beginning of the text, `$` matches the end of the text, `.` matches any single character ("a", "7", or any character, one time), and `*` repeats the previous match zero or more times (so `.*` matches "a", "7", "xyz42", and an empty string).
* What this Pattern is compared against varies depending on where the RewriteRule directive is defined. In W3ID context, where a per-directory `.htaccess` is used, if the full requested URL is `https://w3id.org/example/subdir/file.html`, the text to be compared against will be `subdir/file.html`.
* *Substitution* is the string that replaces the text that was matched by Pattern. It can be part of the URL (URL-path) to be combined with the hostname later or it can be a full URL (absolute URL).
* In W3ID context, the Substitution tends to be an absolute URL to an external (non w3id.org) resource.
* *Flags* set [special actions](https://httpd.apache.org/docs/current/rewrite/flags.html) to be performed. Flags is a comma-separated list, surround by square brackets. They are optional. Common flags include `R` (redirect), `L` (last, stop processing the rule set), and `NE` (no character escape).

So in our first example:
```ApacheConf
RewriteRule ^ https://example.com/ [R=302,L]
```
It means:
* *Pattern:* `^` -- if a beginning of a string is found in the requested URL (it will always find, as every string must have a beginning):
* *Substitution:* `https://example.com/` -- replace the whole URL with https://example.com/
* *Flag1:* `R=302` -- issue a HTTP redirect, with [status code 302](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes), to the client
* *Flag2:* `L` -- stop processing the rule set

You can have more than one *RewriteRule* in a single `.htaccess` file.

See [Apache HTTP Server documentation](https://httpd.apache.org/docs/current/rewrite/intro.html) for more details about *RewriteRule*, regular expressions, and other directives.

### Example 1: Minimalist (grouping)

This 3 lines of code from [/mircat/.htaccess](https://github.com/perma-id/w3id.org/blob/master/mircat/.htaccess) redirects `https://w3id.org/mircat/<ANYTHING>` to `https://fairsharing.github.io/mircat/<ANYTHING>`.

The URL rewriting rule for that is:
```ApacheConf
RewriteRule ^(.*)$ https://fairsharing.github.io/mircat/$1 [R=302,L]
```

A sequence of characters matches inside a pair of parentheses, or "grouping", will be put in a computer memory and can be recalled by using a special character `$` followed with a group number (`$1`, `$2`, `$15`). A group number is starting from one.

So in the example above, every characters between the beginning of the string (`^`) and the end of the string (`$`) will be stored in group one and can be recalled by a character sequence `$1`.

If the request URL is `https://w3id.org/mircat/subdir/`, the *Pattern* `^(.*)$` will matched with the whole `subdir/`. As `.*` is inside parentheses, `subdir/` will be stored in group one.

When the *Substitution* `https://fairsharing.github.io/mircat/$1` is evaluated, `$1` will be replaced by `subdir/`, resulting a string `https://fairsharing.github.io/mircat/subdir`. This final string will be used for the redirection.


### Example 2: Supporting multiple media types (MIME types)

A web server can be configured to return different media type or file format depends on the client's request or capability.

[/ppop/.htaccess](https://github.com/perma-id/w3id.org/blob/master/ppop/.htaccess) demonstrates the use of `RewriteCond %{HTTP_ACCEPT}` to check which [media types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_Types) the client accepts or expects to be returned by the server.

A simplified version of the URL rewriting rules in that file will look like this:
```ApacheConf
AddType text/turtle .ttl

RewriteEngine on

# Rewrite rule to serve HTML content
RewriteCond %{HTTP_ACCEPT} text/html
RewriteRule ^$ https://protect.oeg.fi.upm.es/ppop/ppop.html [R=303,L]

# Rewrite rule to serve TTL content
RewriteCond %{HTTP_ACCEPT} text/turtle
RewriteRule ^$ https://protect.oeg.fi.upm.es/ppop/ppop.ttl [R=303,L]
```

The rule set utilizes the *RewriteCond* directive, which "defines a rule condition. One or more RewriteCond can precede a RewriteRule directive. The following rule is then only used if both the current state of the URI matches its pattern, and if these conditions are met" ([Apache HTTP Server documentation](https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond)).

The above example will have this behavior:
* If `%{HTTP_ACCEPT}` matches `text/html`, the server will return an HTML document (`ppop.html`)
* If `%{HTTP_ACCEPT}` matches `text/turtle`, the server will return a Turtle document (`ppop.ttl`)

The syntax of `RewriteCond` directive is:
```ApacheConf
RewriteCond TestString CondPattern [Flags]
```

Where *CondPattern* is a regular expresssion to match pattern in *TestString*.

In this case, *TestString* is `%{HTTP_ACCEPT}` which its value is taken from an `Accept` field in the HTTP request header. The `Accept` field can be a string like this:

```HTTP
Accept: text/html, application/xhtml+xml, application/xml, image/webp
```

Each media type will be presented, separated by a comma. With that, we can use *CondPattern* to matches media types in this string.


#### Example 3: Dealing with query string

Everything after the question mark (`?`) in the URL, but not that `?` itself, is a query string.

For example, for the URL `https://en.wikipedia.org/w/index.php?title=Web`, the query string is `title=Web`.

As the query string is not included in the string that the *Pattern* of *RewriteRule* will compared against, you cannot use *Pattern* to match them.

To find pattern in the query string, use `%{QUERY_STRING}` as a *TestString* in *RewriteCond*.

As an example, if you like to redirect the URL `https://w3id.org/example?a=1&b=2` to `https://example.com/path/file.php?a=1&b=2`, you can use this set of rules:

```ApacheConf
RewriteCond %{QUERY_STRING} (.*)
RewriteRule ^ https://example.com/path/file.php?%1? [R=302,L]
```

In this example, the *CondPattern* `(.*)` in *RewriteCond* will match every characters in the query string (`a=1&b=2`) and put it in group one. This group one (`%1`) is later used in the *Substitution* of *RewriteRule*.



Note that the special character to recall groups from *CondPattern* of *RewriteCond* is `%` (unlike the special character to recall groups from *Pattern* of *RewriteRule*, which is `$`).

The character `?` at the end of the *Substitution* of *RewriteRule* tells the server not to pass the query string to the final URL after rewrite.


## README.md

Each ID hosted on W3ID is expected to have a file named `README.md` containing an information about the ID itself and an information about the maintainer(s). This can be more elaborate than the information inside `.htaccess`.

The `.md` file extension at the end of the file name indicates that the file uses Markdown markup language for text formatting. You can have tables and images as well, if needed.

GitHub will automatically display the content of a `README.md` to repository visitors.

An example of a good README file: [w3id.org/dggs/README.md](https://github.com/perma-id/w3id.org/blob/master/dggs/README.md)


## Publish vocabularies with W3ID

If you plan to publish a vocabulary/ontology with W3ID,
see https://w3id.org/examples/ontology.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to link to /example/ directly? The one in /examples is a redirect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. That's better. I have changed that accordingly.

16 changes: 16 additions & 0 deletions examples/ontology/.htaccess
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Example repository
#
# This permanent w3id is meant to showcase an example on
# how to publish vocabularies with W3ID.
#
# https://w3id.org/examples/ontology redirects to
# https://w3id.org/example/
#
# ## Contact
# This space is administered by:
#
# Daniel Garijo
# GitHub username: dgarijo

RewriteEngine on
RewriteRule ^ https://w3id.org/example/ [R=303,L]
8 changes: 8 additions & 0 deletions examples/ontology/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Example repository

This permanent w3id is meant to showcase an example on how to publish vocabularies with w3id.

https://w3id.org/examples/ontology redirects to https://w3id.org/example/


Maintainer: Daniel Garijo (@dgarijo)