# Web Services

## Topics

Theory:
- What are the web services?
- Different types of web services

Demo:
- Demo for setting up minimal web API using ASP.NET
- Deeper look into ASP.NET
- Controllers in ASP.NET


## What are the Web Services

In short: web based application that returns information that is *typically* intended to be consumed by another application.

However, for the sake of academic knowledge seeking, it is worth diving deeper into the subject and to dissect the term.

### What is Web

Web is an ecosystem of standards and technologies mainly based on HTTP(S) protocol.

Web is based on client-server architecture[^1]:
1. Client opens up the TCP connection and sends a request to the server.
2. Server responds to the request.

[^1] HTTP/3 mixed up things a bit by allowing server to push resources to the client once the connection is established.

Most common type of a web client is an internet browser. Whenever you type an address into a web browser it sends an HTTP GET request to that server.

### How to define the "service"

[Cambridge dictionary](https://dictionary.cambridge.org/dictionary/english/service) defines *service* as: “a government system or private organization that is responsible for a particular type of activity, or for providing a particular thing that people need”.

When looking at the definition from systems perspective, then we can say it is something that can either:
- Gives us something that we want.
- Performs us some action that we want.

### Web-service

Web service can be loosely defined as a program that works over HTTP(s) and can be interacted with using machine-readable content types (formats).

Emphasis on machine-readable. It could be said that what differentiates web-sites from web-services is that web-site is intended to be interacted with by humans and web-services is intended to be interacted with by machines.

### URL structure

Web services have endpoints, which are defined by URLs.

URL defines specific resource and how to access. Any website address is URL. 

Example or an URL: `https://github.com/smagurauskas/software-engineering?something=maybe`.

A sample URL `https://github.com/smagurauskas/software-engineering?something=maybe` could be deconstructed into the following parts:

1. `https://` which denotes the protocol used for communication, in this case it is `https`.
2. `github.com` which denotes domain or a host. It can be further divided into `com` being the top level domain (TLD), `github` being second level domain and so on.
3. `smagurauskas/software-engineering` which denotes the path.
4. `something=maybe` which denotes the query parameters. Query parameters act as key-value pairs, where `something` is the key, and `maybe` is the value. Multiple query parameters can be provided by chaining them with the `&`.

URLs are defined in the [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986) memo.

### HTTP

HTTP stands for HyperText Transfer Protocol.

HTTP is an application layer protocol built on top of TCP communications protocol.

HTTP protocol defines a structure way how to request the data from the server and the canonical rules how the server should respond.

HTTP allows to make requests by specifying the request method, path, query parameters and various accompanying headers.

Methods (also can be called *verbs*) are typically used to identify what action should be performed on the specific resource with the request.

Headers provide additional information like how the request should be interpreted (read), how the server should respond, authorization information and much more.

Query allows to pass additional information for handling the request.

Some of the HTTP methods can have request *bodies*, for example `POST`. Payloads allow to transfer larger amounts of data, than can be fit into the URL or headers. Data like form inputs are passed via HTTP request body.

### Sample HTTP request

HTTP request in plain text looks like:

```text
GET  HTTP/1.1
HOST: github.com
Accept: text/html
```

In practice software engineers almost never form the requests manually, but rather use some HTTP library which abstracts most of the internals.

Good HTTP library allows to interact with high level code and assembles HTTP requests based on the input.

An example of how the most simplest HTTP `GET` request is formed in C# using the `HttpClient` class:
```csharp
var httpClient = new HttpClient();

var response = httpClient.GetAsync("https://github.com/smagurauskas/software-engineering");
```

### HTTP methods

HTTP uses methods (aka *verbs*) to identify actions that should be performed on the URL. Each action has canonical meaning that the method should do.

For example method like HTTP `GET` is (and should) only be used to fetch some data. On the other hand HTTP `POST` is used to transfer data or create a resource.

Some of the most frequently used methods are:
- `GET` - the "default" methods (at least for browsers) to retrieve the resource.
- `POST` - method for creating a resource or invoking a command. Has body where payload can be transferred.
- `PUT` - method for updating the resource, has a body similarly to `POST`.
- `DELETE` - method for deleting the resource.
- `OPTIONS` - used by browsers for CORS requests. [See more about CORS.](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)

There various other methods, each of which has their canonical meaning and use case assigned to it. Read more about them in [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-methods).

HTTP Methods can be further divided into idempotent and non-idempotent methods. Idempotent methods are the methods that do not change the state of the system, meaning that they can be be safely called multiple times and same result should be received everytime (assumin nothing else change the state). Primary example of idempotent methods is `GET`. Non-idempotent methods are methods that do change the state of the system, an example would be `POST`.

### HTTPS

HTTPS is an extension of HTTP with an added Transport Layer Security (TLS) which allows encryption of messages in transit. HTTPS allows encrypting HTTP requests so that only the receiver could decrypt it.

You can see that the service uses HTTPS by looking at the protocol in the URL: it should contain `https://` 🙃.

If another party were to see the request in transit, then it would not be able to make any sense to it.

HTTPS relies on Certificate Authorities (CA) for issuing certificates. A well known CA issues a certificate for a website and the client can check with the CA if it really did issue that certificate. This model relies on the notion that there is only a very limited amount of CAs present and unlimited amount of websites available. Operating systems *typically* come bundled with predefined list of CAs that are trusted. HTTPS client can then locally check if the certificate that was provided by the server is correctly signed by one of the CA.

[Read more on SSL/TLS here.](https://security.stackexchange.com/questions/20803/how-does-ssl-tls-work/20833#20833)

In the past due to the fact that there are a limited amount of well known CAs, getting a HTTPS certificate used to be a quite expensive. Currently there are non-profit CAs like [Let's encrypt](https://letsencrypt.org/) which issues certificates for free.

Due to certificates being so easy to obain nowadays, it is considered a bad practice not to run a production system on HTTPS.

### HTTP Content types

HTTP uses `Content-Type` header to identify what is the media format of request or response.

2 large groups of media format could be highlighted:
- `text` - human readable content types.
- `application` - machine readable content types.

Value of `Content-Type` header is called media type or MIME type. MIME stands for Multipurpose Internet Mail Extensions. MIME types are defined in [RFC 6838](https://datatracker.ietf.org/doc/html/rfc6838).

`Content-Type` header has a structure of `type/subtype`. It has additionally be follower by parameter following `;` after `subtype`.

`type` generally indicates what kind of content the message is going to contain. Among other types there are such as `text` and `application`.

`text` type messages are intended to be consumed by humans. They may not be readability read by humans like `text/html`, but it is intended that the content of such type has to be rendered (by the browser in case of HTML) and then it can be consumed by humans.

`application` types are structured so that they could be consumed by other applications. For `application` media types there are are usually special serialization and deserialization algorithms developed, that can transform serialized input text into native language object with corresponding values.

Even though `application` media format end consumer are supposed to be other applications, that does not mean that a human cannot read or make sense of it. Typically formats such as `xml` or `json` can be read by humans very well, it is just that these formats are simple enough, so they can be easily parsed by computers as well.

[More on MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types).

### Machine readable formats

Although there are [multiple `application` types](https://www.iana.org/assignments/media-types/media-types.xhtml#application) currently most common ones are `application/json`, `application/xml` and `application/yaml`.

Of these the most popular by far is `json`. Generally `json` is an optimal choice for most cases.

See more in [tag popularity in stack overflow questions](https://trends.stackoverflow.co/?tags=json,xml,yaml).

#### XML

XML stands for Stands for eXtensible Markup Language.

Standard format for Web API protocols such as SOAP. But with the fall in popularity of SOAP and related protocols, so fell the popularity of SOAP. XML is not currently popular for new developments. 

Fall in popularity also coincides with XML not being easily deserializable into typical object oriented languages, because of its attribute structure. There are typically some ambiguity in how XML attributes should be deserialized by default.

XML also has accompanying standards like XSLT which allows to transform XML documents into different ones, and XSD which allows to define a schema against which the XML document can be validated.

Sample `xml`:

```xml
<Courses>
    <Course name="Software Engineering" description="...">
        <Subject>Web Services</Subject>
        <Subject>APIs</Subject>
    </Course>
</Courses>
```

#### JSON

Stands for JavaScript Object Notation. 

`json` is native to JavaScript language, meaning that the serialized content could be directly pasted into JS script and would work.

`json` grew in popularity together with the rise of JavaScript. JavaScript is currently embedded into the browsers and browsers use it to support interactive behaviours. It is almost impossible to develop web application with high level of interactivity without using JavaScript.

JavaScript is also used for server side development via implementations like `node`. Availability of client side and server side development, along with other influencing factors, led to huge growth in JavaScript popularity, which in turn grew the popularity of `json`.

Due to its simplicity it is pretty easy to parse `json` files and it does not create much mental overhead.

Sample `json`:
```json
{
    "Courses": 
    [
        { 
            "Name": "Software Engineering",
            "Subjects": 
            [
                "Web Services",
                "APIs"
            ]
        }
    ]
}
```

### What is an API

API stands for Application Programming Interface.

Web services are considered to be APIs, but API term is not limited to web services.

Term API does not limit to web in any way, but it is used almost interchangeably. If you say "I am using API of {something}", most people will automatically assume you are using kind of web API.

APIs can carry data via other mechanisms than Web.

An example of non Web API could be IPC (inter-process communication) mechanisms like named pipes and memory mapped file, where one process writes to file that is stored in memory and other processes can read from that file. 

There is a good stack overflow answers on the topic how APIs relates to web services:

> An API (Application Programming Interface) is the means by which third parties can write code that interfaces with other code. A Web Service is a type of API, one that almost always operates over HTTP (though some, like SOAP, can use alternate transports, like SMTP). The official W3C definition mentions that Web Services don't necessarily use HTTP, but this is almost always the case and is usually assumed unless mentioned otherwise.

> For examples of web services specifically, see SOAP, REST, and XML-RPC. For an example of another type of API, one written in C for use on a local machine, see the Linux Kernel API.

> As far as the protocol goes, a Web service API almost always uses HTTP (hence the Web part), and definitely involves communication over a network. APIs in general can use any means of communication they wish. The Linux kernel API, for example, uses Interrupts to invoke the system calls that comprise its API for calls from user space.

[https://stackoverflow.com/questions/808421/api-vs-webservice/808467#808467](https://stackoverflow.com/questions/808421/api-vs-webservice/808467#808467).

## Common Web Service Architectures

Web services are sometimes are sometimes identified by their architectural pattern. The goal of web services architecture is to define the constraints against which web services are modeled.

The term "web service architecture" can mostly be used interchangeably with "API architecture".

Not every web service follows an web service architecture. Some web services can be very simplistic and just use convenient parts from HTTP standards. Not having any specific architecture does not mean that there is an inherent problem in that. Some business cases are simple enough that they do not require any sophisticated API design. 

The term "API architecture" is not definitive, and it can (and is) used interchangeably with the terms "protocol" or "standard." However, they all refer to the same thing.

All web service architectures are making some implicit or explicit trade-offs, for example prioritizing speed of API server development vs speed of API client development.

Web service architectures provide guidelines on how the API should allow users to interact with business logic and what the requests or responses should look like.

#### REST

REST is an acronym that stands for **Re**presentational **S**tate **T**ransfer. REST is a stateless web service architecture and makes heavy use of HTTP protocol.

REST was originally defined in [Roy Thomas Fielding dissertation](https://ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf) in 2000.

REST is best described by the "uniform interface" it is supposed to provide:
- Resources are identified by their URIs.
- HTTP standard is used to describe communication and actions.
- Resources representations are uncoupled from their internal representation.
- All the related resources must be navigable from any resource.

##### Resources are identified by their URIs

URIs fully describe the resources, including the protocol, location and resources themselves. In REST every individual resource must have an URI that would allow to interact with it.

Typically that includes nothing more than URL i.e. `https://mif.vu.lt/location#resource`. URL by definition does not include the final part of the example string (`#resource`), while URI does.

For example - if the study program has 10 courses, then every single one of the courses should have a URI, which identifies exactly it, i.e. `https://mif.vu.lt/software-engineering/se-1`.

[URL vs URN vs URI](https://www.pierobon.org/iis/url.htm).

##### HTTP standard is used to describe communication and actions

HTTP verbs are used to define the action on requested resource.

Meaning that:
- `GET https://mif.vu.lt/software-engineering/se-1` - should return the representation of resource.
- `DELETE https://mif.vu.lt/software-engineering/se-1` - should delete the resource.
- etc.

##### Resources representations are uncoupled from their internal representation

In practice this means that if the server must return proper `Content-Type` header that would explain to client how to parse the message.

Analogously client could request different `Content-Type` via `Accept` header and that should also be *fundamentally* supported by the server. *Fundamentally* in this case means that it does not mean that the server can expect any niche media format specific in `Accept` header, but it means that the implementation is detached in such a way that this should be possible in the server.

##### All the related resources must be navigable from any account

The most complicated constraint of Uniform Interface. Engineers tend to avoid this part, because of complexity of its implementation, however the RT Fielding highlighted that it is an essential part of REST.

In practice it means, that relates resources should be linked via their URIs in the representation:
```json
{
    "links": {
        "self": "https://domain/account/1",
        "next": "https://domain/account/2",
    },
    "account": {
        "owner": "Person name",
        "account_number": 123,
        "links": {
            "transfers": "https://domain/account/1/transfers",
            "withdrawals": "https://domain/account/1/withdrawals"
        }
    }
}
```

#### GraphQL

GraphQL is a query language for APIs. 

```gql
{
    hero {
        name
    }
}
```

the response would only include:
```json
{
    "data": {
        "hero": {
            "name": "R2-D2"
        }
    }
}
```

but it does not mean that `hero` only has `name`. A hero can have much more properties, but only the ones that are requested are returned. This provides a lot of flexibility from the client and from the server side.

GraphQL is typically served over HTTP, but the exact protocol of how it is served is not fully defined yet, but there is a draft version in the works at https://github.com/graphql/graphql-over-http.

Biggest advantage of GraphQL is that it allows to request the properties that the client wants explicitly, and via full navigational graph path. For example, given this request:

To change the resources GraphQL uses "mutations", which are defined in very similar syntax to queries.

https://studio.apollographql.com/public/star-wars-swapi/variant/current/explorer provides a nice playground to test and try out the GraphQL and how it works.

##### N+1 problem

Biggest concern with GraphQL is that it shifts the N+1 problem from the client side to the server side.

In essence the N+1 problem means that if the main resource (for instance `movie`) has relation to 5 other resource (for instance `actor`), then it would result in 6 (5 + 1) queries. It is easy to see how time-space complexity of GraphQL implementations can explode quadratically because of this.

To work around this problems GraphQL frameworks typically have Batch Loaders or similar capabilities to batch requests against the data source. This *usually* provides more upfront development effort on the API side, but potentially saves overall effort during the total product development.

#### SOAP

SOAP stands for **S**imple **O**bject **A**ccess **P**rotocol.

SOAP is an old API architecture that is still running in multiple legacy systems, but hardly any development is happening with it.

SOAP usually uses WSDL (Web Services Description Language) to describe its services and facilitate code generation.

SOAP is based on XML by the standard and has very specific request-response structure.

#### gRPC

gRPC is a remote procedure call framework (hence the RPC). gRPC uses protocol buffers `.proto` to define the interfaces. gRPC also allow bidirectional streaming.

As seen in https://grpc.io/docs/what-is-grpc/core-concepts/, gRPC allows to define four kinds of service methods:

Unary:
`rpc SayHello(HelloRequest) returns (HelloResponse);`

Server streaming:
`rpc LotsOfReplies(HelloRequest) returns (stream HelloResponse);`

Client streaming:
`rpc LotsOfGreetings(stream HelloRequest) returns (HelloResponse);`

Bidirectional streaming:
`rpc BidiHello(stream HelloRequest) returns (stream HelloResponse);`

`.proto` file example from https://learn.microsoft.com/en-us/aspnet/core/grpc :

```text
syntax = "proto3";

service Greeter {
  rpc SayHello (HelloRequest) returns (HelloReply);
}

message HelloRequest {
  string name = 1;
}

message HelloReply {
  string message = 1;
}
```

## Summary

Web service is a program that:

1. Works over HTTP(s) protocol.
2. Exposes an interface machine readable content types that is intended to be used by other programs.
3. Provides some kind of a service.
4. Web services can be modelled in many different ways, which are called "architectures".

### Further reading
- [On misconceptions of what the REST is and is not - https://twobithistory.org/2020/06/28/rest.html](https://twobithistory.org/2020/06/28/rest.html).