In [None]:
!jupyter nbconvert --to html --TemplateExporter.exclude_code_cell=True --TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True scc2425-proj1-tukano.ipynb 2> /dev/null

# Cloud Computing Systems – 2024/25

## Project Assigment #1

Version: 29-10-2024, changes: removed Spark Computations. 

## Deadlines

+ Code: November 3
+ Report: November 10
+ 1 grade point penalty per day late.

## Introduction

The goal of this project is to understand how services available in cloud computing platforms can be used for creating applications that are scalable, fast, and highly available.

The project will consist in porting an existing web application to the Microsoft Azure Cloud platform. To that end, the centralized solution that is provided will need to be modified to leverage the Azure PaaS portfolio, in ways that agree with current cloud computing engineering best practices.

As part of the end result, besides the implementation code, a final report will need to explain the design choices and provide a performance evaluation of the solution, for at least two deployment scenarios. One scenario will match a web application with high popularity at a regional (continental) scale; the other targets a global scale, where the application will have users/clients spanning across multiple continents.

## TuKano

The starting point is a web application named **TuKano** that implements a social network inspired in existing video sharing services, such as [TikTok](https://en.wikipedia.org/wiki/TikTok) or [Youtube Shorts](https://en.wikipedia.org/wiki/YouTube_Shorts). TuKano users can upload short videos to be viewed (and liked) by other users of the plataform. The social network aspect of TuKano resides on having users *follow* other users, as the main way for the platform to populate the *feed* of shorts each user can visualize.

### Architecture

TuKano is organized as a three-tier architecture, where the application-tier, comprises three [REST](https://en.wikipedia.org/wiki/REST) services:

+ Users - for managing users individual information;
+ Shorts - for managing the shorts metadata and the social networking aspects, such as *users feeds*, *user follows* and *likes*.
+ Blobs - for managing the media blobs that represent the actual videos.


### Workflow 

One issue to consider is that uploading a short video is performed in two steps, as follows:
The **Shorts** service is contacted to create the *metadata* that will be associated with the *short video*. This is represented by a ***Short*** object. The returned *metadata* contains the URL where the short video media needs to be uploaded to. This URL points to the upload endpoint in **Blobs** service and includes a **token** to ensure the safety of the upload procedure. Namely, the upload operation needs to match a short that has been created through the proper endpoint and the upload must occur within the allowed time limit.

Following a user has the side-effect of populating the **feed** of follower with the shorts of the followee.

### Source Code

The Java source code of the application can be found [here](https://github.com/smduarte/scc2425/tree/main/scc2425-tukano).

This version of TuKano consists of a single application server (web server) to host the three services together. (Unlike the original version, where the three services executed in dedicated servers and relied on a discovery mechanism to find each other).

Other adaptations introduced to the original source code are:
+ Services reference each other directly by normal method calls via *getInstance()* methods (instead of relying on REST clients).
+ Shorts database "persistence" has been configured in [hibernate.cfg.xml](https://github.com/smduarte/scc2425/blob/main/scc2425-tukano/hibernate.cfg.xml) to use memory instead of a local filesystem directory.

#### Endpoints

The service endpoints are documented under [tukano.api](https://github.com/smduarte/scc2425/tree/main/scc2425-tukano/src/main/java/tukano/api). 

Java interfaces are used to model the abstract semantics of the TuKano services' operations, whereas the actual REST enpoints can be found in [tukano.api.rest](https://github.com/smduarte/scc2425/tree/main/scc2425-tukano/src/main/java/tukano/api/rest). 

These endpoints are already implemented in the provided code at the application-layer. The goal is to replace the data-layer currently used with suitable services from the Azure Cloud Platform, without affecting the semantics of the endpoints exposed to the hypothetical presentation-layer (clients).

Note: Small "tweaks" to endpoint semantics are allowed, for example, to account for user authentication or match token expiration values to the Azure platform. If deemed relevant, they should be discussed in the final report. 

## Deliverables

The project assignment consists of two deliverables.

+ The source code of the TuKano backend ported to the Azure Cloud platform made available as a GitHub repository and the commit point representing the finished solution;

+ A written report describing the ported solution and explaining how it leverages the Azure PaaS portfolio. This report should also provide a performance evaluation of the solution. The performance analysis should strive to objetively show the impact of the design choices and services on performance metrics such as throughput and operation latency. To that end, the base TuKano application should be used as the baseline for any comparisons. 

## Minimum Requirements (up to 13)

The ported TuKano application must use [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Azure Cosmos DB](https://azure.microsoft.com/en-us/products/cosmos-db), [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache)

Performance analysis for a single geographic region (Europe).

Notes:

+ Azure Cosmos DB for PostgreSQL + Hibernate can be used to fulfil the CosmosDB requirement.
+ The impact on performance of using a cache (or not) should be included in the performance analysis report.

## Base Solution (up to 15)

### NoSQL vs SQL

The solution supports two alternative persistence storage backends for the *Shorts* and *Users* databases: CosmosDB NoSQL and CosmosDB for PostgreSQL.

+ The impact on performance/latency of using each backend should be analysed, with or without cache.

### User Session

Include support for authenticated users via cookies stored in the cache.

## Advanced Features (up to 20)

Below are advanced features that can be implemented for an improved grade.

#### Geo-Replication support

The solution has support for geo-replicated deployment, implying the TuKano userbase spans multiple geographic regions.

+ The impact on performance/latency of of geo-replicated scenario should be analysed, using at least two regions: Europe and North America.

#### Couting Views

Shorts metadata is extended with ***total views*** statistics, refreshed as fast as feasible. Views should be incremented based on blob downloads. Views counters should grow monotonically.

+ Must leverage Azure services, not rely solely on application server logic.

#### Tukano Recomends

Every user automatically follows a system managed user named "Tukano Recomends". This user will republish selected content from the collection of shorts publish by general TuKano userbase. The criteria from choosing which videos are select open.

+ Must leverage Azure services, not rely solely on application server logic.


## Azure Functions ~~and (Spark) Computations-~

The solution makes use of Azure Functions ~~and/or  (Spark) Computations~~ in a meaninful way. For example,
the advanced features listed above are implemented with the help of one or both technologies.

# Grading

Grading will take into acount: code quality, soundness and merit of the design choices, quality and depth of the report. This includes the methodology and analysis of the resuls from the experimental evaluation of the delivered solution.

Selected projects may have to be defended in person, in an oral discussion. When justified, separate individual grades will be assigned.  

# Penalties

Lack of evidence, for example in the Github repository, of a meaninful contribution to the solution on an individual student basis.

Excessive use or undisclosed use of AI tools.