# Lab 7
## Data Structures & Algorithms
26th & 27th March

## Today

* [Some more Big O stuff](#bigo)
* [Introducing Plotly (for plotting in Flask)](#plotly)
* [A toy example: a dashboard to track page loads](#dashboard)
* [Exercises](#exercises)

# Some more Big O examples <a class="anchor" id="bigo"></a>

We are interested in analysing an algorithm that returns the largest sum of any contiguous subarray of an input array. Let's go through this step by step first and then look at the algorithm's time complexity.


## Slightly optimised algo:

In [None]:
def largest_sum_contiguous_subarray(array):
    max_sum = 0
    # loop through start points of the array
    for i in range(len(array)):
        sum_subarray = 0
        # loop through end points of the array
        for j in range(i, len(array)):
            # keep track of the subarray that ends with j (so that at j+1, we don't have to calculate it again)
            sum_subarray += array[j]
            if max_sum < sum_subarray:
                max_sum = sum_subarray
    return max_sum

In [None]:
array_0 = [ ]
array_1 = [2]
array_2 = [-2, 3]
array_3 = [-2, 3, -2, 4, -1, -2]

print(largest_sum_contiguous_subarray(array_0))
print(largest_sum_contiguous_subarray(array_1))
print(largest_sum_contiguous_subarray(array_2))
print(largest_sum_contiguous_subarray(array_3))

### Complexity

To analyse the the time complexity of the algorithm, we break it down into its loops:

* **Outer loop**: iterates through each element of the array of size `n` 
    - it runs `n` times
* **Inner loop**: iterates through elements of array starting from current index `i` all the way to the end
    - in the worst case, this is `n` operations 
    - NB: *within* the inner-most loop: constant time operations ($O(1)$ - does not depend on size of array), so this can be ignored 


We could stop here and say that it's $O(n^2)$ because of this worst-case scenario.

Or, in more detail: 
- for `i=0` the inner loop runs `n` times, 
- for `i=1` the inner loop runs `n-1` times, 
- and so on

$$n + (n-1) + (n-2) + ... +1$$
This simplifies to:
$$\frac{n(n+1)}{2}$$

So, in total, we have $\frac{n(n+1)}{2}$ operations.

Still, time complexity in Big O notation is $O(n^2)$ (focusing on highest order term, as lower-order terms like (like `-1`) / constants (like $\frac{1}{2}$) become insignificant as $n$ grows very large.)

### In context:
There is a more efficient way of doing this (see below). However, this is already more efficient than the brute-force algorithm (a.k.a going through every single option and checking it). 
- This is because after calculating the sum of subarray say, from $i$ to $k$, we do not have to do this again for the subarray from $i$ to $k+1$. 
- Instead, we are 'saving on operations' by storing the sum of the subarray from $i$ to $k$, setting it as the new `max_sum` if it is, and then adding the next element to the current sum. 

## Alternative 1: higher time complexity (brute-force)

Here, we loop over all possible contiguous sub-arrays and calculate the sum.

In [None]:
def largest_sum_contiguous_subarray_bruteforce(array):
    # initialise value for largest sum
    max_sum = 0
    
    # loop through start points of the array
    for i in range(len(array)):
        # loop through end points of the array
        for j in range(i, len(array)):
            # loop through the entire subarray (starting with i and ending with j)
            sum_subarray = 0
            for k in range(i, j + 1):
                sum_subarray += array[k]
            
            if max_sum < sum_subarray:
                # update max_sum if the sum of the current subarray is greater than the stored global sum
                max_sum = sum_subarray
            
    return max_sum

In [10]:
array_1 = [2]
array_2 = [-2, 3]
array_3 = [-2, 3, -2, 4, -1, -2]

print(largest_sum_contiguous_subarray_bruteforce(array_1))
print(largest_sum_contiguous_subarray_bruteforce(array_2))
print(largest_sum_contiguous_subarray_bruteforce(array_3))

2
3
5


### Complexity

To analyse the the time complexity of the algorithm, we again break it down into its loops: (there are 3 loops; outer, middle, inner)

* **First two loops**: same as before, giving us $\frac{n(n+1)}{2}$ operations.
* **Inner most loop**: in the worst case (when `i=0` and `j=n-1`), this runs for `n` iterations .
    * NB: within the inner-most loop: constant time operations ($O(1)$ - does not depend on size of array)


So this is worse: $O(n^3)$!

## Alternative 2: lower time complexity

Now we improve the algorithm to have an algorithm that has time complexity $O(n)$ instead of $O(n^2)$.

The idea: At each position $i$, *what's the best sum we can get for a subarray that ends exactly at $i$?*

To do this: we create a helper array, `sum_arr`, where: 

<div align="center">
sum_arr[i] = the best sum of a subarray that ends at index `i`
</div>

Take array = [3, -2, 5, -1]:
1. index 0: → Only one choice: the subarray $[3]$. 
2. index 1: $[-1]$ → is it better to extend the previous subarray (which ended at 3) or start fresh with just -2?
    - Extending gives: `sum_arr[0] + array[1] = 3 + (-2) = 1`
    - Starting fresh: $-2$
    - Best choice: extend, so `sum_arr[1] = 1`
3. index 2: $[5]$ → again:
    - Extend: `sum_arr[1] + 5 = 1 + 5 = 6`
    - Start fresh: just 5
    - Best choice: extend, so `sum_arr[2] = 6`
etc

The importanat decision rule here is if the sum of the `i-1`th element is positive of not!

In [None]:
def largest_sum_contiguous_subarray_efficient(array):
    # initialise array where at each i we store the sum of the subarray ending at i
    sum_arr = [0] * len(array) 

    # the first element of sum_arr is just the first element of the array
    sum_arr[0] = array[0]

    # loop through end points
    for i in range(1, len(array)):
        # if the sum of the subarray ending at the previous element is greater than 0, we extend the subarray to include the current element
        if sum_arr[i - 1] > 0: # extend
            sum_arr[i] = sum_arr[i - 1] + array[i]
        # otherwise, the sum at end point i is just equal to the ith element of the array
        else:
            sum_arr[i] = array[i] # start afresh
      
    return max(sum_arr)

The max() part just takes the largest value - this is done in constant time, so we only have to go through the array once! $O(n)$

In [11]:
array_1 = [2]
array_2 = [-2, 3]
array_3 = [-2, 3, -2, 4, -1, -2]

print(largest_sum_contiguous_subarray_efficient(array_1))
print(largest_sum_contiguous_subarray_efficient(array_2))
print(largest_sum_contiguous_subarray_efficient(array_3))

2
3
5


### Complexity

**Time complexity**: Since we only have one loop that iterates through the length of the array and we have constant time operations (comparison, addition, assignment), we now have time complexity $O(n)$.

**Space complexity**: Why has space complexity increased from $O(1)$ to $O(n)$? Can we do better?

## Alternative 3: lower time & space complexity

Now we improve the algorithm even more, by also lowering the space complexity. 

In [None]:
def largest_sum_contiguous_subarray_more_efficient(array):
    # initialise two max values; global_max always stores the current largest sum, local_max stores the sum of the current subarray
    global_max, local_max = array[0], array[0]

    # loop through the end points
    for i in range(1, len(array)):
        # if the current local_max is greater than 0, we add the ith element of the array, so we extend the subarray by element i
        if local_max > 0:
            local_max += array[i]
        # else, we start with a new subarray at element i
        else:
            local_max = array[i]
        # whenever the sum of the current subarray is greater than the one that was previously stored, update the global max value
        if global_max < local_max:
            global_max = local_max
            
    return global_max

In [None]:
array_1 = [2]
array_2 = [-2, 3]
array_3 = [-2, 3, -2, 4, -1, -2]

print(largest_sum_contiguous_subarray_more_efficient(array_1))
print(largest_sum_contiguous_subarray_more_efficient(array_2))
print(largest_sum_contiguous_subarray_more_efficient(array_3))

### Complexity

**Time complexity**: Here we are only looping through the input array of length `n` once. As we do so we have constant time operations (comparison, addition, assignment), so the time complexity remains $O(n)$.

**Space complexity**: The space that is used as we are looping through the input array now does not depend on it's size anymore, so it is constant ($O(1)$).

# Introducing `Plotly` (for visualisation with Flask apps)  <a class="anchor" id="plotly"></a>

Plotly is an interactive plotting library - you can include things like buttons, sliders, dropdowns. It can be easily integrated with flask. Let's first create a simple bar chart as an example (example taken from [here](https://towardsdatascience.com/web-visualization-with-plotly-and-flask-3660abf9c946)). You can either add this dashboard page to your own personal flask app (for which you have created your about page) or you can clone [this](https://github.com/henrycgbaker/my_flask_app_25) repo.

### Preparatory setup

First you need to get the correct **python modules and packages** in place:
- Clone the repo
- Add a `config.py` file (inside the `instance` folder, that you need to create inside the project root directory). 
    - The `config.py` file only needs to contain one line of code with a secret key (`SECRET_KEY='some password'`). 
- Make sure we have the correct packages installed in our flask environment:
    - `flask-wtf`
    - `plotly`
    - `pandas`

Next, you need to **initialise the database** in a python shell:
- in the command line run `python` while in the `my_flask_app` project root directory and with your flask environment activated
- next, run the following (I recommend running it line by line to get a sense of what we're doing here):

```python
from flaskapp import db
from flaskapp.models import User
db.create_all()
user = User(name='Default user')
db.session.add(user)
db.session.commit()
```

**Explanation of code above**
- line 1: imports `db`: this is an instance of SQLAlchemy from our `flaskapp/__init__.py` module. This instance gives us access to database-related functionality, including table creation and querying.
- line 2: imports the `User` model from `flaskapp/models.py`.
- line 3: tells SQLAlchemy to create all missing tables based on the models defined in `models.py`.
    - If the tables already exist, this does nothing (it does not override existing tables).
    - If this is the first time running it, it sets up an empty database with tables corresponding to all models.
- line 4: creates a new `User` object in Python, *NB: it is not yet saved to the database*.
    - The `User` constructor assigns "Default user" to the name column.
    - At this stage, user is just an object in memory.
- line 5: tells SQLAlchemy that we want to add `user` to the database session. *NB: However, the change is not yet saved permanently*.
- line 6: permanently saves the new `user` to the database.
    - All pending changes in the session (such as adding new users or modifying existing records) are committed to the database.
    - After this, the `user` object is now officially stored in the database.

Now you are ready to start working with `plotly`!

## Steps: changes to you flask app to incorporate `plotly`

### 1. Import packages

At the top of your `routes.py` file, import the necessary packages as follow:

```python
import pandas as pd
import json
import plotly
import plotly.express as px
```
### 2. Routing (`@app.route('/dashboard')`)

- We will create a new route to your `routes.py` file, that will display the dashboard html, when the user goes to `/dashboard`. 
    - *(This is where we will display the dynamic plotly graphs later, which will update for user page loads)*. 
- Within the `dashboard()` method, we will create the dataframe and create a plot with [plotly express](https://plotly.com/python/plotly-express/) (plotly module for the most common figures, e.g. bar charts) that is assigned to the variable `fig`. 
    - *NB: To include plotly figures in flask, we need to convert the plot to something called JSON format format using the `json.dumps()` and the JSON encoder that comes with Plotly.* 
    - *We do this so that the flask app can use the plotly javascript library to render the plot properly.*

```python
@app.route('/dashboard')
def dashboard():
    df = pd.DataFrame({
        'Fruit': ['Apples', 'Oranges', 'Bananas', 'Apples', 'Oranges',
                'Bananas'],
        'Amount': [4, 1, 2, 2, 4, 5],
        'City': ['Berlin', 'Berlin', 'Berlin', 'Munich', 'Munich', 'Munich']
    })
    fig = px.bar(df, x='Fruit', y='Amount', color='City',
                barmode='group')
    graphJSON = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
    return render_template('dashboard.html', title='My plot', graphJSON=graphJSON)
```
**Explanation of code above**
- create a pandas dataframe object: `df`
- convert it into a plotly object: `fig`
- convert it into a JSON format: `graphJSON`
- pass it to an HTML template...(so we're going to need an associated HTML template!)

### 3. Associated HTML template

Finally, we will create a new html template in our templates folder, called `dashboard.html` and add the following content. 

```html
{% extends "layout.html" %}
{% block content %}
<h1>{{ title }}</h1>
<div id='chart' class='chart'></div>
<script src='https://cdn.plot.ly/plotly-latest.min.js'></script>
<script type='text/javascript'>
    var graphs = {{graphJSON | safe}};
    Plotly.plot('chart',graphs,{});
</script>
{% endblock %}
```
- First, our template inherits from our base template `layout.html`.
- Then, we include our bar chart by specificing a `div` tag:
    - `<div>` is a placeholder where the Plotly chart will be rendered.
    - `id='chart'`references this element in JavaScript.
    - `class='chart'` allows for additional styling using CSS if needed.
- Finally, the below `script` tags include some Javascript into our code: 
    - *NB: you are not expected to know JavaScript, but I've included the following by way of explanation for thos interested*
    - The first script tag: loads the Plotly JavaScript library from a CDN.
    - The second script tag: 
        - The `graphJSON` variable from our routing module (which contains JSON code from our bar chart) is injected into the template and saved as a variable called `graphs`. This contains the actual JSON data we need for the Plotly chart.
        - *NB: The `| safe` filter just ensures the JSON data is rendered as raw JavaScript, preventing escaping issues.*
        - We then call the plotting method from the plotly library to display the chart. `Plotly.plot('chart', graphs, {})`; initializes the chart by:
            1. Selecting the `<div>` with `id="chart"`.
            2. Using `graphs` as the data source.
            3. Passing an empty object `{}` for layout settings

# Toy case: Creating a dashboard to track page loads <a class="anchor" id="dashboard"></a>

We now create a dashboard for tracking the data for page loads (code adapted from [here](https://python.plainenglish.io/track-website-usage-with-postgresql-and-flask-53f583249911)).

### 1. Define DB Models

First, we create two new database models (a.k.a tables). 
1. One will have the total number of page views per day 
2. The other will save the users' IP addresses and the dates on which they visited. 

In your `models.py` file, add:

```python
class Day(db.Model):
    id = db.Column(db.Date, primary_key=True)
    views = db.Column(db.Integer)

    def __repr__(self):
        return f"Day('{self.id}', '{self.views}')"


class IpView(db.Model):
    ip = db.Column(db.String(20), primary_key=True)
    date_id = db.Column(db.Date, db.ForeignKey('day.id'), primary_key=True)

    def __repr__(self):
        return f"IpView('{self.ip}', '{self.date_id}')"
```

### 2. Initialise DBs

Now, we create our database structure. To do this:
- open a Python shell from our command line (while in the correct directory): `python`
- run the below code

```python
from flaskapp import db
from flaskapp.models import User
db.create_all()
user = User(name='Default user')
db.session.add(user)
db.session.commit()
```

### 3. Routing (`@app.before_request`)

Now, we need to create a path in our `routes.py` file that saves data to these tables, the moment a user loads a page. 
- This uses the decorator `@app.before_request`, which tells Flask to run this function automatically before handling any request to any route.
- Flask has a request lifecycle—a series of hooks it goes through when processing a request.
    1. Receive a request (e.g. user opens page)
    2. Run all `before_request` functions
    3. Match and run the appropriate route (e.g., `@app.route('/dashboard_2'`)
    4. Send response to browser
    5. Run any `after_request` functions

*NB: at the top, you need to import some more methods as follows:*

```python
from flask import render_template, flash, redirect, url_for, request
from flaskapp import app, db
from flaskapp.models import BlogPost, IpView, Day
from flaskapp.forms import PostForm
import datetime
```
Now, we add the following routing in:

```python
@app.before_request
def before_request_func():
    day_id = datetime.date.today()   # get our day_id, which is the date string in the format "yyyy-mm-dd"
    client_ip = request.remote_addr  # get the ip address of where the client request came from

    query = Day.query.filter_by(id=day_id)  # try to get the row associated to the current day
    if query.count() > 0:
        # the current day is already in table, simply increment its views
        current_day = query.first()
        current_day.views += 1
    else:
        # the current day does not exist, it's the first view for the day.
        current_day = Day(id=day_id, views=1)
        db.session.add(current_day)  # insert a new day into the day table

    query = IpView.query.filter_by(ip=client_ip, date_id=day_id)
    if query.count() == 0:  # check if it's the first time a viewer from this ip address is viewing the website
        ip_view = IpView(ip=client_ip, date_id=day_id)
        db.session.add(ip_view)  # insert into the ip_view table

    db.session.commit()  # commit all the changes to the database
```

### 4. Routing (`@app.route('/dashboard')`)

We also *temporarily* update our dashboard route, to display the data that is currently in the `day` table. 

Remember that you can (/should) also add the dashboard to your navigation bar by updating the `layout.html` file: `<li><a href="{{ url_for('dashboard_2') }}">Second Dashboard</a></li>`

```python
# Route to the dashboard page
@app.route('/dashboard_2')
def dashboard_2():
    days = Day.query.all()
    return [{'Date': day.id, 'Page views': day.views} for day in days]
```
If you run your app now, you will be presented with an (ugly) print out of the raw data (date and page views).

### 5. Synthetic data

Let's add some fake page views to our database by opening your python shell and manually adding some data to the database.

First we need to make sure we have cleared any existing records (in case there's anything already existing). In the terminal:

```python
from flaskapp import app, db
from flaskapp.models import IpView, Day  

db.session.query(IpView).delete()
db.session.query(Day).delete()
db.session.commit()
```
Now we can go ahead and add our synthetic data:

```python
from flaskapp import db
from flaskapp.models import User, Day, IpView
import datetime
day1 = Day(id=datetime.date(2025,3,1),views=10)
day2 = Day(id=datetime.date(2025,3,5),views=12)
day3 = Day(id=datetime.date(2025,3,7),views=13)
day4 = Day(id=datetime.date(2025,3,14),views=16)
day5 = Day(id=datetime.date(2025,3,20),views=17)
day6 = Day(id=datetime.date(2025,3,24),views=13)
db.session.add(day1)
db.session.add(day2)
db.session.add(day3)
db.session.add(day4)
db.session.add(day5)
db.session.add(day6)
ipview1 = IpView(ip='127.0.0.1', date_id=day1.id)
ipview2 = IpView(ip='127.0.0.1', date_id=day2.id)
ipview3 = IpView(ip='127.0.0.1', date_id=day3.id)
ipview4 = IpView(ip='127.0.0.1', date_id=day4.id)
ipview5 = IpView(ip='127.0.0.1', date_id=day5.id)
ipview6 = IpView(ip='127.0.0.1', date_id=day6.id)
db.session.add(ipview1)
db.session.add(ipview2)
db.session.add(ipview3)
db.session.add(ipview4)
db.session.add(ipview5)
db.session.add(ipview6)
db.session.commit()
```

### 6. Routing (`@app.route('/dashboard')`)

Now, we will update our dashboard route again, to display the page load data as a (less ugly) plotly bar chart:

```python
# Route to the dashboard page
@app.route('/dashboard_2')
def dashboard_2():
    days = Day.query.all()
    df = pd.DataFrame([{'Date': day.id, 'Page views': day.views} for day in days])

    fig = px.bar(df, x='Date', y='Page views')

    graphJSON = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
    return render_template('dashboard.html', title='Page views per day', graphJSON=graphJSON)
```

# Exercises <a class="anchor" id="exercises"></a>

### Exercise 1

Implement the page with the plotly bar chart from the [first example](#plotly).

### Exercise 2

Implement the dashboard for tracking page loads from the [second example](#dashboard).

### Exercise 3

Create another page for your website, that gives you information about your blog posts in table format. Specifically, the table should include at least two columns: the date and the number of posts that were published on that date. 

Hint: You'll have to create another template file (e.g. `blog_dashboard.html`) and create an HTML table. You'll first have to research how to create tables in HTML. In your `routes.py` file, you'll need a new route which passes metadata about the blogs to the new template file.