## Data Structures and Algorithms

- [LaTex](https://www.malinc.se/math/latex/basiccodeen.php)

- [Python Collection Module](https://docs.python.org/3/library/collections.html)

- [Python Collection's Counter](https://docs.python.org/3/library/collections.html#collections.Counter)

Before we start the course, we will discuss some of the basics

- Introduction to Big-O complexity 
- Introduction to recursion 

Before we talk about Big-O, it is important that we first understand what exactly an "algorithm" is.

An algorithm can be seen as a recipe for a computer to follow. It's a set of instructions that a computer will follow step-by-step to solve a problem. An algorithm takes an inpute and produce an output. 

For example, let's say  you had a non-empty array of positive integers called `nums`, and you wanted to answer the question: "what is the largest number in `nums` ?".

- To answer this question, you would write an algorithm that takes an array called `nums` as **input** and **outputs** the largest number in `nums`. Here is an example of such an algorithm:

1) Create a variable `maxNum` and initialize it to `0`.
2) Iterate over each element `num` in `nums`.
3) If `num` is greater than `maxNum`, update `maxNum = num`.
4) Output `maxNum`. 

Here, we have written down a set of instructions that when followed, will solve the problem. We can now implement these instructions in code so that a computer can quickly solve the problem. There are some important requirements for algorithms:

* Algorithms should be **deterministic**. Given the same input, the algorithm should **always** produce the same **output**. Basically, there shouldn't be any randomness.
* The algorithm should be correct for any arbitrary valid **input**. In our example, we said that `nums` is a non-empty array of positive integers. There are infinitely many of such arrays, and our algorithm works for **all** of them. Note that if `nums` had negative numbers, the input would be invalid since we stated the integers are positive. In fact, our algorithm would actually break because we initialized `maxNum` to 0, so if all of `nums` was negative, we would incorrectly output `0`.

<p>

---

**Big-O**

Big-O is a notation used to describe the computational complexity of an algorithm. The computational complexity is split into two parts: 

1) Time complexity - the amount of time the algorithms needs to run relative to input size
2) Space complaxity - the amount of memory allocated by the algorithm relative to input size


**Typically, people care about the time complexity more than the space complexity, but both are important to know**.

<p>

There are some common assumptions that we make. Wehen dealing with integers, the larger t he integer, the more time operations like addition, multiplication or printing will take. While this **is** relevant in theory, we typically ignore this fact because the difference is practically very small, and treat all integers the same. If you are given an array of integers as an input, the only variable you would ues is _n_ to denote the length of the array. Technically, you could introduce another variable, let's say _k_ which denotes the average value of the integers in the array. However, nobody does this. 

<p>

Here are some example of complexities:

* $O(n)$
* $O(n^{2})$
* $O(2^{n})$
* $O(log n)$
* $O(n.m)$

<p>

You might be thinking, what is _m_ ? Remember: we define the variables. As these are simple examples with no associated problem, _m_ could denote any arbitrary variable. For example, we could have a problem where the input is two arrays. _n_ could denote the length of one while _m_ denotes the length of the other.

<p>

**Calculating complexity**

Using the above example (find the largest number in `nuums`), we have a time complexity of **O(n)**. The algorithm involves iterating over each elements in `nums`, so if we define _n_ as the length of `nums`, ou algorithm uses approximately _n_ steps. If we pass an array with a length of `10`, it will perform approximately `10` steps. If we pass an array with a length of `10,000,000,000`, it will perform approximately `10,000,000,000` steps. 

**NOTE:**
- Being able to analyze an algorithm and calculate it's time and space complexity is a crucial skill. Interviewers will **almost always** ask you for your algorithm's complexity to check that you actually understand your algorithm and didn't just memorize/copy the code. Being able to analyze an algorithm also enables you to determine what parts of it can be improved. 

---

**Rules**

There are a few rules when it comes to calculating complexity. First, **we ignore constants**. That means $O(9999999n) = O(8n) = O(n) = O(\frac{n}{500})$. Why do we do this? Imagine you had two algorithms. Algorithm A uses approximately _n_ operations and algorithm B uses approximately _5n_ operations. 

<p>

When _n_ = 100, algorithm A uses 100 operations and algorithm B uses 500 operations. What happens if we double _n_ ? Then algorithm A uses 200 operations and algorithm B uses 1000 operations. As you can see, When we double the value of _n_, both algorithms require double the amount of operations. If we were to _10x_ the value of _n_, then both algorithms would require _10x_ more operations.

<p>

Remember: the point of complexity is to analyze the algorithm **as the input changes**. We don't care that algorithm B is _5x_ slower than algorithm A. For both algorithms, as the input size increases, the number of operations required increases **linearly**. That's what we care about. Thus, both algorithms are **O(n)**.

<p>

The second rule is that we consider the complexity as the variables **tend to infinity**. When we have additions/subtraction between terms of the **same variable, we ignore all terms except the most powerful one**.

For example, $O(2^{n} + n^{2} - 500) = O(2^{n})$. Why? Because as _n_ tends to infinity, $2^{n}$ becomes so large that the other two terms are effectively zero in comparison. 

Let's say that we had an algorithm that required _n_ + 500 operations. It has a time complexity of $O(n)$. When _n_ is small, let's say n = 5, the +500 term is very significant - but we don't care about that. We need to perform the analysis as if _n_ is tending toward infinity, and in that scenario, the 500 is nothing.

<p>

**NOTE:**
* The best complexity possible is **O(1)**, called "constant time" or "constant space". it means that the algorithm ALWAYS uses the same amount of resources, regardless of the input.

Note that a constant time complexity doesn't neccessarily mean that an algorithm is fast $(O(5000000) = O(1))$, it just means that it's runtime is independent of the input size. 

<p>

When talking about complexity, there are normally three cases:

* Best case scenario 
* Average case 
* Worst case scenario 

<p>

In most algorithms, all three of these will be equal, but some algorithms will have them differ. If you have to choose only one to represent the algorithm's time or space complexity, never choose the best case scenario. It is most correct to use the worst case scenario, but you should be able to talk about the difference between the cases. 

<p>

---




In [7]:
def maximumNumber(nums):
    """
    : create a variable `maxNum` and initialize it to `0`
    : Iterate over each element `num` in `nums` 
    : If `num` is greater than `maxNum`, update `maxNum = num`
    : Output `maxNum`
    """
    maxNum = 0

    for num in nums:
        if num > maxNum:
            maxNum = num 

    return maxNum

In [8]:
#test 
nums = [2, 5, 20, 50, 200, 120, 180]

sol = maximumNumber(nums)
print(sol)


200


### Analyzing time complexity 

Let's look at some example algorithms in pseudo-code and talk about their time complexities.

```
// Given an integer array "arr" with length n,

for (int num: arr) {
    print(num)
}

```

This algorithms has a time complexity of $O(n)$. In each for loop iteration, we are performing a print, which costs $O(1)$. The for loop iterates $n$ times, which gives a time complexity of $O(1.n)= O(n)$.

```
// Given an integer array "arr" with length n,

for (int num: arr) {
    for (int i = 0; i < 500,000; i++) {
        print(num)
    }
}
```
This algorithm has a time complexity of $O(n)$. In each inner for loop iteration, we are performing a print, which costs $O(1)$. This for loop iterates 500,000 times, which means each outer for loop iteration costs $O(500000) = O(1)$. The outer for loop iterates $n$ times, which gives a time complexity of $O(n)$.

<p>

Even though the first two algorithms technically have the same time complexity, in reality the second algorithm is **much** slower than the first one. It's correct to say that the time complexity is $O(n)$, but it's important to be able to discuss the difference between practicality and theory. 

```
// Given an integer array "arr" with length n,

for (int num: arr) {
    for (int num2: arr) {
        print(num * num2)
    }
} 
```
This algorithm has a time complexity of $O(n^{2})$. In each inner for loop iteration, we are performing a multiplication and print, which cost both cost $O(1)$. The inner for loop runs $n$ times, which means each outer for loop iteration costs $O(n)$. The outer for loop runs $O(n)$ times, which gives a time complexity of $O(n.n) = O(n^{2})$.

<p>

```
// Given integer arrays "arr" with length n and "arr2" with length m,

for (int num: arr) {
    print(num)
}
for (int num: arr) {
    print(num)
}
for (int num: arr2) {
    print(num)
}
```

This algorithm has a time complexity of $O(n+m)$. The first two for loops both cost $O(n)$, whereas the final for loop costs $O(m)$. This gives a time complexity of $O(2n+m) = O(n+m)$.

```
// Given an integer array "arr" with length n,

for (int i = 0; i < arr.length; i++) {
    for (int j = i; j < arr.length; j++) {
        print(arr[i] + arr[j])
    }
}
```
This algorithm has a time complexity of $O(n^{2})$. The inner for loop is dependent on what iteration the outer for loop is currently on. The first time the inner for loop is run it runs $n$ times. The second time, it runs $n-1$ times, then $n-2$, $n-3$, and so on.

<p>

That means the total iterations is $1 + 2 + 3 + 4 + ... + n$, which is the partial sum of [this series](https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums), which is equal to $\frac{n.(n+1)}{2} = \frac{n^{2}+n}{2}$. In big-O, this is $O(n^{2})$ because the addition term in the numerator and the constant term in the denominator are both ignored.

<p>

### Logarithm time

A logarithm is the inverse operation to exponents. The time complexity $O(logn)$ is called logarithmic time and is **extremely fast**. A common time complexity is $O(n.logn)$, which is reasonably fast for most problems and also the time complexity of efficient sorting algorithms. 

Typically, the base of the logarithm will be `2`. This means that if your input is size $n$, then the algorithm will perform $x$ operations, where $2^{x} = n$. However, the base of the logarithm [doesn't actually matter](https://stackoverflow.com/questions/1569702/is-big-ologn-log-base-e/1569710#1569710) for big $O$, since all algorithms are related by a constant factor.

$O(logn)$ means that somewhere in your algorithm, the input is being reduced by a percentatge at every step. A good example of this is binary search, which is a searching algorithm that runs in $O(logn)$ time (there is a chapter dedicated to binary search later on). With binary search, we initially consider the entire input ($n$ elements). After the first step, we only consider $n/2$ elements. After the second step, we only consider $n/4$ elements, and so on. At each step, we are reducing our search space by `50%`, which gives us a logarithmic time complexity.

<p>

### Analyzing space complexity

When you initialize variables like arrays or strings, your algorithm is allocating memory. We never count the space used by the input (it is bad practice to modify the input), and usually don't count the space used by the output (the answer) unless an interviewer asks us to.

```
In the below examples, the code is only allocating memory so that we can analyze the space complexity, so we will consider everything we allocate as part of the space complexity (there is no "answer").
```

```
// Given an integer array "arr" with length n
for (int num: arr) {
    print(num)
}
```
This algorithm has a space complexity of $O(1)$. The only space allocated is an integer variable `num`, which is constant relative to $n$.

```
// Given an integer array "arr" with length n
array doubleNums = int[]
for (int num: arr) {
    doubleNums.add(num * 2)
}
```
This algorithm has a space complexity of $O(n)$. The array `doubleNums` stores $n$ integers at the end of the algorithm.

```
// Given an integer array "arr" with length n
array nums = int[]
int oneHundredth = n / 100

for (int i = 0; i < oneHundredth; i++) {
    nums.add(arr[i])
}
```
This is algorithm has a space complexity of $O(n)$. The array `nums` stores the first 1% of numbers in `arr`. This gives a space complexity of $O(\frac{n}{100}) = O(n)$.

```
// Given integer arrays "arr" with length n and "arr2"
Array grid = int[n][m]
for (int i = 0; i < arr.length; i++) {
    for (int j = 0; j < arr2.length; j++) {
        grid[i][j] = arr[i] * arr2[j]
    }
}
```
This algorithm has a space complexity of $O(n.m)$. We are creating a `grid` that has dimensions $n.m$.

**NOTE:**
* In this course, we will talk extensively about time and space complexity. If it's a new concept to you, don't worry - with practice, you will become more and more comfortable with analyzing algorithms on your own. 

## Introduction to recursion 

* Recursion is a problem solving method. In code, recursion is implemented using a function that calls itself. 

The opposite of a recursive algorithm would be an **iterative algorithm**. There [is a branch](https://en.wikipedia.org/wiki/Computability_theory) of study that proves that any iterative algorithm can be written recursively. While iterative algorithms use `for loops and while loops` to simukate repetition, recursive algorithms use function calls to simulate the same logic. 

Let's say we wanted to print the number from 1 to 10. Here's some pseudocode for an iterative algorithm:

```
for (int i = 1; i < 10; i++) {
    print(i)
}
```

In [2]:
#print 1 to 10, starting from 1
for i in range(1, 11):
    print(i)

1
2
3
4
5
6
7
8
9
10


Here's some pseudocode for an equivalent recursive algorithm:

```
function fn(i):
    print(i)
    fn(i + 1)
    return 

fn(1)
```

**NOTE:** This function will run indefinitely as there is no base code to terminate the function. 

* Each call to `fn` first prints `i` (which starts at 1), and then calls `fn` again but incrementing `i` (to print the next number).

`The first function call prints 1, then calls fn(2). In fn(2), we print 2, then call fn(3), and so on.`

<p>

However, this code is actually wrong. Do you see the problem? The functio calls will never stop! Running this code would print natural numbers (positive integers) infinitely (or until the computer explode). The `return` line never gets reached because `fn(i + 1)` comes before it.


In [4]:
def recursive_fun(i):
    print(i)
    recursive_fun(i + 1)
    return 

#recursive_fun(1)
    

To optimize the function and stop it from running indefinitely and stack overflow. we can implement a break when it reaches a base case.

We need what is called a **base case** to make the recursion stop. Base cases are conditions at the start of recursive functions that terminate the calls.

In [8]:
def recursive_func(i):
    if i > 10:
        return 
    print(i)
    recursive_func(i + 1)
    return 

recursive_func(1)

1
2
3
4
5
6
7
8
9
10


After we call `fn(10)`, we print `10` and call `fn(11)`. In the `fn(11)` call, we trigger the base case and return. So now we are back in the call to `fn(10)` and move to the next line, which is the return statement. This makes us return back to the `fn(9)` call and so on, until we eventually return from the `fn(1)` call and the algorithm terminates. 

An important thing to understand about recursion is the **order** in which the code runs - the order in which the computer executes instructions. With an iterative program, it's easy - start at the top, and go line by line. With recursion, it can get confusing because calls can cascade on top of each other. Let's print numbers again, but this time only up to 3. Let's also add another print statement and number the lines:

In [11]:
def fn(i):
    if i > 3:
        return 
    
    print(i)
    fn(i + 1)
    print(f"End of call where i = {i}")
    return 

fn(1)

1
2
3
End of call where i = 3
End of call where i = 2
End of call where i = 1


As you can see, the line where we print text is executed in reverse order. The original call `fn(1)` first prints `1`, then calls to `fn(2)`, which prints `2`, then calls to `fn(3)`, which prints `3`, then calls to `fn(4)`. **Now, this the important part:** how recursion "moves" back "up". `fn(4)` triggers the base case, which returns. We are now back in the function call where `i = 3` and line **4** has finished, so we move to line **5** which prints `End of call where i = 3`. Once that line runs, we move to the next line, which is a `return`. Now, we are back in the function call where `i = 2` and **line 4** line has finshed, so again we move to the next line and print `End of call where i = 2`. This repeats until the original function call to `fn(1)` returns. 

<p>

Note that each function call also has its own scope. So in the example above, when we call `f(3)`, there are 3 "versions" of `i` simultaneously. The first call has `i = 1`, the second call has `i = 2`, and the third call has `i = 3`. Let's say that we were to do `i += 1` in the `f(3)` call. Then `i` becomes `4`, but **only** in the `f(3)` call. The other 2 "versions" of `i` are unaffected because they are in different scopes. 

<p>

### Breaking problems down

This printing example is pretty pointless - it's easier to use a for loop if you just want to print numbers. Where recursion shines is when you use it to break down a problem into "subproblems", whose solutions can then be combined to solve the original problem. 

Let's look at the [Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_sequence). The Fibonacci numbers are a sequence of numbers starting with `0, 1`. Then, each number is defined as teh sum of the previous two numbers. The first few Fibonacci numbers are `0, 1, 1, 2, 3, 5, 8`. More formally we have 

$F_{n} = F_{n-1} + F_{n-2}$

This is called a **recurrence relation** - it's an equation that connected the terms together. 

Let's use a pseudocode to write a function `F(n)` that returns the $n^{th}$ Fibonacci number (0 indexed). `Don't forget we need base cases with any recursive function`. In this case, the base cases are explicitly defined: `F(0) = 0`, and `F(1) = 1` 

In [17]:
def fib(n):
    if n <= 1:
        return n 
    
    oneback = fib(n - 1)
    twoback = fib(n - 2)

    return oneback + twoback

fib(3)

2

Let's sat that we wanted to find `F(3)`. Upon calling `F(3)`, we would see the following flow, with each indentation level representing a function call's scope:

```
oneBack = fib(2)
    oneBack = fib(1)
        fib(1) = 1
    twoBack = fib(0)
        fib(0) = 0
    fib(2) = oneBack + twoBack = 1
twoBack = fib(1)
    fib(1) = 1
fib(3) = oneBack + twoBack = 2
```

* As you can see, we took the original problem `fib(3)`, and broke it down into two smaller subproblems - `F(2)` and `F(1)`. By combining the recurrence relation and base cases, we can solve the subproblems and use those solutions to solve the original problem. 

This is the most common use of recursion - you have your recursive function **return the answer to the problem you're trying to solve for a given input**. In this example, the problem we're trying to solve for a given input is "What is the $n^{th}$ Fibonacci number ?" As such, we designed our function to return a Fibonacci number, according to the input $n$. By determining the base cases and a recurrence relation, we can easily implement the function. 

<p>

By following this idea, solving the subproblems is easy - if we wanted the 100th Fibonacci number, we know by definition that it is the sum of the 99th and 98th Fibonacci number. On the function call to `F(100)`, we know that calling `F(99)` and `F(98)` will give us those numbers.

In [22]:
def fibonacci(n, memo={}):
    """
    :optimized Fibonnaci function
    """
    if n <= 1:
        return n
    
    if n not in memo:
        memo[n] = fibonacci(n - 1, memo) + fibonacci(n - 2, memo)

    return memo[n]

# Test
fibonacci_number = fibonacci(100)
print(f"The 100th Fibonacci number is: {fibonacci_number}")

The 100th Fibonacci number is: 354224848179261915075


## Arrays and strings
In terms of algorithm problems, arrays (1D) and strings are very similar: they both represent an ordered group of elements. Most algorithm problems will include either an array or string as part of the input, so it's important to be comfortable with  the basic operations and learn the most common patterns.

"Array" can mean something different between languages. For example, Python primarily uses "lists" instead of arrays which are extremely lenient. Initialization is as easy as `arr = []`, and you don't need to worry about the type of data you store in the list or the size of the list. Other languages like C++ require you to specify the size and data type of the array during initialization, but also have support for lists (like `std::vector` in C++).

<p>

* Technically, an array can't be resized. A dynamic array, or list can be. In the context of algorithm problem, usually when people talk about arrays, they are referring to dynamic arrays. 

Similary, strings are implemented differently between languages. In Python and Java, they are `immutable`. In C++ they are `mutable`. 

- **Mutable:** a type of data that can be changed
- **Immutable:** a type of data that cannot be changed.
 
If you want to change something **immutable**, you will have to recreate the entire thing. 

Why should we care about somethinh being mutable or immutable? If you have array `arr = ["a", "b", "c"]` and an immutable string `s = "abc`, but you want to instead represent `abd`, you can easily do:

In [35]:
arr = ["a", "b", "c"]
arr[2] = "d" # replace "c" with "d"
arr

['a', 'b', 'd']

But you cannot do:

In [28]:
s = 'abc'
s[2] = 'd'

TypeError: 'str' object does not support item assignment

In [44]:
print(dir(s))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [45]:
help(s.endswith)


Help on built-in function endswith:

endswith(...) method of builtins.str instance
    S.endswith(suffix[, start[, end]]) -> bool
    
    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.



In [47]:
help(s.startswith)

Help on built-in function startswith:

startswith(...) method of builtins.str instance
    S.startswith(prefix[, start[, end]]) -> bool
    
    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.



In [49]:
methods = [method for method in dir(str) if callable(getattr(str, method))]
print(methods)


['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [39]:
arr.append('e')
print(dir(arr)) # see all available method for array/list 


['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [48]:
help(arr.remove)

Help on built-in function remove:

remove(value, /) method of builtins.list instance
    Remove first occurrence of value.
    
    Raises ValueError if the value is not present.



You would need to create `s` entirely from scratch. With such a small string, it's not a big deal. But sometimes you are dealing with strings with 100,000 characters, so creating new versions just to modify one character is very expensive $O(n)$, where $n$ is the size of the string.

<p>

As mentioned earlier, a majority of algorithm problems will involve an array or string. They are extremely versatile data structures and it's impossible to list all the relevant problem-solving techniques in one article. In the next few articles, we will go over the most common techniques.

- **But first, let's take a quick look at the complexity of array and string operations.**

## Two Pointers 

Two pointers is an extremely common technique used to solve array and string problems. It involves having two integer variables that both move along an iterable. This means we have two integers, usually named something like `i` and `j` or `left` and `right` which each represent an `index` of the array or string.


<p>

There are several ways to implement two pointers. To start, let's look at the following method: 

```
Start the pointer at the edges of the input. Move them towards each other util they meet. 
```

**Converting this idea into instructions:**
1) Start one pointer at the first index `0` and the other pointer at the last index `input.length - 1`.

2) Use a `while loop` until the pointers are equal to each other.

3) At each iteration of the loop, move the pointers towards each other. This means either increment the pointer that start at the first index, decrement the pointer that started at the last index,, or both. Deciding which pointers to move will depend on the problem we are trying to solve. 

**Here's some pseudocode illustrating the concept:**

```
function fn(arr):
    left = 0
    right = arr.length - 1

    while left < right:
        Do some logic here depending on the problem
        Do some more logic here to decide on one of the following:
        1. Left ++
        2. right --
        3. Both left ++ and right --
```

- The strength of this technique is that we will never have more than $O(n)$ iterations for the while loop because the pointers start $n$ away from each other and move at least one step closer in every iteration. Therefore, if we can keep the work inside each iteration at $O(1)$, this technique will result in a linear runtime, which is usually the best possible runtime. Let's look at some example:

<p>

### Example 1:
Given a string `s`, return `true` if it is a palindrome, `false` otherwise.

A string is a palindrome if it reads the same forward as backward. That means, after reversing it, it is still the same string. For example: "abcdcba" or "racecar".

After reversing a string, the first character becomes the last character. If a string is the same after being reversed, that means the first character is the same as the last character, the second character is the same as the second last character, and so on. We can use the two pointers technique here to check that all corresponding characters are equal. To start, we check the first and last characters using two separate pointers. To check the next pair of characters, we just need to move our pointers toward each other one position. We continue until the pointers meet each other or we find a mismatch.

- **NOTE:** We keep track of two indices: a left one, and a right one. In the beginning, the left index points to the first character, and the right index points to the last character. If these characters are not equal to each other, we know the string can't be a palindrome, so we return false. Otherwise, the string may be a palindrome; we need to check the next pair. To move on to the next pair, we move the left index forward by one, and the right index backward by one. Again, we check if the pair of characters are equal, and if they aren't, we return false.

- We continue this process until we either find a mismatch (in which case the string cannot be a palindrome, so we return false), or the pointers meet each other (which indicates we have gone through the entire string, checking all pairs). If we get through all pairs without a mismatch, we know the string is a palindrome, so we can return true.

- To run the algorithm until the pointers meet each other, we can use a while loop. Each iteration in the while loop checks one pair. If the check is successful, we increment `left` and decrement `right` to move to the next pair. If the check is unsuccessful, we return false.

### Example 2:
Given a **sorted** array of unique integers and a target integer, return `true` if there exists a pair of numbers that sum to target, `false` otherwise. This problem is similar to [Two Sum]. (In Two Sum, the input is not sorted).

For example, given nums = [1, 2, 4, 6, 8, 9, 14, 15] and target = 13, return true because 4 + 9 = 13.

In [59]:
def two_sum(nums, target):
    # Create a dictionary to store the numbers we have seen and their indices
    num_to_index = {}
    
    for index, num in enumerate(nums):
        # Calculate the complement of the current number
        complement = target - num
        
        # Check if the complement is in the dictionary
        if complement in num_to_index:
            print(f'The sum of {num} and {complement} is: {num + complement}')
            return (num_to_index[complement], index)
        
        # Store the current number in the dictionary
        num_to_index[num] = index
    
    # If no such pair is found
    return None

# Test function 
nums = [1, 2, 4, 6, 8, 9, 14, 15]
result = two_sum(nums, target=13)
print(result)

The sum of 9 and 4 is: 13
(2, 5)


The brute force solution would be to iterate over all pairs of integers. Each number in the array can be paired with another number, so this would result in a **time complexity** of $O(n^{2})$. 

- Because the array is sorted, we can use two pointers to improve to an $O(n)$ time complexity. To implement this algorithm, we use a similar process as in the previous `palindrome` example. We use a while a loop until the pointers meet each other. If at any point the sum is equal to the `target`, we can return true. If the pointers meet each other, it means we went through the entire input without finding `target`, so we return false. 


**Convert idea to instructions**
- Assuming an array: [1, 2, 4, 6, 8, 9, 14, 15] with a target sum `13`
1) With the two pointers, we start by looking at the first and last number
2) Their sum is `1 + 15  = 16`. Because `16 > target`, we neet to make our current sum smaller and therefore need to move the right pointer.
3) Now we have `1 + 14 = 15` - again this is greater than targer, we move the right pointer.
4) Now we have `1 + 9 = 10` - Since this two small, we need to move the left pointer
5) We have `2 + 9 = 11` which is smaller than the target. So we need to move the left pointer again
6) Finally we have `4 + 9 = 13` - which is equal to the target. 


The reason the algorithm works is because we have a sorted array - so moving left pointer permanently increases the value, similarly - moving the right pointer permanently decreases the value. 

In [65]:
def check_for_target(num: list, target: int) -> int:
    left = 0 
    right = len(num) - 1

    while left < right:
        curr = num[left] + num[right] 
        if curr == target:
            print(f'The sum of {num[left]} and {num[right]} is {curr}, which is equal to target')
            return True 
        
        if curr > target:
            right -= 1
        else:
            left += 1
    
    return False

# Test function 
nums = [1, 2, 4, 6, 8, 9, 14, 15]
result = check_for_target(nums, target=13)
print(result)

The sum of 4 and 9 is 13, which is equal to target
True


### Another way to use two pointers 
This method where we start the pointers at the first and last indices and move them towards each other is only one way to implement two pointers. Algorithms are beautiful because of how abstract they are - "two pointers" is just an idea, and it can be implemented in many different ways. Let's look at another method and some new examples. The following method is capable when the problem has two iterables in the input, for example, two arrays. 

_Move along both input inputs simultaneously until all elements have been checked._

**Converting  this idea into instructions:**
1) Create two pointers, one for each iterable. Each pointer should start at the first index.
2) Use a while loop until one of the pointers reached the end of its iterable.
3) At each iteration of the loop, move the pointers forward. This means incrementing either one of the pointers or both of the pointers. Deciding which pointers to move will depend on the problem we are trying to solve. 
4) Because our while loop will stop when one of the pointers reaches the end, the other pointer will not be at the end of its respective iterable when the loop finishes. Sometimes, we need to iterate through all elements - if this is the case, you will need to write extra code here to make sure both iterables are exhausted. 

_Here's some pseudocode illustrating the concept:_

```
function fn(arr1, arr2):
    i = j = 0
    while i < arr1.length AND j < arr2.length:
    Do some logic here depending on the problem 
    Do some more logic here to decide on one of the following:
        1. i++
        2. j++
        3. Both i++ and j++

// Step 4: make sure both iterables are exhausted 
// Note that only one of these loops would run while i < arr1.length:
    Do some logic here depending on the problem i++

while j < arr2.length:
    Do some logic here depending on the problem j++
```

Similar to the first method we looked at, this method will have a linear time complexity of $O(n + m)$ if the work inside the while loop is $(1)$, where $n = arr1.length$ and $m = arr2.length$. This is because at every iteration, we move at least one pointer forward, and the pointer cannot be moved forward more than $n + m$ times without the arrays being exhausted. Let's look at some examples.


### Example 3: 

Given two sorted integer arrays arr1 and arr2, return a new array that combines both of them and it is also sorted. 


* The trivial approach would be to first combine both input arrays and then perform a sort. If we have $n = arr1.length + arr2.length$, then this gives a time complexity of $O(n⋅logn)$ (the cost of sorting). This would be a good approach if the input arrays were not sorted, but because they are sorted, we can take advantage of the two pointers technique to improve to $O(n)$.

* In the explanation prior to this example, we declared $n = arr1.length$ and $m = arr2.length$. Here, we are saying $n = arr1.length + arr2.length$. Why? Remember that when it comes to big $O$, we are allowed to define the variables as we see fit. We could certainly stick to using $n, m$. In that case, the time complexity of the sorting approach would be $O((n+m)⋅log(m+n))$ and the time complexity of the approach we are about to cover would be $O(n+m)$. It doesn't really make a difference, but one justification we could give here is that since we are combining the arrays, the total length is a significant number, so it makes sense to represent it as $n$.

* We can build the answer array as $ans$ one element at a time. Start two pointers at the first index of each array, and compare their elements. At each iteration, we have 2 values. Whichever value is lower needs to come first in the answer, so add it to the answer and move the respective pointer. 

**Explanation**
* Sorting an array of length $n$ costs $O(n. log n)$. We can improve the time complexity by a factor of $log n$ by taking advantage of the input arrays already being sorted. 

* If we start with the smallest number from each array, then whichever one is smaller must be before the other one - so we add it to the answer and move to the next number in that array. If the values are equal, it doesn't matter which one we choose - we can arbitrarily choose either. This process can be repeated until one of the arrays runs out of numbers.

* When this happens, we are still left with some numbers in the other array. These numbers are all larger than the largest number in the exhausted array. We should just append them to the answer.

In [None]:
def combine(arr1: list[int], arr2: list[int]) -> list[int]:
    ans = []
    i = j = 0

    while i < len(arr1) and j < len(arr2):
        if arr1[i] < arr2[j]:
            ans.append(arr1[i])
            i += 1
        else:
            ans.append(arr2[j])
            j += 1

    while i < len(arr1):
        ans.append(arr1[i])
        i += 1
    while j < len(arr2):
        ans.append(arr2[j])
        j += 1

    return ans

* Like in the previous two examples, this algorithm has a time complexity of $O(n)$ and uses $O(1)$ space (if we don't count the output as extra space, which we usually don't)

### Example 4: Is Subsequence 
Given two strings `s` and `t`, return true if `s` is a subsequence of `t` or false otherwise.

A subsequencce of a string is a sequence of characters that can be obtained by deleting some (or none) of the characters from the original string, while maintaining the relative order of the remaining characters. For example, "ace" is a subsequence of "abcde" while "aec" is not.

In this problem, we need to check if the characters of `s` appear in the same order in `t` with gaps allowed. For example, "ace" is a subsequence of "abcde" because "abcde" contains the letters "ace" in the same order - the fact that they aren't consecutive doesn't matter.

We can use two pointers to solve this in linear time. If we find that s[i] == t[j], that means we "found" the letter at position `i` for `s` and we can move on to the next one by incrementing `i`. We should increment `j` at each iteration no matter what (which means we could also implement this algorithm using a for loop). `s` is a subsequence of `t` if we can "find" all the letters of `s`, which means that `i == s.length` at the end of the algorithm. 

#### Further Explanation
</p>

For every character in `s`, we need to find a match in `t`. Let's say we have $s = "bc"$ and $t = "abcd"$. Using the two pointers technique, we start by looking at the first character in both strings.

We need to try and match the first character of s, which is $"b"$. The first character of $t$ is $"a"$, which is not a match. As such, we will move to the next character in $t$. We don't move forward in s just yet, because we still need to match the $"b"$. The next character of $t$ is $"b"$, and we have found $a$ match. Now, we can move on to the next character in $s$, which is the $"c"$. A character in $t$ can only be matched once, so we must also move forward in $t$. Now, we have another match since the next character in $t$ is also $"c"$.

We have managed to match all the characters in $s$, which means that $s$ is a subsequence of $t$.

As you can see, in both scenarios (match or mismatch), we move forward in $t$. In the match scenario, it's because we can't use a letter in $t$ multiple times. In the mismatch scenario, it's like we're discarding the character since it's not useful. We only move forward in s when we find a match, since our task is to match all characters in $s$.

In [20]:
def is_subsequence(str1, str2):
    is_substr = ''
    i = j = 0

    while i < len(str1) and j < len(str2):
        if str1[i] == str2[j]:
            print(f"string {str1[i]} is a match with string {str2[j]}")
            i += 1

        j += 1

    return i == len(str1) # we have a match only if all the characters have been exhausted and matches the lenght of str1


#Test Functions 
str1 = 'ace'
str2 = 'abcde'

func = is_subsequence(str1, str2)
print(func)


string a is a match with string a
string c is a match with string c
string e is a match with string e
True


* Just like all the prior examples, this solution uses $O(1)$ space. The time complexity is linear with  the length of `str1` and `str2`. 

In [18]:
methods = [method for method in dir(str) if callable(getattr(str, method))]
print(methods)

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [21]:
lst_methods = [method for method in dir(list) if callable(getattr(list, method))]

print(lst_methods)
help(lst_methods.append)
help(lst_methods.reverse)

['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.

Help on built-in function reverse:

reverse() method of builtins.list instance
    Reverse *IN PLACE*.



In [11]:
dict_methods = [method for method in dir(dict) if callable(getattr(dict, method))]
print(dict_methods)

['__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__ior__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__ror__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']


## Sliding Window
Like **two pointers**, sliding windows work the same with arrays and strings - the important thing is that they're iterables with ordered elements. For the sake of brevity, the first part of this article up until the examples will be focusing on **arrays**. However, all the logic is identical for strings. 

Sliding window is another common approach to solving problems related to arrays. A sliding window is actually implemented using **two pointers**! Before we start, we need to talk about the concept of a **subarray**. 

#### Subarrays
Givern an array, a **subarray** is a contiguous section of the array. All the elements must be adjacent to each other in the original array and in their original order. For example, with the array `[1, 2, 3, 4]`, the subarrays (grouped by length) are:

- [1], [2], [3], [4]
- [1, 2], [2, 3], [3, 4]
- [1, 2, 3], [2, 3, 4]
- [1, 2, 3, 4]

**A subarray can be defined by two indices, the start and end**. For example, with `[1, 2, 3, 4]`, the subarray `[2, 3]` has a starting index of `1` and an ending index of `2`. Let's call the starting index the **left bound** and the ending index the **right bound**. Another name for subarray in this context is `"window"`.

**When should we use sliding window?**

There is a very common group of problems involving subarrays that can be solved efficiently with sliding window. Let's talk about how to identify this problems.

- **first**, the problem will either explicitly or implicitly define criteria that make a subarray "valid". There are 2 components regarding what makes a subarray valid:

1) A constraint metric. This is some attribute of a subarray. It could be the sum, the number of unique elements, the frequency of a specific element, or any other attribute.

</p>

2) A numeric restriction on the constraint metric. This is what the constraint metric should be for a subarray to be considered valid.

**For example**, let's say a problem declares a subarray is valid if it has a sum less than or equal to `10`. The constraint metric here is the sum of the subarray, and the numeric restriction is `<= 10`. A subarray is considered valid if its constraint metric conforms to the numeric restriction, i.e. the sum is less than or equal to `10`.

- **Second**, the problem will ask you to find valid subarrays in some way.

1) The most common task you will see is finding the best valid subarray. The problem will define what makes a subarray better than another. For example, a problem might ask you to find the **longest** valid subarray. 

</p>

2) Another common task is finding the number of valid subarrays. We will take a look at this later in the article. 

```
Whenever a problem description talks about subarrays, you should figure out if sliding window is a good option by analyzing the problem description. If you can find the things mentioned above, then it's a good bet.
```
Here is a preview of some of the example problems that we will look at in this article, to help you better understand what sliding window problems look like:

* Find the longest subarray with a sum less than or equal to `k`
* Find the longest substing that has at most one `0`
* Find the number of the subarrays that have a product less than `k`

#### The algorithm

The idea behind a sliding window is to consider **only** valid subarrays. Recall that a subarray can be defined by a left bound (the index of the first element) and a right bound (the index of the last element). In sliding window, we maintain two variables `left` and `right`, which at any given time represent the **current subarray** under consideration.

Initially, we have `left = right = 0`, which means that the first subarray we look at is just the first element of the array on its own. We want to expand the size of our `"window"`, and we do that by incrementing `right`. When we increment `right`, this is like "adding" a new element to our window.

But what if after adding a new element, the subarray becomes invalid? We need to "remove" some elements from our window until it becomes valid again. To "remove" elements, we can increment `left`, which shrinks our window.

As we add and remove elements, we are `"sliding"` our window along the input from left to right. The window's size is constantly changing - it grows as large as it can until it's invalid, and then it shrinks. However, it always slides along to the right, until we reach the end of the input.

To explain why this algorithm works, let's look at a specific example. Let's say that we are given a positive integer array `nums` and an integer `k`. We need to find the length of the longest subarray that has a sum less than or equal to `k`. For this example, let `nums = [3, 2, 1, 3, 1, 1]` and `k = 5`.

Initially, we have `left = right = 0`, so our window is only the first element: `[3]`. Now, let's expand to the right until the constraint is broken. This will occur when `left = 0`, `right = 2`, and our window is: `[3, 2, 1]`. The sum here is `6`, which is greater than `k`. We must now shrink the window from the left until the constraint is no longer broken. After removing one element, the window becomes valid again: `[2, 1]`.

Why is it correct to remove this `3` and forget about it for the rest of the algorithm? Because the input only has positive integers, a longer subarray directly equals a larger sum. We know that `[3, 2, 1]` already results in a sum that is too large. There is no way for us to ever have a valid window again if we keep this `3` because if we were to add any more elements from the right, the sum would only get larger. That's why we can forget about the `3` for the rest of the algorithm.

#### Implementation 
Now that you have an idea of how sliding window works, let's talk about how to implement it. For this section, we will use the previous example (find the longest subarray with a sum less than or equal to `k`).

As described above, we need to identify a **constraint metric**. In our example, the constraint metric is the sum of the window. How do we keep track of the sum of the window as elements are added and removed? One way that we could do it is by keeping the window in a separate array. When we add elements from the right, we add them to our array. When we remove elements from the left, we remove the corresponding elements from the array. This way, we can always find the sum of our current window just by summing the elements in the separate array.

This is very inefficient as removing elements and finding the sum of the window will be $O(n)$ operations. How can we do better?

We don't actually need to store the window in a separate array. All we need is some variable, let's call it curr, that keeps track of the current sum. When we add a new element from the right, we just do `curr += nums[right]`. When we remove an element from the left, we just do `curr -= nums[left]`. This way, all operations are done in $O(1)$.

Next, how do we move the pointers `left` and `right?` Remember, we want to keep expanding our window, and the window always slides to the right - it just might shrink a few times in between. Because `right` is always moving forward, we can use a `for loop` to iterate right over the input. In each iteration of the `for loop`, we will be `adding the element nums[right] to our window`.

What about `left?` When we move `left`, we are shrinking our window. We only shrink our window when it becomes invalid. By maintaining `curr`, we can easily tell if the current window is valid by checking the condition `curr <= k`. When we add a new element and the window becomes invalid, we may need to remove multiple elements from the left. For example, let's say `nums = [1, 1, 1, 3]` and `k = 3`. When we arrive at the `3` and add it to the window, the window becomes invalid. We need to remove three elements from the left before the window becomes valid again.

This suggests that we should use a while loop to perform the removals. The condition will be `while (curr > k)` (while the window is invalid). To perform the removals, we do `curr -= nums[left]` and then increment `left` in each iteration of the while loop.

Finally, how do we update the answer? In each `for loop iteration`, after the `while loop`, the current window is valid. We can write code here to update the answer. **The formula for the length of a window is right - left + 1**.


- Here's some pseudocode that puts it all together:

```
function fn(nums, k):
    left = 0
    curr = 0
    answer = 0
    for (int right = 0; right < nums.length; right++):
        curr += nums[right]
        while (curr > k):
            curr -= nums[left]
            left++

        answer = max(answer, right - left + 1)

    return answer
```

* Here's some pseudocode for a general template:

```
function fn(arr):
    left = 0
    for (int right = 0; right < arr.length; right++):
        Do some logic to "add" element at arr[right] to window

        while WINDOW_IS_INVALID:
            Do some logic to "remove" element at arr[left] from window
            left++

        Do some logic to update the answer
```
### Why is Sliding window efficient ?

For any array, how many subarrays are there? If the array has a length of `n`, there are `n` subarrays of length `1`. Then there are `n - 1` subarrays of length `2` (every index except the last one can be a starting index), `n - 2` subarrays of length `3` and so on until there is only `1` subarray of length `n`. This means there are $\sum_{k=1}^{n}k=\frac{n.(n+1)}{2}$ subarrays (it's the partial sum of this [series](https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums)). In terms of time complexity, any algorithm that looks at every subarray will be at least $O(n^2)$, which is usually too slow. A sliding window guarantees a maximum of $2n$ window iterations - the right pointer can move $n$ times and the left pointer can move $n$ times. This means if the logic done for each window is $O(1)$, sliding window algorithms run in $O(n)$, which is **much** faster.

* You may be thinking: there is a while loop inside of the for loop, isn't the time complexity $O(n^2)$? The reason it is still $O(n)$ is that the while loop can only iterate $n$ times in total for the entire algorithm (`left` starts at $0$, only increases, and never exceeds `n`). If the while loop were to run `n` times on one iteration of the for loop, that would mean it wouldn't run at all for all the other iterations of the for loop. This is what we refer to as [amortized analysis](https://en.wikipedia.org/wiki/Amortized_analysis) - even though the worst case for an iteration inside the for loop is $O(n)$, it averages out to $O(1)$ when you consider the entire runtime of the algorithm.

- Now let's look at some sliding window examples.

#### Example 1:

Given an array of positive integers `nums` and an integer `k`, find the length of the longest subarray whose sum is less than or equal to `k`. This is the problem we have been talking about above. We will now formally solve it.

Let's use an integer `curr` that tracks the `sum` of the current window. Since the problem wants subarrays whose sum is less than or equal to `k`, we want to maintain `curr <= k`. Let's look at an example where `nums = [3, 1, 2, 7, 4, 2, 1, 1, 5`] and `k = 8`.

The window starts empty, but we can grow it to `[3, 1, 2]` while maintaining the constraint. However, after adding the `7`, the window's sum becomes too large. We need to tighten the window until the sum is below `8` again, which doesn't happen until our window looks like `[7]`. When we try to add the next element, our window again becomes too large, and we need to remove the `7` which means we have `[4]`. We can now grow the window until it looks like `[4, 2, 1, 1]`, but adding the next element makes the sum too large. We remove elements from the left until it fits the constraint again, which happens at `[1, 1, 5]`. The longest subarray we found was `[4, 2, 1, 1]` which means the answer is `4`.

When we add an element to the window by moving the right bound, we just do `curr += value`. When we remove an element from the window by moving the left bound, we just do `curr -= value.` We should remove elements so long as `curr > k`.

**More detailed explanation**

To summarize what each variable does in the code:

* `left`: the leftmost index of our current window
* `right`: the rightmost index of our current window
* `curr`: the sum of our current window
* `ans`: the length of the longest valid window we have seen so far

Iterate `right` over the input to add elements to the window. Update `curr` by adding `nums[right]` to it. When the window becomes invalid `(curr > k)`, remove elements from the window by subtracting `nums[left]` from `curr`. Then increment `left`. We need to do this until the window becomes valid again, so we use a while loop.

The size of a window is `right - left + 1`. Update our answer only when the window becomes valid.


### Let's programme the sliding window example

In [None]:
def find_length(nums: list[int], k: int) -> int:
    #let's start by initializing our pointers with the left, curr and ans - all starts from zero
    # curr is the current sum of the window 
    left = curr = ans = 0
    #next we interate over the input 
    for right in range(len(nums)):
        curr += nums[right] #At each number we add the number to our current window by incrementing our current window sum by that value
        #Next we will check as long as the constraint is broken - as long as the sum is greater than the element 
        while curr > k:
            curr -= nums[left] # we remove the value from the left and then increment left pointer to the right. So this will shrink our window
            left += 1

        #Now once we know that our constraint is no longer broken, we will update our ans and the length of our window
        ans = max(ans, right - left + 1)

    return ans


* Given a subarray **starting at left and ending at right, the length is `right - left + 1`**. As mentioned before, this algorithm has a time complexity of $O(n)$ since all work done inside the for loop is amortized $O(1)$, where 
`n` is the length of nums. The space complexity is constant because we are only using 3 integer variables.

#### Example 2
You are given a binary string `s` (a string containing only 0" and "1"). You may choose up to one "0" and flip it to a "1". What is the length of the longest substring achievable that contains only "1" ?

For example, given s = "1101100111", the answer is "5". If you perform the flip at index `2`, the string becomes `1111100111`.

For example, given `s = "1101100111"`, the answer is `5`. If you perform the flip at index `2`, the string becomes `1111100111`.

Because the string can only contain `"1"` and `"0"`, another way to look at this problem is "what is the longest substring that contains **at most one** `"0"`?. This makes it easy for us to solve with a sliding window where our condition is `window.count("0") <= 1`. We can use an integer `curr` that keeps track of how many `"0"` we currently have in our window.

**Further explanation**
The input can only contain `"1"` or `"0"`. We want to find the max consecutive `"1"`. Because any element that isn't a `"1"` is a `"0"`, this problem is equivalent to "what is the longest substring with at most one `"0"`, since we could just flip that `"0"` and it's guaranteed every other character in the substring would be a `"1"`.

Notice that the problem is asking for the length of a substring, and also has defined what makes a substring valid. The constraint metric is "how many `0s` are in the substring". The numeric restriction is `<= 1`. Therefore, if we use an integer `curr` to track the constraint metric, the condition to determine if a window is valid is `curr <= 1`.

We can use the exact same process as in the previous example now. We iterate over the elements with a pointer `right`. At each element, if `s[right]` is equal to `"1"`, we don't need to do anything. If it's equal to `"0"`, we increment `curr`.

Whenever the window becomes invalid `(curr > 1)`, we remove elements from the left. If `s[left] == "0"`, then we can decrement `curr`. We increment `left` to remove elements.

Again, the size of a window is `right - left + 1`. We update our answer with this value after the `while loop` because the window is guaranteed to be valid.

In [25]:
def find_length(s: str) -> int:
    """ 
    """
    #define the pointers 
    left = curr = ans = 0

    for right in range(len(s)):
        if s[right] == "0":
            curr += 1
        
        while curr > 1:
            if s[left] == "0":
                curr -= 1
            left += 1

        ans = max(ans, right - left + 1)

    return ans 


Like the previous example, this problem runs in $O(n)$ time, where 
$n$ is the length of `s`, as the work done in each loop iteration is **amortized** constant. Only a few integer variables are used as well, which means this algorithm uses 
$O(1)$ space.

#### Number of subarrays 
If a problem asks for the number of subarrays that fit some constraint, we can still use sliding window, but we need to use a neat math trick to calculate the number of subarrays.

Let's say that we are using the sliding window algorithm we have learned and currently have a window `(left, right)`. How many valid windows **end** at index `right`?

There's the current window `(left, right)`, then `(left + 1, right)`, `(left + 2, right)`, and so on until `(right, right)` (only the element at `right`).

You can fix the right bound and then choose any value between `left` and `right` inclusive for the left bound. Therefore, the number of valid windows **ending** at index `right` is equal to the size of the window, which we know is `right - left + 1`.

### Example 3: Subarray Product Less Than K
Given an array of positive integers `nums` and an integer `k`, return the number of subarrays where the product of all the elements in the subarray is strictly less than `k`. 

For example, given the input `nums = [10, 5, 2, 6], k = 100`, the answer is `8`. The subarrays with products less than `k` are:

`[10], [5], [2], [6], [10, 5], [5, 2], [2, 6], [5, 2, 6]`

To demonstrate the property we have just learned, let's look at the example in the description. When we reach index `2`, the product becomes too large, so we need to remove the leftmost element `10`. Now, the window is valid, and it has a length of `2`. That means that there are `2` valid subarrays that end here (`[2]` and `[5, 2]`).

Recall that in the previous examples, we updated the answer (longest length) after the while loop, when the window must be valid. Here, we can add the current size of the window to our answer instead. The constraint that determines if a window is valid is that the product is less than `k`.

Additionally, note that if `k <= 1` we can never have any valid windows, so we can just return `0` immediately.

**Further Explanation**
The constraint metric is: product of the window. The numeric restriction is `< k`. If we use an integer `curr` to represent the current product of the window, the condition that makes a window invalid is `curr >= k`. 

Add elements to the window with `curr *= nums[right]`. Remove them with `curr /= nums[left]`.

After the while loop, we know the window is valid. Add the window size `right - left + 1` to our answer. 

In [None]:
class Solution:
    def numSubarrayProductLessThanK(self, nums: list[int], k: int) -> int:
        if k <= 1:
            return  0
        
        ans = left = 0
        curr = 1 # since we are dealing with product, this can't be zero 

        for right in range(len(nums)):
            curr *= nums[right]
            while curr >= k:
                curr //= nums[left]
                left += 1

            ans += right - left + 1 #add to ans to get the number of subarrays with product less than K
        
        return ans  

Again, the work done in each loop iteration is amortized constant, so this algorithm has a runtime of $O(n)$, where $n$ is the length of `nums`, and $O(1)$ space.


### Fixed window size 
In the example we looked at above, our window size was dynamic. We tried to expand it to the right as much as we could while keeping the window within some constraint and removed elements from the left when the constraint was violated. Sometimes, a problem will specify a **fixed** length `k`.

These problems are easy because the difference between any two adjacent windows is only two elements (we add one element on the right and remove one element on the left to maintain the length).

Start by building the first window (from index `0` to `k - 1`). Once we have a window of size `k`, if we add an element at index `i`, we need to remove the element at index `i - k`. For example, `k = 2` and you currently have elements at indices `[0, 1]`. Now, we add `2`: `[0, 1, 2]`. To keep the window size at `k = 2`, we need to remove `2 - k = 0: [1, 2]`.

### Example 4: 
Given an integer array `nums` and integer `k`, find the sum of the subarray with the largest sum whose length is `k`.

As we mentioned before, we can build a window of length `k` and then slide it along the array. Add and remove one element at a time to make sure the window stays size `k`. If we are adding the value at `i`, then we need to remove the value at `i - k`.

After we build the first window we initialize our answer to `curr` to consider the first window's sum.

In [None]:
class Solution(object):
    def find_best_subarray(self, nums: list[int], k: int) -> int:
        curr = 0 
        for i in range(k):
            curr += nums[i]

        ans = curr 
        for i in range(k, len(nums)):
            curr += nums[i] - nums[i - k]
            ans = max(ans, curr)
        
        return ans 
    
#Test functuion 
nums = [3, -1, 4, 12, -8, 5, 6]
k = 4
sol = Solution()
largest = sol.find_best_subarray(nums, k)
print(largest)

In [None]:
k = 4 
nums = [3, -1, 4, 12, -8, 5, 6]

for i in range(4, len(nums)):
    print(i)

The total for loop iteration is equal to $n$, where $n$ is  the length of `nums`, and the work done in each iteration is constant, giving this algorithm a time of $O(n)$ , using $O(1)$ space. 

**Closing notes**

Sliding window is extremely common and versatile as a pattern. We only scratched the surface here because many sliding window problems will also need to use `hashmap`, which we will talk about in the hashing chapter. After learning about hashmaps, we'll look at some more sliding window problems. In the meantime, test your knowledge by solving upcoming practice problems. 

### A Prefix Sum

Prefix sum is a technique that can be used on arrays (of numbers). The idea is to create an array `prefix` where `prefix[i]` is the sum of all elements up to the index `i` (inclusive). For example, given `nums = [5, 2, 1, , 6, 3, 8]`, we would have `prefix = [5, 7, 8, 14, 17, 25]`.

```
When a subarray starts at index 0, it is considered a `prefix` of the array. A prefix sum represents the sum of all prefixes.
```

Prefix sums allow us to find the sum of any subarray in $O(1)$. If we want the sum of the subarray from `i` to `j` (inclusive), then the answer is `prefix[j] - prefix[i - 1]`, or `prefix[j] - prefix[i] + nums[i]` if you don't want to deal with the out of bounds case when `i = 0`.

This works because `prefix[i - 1]` is the sum of all elements before index `i`. When you subtract this from the sum of all elements up to index `j`, you are left with the sum of all elements starting at index `i` and ending at index `j`, which is exactly what we are looking for.

* Building a prefix sum is very simple:

Initially, we `start with just the first element`. Then we iterate with `i starting from index 1`. At any given point, the last element of prefix will represent the sum of all the elements in the input up to `but not including index i`. So we can add that value plus the current value to the end of prefix and continue to the next element.

A prefix sum is a great tool whenever a problem involves sums of a subarray. It only costs $O(n)$ to build but allows all future subarray queries to be $O(1)$, so it can usually improve an algorithm's time complexity by a factor of $|O(n)$, where $n$ is the length of the array. Let's look at some examples.


Building a prefix sum is a form of pre-processing. Pre-processing is a useful strategy in a variety of problems where we store pre-computed data in a data structure before running the main logic of our algorithm. While it takes some time to pre-process, it's an investment that will save us a huge amount of time during the main parts of the algorithm.

### Example 5: Prefix Sum
Given an integer array `nums`, an array `queries` where `queries[i] = [x, y]` and an integer `limit`, return a boolean array that represents the answer to each query. A query is `true` if the sum of the subarray from `x` to `y` is less than `limit`, or `false` otherwise.

- For example, given `nums = [1, 6, 3, 2, 7, 2]`, `queries = [[0, 3], [2, 5], [2, 4]]`, and `limit = 13`, the answer is `[true, false, true]`. For each query, the subarray sums are `[12, 14, 12]`.

**Let's build a prefix sum and then use the method described above to answer each query in $O(1)$**.

In [3]:
def answer_queries(nums, queries, limit):
    prefix = [nums[0]]

    for i in range(1, len(nums)):
        prefix.append(nums[i] + prefix[- 1])

    ans = []
    for x, y in queries:
        curr = prefix[y] - prefix[x] + nums[x]
        ans.append(curr < limit)
    
    return ans

Without the prefix sum, answering each query would be $O(n)$ in the worst case, where $n$ is the length of nums. If `m = queries.length`, that would give a time complexity of 
$O(n∗m)$. With the prefix sum, it costs $O(n)$ to build, but then answering each query is $O(1)$. This gives a much better time complexity of $O(n+m)$. We use $O(n)$ space to build the prefix sum.

### Example 6: Prefix Sum: Number of ways to Split Array:
Given an integer array `nums`, find the number of ways to split the array into two parts so that the first section has a sum greater than or equal to the sum of the second section. The section should have at least one number.


A brute force approach would be to iterate over each index `i` from `0` until `nums.length - 1`. For each index, iterate from `0` to `i` to find the sum of the left section, and then iterate from `i + 1` until the end of the array to find the sum of the right section. This algorithm would have a time complexity of $O(n^2)$.

If we build a prefix sum first, then iterate over each index, we can calculate the sums of the left and right sections in $O(1)$, which would improve the time complexity to $O(n)$.

**Further Explanation**
When we split the array into two parts, we are left with two adjacent subarrays. We need to find the sums of these subarrays and compare them.

There are $n−1$ ways to split the array (the right section can't be empty). For each of these splits, it would cost $O(n)$ to iterate over the two subarrays and find their sums.

Instead, we can spend $O(n)$ once to build a prefix sum before trying any splits. Then we can use the prefix sum to perform each of the $n−1$ splits in $O(1)$ time. As we know, with a prefix sum we can calculate the sum of any subarray in $O(1)$.

Let's say we are splitting at index `i`. The left section has all elements in the array up to index `i`, so it has a sum of `prefix[i]`. The right section begins at index `i + 1` and ends at the final index `n - 1`. This means it has a sum of `prefix[n - 1] - prefix[i]`.

In [2]:
class Solution:
    def waysToSplitArray(self, nums: list[int]) -> int:
        prefix  = [nums[0]]

        for i in range(1, len(nums)):
            prefix.append(nums[i] + prefix[-1])
        
        ans = 0 
        for i in range(len(nums) - 1):
            left_section = prefix[i]
            right_section = prefix[-1] - prefix[i]
            if left_section >= right_section:
                ans += 1

        return ans

In [None]:
class Solution:
    def sub

**Improved Complexity: Do we need the array ?**

In the above problem, the order in which we need to access `prefix` is incremental: to find `leftSection` we do `prefix[i]` as `i` increments` by `1` each iteration.

As such, to calculate `leftSection` we don't actually need the array. We can just initialize `leftSection = 0` and then calculate it on the fly by adding the current element to it at each iteration.

What about the `rightSection` ? By definition, the right section contains all the numbers in the array that aren't in the left section. Therefore, we can pre-compute the sum of the entire input as `total`, then calculate `rightSection` as `total - leftSection`.

We are still using the concept of a prefix sum as each value of `leftSection` represents the sum of a prefix. **We have simply replicate the functionality using an integer instead of an array- So we have improved the Space Complexity to $O(1)$**.


In [7]:
class Solution:
    def waysToSplitArray(self, nums: list[int]) -> int:
        ans = left_section = 0
        total = sum(nums)

        for i in range(len(nums) - 1):
            left_section += nums[i]
            right_section = total - left_section
            if left_section >= right_section:
                ans += 1
        
        return ans 
    
#Test Function
nums = [10,4,-8,7]
sol = Solution()
tst = sol.waysToSplitArray(nums=nums)
tst

2

**Closing notes**
This is the last major pattern we will be looking at for arrays and strings. In the next article, we'll look at a few more common tricks and patterns, then close the chapter with a quiz before moving on. Before that, try applying the concepts learned here in the next problem. 

### Example 7: Runing Sum of 1d Array

* Given an array `nums`. We define a running sum of array as `runningSum[i] = sum(nums[0]...nums[i])

- Return the running sum of `nums`

**Example**: 
```
Input: nums = [1,2,3,4]
Output: [1,3,6,10]
Explanation: Running sum is obtained as follows: [1, 1+2, 1+2+3, 1+2+3+4].
```

In [None]:
class Solution:
    def runningSum(self, nums: list[int]) -> list[int]:
        """
        We will use prefix sum to calculate the running sum
        """
        prefix = [nums[0]]

        for i in range(1, len(nums)):
            prefix.append(nums[i] + prefix[-1])
            
        return prefix
    

### Example 8: Minimum Value to Get Positive Step by Step Sum
Given an array of integers `nums`, you start with an initial positive startvalue.

In each iteration, you calculate the step by step sum of startVakue plus elements in `nums` (from left to right).

Return the minimum **positive** value of startValue such that the step by step sum is never less than 1.

**Example 1:**
```
Input: nums = [-3,2,-3,4,2]
Output: 5
Explanation: If you choose startValue = 4, in the third iteration your step by step sum is less than 1.
step by step sum
startValue = 4 | startValue = 5 | nums
  (4 -3 ) = 1  | (5 -3 ) = 2    |  -3
  (1 +2 ) = 3  | (2 +2 ) = 4    |   2
  (3 -3 ) = 0  | (4 -3 ) = 1    |  -3
  (0 +4 ) = 4  | (1 +4 ) = 5    |   4
  (4 +2 ) = 6  | (5 +2 ) = 7    |   2
```

**Example 2:**
```
Input: nums = [1,2]
Output: 1
Explanation: Minimum start value should be positive.
```

**Example 3:**
```
Input: nums = [1,-2,-3]
Output: 5
```

In [None]:
class Solution:
    def minStartValue(self, nums: list[int]) -> int:
        min_cumulative_sum = float('inf')
        cumulative_sum = 0
        
        for num in nums:
            cumulative_sum += num
            min_cumulative_sum = min(min_cumulative_sum, cumulative_sum)
        
        # The starting value should be at least 1, so we need to compensate
        # for any negative minimum cumulative sum by adding the absolute value.
        start_value = 1 - min_cumulative_sum
        
        # If the minimum cumulative sum is positive, start_value would be 1.
        # Otherwise, it's adjusted by the negative cumulative sum.
        return max(start_value, 1)

# Example usage:
solution = Solution()
print(solution.minStartValue([-3, 2, -3, 4, 2]))  # Output should be 5


### Example 9: K Radius Subarray Averages
* You are given a **0-indexed** array `nums` of `n` integers, and an integer `k`. 

The **k-radius average** for a subarray of `nums` **centered** at some index `i` with the **radius** `k` is the average of **all** elements in `nums` between the indices `i - k` and `i + k` (**inclusive**). If there are less than `k` elements before **or** after the index `i`, then the **k-radius average** is `-1`.

Build and return an array `avgs` of length `n` where `avgs[i]` is the **k-radius average** for the subarray centered at index `i`.

The average of `x` elements is the sum of the `x` elements divided by `x`, using **integer division**. The integer division truncates toward zero, which means losing its fractional part.

- For example, the average of four elements `2`, `3`, `1`, and `5` is `(2 + 3 + 1 + 5) / 4 = 11 / 4 = 2.75`, which truncates to `2`.

**Example 1:**

**K-Radius Subarray Average**

<img width="823" alt="K-Radius Average" src="https://github.com/user-attachments/assets/d9d87a45-5791-456d-ad00-f352ff536b1c">


```
Input: nums = [7,4,3,9,1,8,5,2,6], k = 3
Output: [-1,-1,-1,5,4,4,-1,-1,-1]
Explanation:
- avg[0], avg[1], and avg[2] are -1 because there are less than k elements before each index.
- The sum of the subarray centered at index 3 with radius 3 is: 7 + 4 + 3 + 9 + 1 + 8 + 5 = 37.
  Using integer division, avg[3] = 37 / 7 = 5.
- For the subarray centered at index 4, avg[4] = (4 + 3 + 9 + 1 + 8 + 5 + 2) / 7 = 4.
- For the subarray centered at index 5, avg[5] = (3 + 9 + 1 + 8 + 5 + 2 + 6) / 7 = 4.
- avg[6], avg[7], and avg[8] are -1 because there are less than k elements after each index.
```

**Example 2:**
```
Input: nums = [100000], k = 0
Output: [100000]
Explanation:
- The sum of the subarray centered at index 0 with radius 0 is: 100000.
  avg[0] = 100000 / 1 = 100000.
```

**Example 3:**
```
Input: nums = [8], k = 100000
Output: [-1]
Explanation: 
- avg[0] is -1 because there are less than k elements before and after index 0.
```

In [12]:
class Solution:
    def getAverages(self, nums: list[int], k: int) -> list[int]:
        """
        Approach: Sliding Window
        ========================
        Window Size: 
            :For a given index (i), the subarray would range from (i - k) to (i + k), making the total number of elements in the subarray equal to (2*k + 1)
        
        Out-Of-Bounds-Handling:
            :If (i - k < 0) or (i + k >= n) there aren't enough elements to form the subarray, and thus the average for that index should be -1
        
        Sliding Window Sum:
            :Instead of recalculating the sum for each window, we can use the sliding window technique. This allows us to update the sum by subtracting the element that slides out and adding elements 
        
        Result Array:
            :We'll initialize a result array (avg) filled with (-1). For each index (i) where a valid k-radius subarray exists, we'll compute and store the average.
        """
        n = len(nums)
        avgs = [-1] * n
        window_size = 2 * k + 1

        if window_size > n:
            return avgs
        
        current_sum = sum(nums[:window_size - 1]) # initial sum excluding the first element 

        for i in range(k, n - k):
            current_sum += nums[i + k] # Include the next element in the window 
            avgs[i] = current_sum // window_size
            current_sum -= nums[i - k] # Exclude element tha is sliding out 

        return avgs
        
# Test
nums = [7,4,3,9,1,8,5,2,6]
k = 3
sol =  Solution()
test1 = sol.getAverages(nums, k)
test1

[-1, -1, -1, 5, 4, 4, -1, -1, -1]

### Example 10: Get Subarray Averages
* Approach: you need to handle window of size (2 * k + 1) around each element
* For positions where the window cannot fully fit (i.e., near the beginning and the end of the list) return -1

In [16]:
class Solution:
    def getAverages(self, nums: list[int], k: int) -> list[int]:
        n = len(nums)

        # window size 
        window_size = 2 * k + 1 
        result = [-1] * n 

        if window_size > n:
            return result
        
        #calculate the sum of the first windown 
        current_window_sum = sum(nums[:window_size])

        # Assign the average to the middle of the first window
        result[k] = current_window_sum // window_size

        #slide the window across the array
        for i in range(k+1, n - k):
            #update the current sum by subtracting element that is sliding out and adding element that is sliding in
            current_window_sum -= nums[i - k - 1]
            current_window_sum += nums[i + 1]

            #Assign the average to the current middle of the window
            result[i] = current_window_sum // window_size
        
        return result

# Test
nums = [7,4,3,9,1,8,5,2,6]
k = 3
sol =  Solution()
test1 = sol.getAverages(nums, k)
test1

[-1, -1, -1, 5, 5, 5, -1, -1, -1]

### A More Common Patterns 
In this article, we'll briefly talk about a few more patterns and some common tricks that can be used in algorithm problems regarding arrays and strings.

-  **$O(n)$ String Building**
We mentioned earlier that in most languages, strings are immutable. This means concatenating a single character to a string is an $O(n)$ operation. If you have a string that is 1 million characters long, and you want to add one more character, all 1 million characters need to be copied over to another string.

Many problems will ask you to return a string, and usually, this string will be built during the algorithm. Let's say the final string is of length `n` and we build it one character at a time with concatenation. What would the time complexity be? The operations needed at each step would be `1 + 2 + 3 + ... + n`. This is the partial sum of this series, which leads to $O(n^2)$ operations.

**Simple concatenation will result in an $O(n^2)$ time complexity if you are using a language where strings are immutable**.

- There are better ways to build strings in just $O(n)$ time. This will vary between languages - here, we'll talk about Python and Java - if you're using another language, we recommend researching the best way to build strings in your language.

**Python**
1) Declare a list 
2) When building the string, add the characters to the list. This is $O(1)$ per operation. Across `n` operations, it will cost $O(n)$ in total 
3) Once finished, convert the list to a string using `"".join(list)`. This is $O(n)$.
4) In total, it cost us $O(n + n) = O(2n) = O(n)$.

In [17]:
def build_string(s):
    arr = []
    for char in s:
        arr.append(char)
    
    return "".join(arr)

### Subarrays/Substrings, Subsequences, and Subsets
* Let's quickly talk about the differences between these types and what to look out for when encountering them in problems.

**Subarrays/Substrings**
As a reminder, a subarray or substring is a contiguous section of an array or string.

**If a problem has explicit constraints such as:**
- Sum greater than or less than `k`
- Limits on what is contained, such as the maximum of `k` unique elements or no duplicates allowed

And/or asks for:
- Minimum or maximum length 
- Number of subarrays/substrings
- Max or minimum sum 

Think about a sliding window. Note that not all problems with these characteristics should be solved with a sliding window, and not all sliding window problems have these characteristics. These characteristics should only be used as a general guideline.

If a problem's input is an integer array and you find yourself needing to calculate multiple subarray sums, consider building a prefix sum.

The size of a subarray between `i` and `j` (inclusive) is `j - i + 1`. This is also the number of subarrays that end at `j`, starting from `i` or later.

**Subsequences**
A subsequence is a set of elements of an array/string that keeps the same relative order but doesn't need to be contiguous.

For example, subsequences of `[1, 2, 3, 4]` include: `[1, 3], [4], [], [2, 3]`, **but not** `[3, 2], [5], [4, 1]`.

Typically, subsequence problems are more difficult. Because this is only the first chapter, it is difficult to talk about subsequence patterns now. Subsequences will come up again later in the course - for example, dynamic programming is used to solve a lot of subsequence problems.

**From the patterns we have learned so far, the most common one associated with subsequences is two pointers** when two input arrays/strings are given (we did look at one problem in the two pointers articles involving subsequences). **Because prefix sums and sliding windows represent subarrays/substrings, they are not applicable here**.

**Subset**
A subset is any set of elements from the original array or string. The order doesn't matter and neither do the elements being beside each other. For example, given [1, 2, 3, 4], all of these are subsets: `[3, 2], [4, 1, 2], [1]`. Note: subsets that contain the same elements are considered the same, so `[1, 2, 4]` is the same subset as `[4, 1, 2]`.

You may be thinking, what is the difference between subsequences and subsets if subsets with the same elements are considered the same? In subsequences, the order matters - let's say you had an array of integers and you needed to find a subsequence with 3 consecutive elements `(like 1, 2, 3)`. This would be harder than finding a subset with 3 consecutive elements because, with a subset, the 3 elements simply need to exist. In a subsequence, the elements need to exist in the correct relative order.

One thing to note is that if a problem involves subsequences, but the order of the subsequence doesn't actually matter (let's say it wants the sum of subsequences), then you can treat it the same as a subset. A useful thing that you can do when dealing with subsets that you can't do with subsequences is that you can sort the input, since the order doesn't matter.

**Closing Notes**
That's all for the arrays and strings chapter. Because of the simplicity of the topic, it is difficult to delve into deeper problems at the moment. However, the structure of this course involves building on knowledge incrementally. For example, there are dozens of sliding window problems on LeetCode that we couldn't talk about here because we haven't talked about hash maps yet. Because almost all non tree/graph/linked list problems have an array or string in the input, this will definitely not be the last we see of the patterns learned in the chapter.

Before moving on to the next topic, test your knowledge with the upcoming quiz.


### Array & String Quiz
1) Given nums = [5, 2, 3, 1, 6], the prefix sum would be:?

In [21]:
class Solution:
    def prefixSum(self, nums: list[int]) -> list[int]:
        prefix = [nums[0]]

        for i in range(1, len(nums)):
            prefix.append(nums[i] + prefix[-1])

        return prefix 
    
#Test 
nums = [5, 2, 3, 1, 6]
sol = Solution()
ts = sol.prefixSum(nums)
ts 

[5, 7, 10, 11, 17]

### Time Complexity for Appending to the end of a Dynamic Array
The time complexity of appending to the end of a dynamic array is: **$O(1)$**

* Sometimes the operation will cost $O(n)$, but it doesn't happen often enough to make the average operation cost $O(n)$.

### Time complexity for adding characters in an array to the string 
* You have a **mutable** string and an array of characters with length n. You want to add all the characters in the array to the string one by one with string concatenation. 

- What will the time complexity be? -  **$O(n)$**

**NOTE:** If the string is mutable, then each concatenation is O(1), which is performed n times.

## Time complexity of `while loop` inside `for loop`

* Sliding window algorithms have while loops inside for loops. Why is the time complexity still O(n)?

- **Ans:** The while loop can only iterate n times in total, so we say the work inside the for loop is amortized O(1).

### Calculating the length of a sliding window
* You have a subarray that starts at index left and ends at index right (inclusive). How many elements are in the subarray?

**Ans:** `right - left + 1`

- This is an important formula to remember for problems that ask for the number of subarrays that fit a constraint.

### Bonus Problems: Array & Strings
**Two pointers**


**Sliding Window**



**Prefix Sum**

### Two Pointer 

#### Bonus Problem 1:
* Given a string `s`, reverse the order of character in each word within a sentence while preserving whitespace and initial word order. 

**Example 1:**
```
Input: s = "Let's take LeetCode contest"
Output: "s'teL ekat edoCteeL tsetnoc"
```

**Example 2:**
```
Input: s = "Mr Ding"
Output: "rM gniD"
```

## Option: 1

In [33]:
class Solution:
    def reverseWords(self, s: str) -> str:
        #Step 1: convert string to list of characters
        char_list = list(s)
        n = len(char_list)

        def reverse_segment(left: int, right: int):
            while left < right:
                char_list[left], char_list[right] = char_list[right], char_list[left]
                left += 1
                right -= 1

        start = 0 
        while start < n:
            if char_list[start] != ' ': # Find the start of a word
                end  =  start 

                while end < n and char_list[end] != ' ': # Find the end of the word 
                    end += 1
                
                reverse_segment(start, end - 1) # reverse the word in place 
                start = end #move to the next word
            else:
                start += 1 # Skip Spaces

        #Convert list of characters back to string 
        reversed_str = ''.join(char_list)

        return reversed_str
    
    #Test 
s = "Let's take LeetCode contest"
sol = Solution().reverseWords(s)
sol

"s'teL ekat edoCteeL tsetnoc"

### Option 2:

In [25]:
class Solution:
    def reverseString(self, s: str) -> str:
        #Step 1: Split the string while preserving the white spaces 
        words = s.split(' ')

        #Setp 2: reverse each word in the list 
        reversed_words = [word[::-1] for word in words]

        # joint the reversed words back into a single string 
        reversed_str = ' '.join(reversed_words)

        return reversed_str
    
#Test 
s = "Let's take LeetCode contest"
sol = Solution().reverseString(s)
sol

"s'teL ekat edoCteeL tsetnoc"

### Bonus problem 2: 
**To reverse the order of characters in each word of a sentence while preserving the whitespace and the initial order of the words.**

In [24]:
class Solution:
    def reverseWords(self, s: str) -> str:
        # Step 1: Split the string into words
        words = s.split()
        
        # Step 2: Reverse the list of words
        words.reverse()
        
        # Step 3: Join the words back into a single string with spaces in between
        reversed_str = ' '.join(words)
        
        return reversed_str

# Example usage:
solution = Solution()
print(solution.reverseWords("hello world"))  
# Output: "world hello"


world hello


### Bonus Problem 3: reverse Only Letters
Given a string `s`, reverse the string according to the following rules:

- All the characters that are not English letters remain in the same position.
- All the English letters (lowercase or uppercase) should be reversed.

- Return `s` after reversing it.

**Example 1:**

```
Input: s = "ab-cd"
Output: "dc-ba"
```

**Example 2:**
```
Input: s = "a-bC-dEf-ghIj"
Output: "j-Ih-gfE-dCba"
```

**Example 3:**
```
Input: s = "Test1ng-Leet=code-Q!"
Output: "Qedo1ct-eeLg=ntse-T!"
```

In [35]:
class Solution:
    def reverseOnlyLetters(self, s: str) -> str:
        # Convert the string to a list of characters to modify it in place
        char_list = list(s)
        left, right = 0, len(char_list) - 1
        
        while left < right:
            # Move the left pointer to the next English letter
            while left < right and not char_list[left].isalpha():
                left += 1
            # Move the right pointer to the previous English letter
            while left < right and not char_list[right].isalpha():
                right -= 1
            
            # Swap the characters at left and right pointers
            if left < right:
                char_list[left], char_list[right] = char_list[right], char_list[left]
                left += 1
                right -= 1
        
        # Convert the list back to a string and return it
        return ''.join(char_list)

# Example usage
s = "a-bC-dEf-ghIj"
solution = Solution()
reversed_s = solution.reverseOnlyLetters(s)
print(reversed_s)  # Output: "j-Ih-gfE-dCba"


j-Ih-gfE-dCba


## Hashing 
Before we start this chapter, let's quickly talk about data structures.

In the most basic terms, a data structure is a format for organizing data in an efficient way. In practical terms, we can split data structures into two things: the interface and the implementation.

The interface is like a contract that specifies how we can interact with the data structure - what operations we can perform on it, what inputs it expects, and what outputs we can expect.

For example, consider a dynamic array. The interface would include operations like appending, insertion, removal, updating, and more. These operations are well-defined and have specific rules that we must follow when we use them. If we want to append an element, we use the built-in method like `.append()` or `.push()` while passing in the element we want to add as an argument. Typically this operation doesn't return anything.

Now, the implementation is the code that actually makes the data structure work. This is where the details of how the data is stored and how the operations are performed come into play. For example, the implementation of a dynamic array might involve allocating memory for the list, tracking the size, and rearranging the elements when an operation like remove is called.

For many data structures, the implementation can be quite complex, involving intricate algorithms and data manipulation. However we don't need to worry about those details - we only need to understand the interface and how to use it properly.

In this article and a few others in the course, we will talk about the underlying implementation details behind a data structure. While it does help to have a basic understanding, don't worry too much about memorizing these details. We have included them for completeness.

**What is the point of a hash function** ?

We know that arrays have $O(1)$ random access. Given an arbitrary index, we can access and update its value in the array in constant time. The main constraint with arrays is that they are a fixed size, and the indices have to be integers. Because hash functions can convert any input into an integer, we can effectively remove the constraint of indices needing to be integers. When a hash function is combined with an array, it creates a **hash map**, also known as a **hash table** or **dictionary**.

With arrays, we **map** indices to values. With hash maps, we map **keys** to values, and a key can be almost anything. Typically, the only constraint on a hash map's key is that it has to be **immutable** (this is language dependent but generally a good rule of thumb). Values can be anything.

A hash map is probably the most important concept in all of algorithm interviewing. It is extremely powerful and allows you to reduce the time complexity of an algorithm by a factor of $O(n)$ for a huge amount of problems. Every major language has a [built-in implementation of a hash map](https://en.wikipedia.org/wiki/Hash_function). For example, in Python they're called dictionaries and declaring one is as simple as `dic = {}`. If you could only take one thing from this course, it should be to master the hash map interface for the programming language you use.

To summarize, a hash map is an unordered data structure that stores key-value pairs. A hash map can add and remove elements in $O(1)$, as well as update values associated with a key and check if a key exists, also in $O(1)$. You can iterate over both the keys and values of a **hash map**, but the iteration won't necessarily follow any order (there are many implementations and this is language dependent for the built-in types).

**An ordered data structure is one where the insertion order is "remembered". An unordered data structure is one where the insertion order is not relevant.**

##### Comparison with arrays
In terms of time complexity, hash maps blow arrays out of the water. The following operations are all $O(1)$ for a hash map:
* Add an element and associate it with a value
* Delete an element if it exists
* Check if an element exists

A hash map also has many of the same useful properties as an array with the same time complexity:

* Find length/number of elements
* Updating values
* Iterate over elements

**Hash maps are also just easier/cleaner to work with. Even if your keys are integers and you could get away with using an array, if you don't know what the max size of your key is, then you don't know how large you should size your array. With hash maps, you don't need to worry about that, since the key will be converted to a new integer within the size limit anyways.**

However, from a practical perspective, there are some disadvantages to using hash maps, and it's important to know them as it is common in interviews to talk about tradeoffs.

The biggest disadvantage of hash maps is that for smaller input sizes, they can be slower due to overhead. Because big O ignores constants, the $O(1)$ time complexity can sometimes be deceiving - it's usually something more like $O(10)$ because every key needs to go through the hash function, and there can also be collisions, which we will talk about in the next section.

Hash tables can also take up more space. Dynamic arrays are actually fixed-size arrays that resize themselves when they go beyond their capacity. Hash tables are also implemented using a fixed size array - remember that the size is a limit set by the programmer. The problem is, resizing a hash table is much more expensive because every existing key needs to be re-hashed, and also a hash table may use an array that is significantly larger than the number of elements stored, resulting in a huge waste of space. Let's say you chose your limit as 10,000 items, but you only end up storing 10. Okay, you could argue that 10,000 is too large, but then what if your next test case ends up needing to store 100,000 elements? The point is, when you don't know how many elements you need to store, arrays are more flexible with resizing and not wasting space.

**Note: remember that time complexity functions only involve the variables you define. When we say that hash map operations are $O(1)$, the variable we are concerned with is usually $n$ which is the size of the hash map. However, this may be misleading. For example, hashing a string requires $O(m)$ time, where $m$ is the length of the string. The constant time operations are only constant relative to the size of the map.**


#### Collisions
When different keys convert to the same integer, it is called a collision. Without handling collisions, older keys will get overridden and data will be lost. There are [multiple ways](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution) to handle collisions, but here we'll talk about a common one called **chaining**.

When using chaining, we store linked lists inside the hash map's array instead of the elements themselves. The linked list nodes store both the key and the value. If there are collisions, the collided key-value pairs are linked together in a linked list. Then, when trying to access one of these key-value pairs, we traverse through the linked list until the key matches.

Collisions are problematic because handling them is necessary, and handling them takes time, slowing down the overall speed and efficiency of the hash map. How can we design our hash map to minimize collisions? The most important thing is that the size of your hash table's array and [modulus is a prime number](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus). Prime numbers near significant magnitudes that are common to use are:

* 10,007
* 1,000,003
* 1,000,000,007

#### Sets
A set is another data structure that is very similar to a hash table. It uses the same mechanism for hashing keys into integers. The difference between a set and hash table is that sets do not map their keys to anything. Sets are more convenient to use when you only care about checking if elements exist. You can add, remove, and check if an element exists in a set all in $O(1)$.

An important thing to note about sets is that they don't track frequency. If you have a set and add the same element 100 times, the first operation adds it and the next 99 do nothing.

#### Arrays as keys?
We said that being immutable is usually a requirement for being a hash map key. Arrays are mutable, so how do we store an ordered collection of elements as a key? Depending on the language you're using, there are several ways to convert an array into a unique immutable key. In Python, tuples are immutable, so it's as easy as doing `tuple(arr)`. Another trick is to convert the array into a string, delimited by some character that is guaranteed to not show up in any element. For example, use a comma to separate integers. `[1, 51, 163] --> "1,51,163"`.

In [34]:
#method = [method for method in dir(str) if callable(getattr(str, method))]
#method

#### Checking for existence
One of the most common applications of a hash table or set is determining if an element exists in $O(1)$. Since an array needs $O(n)$ to do this, using a hash map or set can improve the time complexity of an algorithm greatly, usually from $O(n^2)$ to $O(n)$. Let's look at some example problems.

**Example 1** Two Sum
Given an array of integers `nums` and an integer `target`, return indices of two numbers such that they add up to `target`. You cannot use the same index twice.

The brute force solution would be to use a nested for loop to iterate over every pair of indices and check if the sum is equal to `target`. This will result in a time complexity of $O(n^2)$. In the brute force solution, the first for loop focuses on a number `num` and does a second for loop which looks for `target - num` in the array. With an array, looking for `target - num` is $O(n)$, but with a hash map, it is $O(1)$.

We can build a hash map as we iterate along the array, mapping each value to it's index. At each index i, where num = nums[i], we can check our hash map for target - num. Adding key-value pairs and checking for target - num are all $O(1)$, so our time complexity will improve to $O(n)$.

**Detailed Explanation**
We are looking for two numbers that sum to `target`. We iterate over the input and for each element `num`, we see if this element can be paired with another number to form `target`.

If another element `target - num` exists, then their sum `num + target - num = target` is what we are looking for.

So as we iterate over the input, we put elements in a hash map. Then in the future, we can check if we've seen target - num for each num in $O(1)$. The problem wants us to return the indices instead of the numbers themselves, so we can associate each number with its index.

In [38]:
class Solution:
    def twoSum(self, nums: list[int], target: int) -> list[int]:
        dic = {}

        for i in range(len(nums)):
            num = nums[i]
            complement =  target - num 
            if complement in dic:
                return [i, dic[complement]]
            
            dic[num] =  i

        return [-1, -1]

**NOTE:** 
- If the question wanted us to return a boolean indicating if a pair exists or to return the numbers themselves, then we could just use a **set**. However, since it wants the indices of the numbers, we need to use a **hash map to "remember" what indices the numbers are at**.

**Time & Space Complexity**
- The time complexity is $O(n)$ as the hash map operations are $O(1)$. This solution also uses $O(n)$ space as the number of keys the hash map will store scales linearly with the input size.

#### The First Letter to Appear Twice 

* Given `a` string `s` consisting of lowercase English letters, return the first letter to appear **twice**.

**Note:**
- A letter `a` appears twice before another letter b if the second occurrence of a is before the second occurrence of b.
- `s` will contain at least one letter that appears twice.

**The Brute Force Option**
The brute force solution would be to iterate along the string, and for each character `c`, iterate again up to `c` to see if there is any match.

In [None]:
class Solution:
    def repeatedCharacter(self, s:str) -> str:
        for i in range(len(s)):
            c = s[i]
            for j in range(i):
                if s[j] == c:
                    return c 
        
        return ""       

**Time Complexity**:
This is $O(n^2)$ due to the nested loop. The second loop is checking for the existence of `c`, which can be done in $O(1)$ using a **set**.

#### Optimal Option: Set

In [40]:
class Solution:
    def repeatedCharacter(self, s:str) -> str:
        seen = set()

        for c in s:
            if c in seen:
                return c
            seen.add(c)

        return ""

**Time Complexity:** This improves our time complexity to $O(n)$ as each for loop iteration now runs in constant time.

**Space Complexity**
The space complexity is a more interesting topic of discussion. Many people will argue that the space complexity is $O(1)$ because the input can only have characters from the **English alphabet**, which is bounded by a constant (26). This is very common with string problems and technically correct. In an interview setting, this is probably a safe answer, but you should also note that the space complexity could be $O(m)$, where `m` is the number of allowable characters in the input. This is a more general answer and also technically correct.

#### Unique Numbers 
* Given an integer array `nums`, find all the **unique** numbers `x` in `nums` that satisfy the following: `x + 1` is not in `nums`, and `x - 1` is not in `nums`.

We can solve this in a straightforward manner - just iterate through `nums` and check if `x + 1` or `x - 1` is in `nums`. By converting `nums` into a set beforehand, these checks will cost $O(1)$.

Converting the input into a set beforehand is another example of pre-processing.

In [None]:
class Solution:
    def uniqueNumbers(self, nums):
        ans  = []
        nums = set(nums)

        for num in nums:
            if (num + 1 not in nums) and (num - 1 not in nums):
                ans.append(num) 
            
        return ans

** Time and Space Complexity**
Because the checks are $O(1)$, the time complexity is $O(n)$ since each for loop iteration runs in constant time. The set will occupy $O(n)$ space.

**NOTE:**
Anytime you find your algorithm running `if ... in ...`, then consider using a **hash map or set** to store elements to have these operations run in $O(1)$. Try these upcoming practice problems with what was learned here.

#### Check if the Sentence Is Pangram
A **pangram** is a sentence where every letter of the English aplphabet appears at least once.

* Given a string `sentence` containing only lowercase English letters, return `true` if `sentence` is a **pangram**, or `false` otherwise.

**Example 1:**
```
Input: sentence = "thequickbrownfoxjumpsoverthelazydog"
Output: true
Explanation: sentence contains at least one of every letter of the English alphabet.
```

**Example 2:**
```
Input: sentence = "leetcode"
Output: false
```

**Hint 1**:
- Iterate over the string and mark each character as found (using a boolean array, bitmask, or any other similar way).

**Hint:2**
- Check if the number of found characters equals the alphabet length.

In [47]:
class Solution:
    def checkIfPangram(self, sentence: str) -> bool:
        seen = set()

        for c in sentence:
            seen.add(c)

            if len(seen) == 26:
                return True
            
        return len(seen) == 26

#### Missing Number
Given an array `nums` containing `n` distinct numbers in the range `[0, n]`, return the only number in the range that is missing from the array.

**Example 1:**
```
Input: nums = [3,0,1]
Output: 2
Explanation: n = 3 since there are 3 numbers, so all numbers are in the range [0,3]. 2 is the missing number in the range since it does not appear in nums.
```

**Example 2:**
```
Input: nums = [0,1]
Output: 2
Explanation: n = 2 since there are 2 numbers, so all numbers are in the range [0,2]. 2 is the missing number in the range since it does not appear in nums.
```

**Example 3:**
```
Input: nums = [9,6,4,2,3,5,7,0,1]
Output: 8
Explanation: n = 9 since there are 9 numbers, so all numbers are in the range [0,9]. 8 is the missing number in the range since it does not appear in nums.
```

**NOTE:** Follow up: Could you implement a solution using only O(1) extra space complexity and O(n) runtime complexity?


**Implementation Strategy**
To find the missing in a sequence `[0, n]`, we can use the mathematical property of the sum of the first `n` natural numbers:
* The sum of the first `n` natural numbers is given by the formula:
$$Sum = \frac{n * (n + 1)}{2}$$

* Calculate the actual sum of the elements in the array.

* The difference between the expected sum and the actual sum will give the missing number 

In [61]:
class Solution:
    def missingNumber(self, nums: list[int]) -> int:
        n = len(nums)
        expected_sum = n * (n + 1) // 2   # Sum of the first n natural numbers 
        actual_sum = sum(nums)            # Sum of elements in nums

        return expected_sum - actual_sum

**Time & Space Complexity**: This approach provides both time-efficient $O(n)$ and space-efficient $O(1)$.

**Approach 2: Missing Number**
* We use **set** to return unique numbers 

In [62]:
class Solution:
    def missingNumber(self, nums: list[int]) -> int:
        n = len(nums)
        full_set = set(range(n + 1))
        nums_set = set(nums)

        missing_num = full_set -  nums_set

        return missing_num.pop()

**Time & Space Complexity:** The overall time complexity is $O(n)$ because both creating the set and finding the difference take linear time with and **Space Complexity** of $O(1)$. 

- This approach is simple and leverages Python's set operations to solve the problem efficiently.

### Counting Elements

Given an array `arr`, count how many elements `x` there are, such that `x + 1` is also in `arr`. If there are duplicates in `arr`, count them separately.

**Example 1:**
```
Input: arr = [1,2,3]
Output: 2
Explanation: 1 and 2 are counted cause 2 and 3 are in arr.
```

**Example 2: **
```
Input: arr = [1,1,3,3,5,5,7,7]
Output: 0
Explanation: No numbers are counted, cause there is no 2, 4, 6, or 8 in arr.
```

In [77]:
from typing import List

class Solution:
    def countElements(self, arr: List[int]) -> int:
        count = 0
        nums_set = set(arr)  
        
        for x in arr:
            if x + 1 in nums_set:
                count += 1
                
        return count
        
    
#Test 
#arr = [1,1,3,3,5,5,7,7]
#arr = [1,2,3]
arr = [1,1,2,2]
sol = Solution().countElements(arr)
sol

2

#### Counting
**Counting is a very common pattern with hash maps.** By "counting", we are referring to tracking the frequency of things. This means our hash map will be mapping keys to integers. Anytime you need to count anything, think about using a hash map to do it.

Recall that when we were looking at sliding windows, some problems had their constraint as limiting the amount of a certain element in the window. For example, longest substring with at most `k` `0`s. In those problems, we could simply use an integer variable `curr` because we are only focused on one element (we only cared about `0`). A hash map opens the door to solving problems where the constraint involves multiple elements. Let's start by looking at a sliding window example that leverages a hash map.

**Example: 1** 

You are given a string `s` and an integer `k`. Find the length of the longest substring that contains **at most** `k` distinct characters.

For example, given `s = "eceba"` and `k = 2`, return `3`. The longest substring with at most `2` distinct characters is `"ece"`.


This problem deals with substrings and has a constraint on the substrings (at most `k` distinct characters). These characteristics let us know that we should consider sliding window. Remember, the idea of a sliding window is to add elements by sliding to the right until the window violates the constraint. Once it does, we shrink the window from the left until it no longer violates the constraint. In this problem, we are concerned with the number of distinct characters in the window. The brute force way to check for this constraint would be to check the entire window every time, which could take $O(n)$ time. Using a hash map, we can check the constraint in $O(1)$.

Let's use a hash map `counts` to keep count of the characters in the window. This means we will map letters to their frequency. The length (number of keys) in `counts` at any time is the number of distinct characters. When we remove from the left, we can decrement the frequency of the elements being removed. When the frequency becomes `0`, we know this character is no longer part of the window, and we can delete the key.

**Detailed Explanations**
In this problem, the constraint metric is "how many unique characters are in the window". The numeric restriction is `<= k`. We can use a hash map `counts` that keeps track of the frequency of each character in the window. The length of `counts` is the number of keys, which is also the constraint metric. Therefore the window is invalid when `counts.length > k`.

When we add a character `s[right]`, we increment its frequency in `counts` by one. If it doesn't exist in `counts`, we insert a new key value pair `s[right]: 1`.

When we remove a character `s[left]`, we decrement its frequency in `counts` by one. If the frequency becomes `0`, we know that this character no longer exists. Therefore we delete the key from the hash map, which also decreases the length of `counts`.

Recall that the length of a window is `right - left + 1`.

In Python, the [collection module](https://docs.python.org/3/library/collections.html) provides very useful data structures. We will be using a [defaultdict](https://docs.python.org/3/library/collections.html#collections.defaultdict) in the python code. Functionality-wise, a defaultdict is the same as a **hash map**, it's just more pleasant to work with. 

In [79]:
from collections import defaultdict

class Solution:
    def find_longest_substring(self, s: str, k: int) -> int:
        counts = defaultdict(int)
        left = ans = 0

        for right in range(len(s)):
            counts[s[right]] += 1
            while len(counts) > k:
                counts[s[left]] -= 1
                if counts[s[left]] == 0:
                    del counts[s[left]]
                left += 1
            
            ans  = max(ans, right - left + 1)

        return ans 

As you can see, using a hash map to store the frequency of any key we want allows us to solve sliding window problems that put constraints on multiple elements. We know from earlier that the time complexity of sliding window problems are $O(n)$ if the work done inside each for loop iteration is amortized constant, which is the case here due to a hash map having $O(1)$ operations. The hash map occupies $O(k)$ space, as the algorithm will delete elements from the hash map once it grows beyond k.

#### Intersection of Multiple Arrays
Given a 2D array `nums` that contains `n` arrays of distinct integers, return a sorted array containing all the numbers that appear in all `n` arrays.

For example, given `= [[3,1,2,4,5],[1,2,3,4],[3,4,5,6]]`, return `[3, 4]`. `3` and `4` are the only numbers that are in all arrays.


The problem states that each individual array contains **distinct** integers. This means that a number appears `n` times if and only if it appears in all arrays.

Let's use a hash map `counts` to count the frequency of elements. We iterate over each of the inner arrays and update `counts` with every element. After going through all the arrays, we can iterate over our hash map to see which numbers appear `n` times.

In [80]:
from collections import defaultdict
from typing import List 

class Solution:
    def intersection(self, nums: List[List[int]]) -> List[int]:
        counts = defaultdict(int)

        for arr in nums:
            for x in arr:
                counts[x] += 1
            
        n = len(nums)
        ans  = []

        for key in counts:
            if counts[key] == n:
                ans.append(key)

        return sorted(ans)

This problem is a good discussion point for why a hash map is convenient. You may be thinking, since our keys are integers, why can't we just use an array instead of a hash map? We could, but the problem is that the array needs to be at least as large as the maximum element. What if we have a test case like `[1, 2, 3, 1000]`? We need to initialize an array of size `1000`, even though only a few of the indices will actually be used. Therefore, using an array could end up being a huge waste of space. Sure, sometimes it would be more efficient because of the overhead of a hash map, but overall, a hash map is much safer. Even if `99999999999` is in the input, it doesn't matter - the hash map handles it like any other element.

**Time & Space Complexity**
Let's say that there are $n$ lists and each list has an average of $m$ elements. To populate our hash map, it costs $O(n⋅m)$ to iterate over all the elements. The next loop iterates over all unique elements that we encountered. If all elements are unique, this can cost up to $O(n⋅m)$, although this won't affect our time complexity since the previous loop also cost $O(n⋅m)$. Finally, there can be at most $m$ elements inside ans when we **perform the sort**, which means in the **worst case, the sort will cost** $O(m⋅logm)$. This gives us a time complexity of $O(n⋅m+m⋅logm)=O(m⋅(n+logm))$. If every element in the input is unique, then the hash map will grow to a size of $n⋅m$, which means the algorithm has a space complexity of $O(n⋅m)$.

#### Check  if All Characters Have Equal Number of Occurrences
Given a string `s`, determins if all characters have the same frequency. 

For example, given `s = "abacbc"`, return true. All characters appear twice. Given `s = "aaabb"`, return False. "a" appears 3 times "b" appears 2 time. `3 != 2`.

Using our knowledge of hash maps and sets, this is a straightforward problem. Use a hash map `counts` to count all character frequencies. Iterate through s and get the frequency of every character. Check if all frequencies are the same.

**NOTE:** Because a set ignores duplicates, we can put all the frequencies in a set and check if the length is 1 to verify if the frequencies are all the same.

#### Further Explanations**
Recall from the first article of the chapter that sets ignore frequency. If you add the same element to a set 100 times, the first operation will add it, then the next 99 will do nothing.

In this problem, we want to determine if there exists only one unique frequency. We can first find the frequencies by counting each character using a hash map. After counting, the values of the hash map are our frequencies.

If there is only one unique frequency, then after adding all the values to a set, the set will have a length of 1. If there are any characters with different frequencies, then the set would have a length greater than 1, as it would hold all unique frequencies.

In [81]:
from collections import defaultdict

class Solution:
    def areOccurrencesEqual(self, s: str) -> bool:
        counts = defaultdict(int)

        for c in s:
                counts[c] += 1
        
        frequencies = counts.values()
                
        return len(set(frequencies)) == 1

Given $n$ as the length of `s`, it costs $O(n)$ to populate the hash map, then $O(n)$ to convert the hash map's values to a set. This gives us a time complexity of $O(n)$. The space that the hash map and set would occupy is equal to the number of unique characters. **As previously discussed, some people would argue that this is $O(1)$ since the characters come from the English alphabet, which is bounded by a constant**. A more general answer would be to say that the space complexity is $O(k)$, where $k$ is the number of characters that could be in the input, which happens to be `26` in this problem.

**Bonus Python one liner using collection's [counter](https://docs.python.org/3/library/collections.html#collections.Counter)

In [82]:
from collections import Counter

class Solution:
    def areOccurrencesEqual(self, s: str) -> bool:
        return len(set(Counter(s).values())) == 1