# Example: Fun with Iteration Patterns for Arrays, Sets, and Dictionaries
Iteration is the repeated execution of a block of code, typically used to process elements in a collection or to perform repeated actions until a condition is met. 
> __Iteration__ is an _extremely common_ fundamental concept used in nearly all applications. It appears in tasks ranging from simple loops over arrays to complex algorithms in data processing, simulation, and machine learning, making it an essential tool in nearly every programming language and domain.

In this example, we'll explore the two iteration patterns that you'll likely encounter in your everyday life:
* __For loops__: A for loop is a control structure that repeatedly executes a code block for each element in a specified sequence or range. It is commonly used to iterate over collections like arrays or lists, or to perform a fixed number of repetitions.
* __While loops__: A while loop is a control structure that repeatedly executes a code block if a specified condition remains true. It is typically used when the number of iterations is not known in advance and depends on dynamic conditions during execution.

I'm ready to get going, so let's go!
___

## Setup, Data, and Prerequisites
We set up the computational environment by including the `Include.jl` file and loading any needed resources.

> __Include__: The [include command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. 

Let's set up the computational environment.

In [3]:
include(joinpath(@__DIR__, "Include.jl"));

For additional information on Julia functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). In addition, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). Check out [the documentation](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/) for more information on its functions, types, and data. 
___

## Example 1: Basic for-loop iteration patterns for Ordered Collections
Let's start by looking at the basic structure of [a for-loop.](https://docs.julialang.org/en/v1/base/base/#for) A [for-loop](https://docs.julialang.org/en/v1/base/base/#for) has a header and a body. 
> __Loop header:__ The __header__ in a for-loop specifies how many times the loop will iterate. The loop `index` is passed into the loop's _body_, where you put your logic. The `index` is always a new variable, even if a variable of the same name exists in the enclosing scope.

In Julia, the [for-loop](https://docs.julialang.org/en/v1/base/base/#for) has its _local scope_ that captures variables from the outside but doesn't pass new variables created inside the loop to the outside unless they already exist. The [local scope of the for loop](https://docs.julialang.org/en/v1/manual/variables-and-scoping/#local-scope) ends with the `end` keyword.

> __Scope:__ In programming, __scope__ refers to the region of code where a variable or identifier is defined and accessible. It determines the visibility and lifetime of variables, helping to manage the namespace and prevent naming conflicts.

Let's do a simple example where we create an `Array{Float64,1}` holding random values, iterate over this array using a for loop, and print using the [println function](https://docs.julialang.org/en/v1/base/io-network/#Base.println) each random value.

In [49]:
let
    
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    value = nothing # declare a variable outside the loop.
    
    for i ∈ 1:number_of_elements
        value = random_vector_array[i];
        println("The index i = $(i) and the value = $(random_vector_array[i])");
    end # end of for loop scope
    value # what gets printed?
end

The index i = 1 and the value = 0.5828712310310923
The index i = 2 and the value = 0.2911991113140032
The index i = 3 and the value = 0.5287458733437066
The index i = 4 and the value = 0.2340000988376163
The index i = 5 and the value = 0.8092032574708081


LoadError: UndefVarError: `value` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

### Eachindex
Another [for-loop](https://docs.julialang.org/en/v1/base/base/#for) pattern is the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex). We use the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex) when we don't explicitly know how many elements we have in an ordered collection, but we want all of the elements in order.

In [8]:
let
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    
    for i ∈ eachindex(random_vector_array)
        value = random_vector_array[i];
        println("The index i = $(i) and the value = $(value)"); # body
    end
end

The index i = 1 and the value = 0.8594647376719959
The index i = 2 and the value = 0.9777251074330687
The index i = 3 and the value = 0.6896687175037127
The index i = 4 and the value = 0.5834011671739497
The index i = 5 and the value = 0.9470612356223896


### Direct iteration
Suppose we don't care about the index $i$, but instead want only the values; we can iterate over the elements of a collection directly. For example, the code block below accesses the values of the `random_vector_array`  directly, but __NOT__ their indexes:

In [10]:
let
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    
    for value ∈ random_vector_array # for value in collection
        println("Only the value = $(value)"); # body
    end
end

Only the value = 0.46862760782059076
Only the value = 0.8573731880806637
Only the value = 0.605797824970177
Only the value = 0.07453727821876133
Only the value = 0.26425025896125054


### Iterate over other things besides an array?
Sure! We can iterate over other ordered (and unordered) collections, not just arrays! For example, we can iterate over [tuples](https://docs.julialang.org/en/v1/manual/functions/#Tuples), fixed-length _immutable_ ordered containers that can hold any values. 

Let's build a tuple holding some `Int64` types and iterate over these using the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex).

The previous examples used the [println function](https://docs.julialang.org/en/v1/base/io-network/#Base.println) to print output [to `stdout`](https://docs.julialang.org/en/v1/base/io-network/#Base.stdout); here, we show another approach that does the same thing, namely using the [@show macro](https://docs.julialang.org/en/v1/base/base/#Base.@show) which prints one or more expressions, and their results, [to `stdout`](https://docs.julialang.org/en/v1/base/io-network/#Base.stdout):

In [12]:
let
    example_tuple = (1,2,3,6,5,4,1,1,1,1)
    
    for i ∈ eachindex(example_tuple) # header: for index in index collection
        @show (i, example_tuple[i]) # body
    end
end

(i, example_tuple[i]) = (1, 1)
(i, example_tuple[i]) = (2, 2)
(i, example_tuple[i]) = (3, 3)
(i, example_tuple[i]) = (4, 6)
(i, example_tuple[i]) = (5, 5)
(i, example_tuple[i]) = (6, 4)
(i, example_tuple[i]) = (7, 1)
(i, example_tuple[i]) = (8, 1)
(i, example_tuple[i]) = (9, 1)
(i, example_tuple[i]) = (10, 1)


___

Let's build a `Set{Char}` of values, and then directly iterate the set (where we show the value at each iteration using [the `@show` macro](https://docs.julialang.org/en/v1/base/base/#Base.@show)

In [15]:
let
    example_set = Set{Char}(['A','B','C','D','R','U','S','T','A']);
    for value ∈ example_set
        @show value
    end
end

value = 'C'
value = 'U'
value = 'D'
value = 'A'
value = 'R'
value = 'S'
value = 'T'
value = 'B'


### Dictionary
[A Dictionary is an associative container](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) that stores key–value pairs, allowing lookup, insertion, and deletion of values based on their unique keys. While significantly different than an array, we can still traverse a dictionary [using a for-loop](https://docs.julialang.org/en/v1/base/base/#for).

We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2014` until `12-31-2024`, along with data for a few exchange-traded funds and volatility products during that time. 

Let's load the `orignal_dataset` by calling the `MyMarketDataSet()` function and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

When we use a [for-loop](https://docs.julialang.org/en/v1/base/base/#for) with a [Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries), which is an unordered collection of `key=>value` pairs, we get both the `key` and `value` as the iteration variable organized as a `tuple.`

In [17]:
original_dataset = MyMarketDataSet() |> x-> x["dataset"];

LoadError: UndefVarError: `MyMarketDataSet` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [18]:
original_dataset

LoadError: UndefVarError: `original_dataset` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [19]:
original_dataset["AAPL"] # Hmmm. we access the elements of dictionary like an array?

LoadError: UndefVarError: `original_dataset` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

#### Clean the data
Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has a maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days::Int64` variable:

In [21]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow; # nrow? (check out: DataFrames.jl)

LoadError: UndefVarError: `original_dataset` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [23]:
dataset = let

    # initialize -
    dataset = Dict{String, DataFrame}();

    # iterate through the dictionary; we can't guarantee a particular order
    for (ticker, data) ∈ original_dataset  # we get each (K, V) pair!
        if (nrow(data) == maximum_number_trading_days) # what is this doing?
            dataset[ticker] = data;
        else
            println("Ticker $(ticker) is not full data")
        end
    end
    dataset; # return
end;

LoadError: UndefVarError: `original_dataset` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

How many firms do we have the full number of trading days? Let's use [the `length(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.length) - notice this works for dictionaries, in addition to arrays and sets!

In [25]:
length(dataset) # tells us how many keys are in the dictionary

LoadError: UndefVarError: `dataset` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

Let's load the `original_dataset` by calling the `MyMarketDataSet()` function and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

## Example 3: While loop iteration Pattern
A for loop is used when the number of iterations is known. In contrast, a while loop is used when iterations depend on a condition evaluated before each loop, making it better for dynamic repetition. A while loop also has a header and a body, but they differ from a for loop:
> __While loop__: In a while loop, the header contains the loop condition evaluated before each iteration to determine whether the loop should continue. The body is the block of code that executes repeatedly as long as the condition in the header remains true.

Let's do a common while loop pattern, where we iterate all the items in a collection until we reach the end of the collection. In this case, we will use a `Set{String}` and iterate over it using a while loop.

Let's start by building a set of possible student firstnames using [the `MyCommonForenameDataset()`](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.MyCommonForenameDataset) function from [the `VLDataScienceMachineLearningPackage.jl` package.](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). We'll store the forenames in `set_of_firstnames::Set{String}` variable.

In [28]:
set_of_firstnames = let 
    
    # initialize -
    set_of_firstnames = Set{String}();
    fornames_df = MyCommonForenameDataset(); # firstnames dataset from the VLDataScienceMachineLearningPackage.jl package

    # build fornames set - 
    for name in eachrow(fornames_df)
        push!(set_of_firstnames, name["Romanized Name"]);
    end

    set_of_firstnames # return -
end;

Next, let's iterate over the `set_of_firstnames` using a while loop. We'll use the [pop!](https://docs.julialang.org/en/v1/base/collections/#Base.pop!) function to remove and return an arbitrary element from the set. We'll continue to iterate until the set is empty, or we hit the maximum number of iterations, which we set to `10` in this example.

In [30]:
let

    # initialize -
    should_stop_while_loop = false;
    max_names_to_pop = 10; # how many names do we want to pop?
    localset = copy(set_of_firstnames); # make a copy of the set

    while should_stop_while_loop == false
        first_name_value = pop!(localset); # remove and return an element from the set

        println("Popped element = $(first_name_value)");

        # check iteration condition -
        if (isempty(localset) == true) || (max_names_to_pop == 1) # what is an if statement?
            should_stop_while_loop = true;
        else
            max_names_to_pop -= 1; # decrement the counter
        end
    end
end

Popped element = Gustavs
Popped element = Anohito
Popped element = Marks
Popped element = Djamel
Popped element = Aleks
Popped element = Aleksandre
Popped element = David
Popped element = Ayşa
Popped element = Ya-ting
Popped element = Amina


__Ok, that was interesting, but when do we want to use a while loop?__ 

* A while loop is useful when the number of iterations is not known in advance and depends on dynamic conditions during execution. It allows for more flexible control over the loop's termination based on changing conditions, making it suitable for scenarios where the loop's exit criteria are not predetermined.
* While loops are often used for tasks like reading data until the end of a file, waiting for a specific condition to be met, or processing items in a collection until all items are processed. They are also useful for implementing algorithms that require repeated execution until a certain condition is satisfied, such as searching, sorting, or iterating through data structures.

There were also a few other things in our while loop implementation, such as the [if else statement](https://docs.julialang.org/en/v1/base/control-flow/#if-elseif-else).  This is an example of [conditional evaluation](https://docs.julialang.org/en/v1/manual/control-flow/#man-conditional-evaluation) that allows us to execute different blocks of code based on whether a condition is true or false. 

___