# LoopedItemList


[TOC]


Let's first create a `LoopedItemList` instance, which we use in the rest of this section.

In [1]:
from tohu.looped_item_list import LoopedItemList
from tohu.tohu_items_class import make_tohu_items_class

In [2]:
Quux = make_tohu_items_class("Quux", ["xx", "aa", "bb", "cc"])

In [3]:
items = [
    Quux(1, '672EF2', 'Johnny', False),
    Quux(1, '250204', 'David', True),
    Quux(2, '679DAE', 'Angela', False),
    Quux(3, '91554C', 'Pamela', True),
    Quux(3, '8EA713', 'Blake', True),
]

In [4]:
num_items = len(items)
field_names = ["xx", "aa", "bb", "cc"]

def f_get_item_tuple_iterators(var_names=None):
    if var_names is None or var_names == []:
        yield from [({}, items)]
    elif var_names == ["xx"]:
        yield from (
            ({"xx": 1}, items[0:2]),
            ({"xx": 2}, items[2:3]),
            ({"xx": 3}, items[3:]),
#             (tuple([1]), items[0:2]),
#             (tuple([2]), items[2:3]),
#             (tuple([3]), items[3:]),
        )
    else:
        raise ValueError("Invalid value for argument `var_names`.")

In [5]:
looped_item_list = LoopedItemList(f_get_item_tuple_iterators, field_names=field_names, tohu_items_class_name="Quux")
looped_item_list

<LoopedItemList>

A `LoopedItemList` can be iterated over and converted to a regular list.

In [6]:
list(looped_item_list)

[Quux(xx=1, aa='672EF2', bb='Johnny', cc=False),
 Quux(xx=1, aa='250204', bb='David', cc=True),
 Quux(xx=2, aa='679DAE', bb='Angela', cc=False),
 Quux(xx=3, aa='91554C', bb='Pamela', cc=True),
 Quux(xx=3, aa='8EA713', bb='Blake', cc=True)]

## ~`.compute()` and caching of item list contents~

Contrary to a regular Python list, however, `ItemList` by default behaves in a lazy way. That is, it only calculates items when it is actually necessary - for example, when they need to be exported to a CSV file or to a pandas dataframe. (**TODO:** explain why this is useful for lists with many items.)

However, during prototyping and interactive working with `ItemLists`, it can be useful to calculate the items and keep them in memory for faster access. In this small example we won't see any speedups but it can be useful for larger lists.

In [7]:
looped_item_list.is_cached

False

In [8]:
looped_item_list.compute()

<LoopedItemList>

In [9]:
looped_item_list.is_cached

True

In [10]:
list(looped_item_list.iter_item_tuples())

[Quux(xx=1, aa='672EF2', bb='Johnny', cc=False),
 Quux(xx=1, aa='250204', bb='David', cc=True),
 Quux(xx=2, aa='679DAE', bb='Angela', cc=False),
 Quux(xx=3, aa='91554C', bb='Pamela', cc=True),
 Quux(xx=3, aa='8EA713', bb='Blake', cc=True)]

## Exporting a `LoopedItemList` to various formats

### `to_df()`

When calling `.to_df()` without any arguments, all fields from all items will be exported.

In [11]:
looped_item_list.to_df()

Unnamed: 0,xx,aa,bb,cc
0,1,672EF2,Johnny,False
1,1,250204,David,True
2,2,679DAE,Angela,False
3,3,91554C,Pamela,True
4,3,8EA713,Blake,True


By passing the `group_by` argument we can split the output into multiple dataframes, based on the values of the specified loop variable(s).

In [12]:
looped_item_list.to_df(group_by=["xx"])

[({'xx': 1},
     xx      aa      bb     cc
  0   1  672EF2  Johnny  False
  1   1  250204   David   True),
 ({'xx': 2},
     xx      aa      bb     cc
  0   2  679DAE  Angela  False),
 ({'xx': 3},
     xx      aa      bb    cc
  0   3  91554C  Pamela  True
  1   3  8EA713   Blake  True)]

If we're only grouping by a single variable we can also pass it directly as a string (as opposed to a list with a single element).

In [13]:
looped_item_list.to_df(group_by="xx")

[({'xx': 1},
     xx      aa      bb     cc
  0   1  672EF2  Johnny  False
  1   1  250204   David   True),
 ({'xx': 2},
     xx      aa      bb     cc
  0   2  679DAE  Angela  False),
 ({'xx': 3},
     xx      aa      bb    cc
  0   3  91554C  Pamela  True
  1   3  8EA713   Blake  True)]

### `head()`

There is also a `.head()` method, which is analogous to the one for pandas dataframes.

In [14]:
looped_item_list.head(3)

Unnamed: 0,xx,aa,bb,cc
0,1,672EF2,Johnny,False
1,1,250204,David,True
2,2,679DAE,Angela,False


### `.to_csv()`

In [15]:
print(looped_item_list.to_csv())

xx,aa,bb,cc
1,672EF2,Johnny,False
1,250204,David,True
2,679DAE,Angela,False
3,91554C,Pamela,True
3,8EA713,Blake,True



In [16]:
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(suffix=".csv", mode="w", ) as f:
    output_filename = f.name
    #print(f"Writing CSV output to temporary file: {output_filename!r}", end="\n\n")

    looped_item_list.to_csv(filename=output_filename, fields=["cc", "aa", "bb"], column_names=["Name", "Age", "ID"])

    with open(output_filename, "r") as f2:
        print(f2.read())

Name,Age,ID
False,672EF2,Johnny
True,250204,David
False,679DAE,Angela
True,91554C,Pamela
True,8EA713,Blake



In [17]:
import os
from glob import glob
from tempfile import TemporaryDirectory

with TemporaryDirectory() as tmpdir:
    output_filename = os.path.join(tmpdir, "output_{xx}.csv")
    
    looped_item_list.to_csv(filename=output_filename, fields=["xx", "cc", "aa", "bb"], column_names=["LoopVar", "Name", "Age", "ID"])

    for filename in glob(os.path.join(tmpdir, "*.csv")):
        print(f"=== {os.path.basename(filename)} ===")
        with open(filename, "r") as f:
            print(f.read())

=== output_1.csv ===
LoopVar,Name,Age,ID
1,False,672EF2,Johnny
1,True,250204,David

=== output_2.csv ===
LoopVar,Name,Age,ID
2,False,679DAE,Angela

=== output_3.csv ===
LoopVar,Name,Age,ID
3,True,91554C,Pamela
3,True,8EA713,Blake



## Specifying which columns to export, and in which order

We can export only a subset of columns, and/or rearrange their order, by specifying a list of field names. In this example we export only the three columns `cc`, `aa`, `bb` (in this order) instead of the full set of columns `xx`, `aa`, `bb`, `cc`.

In [18]:
looped_item_list.to_df(fields=["cc", "aa", "bb"])

Unnamed: 0,cc,aa,bb
0,False,672EF2,Johnny
1,True,250204,David
2,False,679DAE,Angela
3,True,91554C,Pamela
4,True,8EA713,Blake


## Custom field names

If we want to rename the columns we can pass the `column_names` argument, for example:

In [19]:
looped_item_list.to_df(fields=["bb", "cc"], column_names=["First Name", "Likes Chocolate"])

Unnamed: 0,First Name,Likes Chocolate
0,Johnny,False
1,David,True
2,Angela,False
3,Pamela,True
4,Blake,True


In [20]:
#item_list.to_df(fields=["cc", "aa"], column_names=["First Name", "Age", "ID"])

## Accessing nested field values

Let's create a second `ItemList` instance which contains nested items.

In [21]:
from tohu.tohu_items_class import make_tohu_items_class

In [22]:
Quux = make_tohu_items_class("Quux", field_names=["aa", "bb", "cc", "dd"])
Foo = make_tohu_items_class("Foo", field_names=["xx", "yy"])
Bar = make_tohu_items_class("Bar", field_names=["rr", "ss"])

In [23]:
items_2 = [
    Quux(aa=30, bb=Foo(xx='672EF2', yy=Bar(rr=153, ss="Engineer")), cc='Johnny', dd=False),
    Quux(aa=32, bb=Foo(xx='250204', yy=Bar(rr=193, ss="Therapist")), cc='David', dd=True),
    Quux(aa=55, bb=Foo(xx='679DAE', yy=Bar(rr=101, ss="Author")), cc='Angela', dd=False),
    Quux(aa=43, bb=Foo(xx='91554C', yy=Bar(rr=138, ss="Scientist")), cc='Pamela', dd=True),
    Quux(aa=56, bb=Foo(xx='8EA713', yy=Bar(rr=147, ss="Consultant")), cc='Blake', dd=True),
]

#item_list_2 = ItemList(items_2, Quux)

In [24]:
# item_tuples_2 = [
#     (30, Foo(xx='672EF2', yy=Bar(rr=153, ss="Engineer")), 'Johnny', False),
#     (32, Foo(xx='250204', yy=Bar(rr=193, ss="Therapist")), 'David', True),
#     (55, Foo(xx='679DAE', yy=Bar(rr=101, ss="Author")), 'Angela', False),
#     (43, Foo(xx='91554C', yy=Bar(rr=138, ss="Scientist")), 'Pamela', True),
#     (56, Foo(xx='8EA713', yy=Bar(rr=147, ss="Consultant")), 'Blake', True),
# ]

num_items = len(items_2)
field_names = ["aa", "bb", "cc", "dd"]

def f_get_item_tuple_iterators(var_names=None):
    if var_names is None or var_names == []:
        yield from [({}, items_2)]
    elif var_names == ["zz"]:
        yield from (
            ({"zz": 1}, items_2[0:2]),
            ({"zz": 2}, items_2[2:3]),
            ({"zz": 3}, items_2[3:]),
#             (tuple([1]), items[0:2]),
#             (tuple([2]), items[2:3]),
#             (tuple([3]), items[3:]),
        )
    else:
        raise ValueError("Invalid value for argument `var_names`.")


looped_item_list_2 = LoopedItemList(
    f_get_item_tuple_iterators,
    field_names=field_names,
    tohu_items_class_name="Quux"
)
looped_item_list_2

<LoopedItemList>

In [25]:
list(looped_item_list_2)

[Quux(aa=30, bb=Foo(xx='672EF2', yy=Bar(rr=153, ss='Engineer')), cc='Johnny', dd=False),
 Quux(aa=32, bb=Foo(xx='250204', yy=Bar(rr=193, ss='Therapist')), cc='David', dd=True),
 Quux(aa=55, bb=Foo(xx='679DAE', yy=Bar(rr=101, ss='Author')), cc='Angela', dd=False),
 Quux(aa=43, bb=Foo(xx='91554C', yy=Bar(rr=138, ss='Scientist')), cc='Pamela', dd=True),
 Quux(aa=56, bb=Foo(xx='8EA713', yy=Bar(rr=147, ss='Consultant')), cc='Blake', dd=True)]

If we simply call `to_df()` without specifying the `fields` argument, the cells in column `bb` will contain the full nested values of the items of type `Foo`.

In [26]:
looped_item_list_2.to_df()

Unnamed: 0,aa,bb,cc,dd
0,30,"Foo(xx='672EF2', yy=Bar(rr=153, ss='Engineer'))",Johnny,False
1,32,"Foo(xx='250204', yy=Bar(rr=193, ss='Therapist'))",David,True
2,55,"Foo(xx='679DAE', yy=Bar(rr=101, ss='Author'))",Angela,False
3,43,"Foo(xx='91554C', yy=Bar(rr=138, ss='Scientist'))",Pamela,True
4,56,"Foo(xx='8EA713', yy=Bar(rr=147, ss='Consultant'))",Blake,True


If we are only interested in some of the individual values in the nested items `Foo` or `Bar`, it is possible to "reach into" them and extract those values by using the usual `.` notation for accessing attributes.

In [27]:
fields = ["cc", "bb.xx", "bb.yy.ss"]
column_names = ["Name", "ID", "Job description"]

looped_item_list_2.to_df(fields=fields, column_names=column_names)

Unnamed: 0,Name,ID,Job description
0,Johnny,672EF2,Engineer
1,David,250204,Therapist
2,Angela,679DAE,Author
3,Pamela,91554C,Scientist
4,Blake,8EA713,Consultant


Let's verify that an error is raised if an attribute name specified in the `fields` argument doesn't exist at any level.

In [28]:
import pytest
from tohu.field_selector import InvalidFieldError

with pytest.raises(InvalidFieldError, match="Invalid fields: \['bbb'\]. Fields must be a subset of: \['aa', 'bb', 'cc', 'dd'\]"):
    looped_item_list_2.to_df(fields=["bbb.yy.ss"])

with pytest.raises(AttributeError, match="Foo' object has no attribute 'yyy'"):
    looped_item_list_2.to_df(fields=["bb.yyy.ss"])

with pytest.raises(AttributeError, match="Bar' object has no attribute 'sss'"):
    looped_item_list_2.to_df(fields=["bb.yy.sss"])