# ItemList


[TOC]


Let's first create an `ItemList` instance, which we use in the rest of this section.

In [1]:
from tohu.item_list_lazy_NEW import LazyItemListNEW
from tohu.tohu_items_class import make_tohu_items_class

In [2]:
Quux = make_tohu_items_class("Quux", ["aa", "bb", "cc", "dd"])

In [3]:
items = [
    Quux(30, '672EF2', 'Johnny', False),
    Quux(32, '250204', 'David', True),
    Quux(55, '679DAE', 'Angela', False),
    Quux(43, '91554C', 'Pamela', True),
    Quux(56, '8EA713', 'Blake', True),
]

In [4]:
num_items = len(items)
field_names = ["aa", "bb", "cc", "dd"]

def f_get_item_tuple_iterator():
    return items

In [5]:
item_list = LazyItemListNEW(f_get_item_tuple_iterator, num_items=num_items, field_names=field_names, tohu_items_class_name="Quux")
item_list

<LazyItemListNEW containing 5 items>

An `ItemList` can be iterated over and converted to a regular list.

In [6]:
list(item_list)

[Quux(aa=30, bb='672EF2', cc='Johnny', dd=False),
 Quux(aa=32, bb='250204', cc='David', dd=True),
 Quux(aa=55, bb='679DAE', cc='Angela', dd=False),
 Quux(aa=43, bb='91554C', cc='Pamela', dd=True),
 Quux(aa=56, bb='8EA713', cc='Blake', dd=True)]

## `.compute()` and caching of item list contents

Contrary to a regular Python list, however, `ItemList` by default behaves in a lazy way. That is, it only calculates items when it is actually necessary - for example, when they need to be exported to a CSV file or to a pandas dataframe. (**TODO:** explain why this is useful for lists with many items.)

However, during prototyping and interactive working with `ItemLists`, it can be useful to calculate the items and keep them in memory for faster access. In this small example we won't see any speedups but it can be useful for larger lists.

In [7]:
item_list.is_cached

False

In [8]:
item_list.compute()

<LazyItemListNEW containing 5 items>

In [9]:
item_list.is_cached

True

## Exporting an `ItemList` to various formats

### `to_df()`

When calling `.to_df()` without any arguments, all fields from all items will be exported.

In [10]:
item_list.to_df()

Unnamed: 0,aa,bb,cc,dd
0,30,672EF2,Johnny,False
1,32,250204,David,True
2,55,679DAE,Angela,False
3,43,91554C,Pamela,True
4,56,8EA713,Blake,True


### `head()`

There is also a `.head()` method, which is analogous to the one for pandas dataframes.

In [11]:
item_list.head(3)

Unnamed: 0,aa,bb,cc,dd
0,30,672EF2,Johnny,False
1,32,250204,David,True
2,55,679DAE,Angela,False


### `.to_csv()`

In [12]:
print(item_list.to_csv())

aa,bb,cc,dd
30,672EF2,Johnny,False
32,250204,David,True
55,679DAE,Angela,False
43,91554C,Pamela,True
56,8EA713,Blake,True



In [13]:
output_filename = "./example_output.csv"

In [14]:
!rm -f $output_filename

In [15]:
item_list.to_csv(filename=output_filename, fields=["cc", "aa", "bb"], column_names=["Name", "Age", "ID"])

In [16]:
with open(output_filename, "r") as f:
    print(f.read())

Name,Age,ID
Johnny,30,672EF2
David,32,250204
Angela,55,679DAE
Pamela,43,91554C
Blake,56,8EA713



## Specifying which columns to export, and in which order

We can export only a subset of columns, and/or rearrange their order, by specifying a list of field names. In this example we export only the three columns `cc`, `aa`, `dd` (in this order) instead of the full set of columns `aa`, `bb`, `cc`, `dd`.

In [17]:
item_list.to_df(fields=["cc", "aa", "dd"])

Unnamed: 0,cc,aa,dd
0,Johnny,30,False
1,David,32,True
2,Angela,55,False
3,Pamela,43,True
4,Blake,56,True


## Custom field names

If we want to rename the columns we can pass the `column_names` argument, for example:

In [18]:
item_list.to_df(fields=["cc", "aa"], column_names=["First Name", "Age"])

Unnamed: 0,First Name,Age
0,Johnny,30
1,David,32
2,Angela,55
3,Pamela,43
4,Blake,56


In [19]:
#item_list.to_df(fields=["cc", "aa"], column_names=["First Name", "Age", "ID"])

## Accessing nested field values

Let's create a second `ItemList` instance which contains nested items.

In [20]:
from tohu.tohu_items_class import make_tohu_items_class

In [21]:
Foo = make_tohu_items_class("Foo", field_names=["xx", "yy"])
Bar = make_tohu_items_class("Bar", field_names=["rr", "ss"])

In [22]:
items_2 = [
    Quux(aa=30, bb=Foo(xx='672EF2', yy=Bar(rr=153, ss="Engineer")), cc='Johnny', dd=False),
    Quux(aa=32, bb=Foo(xx='250204', yy=Bar(rr=193, ss="Therapist")), cc='David', dd=True),
    Quux(aa=55, bb=Foo(xx='679DAE', yy=Bar(rr=101, ss="Author")), cc='Angela', dd=False),
    Quux(aa=43, bb=Foo(xx='91554C', yy=Bar(rr=138, ss="Scientist")), cc='Pamela', dd=True),
    Quux(aa=56, bb=Foo(xx='8EA713', yy=Bar(rr=147, ss="Consultant")), cc='Blake', dd=True),
]

#item_list_2 = ItemList(items_2, Quux)

In [23]:
# item_tuples_2 = [
#     (30, Foo(xx='672EF2', yy=Bar(rr=153, ss="Engineer")), 'Johnny', False),
#     (32, Foo(xx='250204', yy=Bar(rr=193, ss="Therapist")), 'David', True),
#     (55, Foo(xx='679DAE', yy=Bar(rr=101, ss="Author")), 'Angela', False),
#     (43, Foo(xx='91554C', yy=Bar(rr=138, ss="Scientist")), 'Pamela', True),
#     (56, Foo(xx='8EA713', yy=Bar(rr=147, ss="Consultant")), 'Blake', True),
# ]

num_items = len(items_2)
field_names = ["aa", "bb", "cc", "dd"]

def f_get_item_tuple_iterator():
    return items_2

item_list_2 = LazyItemListNEW(f_get_item_tuple_iterator, num_items=num_items, field_names=field_names, tohu_items_class_name="Quux")
item_list_2

<LazyItemListNEW containing 5 items>

If we simply call `to_df()` without specifying the `fields` argument, the cells in column `bb` will contain the full nested values of the items of type `Foo`.

In [24]:
item_list_2.to_df()

Unnamed: 0,aa,bb,cc,dd
0,30,"Foo(xx='672EF2', yy=Bar(rr=153, ss='Engineer'))",Johnny,False
1,32,"Foo(xx='250204', yy=Bar(rr=193, ss='Therapist'))",David,True
2,55,"Foo(xx='679DAE', yy=Bar(rr=101, ss='Author'))",Angela,False
3,43,"Foo(xx='91554C', yy=Bar(rr=138, ss='Scientist'))",Pamela,True
4,56,"Foo(xx='8EA713', yy=Bar(rr=147, ss='Consultant'))",Blake,True


If we are only interested in some of the individual values in the nested items `Foo` or `Bar`, it is possible to "reach into" them and extract those values by using the usual `.` notation for accessing attributes.

In [25]:
fields = ["cc", "bb.xx", "bb.yy.ss"]
column_names = ["Name", "ID", "Job description"]

item_list_2.to_df(fields=fields, column_names=column_names)

Unnamed: 0,Name,ID,Job description
0,Johnny,672EF2,Engineer
1,David,250204,Therapist
2,Angela,679DAE,Author
3,Pamela,91554C,Scientist
4,Blake,8EA713,Consultant


Let's verify that an error is raised if an attribute name specified in the `fields` argument doesn't exist at any level.

In [26]:
import pytest
from tohu.field_selector import InvalidFieldError

with pytest.raises(InvalidFieldError, match="Invalid fields: \['bbb'\]. Fields must be a subset of: \['aa', 'bb', 'cc', 'dd'\]"):
    item_list_2.to_df(fields=["bbb.yy.ss"])

with pytest.raises(AttributeError, match="Foo' object has no attribute 'yyy'"):
    item_list_2.to_df(fields=["bb.yyy.ss"])

with pytest.raises(AttributeError, match="Bar' object has no attribute 'sss'"):
    item_list_2.to_df(fields=["bb.yy.sss"])