# `FieldSelector`

The `FieldSelector` class is used internally to extract, re-order, and/or rename fields in the item stream produced by a custom generator when items are exported (e.g. to a CSV file or a SQL table).

In [1]:
from tohu.tohu_items_class import make_tohu_items_class
from tohu.field_selector import FieldSelectorNEW3b

To illustrate how the `FieldSelector` class works let's manually define a list of tohu items to which we can apply different field selectors below.

In [2]:
Quux = make_tohu_items_class("Quux", ["aa", "bb", "cc"])

items = [
    Quux(111, "foo", "AAA"),
    Quux(222, "bar", "BBB"),
    Quux(333, "baz", "CCC"),
    Quux(444, "quux", "DDD"),
]

items

[Quux(aa=111, bb='foo', cc='AAA'),
 Quux(aa=222, bb='bar', cc='BBB'),
 Quux(aa=333, bb='baz', cc='CCC'),
 Quux(aa=444, bb='quux', cc='DDD')]

## No explicit field names: passing fields straight through without altering them

If the `fields` argument is not specified, the `FieldSelector` acts like the identity function, i.e., it passes items straight through without altering them. While this may not seem very useful, this case is important for consistency to avoid special case distinctions in the code when the user simply wants to export all fields of a custom generator without modification.

~Note that the output values are ordinary dictionaries instead of tohu items. This is because tohu items are internally constructed as [attrs](https://www.attrs.org/en/stable/) classes but we want to allow renaming fields and the output names may not be valid attribute names (e.g. if they contain whitespace such as the name `"Column X"`).~

In [3]:
fs = FieldSelectorNEW3b(Quux, fields_to_extract=None, new_field_names=None)

list(fs(items))

[Quux(aa=111, bb='foo', cc='AAA'),
 Quux(aa=222, bb='bar', cc='BBB'),
 Quux(aa=333, bb='baz', cc='CCC'),
 Quux(aa=444, bb='quux', cc='DDD')]

## Passing a list of field names: extracting fields and/or changing their order

If the `fields` argument is a list of field names, those fi

In [4]:
fs = FieldSelectorNEW3b(Quux, fields_to_extract=["cc", "aa"], new_field_names=None)

In [5]:
list(fs(items))

[Quux(cc='AAA', aa=111),
 Quux(cc='BBB', aa=222),
 Quux(cc='CCC', aa=333),
 Quux(cc='DDD', aa=444)]

## Setting different names for the output fields

In [6]:
fs = FieldSelectorNEW3b(Quux, fields_to_extract=["bb", "aa"], new_field_names=["col_1", "col_2"])

In [7]:
list(fs(items))

[Quux(col_1='foo', col_2=111),
 Quux(col_1='bar', col_2=222),
 Quux(col_1='baz', col_2=333),
 Quux(col_1='quux', col_2=444)]

## *TODO:* Error for non-existing fields

In [8]:
import pytest
from tohu.field_selector import InvalidFieldError

In [9]:
with pytest.raises(InvalidFieldError, match="Invalid fields: \['xx', 'yy'\]. Fields must be a subset of: \['aa', 'bb', 'cc'\]"):
    fs = FieldSelectorNEW3b(Quux, fields_to_extract=["aa", "xx", "bb", "yy"], new_field_names=None)

## Extracting nested fields

In [10]:
Quux = make_tohu_items_class("Quux", ["aa", "bb", "cc"])
Foobar = make_tohu_items_class("Foobar", ["xx", "yy"])
Barbaz= make_tohu_items_class("Barbaz", ["rr", "ss"])

items = [
    Quux(111, "foo", Foobar(Barbaz("AAA", True), 10)),
    Quux(222, "bar", Foobar(Barbaz("BBB", False), 20)),
    Quux(333, "baz", Foobar(Barbaz("CCC", True), 30)),
    Quux(444, "quux", Foobar(Barbaz("DDD", True), 40)),
]

items

[Quux(aa=111, bb='foo', cc=Foobar(xx=Barbaz(rr='AAA', ss=True), yy=10)),
 Quux(aa=222, bb='bar', cc=Foobar(xx=Barbaz(rr='BBB', ss=False), yy=20)),
 Quux(aa=333, bb='baz', cc=Foobar(xx=Barbaz(rr='CCC', ss=True), yy=30)),
 Quux(aa=444, bb='quux', cc=Foobar(xx=Barbaz(rr='DDD', ss=True), yy=40))]

*TODO:* Ensure we can deal with nested fields; also ensure we raised an error if nested fields have the wrong names!

In [11]:
#fs = FieldSelector({"Column X": "cc.xx", "Column Y": "cc.yy", "Column Z": "aa"})

In [12]:
fs = FieldSelectorNEW3b(Quux, fields_to_extract=["cc.xx.rr", "cc.yy", "aa"], new_field_names=["column_1", "column_2", "column_3"])

In [13]:
list(fs(items))

[Quux(column_1='AAA', column_2=10, column_3=111),
 Quux(column_1='BBB', column_2=20, column_3=222),
 Quux(column_1='CCC', column_2=30, column_3=333),
 Quux(column_1='DDD', column_2=40, column_3=444)]