# TidyData: Horizontal Conditions

When extracting data there will be columns whose value will depend conditionally on the values of other columns. In Datachef we handle this scenario with what we call `HorizontalConditions`.

## Source Data

The data source we're using for these examples is shown below:

The [full data source can be viewed here](https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/csv/bands-wide.csv).

In [1]:
from datachef import acquire, preview, CsvSelectable, Column

Column.horizontal_condition
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

0,1,2,3,4,5,6,7,8,9,10,11
,A,B,C,D,E,F,G,H,I,J,K
1.0,,,,,,,,,,,
2.0,,,Houses,Cars,Boats,,,,Houses,Cars,Boats
3.0,Beatles,,,,,,Rolling Stones,,,,
4.0,,John,1,5,9,,,Keith,2,6,10
5.0,,Paul,2,6,10,,,Mick,3,7,11
6.0,,George,2,7,11,,,Charlie,3,8,12
7.0,,Ringo,4,8,12,,,Ronnie,5,9,13
8.0,,,,,,,,,,,


## Syntax

The basic syntax for constructing a horizontal conditional is as follows:

```
Column.horizontal_condition(<name>, <callable>)
```

The `<callable>` is a python function or lambda function that operates upon a dictionary.

### The Horizontal Condition Dictionary

The dictionary in question consists of:

- keys: the names of the extracted columns
- values: the values extracted **against the same observation the horoxontal condition is operating against**.

---

Example:

Let's imagine the you run a transform that will create the following line of tidy data.

| Observation | Member | Assets | Band    |
| ------------ | ------ | ------ | ------- |
| 5            | John   | Cars   | Beatles |

If you were to add a horizonal condition to the `TidyData` constructor, then **for that specific observation** the dictionary accessible to the horizontal condition would be:

```
{
    "Member": "John",
    "Assets": "Cars",
    "Band" : "Beatles
}
```

This will be shown in context in the examples below.

## Condition: Singer In The Beatles

For our first example, we'll create a horizontal condition columns to identify which band members are singers in the beatles.

You;ve seen this example before, so mainly focus on the new `Column.horizontal_condition()` syntax.

In [2]:
from typing import Dict
from datachef import acquire, preview, CsvSelectable, filters, TidyData, Column, right, left, below

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/csv/bands-wide.csv")

def is_beatles_singer(line: Dict[str, str]):
    """
    Returns "True" as string if member is John or Paul and band
    is beatles, else "False" as string.
    """
    if line["Member"] in ["John", "Paul"] and line["Band"] == "Beatles":
        return "True"
    return "False"

observations = table.filter(filters.is_numeric).label_as("Observation")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Band")
assets = table.excel_ref('2').is_not_blank().label_as("Asset")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Member")
preview(observations, bands, assets, members)

tidy_data = TidyData(
    observations,
    Column(bands.finds_observations_closest(right)),
    Column(assets.finds_observations_directly(below)),
    Column(members.finds_observations_directly(right)),
    Column.horizontal_condition("Is a Beatles Singer", is_beatles_singer)
)
print(tidy_data)

0
Observation
Band
Asset
Member

0,1,2,3,4,5,6,7,8,9,10,11
,A,B,C,D,E,F,G,H,I,J,K
1.0,,,,,,,,,,,
2.0,,,Houses,Cars,Boats,,,,Houses,Cars,Boats
3.0,Beatles,,,,,,Rolling Stones,,,,
4.0,,John,1,5,9,,,Keith,2,6,10
5.0,,Paul,2,6,10,,,Mick,3,7,11
6.0,,George,2,7,11,,,Charlie,3,8,12
7.0,,Ringo,4,8,12,,,Ronnie,5,9,13
8.0,,,,,,,,,,,


Observation,Band,Asset,Member,Is a Beatles Singer
1,Beatles,Houses,John,True
5,Beatles,Cars,John,True
9,Beatles,Boats,John,True
2,Rolling Stones,Houses,Keith,False
6,Rolling Stones,Cars,Keith,False
10,Rolling Stones,Boats,Keith,False
2,Beatles,Houses,Paul,True
6,Beatles,Cars,Paul,True
10,Beatles,Boats,Paul,True
3,Rolling Stones,Houses,Mick,False





## A Note On Design

There is not (and never will be) any version of a "vertical condition" in datachef.

When working with datasets of unknown length we try to always support (or at least leave open the door to) the concept of data streaming, a horizontal condition meets that criteria as it works within the context of a single "tidy row" of "tidy data" so can be iterated, streamed or potentially even distributed for processing relatively easily - none of which works if row 5 is informed by row 4 etc.

If you find yourself needing to lookup vertically into your extracted columns:

- (a) Stop and think about it, is there another way to skin this cat?
- (b) Are you really trying to create tidy data or are you trying to create something else?
- (c) If you need to - then just shunt it into pandas for post processing, you're working on a data series and that's not a paradime datachef was built to support.