# Students

This example uses some fictional student data to showcase how to cell formatting is often used to represent hierarcical relationships in tablated data sources.

_Note - the dataused here is fictional, the structure (and formatting) is not and was taken from a real UK government data source._

First - this is how the data looks.


In [1]:
from tidychef import acquire, preview

table = acquire.xlsx.local("/Users/michael.adams/Code/tidychef/tests/fixtures/xlsx/Students.xlsx")
preview(table)

0,1,2,3,4,5
,A,B,C,D,E
1.0,Student count by location,,,,
2.0,Note - data is entirely fictional for technical example,,,,
3.0,,May-25,Jun-25,Jul-25,
4.0,London,100,200,150,
5.0,Inner,80,130,120,
6.0,Camden,20,30,40,
7.0,Greenwitch,30,50,50,
8.0,Hackney,30,50,30,
9.0,Outer,20,70,30,


There is an obvious hierachy here that is only denoted by the use of bold and cell indentation.

# Requirements

To keep this simple we're going to go with:

- Area (London or Cardiff)
- Sub Area (Inner or Outer)
- Place - the actual location 

In [2]:
from tidychef import acquire, preview
from tidychef.direction import right, up, left, down
from tidychef.output import Column, TidyData

table = acquire.xlsx.local("/Users/michael.adams/Code/tidychef/tests/fixtures/xlsx/Students.xlsx")

# Area is any bold cell in column A that's neither indended nor underlined
area = table.excel_ref("A").is_bold().is_not_indented().is_not_underline().label_as("Area")

# Sub Area is any bold cell in column A that is indented
sub_area = (table.excel_ref("A").is_bold().is_indented() | area).label_as("Sub Area")

# Place is any non blank cell in column A that is indended but is NOT bold
place = (table.excel_ref("A").is_not_blank().is_not_bold().is_indented() |  sub_area).label_as("Place")

# Get the period with a simple string selection
period = table.cell_containing_string("May-25").expand(right).is_not_blank().label_as("Period")

# Values are numbers that are beneath periods
values = period.fill(down).is_not_blank().label_as("Values")

# Create selection preview
preview(area, sub_area, place, period, values)

# Now we define the visual relationships between our selections to create tidydata
tidy_data = TidyData(
    values,
    Column(area.attach_closest(down)),
    Column(sub_area.attach_closest(down)),
    Column(place.attach_closest(down)),
    Column(period.attach_directly(down))
)

tidy_data.to_csv("students.csv")

tidy_data

0
Area
Sub Area
Place
Period
Values

0
Area + Place + Sub Area (2 cells) → AreaSub AreaPlace
Place + Sub Area (4 cells) → Sub AreaPlace

0,1,2,3,4,5
,A,B,C,D,E
1.0,Student count by location,,,,
2.0,Note - data is entirely fictional for technical example,,,,
3.0,,May-25,Jun-25,Jul-25,
4.0,London,100,200,150,
5.0,Inner,80,130,120,
6.0,Camden,20,30,40,
7.0,Greenwitch,30,50,50,
8.0,Hackney,30,50,30,
9.0,Outer,20,70,30,


0,1,2,3,4
Values,Area,Sub Area,Place,Period
100,London,London,London,May-25
80,London,Inner,Inner,May-25
20,London,Inner,Camden,May-25
30,London,Inner,Greenwitch,May-25
30,London,Inner,Hackney,May-25
20,London,Outer,Outer,May-25
8,London,Outer,Brent,May-25
12,London,Outer,Bromley,May-25
130,Cardiff,Cardiff,Cardiff,May-25


