# Preview

For these example we're going to [use table two of this sample xls file](https://github.com/mikeAdamss/datachef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx) as show below. And use the `bounded=` keywword to keep the previews small.

| <span style="color:green">Note - `bounded=` is useful for presentational purposes such as this, but should generally be avoided when writing processing scripts as its possible to hide data that you might need to know about.  .</span>|
|-----------------------------------------|

In [1]:
from typing import List
from datachef import acquire, preview, XlsxSelectable

tables: List[XlsxSelectable] = acquire.xlsx.http("https://github.com/mikeAdamss/datachef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx")
preview(tables[3], bounded="A1:G11")

0,1,2,3,4,5,6,7
,A,B,C,D,E,F,G
1.0,"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,
2.0,This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,
4.0,2019=100,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1
8.0,1998,24.9,45.3,42,59.5,60.7,155
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9


## Selection & Preview

We're going to make use of two selection methods now as follows.

* `.excel_ref()` - use excel cel references to explicitly select a range of cells
* `.label_as()` - give a user friendly label to a selection of cells

We're going to start by making and previewing some simple selections with `.excel_ref()`

In [2]:
from typing import List
from datachef import acquire, preview, XlsxSelectable

tables: List[XlsxSelectable] = acquire.xlsx.http("https://github.com/mikeAdamss/datachef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx")

# We're only working on 1 table here
table = tables[3]

# Create our selections
time = table.excel_ref("A7:A11")
housing = table.excel_ref("B5:G5")
data_identifier_code = table.excel_ref("B6:G6")

# Note on multiple selections
# - Any selections for previewing are just passed as positional arguments to preview().
# - You dont need to pass in a blank selection, that is only necessary where no selections have been made.
preview(time, housing, data_identifier_code, bounded="A1:G11")

0
Unnamed Selection: 0
Unnamed Selection: 1
Unnamed Selection: 2

0,1,2,3,4,5,6,7
,A,B,C,D,E,F,G
1.0,"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,
2.0,This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,
4.0,2019=100,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1
8.0,1998,24.9,45.3,42,59.5,60.7,155
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9


### ......and...beware the gotcha!

And this makes a good example of why you need to be careful with the `bounded=` keyword.

As shown below it can hide information you may need to know - to make this clear lets extend the  last preview another two rows and columns.

In [3]:
preview(time, housing, data_identifier_code, bounded="A1:I14")

0
Unnamed Selection: 0
Unnamed Selection: 1
Unnamed Selection: 2

0,1,2,3,4,5,6,7,8,9
,A,B,C,D,E,F,G,H,I
1.0,"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,,,
2.0,This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,,,
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,,,
4.0,2019=100,,,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work,Private commercial new work,All new work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A,MV3B,MV3C
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1,84.3,63.5
8.0,1998,24.9,45.3,42,59.5,60.7,155,91.4,65.2
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9,102.3,67.2


From here we're going to continue using `bounded=` for reasons of practicality, just be aware of this gotcha in your own scripts.

## Labelling Selections

Lets do a similar thing but this time let's use `.label_as()` to give our cell selection some semantic meaning.

In [4]:
from typing import List
from datachef import acquire, preview, XlsxSelectable

tables: List[XlsxSelectable] = acquire.xlsx.http("https://github.com/mikeAdamss/datachef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx")
table = tables[3]

# Create our selections
time = table.excel_ref("A7:A11").label_as("Time")
housing = table.excel_ref("B5:G5").label_as("Housing")
data_identifier_code = table.excel_ref("B6:G6").label_as("Data Identifier Code")

preview(time, housing, data_identifier_code, bounded="A1:G11")

0
Time
Housing
Data Identifier Code

0,1,2,3,4,5,6,7
,A,B,C,D,E,F,G
1.0,"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,
2.0,This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,
4.0,2019=100,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1
8.0,1998,24.9,45.3,42,59.5,60.7,155
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9


# Excel notations

As previously mentioned, the default behaviour for `preview()` is to show excel style column and row notations.

This is nearly always the practical choice when processing but can lead to some confusion when previewing your work (especially where previewing a non excel format).

As a nod to these scenarios you can use the `with_excel_notations=` keyword to change this behaviour as per the following example. 

In [5]:
preview(time, housing, data_identifier_code, bounded="A1:G14", with_excel_notations=False)

0
Time
Housing
Data Identifier Code

0,1,2,3,4,5,6
"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,
This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,
Source: Construction Output and Employment from the Office for National Statistics,,,,,,
2019=100,,,,,,
Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work
Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A
1997,30.8,44.8,42.6,61.2,57.6,152.1
1998,24.9,45.3,42,59.5,60.7,155
1999,21.6,40.7,37.7,57.9,68.3,159.9
2000,27.1,45.5,42.6,54.3,64.7,142.7


# Alternative Bounded Arguments

Passing in an excel reference for `bounded=` somewhat defeats the idea of using `with_excel_notations=False` so you can also pass in a dictionary of x/y positional offets should you need to as per the following example.

| <span style="color:green">To reiterate: the expected, encouraged and principally supported behaviour of datachef is to use excel style references for human readability, the following is again just an effort to support presentational edge cases where a reference to excel when working with non excel data could cause audience confusion.</span>|
|-----------------------------------------|

In [6]:
preview(time, housing, data_identifier_code, bounded={"start_xy": "0,0", "end_xy": "6,10"},
        with_excel_notations=False)

0
Time
Housing
Data Identifier Code

0,1,2,3,4,5,6
"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,
This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,
Source: Construction Output and Employment from the Office for National Statistics,,,,,,
2019=100,,,,,,
Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work
Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A
1997,30.8,44.8,42.6,61.2,57.6,152.1
1998,24.9,45.3,42,59.5,60.7,155
1999,21.6,40.7,37.7,57.9,68.3,159.9
2000,27.1,45.5,42.6,54.3,64.7,142.7
