# Advanced Store Manipulations

In [None]:
import atoti as tt

session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="MyStore")
cube = session.create_cube(store, "FirstCube")
cube.query()

## Store description

The lengh of a store is the number of rows in the store

In [None]:
len(store)

You can get the shape of the store, which is the number of columns and rows of the store

In [None]:
store.shape

The description of the columns and types of the store

In [None]:
store

## Sampling mode

When loading large datasets into a store it is possible to sample the data. There are currently 3 mode supported for sampling:

- `first_lines` : keeps only the given amount of lines.
- `first_files` : keeps only the given amount of files.
- `FULL` : Loads everything

Sampling the data helps having a very responsive project during the design phase, when this design is over it is possible to load everything by calling `session.load_all_data`

Sampling mode can be defined in the session for all the stores or store by store.

In [None]:
from atoti.sampling import first_lines

sample_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="sampled", sampling_mode=first_lines(5)
)
len(sample_store)

In [None]:
# Do some design such as adding a cube, some measures...
cube = session.create_cube(sample_store, "Sampling")
cube.query()

In [None]:
session.load_all_data()
len(sample_store)

In [None]:
len(sample_store)
cube.query()

## Specify other types for the columns

atoti automatically detects the type of the columns. It is possible to bypass this behaviour and specify the types of some columns manually.

In [None]:
types = {
    "ID": tt.types.DOUBLE,
    "City": tt.types.STRING,
    "Quantity": tt.types.STRING,
    "Price": tt.types.FLOAT,
}

In [None]:
custom_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="Custom", types=types
)
custom_store

## Insert new rows

New records can be inserted into the store.
If a record has the same key columns as an existing record, the previous record will be overriden.

In [None]:
# New key
store.append((11, "2019-03-01", "Europe", "Germany", "Berlin", "yellow", 1000, 400))

# Existing key
store += (1, "2019-03-01", "Europe", "France", "Paris", "red", 2000, 600)

store.head()

## Append data to a store

A CSV file with the same structure as the initial source can be appended into an existing store:

In [None]:
store.load_csv("data/additional_example.csv")
store.head()

In [None]:
len(store)

## Join stores

Stores can be joined together using a mapping between columns. The mapping columns of the target store must be included in its key columns. If no mapping is specified, the columns with the same names are used.

In [None]:
capital_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capital"
)
capital_store.head()

In [None]:
store.join(capital_store, mapping={"Country": "Country name"})

In [None]:
cube = session.create_cube(store, "Cube")

## See the store schema

It is possible to display the schema of the stores used by a cube, starting from its base store:

In [None]:
cube.schema

and the schema of the all the stores of the session:

In [None]:
session.stores.schema

## Automatic scenario creation

If the data you use to feed a store respects a specific directory structure, we are able to automatically split it into different source scenarios.
You can control this behavior by using the `load_csv` method on the store's `scenarios` attribute.

In [None]:
# Start by creating an empty store
scenario_store = session.create_store(
    {
        "ID": tt.types.INT,
        "Date": tt.types.LOCAL_DATE,
        "Continent": tt.types.STRING,
        "Country": tt.types.STRING,
        "Color": tt.types.STRING,
        "Quantity": tt.types.DOUBLE,
        "Price": tt.types.DOUBLE,
    },
    store_name="ScenarioStore",
)

scenario_store.scenarios.load_csv(
    "data/scenario_directory/", base_scenario_directory="base",
)

As you can see, because the provided directory had two subdiretories, `Base` and `Scenario1`, the `Base` scenario was fed with the data in the base sub directory and a new scenario was created with the data from the `Scenario1` sub directory.

In [None]:
session.scenarios