# Preview of the data structures from processing steps in creation of the synthetic standard survey school dashboard

Please note: This will differ for the symbol survey and public dashboards.

For a **diagram and written summary** of the data processing for the synthetic dashboard, please see `data_guide.md`. This notebook accompanies that guide, providing a preview of the data columns and types at key points in the process.

In [None]:
# Import packages required to produced this notebook
import pandas as pd

In [None]:
def describe_data(filepath):
    '''
    Describe the shape of the data, preview the first five rows, and print
    the name and type of every column

    filepath:
    filename : string
        Filepath of dataset to import and describe
    '''
    df = pd.read_csv(filepath)

    # Print shape of dataframe
    print(df.shape)

    # Preview first 5 rows of dataframe
    display(df.head())

    # Print the name and type of every column
    with pd.option_context('display.max_rows', None,
                           'display.max_columns', None):
        print(df.dtypes)

### Raw data

In [None]:
describe_data('data/survey_data/KailoBeeWellStandard_DATA_2023-11-06_1152.csv')

### Headings

Headings is a dataset which only has column headings and does not contain any column entries.

In [None]:
head = pd.read_csv('data/survey_data/headings.csv')

# Print shape of dataframe
print(head.shape)

# Print all the columns in the dataframe
head.columns.tolist()

### Synthetic pupil dataset

In [None]:
describe_data('data/survey_data/synthetic_data_raw.csv')

### Aggregated dataset with scores and RAG ratings

In [None]:
describe_data('data/survey_data/aggregate_scores_rag.csv')

### Aggregated dataset with non-demographic question responses

In [None]:
describe_data('data/survey_data/aggregate_responses.csv')

### Aggregated dataset with overall counts

In [None]:
describe_data('data/survey_data/overall_counts.csv')

### Aggregated dataset with demographic question responses

In [None]:
describe_data('data/survey_data/aggregate_demographic.csv')