# Great Expectations（GE） によるノートブック型環境（Databricks）でのデータ品質保証を実行する方法のまとめ

## Great Expectations とは

Great Expectations (GE) とは、データに対する検証、ドキュメント化、および、プロファイリングにより、データ品質の保証と改善を支援する OSS の Python ライブラリである。


GE に関する基本的な記事として、次の記事が公開されている。

| #    | 記事                                                       | 概要                   |
| ---- | ------------------------------------------------------------ | ---------------------- |
| 1    | [Welcome](https://docs.greatexpectations.io/docs/)           | 概要                   |
| 2    | [Getting started with Great Expectations](https://docs.greatexpectations.io/docs/tutorials/getting_started/tutorial_overview) | チュートリアル         |
| 3    | [Glossary of Terms](https://docs.greatexpectations.io/docs/glossary) | 用語集                 |
| 4    | [Customize your deployment Great Expectations](https://docs.greatexpectations.io/docs/reference/customize_your_deployment) | 利用するための考慮事項 |
| 5    | [Explore Expectations](https://greatexpectations.io/expectations/) | Expectation 一覧       |
| 6    | [Community Page • Great Expectations](https://greatexpectations.io/community) | コミュニティ関連       |
| 7    | [Case studies from Great Expectations](https://greatexpectations.io/case-studies/) | ケーススタディ         |



GE を利用する基本的な手順は次のようになっている。データに対する品質保証条件を Expectations として定義に基づき、データソースへの検証を行い、検証結果をドキュメント化することができる。

1. セットアップ
2. データへの接続
3. Expectations の作成
4. データの検証


検証できるデータソースには次のものがある。

- SQLAlchemy 経由によるデータベース
- Pandas Dataframe
- Spark Dataframe

データソースごとに利用できる Expection が異なり、次のドキュメントにて整理されている。

- [Expectation implementations by backend | Great Expectations](https://docs.greatexpectations.io/docs/reference/expectations/implemented_expectations/)

GE を利用する方法として、CLIによる方法とノートブック型環境による方法がある。ノートブック型環境による方法を実施する際には、次のドキュメントが参考となる。

-   [How to instantiate a Data Context without a yml file](https://docs.greatexpectations.io/docs/guides/setup/configuring_data_contexts/how_to_instantiate_a_data_context_without_a_yml_file/)
-   [How to quickly explore Expectations in a notebook | Great Expectations](https://docs.greatexpectations.io/docs/guides/miscellaneous/how_to_quickly_explore_expectations_in_a_notebook/)
-   [How to pass an in-memory DataFrame to a Checkpoint | Great Expectations](https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_pass_an_in_memory_dataframe_to_a_checkpoint/)


データ検証後に、ドキュメントを作成するだけでなく、次のような [Action](https://docs.greatexpectations.io/docs/terms/action) を設定可能。

-   [How to trigger Email as a Validation Action](https://docs.greatexpectations.io/docs/guides/validation/validation_actions/how_to_trigger_email_as_a_validation_action)
-   [How to collect OpenLineage metadata using a Validation Action](https://docs.greatexpectations.io/docs/guides/validation/validation_actions/how_to_collect_openlineage_metadata_using_a_validation_action)
-   [How to trigger Opsgenie notifications as a Validation Action](https://docs.greatexpectations.io/docs/guides/validation/validation_actions/how_to_trigger_opsgenie_notifications_as_a_validation_action)
-   [How to trigger Slack notifications as a Validation Action](https://docs.greatexpectations.io/docs/guides/validation/validation_actions/how_to_trigger_slack_notifications_as_a_validation_action)
-   [How to update Data Docs after validating a Checkpoint](https://docs.greatexpectations.io/docs/guides/validation/validation_actions/how_to_update_data_docs_as_a_validation_action)

## Greate Exceptions を利用するための事前準備

### 1. Great Expectations のインストール

In [0]:
%pip install great_expectations -q

### 2. データ（データフレーム）の準備

In [0]:
schema = '''
`VendorID` INT,
`tpep_pickup_datetime` TIMESTAMP,
`tpep_dropoff_datetime` TIMESTAMP,
`passenger_count` INT,
`trip_distance` DOUBLE,
`RatecodeID` INT,
`store_and_fwd_flag` STRING,
`PULocationID` INT,
`DOLocationID` INT,
`payment_type` INT,
`fare_amount` DOUBLE,
`extra` DOUBLE,
`mta_tax` DOUBLE,
`tip_amount` DOUBLE,
`tolls_amount` DOUBLE,
`improvement_surcharge` DOUBLE,
`total_amount` DOUBLE,
`congestion_surcharge` DOUBLE
'''

src_files = [
    "/databricks-datasets/nyctaxi/tripdata/yellow/yellow_tripdata_2019-01.csv.gz",
#     "/databricks-datasets/nyctaxi/tripdata/yellow/yellow_tripdata_2019-02.csv.gz",
]

tgt_df = (
    spark
    .read
    .format("csv")
    .schema(schema)
    .option("header", "true")
    .option("inferSchema", "false")
    .load(src_files)
)

## 基本的なデータ品質検証の実施

次の記事を参考にしている。

- [How to Use Great Expectations in Databricks | Great Expectations](https://docs.greatexpectations.io/docs/deployment_patterns/how_to_use_great_expectations_in_databricks/)

### Great Expectations のセットアップ

In [0]:
import datetime

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import (
    DataContextConfig,
    FilesystemStoreBackendDefaults,
)

In [0]:
# 検証結果を永続的に保存する場合には、`/dbfs`ディレクトリに配置すること
root_directory = "/tmp/great_expectations"
root_directory_in_spark_api = f"file:{root_directory}"

In [0]:
# root_directory の初期化
dbutils.fs.rm(root_directory_in_spark_api, True)

try:
    # ディレクトリを確認
    display(dbutils.fs.ls(root_directory_in_spark_api))
except:
    print('Directory is empty.')

In [0]:
# Great expectaions 利用時のエントリーポイントである Data Context を定義
# https://docs.greatexpectations.io/docs/terms/data_context/

# great_expectations.yml を参照せずに定義を実施
data_context_config = DataContextConfig(
    store_backend_defaults=FilesystemStoreBackendDefaults(
        root_directory=root_directory
    ),
)
context = BaseDataContext(project_config=data_context_config)

# 利用状況の情報共有を提供を停止
# https://docs.greatexpectations.io/docs/reference/anonymous_usage_statistics/
context.anonymous_usage_statistics.enabled = False

In [0]:
# ディレクトリを確認
display(dbutils.fs.ls(root_directory_in_spark_api))

path,name,size
file:/tmp/great_expectations/uncommitted/,uncommitted/,4096
file:/tmp/great_expectations/profilers/,profilers/,4096
file:/tmp/great_expectations/expectations/,expectations/,4096
file:/tmp/great_expectations/checkpoints/,checkpoints/,4096


### データへの接続

In [0]:
datasource_name = "taxi_datasource"
dataconnector_name = "databricks_df"
data_asset_name = "nyctaxi_tripdata_yellow_yellow_tripdata"
tgt_deploy_env = "prod"

In [0]:
datasource_config = {
    # データソースを定義
    # https://docs.greatexpectations.io/docs/terms/datasource
    "name": datasource_name,
    "class_name": "Datasource",

    # execution_engine を定義
    # https://docs.greatexpectations.io/docs/terms/execution_engine/
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "SparkDFExecutionEngine",
    },

    # データコネクターを定義
    # https://docs.greatexpectations.io/docs/terms/data_connector/
    "data_connectors": {
        dataconnector_name: {
            "module_name": "great_expectations.datasource.data_connector",
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": [
                "some_key_maybe_pipeline_stage",
                "some_other_key_maybe_run_id",
            ],
        }
    },
}
context.add_datasource(**datasource_config)

In [0]:
batch_request = RuntimeBatchRequest(
    datasource_name=datasource_name,
    data_connector_name=dataconnector_name,
    data_asset_name = data_asset_name,
    batch_identifiers={
        "some_key_maybe_pipeline_stage": tgt_deploy_env,
        "some_other_key_maybe_run_id": f"my_run_name_{datetime.date.today().strftime('%Y%m%d')}",
    },
    runtime_parameters={"batch_data": tgt_df},
)

### Expectations を作成

In [0]:
expectation_suite_name = "nyctaxi_tripdata_yellow_yellow_tripdata"

In [0]:
context.create_expectation_suite(
    expectation_suite_name=expectation_suite_name,
    overwrite_existing=True,
)

In [0]:
# expectations にファイルが作成されたことを確認
expectations_file_path = f'{root_directory_in_spark_api}/expectations/{expectation_suite_name}.json'
print(dbutils.fs.head(expectations_file_path))

In [0]:
# Validator
# https://docs.greatexpectations.io/docs/terms/validator
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name,
)

# exception を追加する際に検証を実施しないように設定
validator.interactive_evaluation = False

In [0]:
# exception を追加
_ = validator.expect_column_values_to_not_be_null(
    column="passenger_count",
)

_ = validator.expect_column_values_to_be_between(
    column="congestion_surcharge",
    min_value=0,
    max_value=1000,
    meta={
        "notes": {
            "format": "markdown",
            "content": "Example notes about this expectation. **Markdown** `Supported`."
        }
    },
)

_ = validator.expect_column_values_to_be_between(
    column="passenger_count",
    min_value=0,
    max_value=1000,
)

In [0]:
# expectations を保存
validator.save_expectation_suite(discard_failed_expectations=False)

In [0]:
# expectations がファイルに追記されたことを確認
print(dbutils.fs.head(expectations_file_path))

### データの検証

In [0]:
checkpoint_config_name = "nyctaxi_tripdata_yellow_yellow_tripdata__checkpoint"

In [0]:
# チェックポイントを定義
checkpoint_config = {
    "name":checkpoint_config_name,
    "config_version": 1,
    "class_name": "SimpleCheckpoint",
    "expectation_suite_name": expectation_suite_name,
    "run_name_template": "%Y%m%d-%H%M%S-yctaxi_tripdata_yellow_yellow_tripdata",
}
context.add_checkpoint(**checkpoint_config)

In [0]:
# checkpoints にファイルが作成されたことを確認
checkpoints_file_path = f'{root_directory_in_spark_api}/checkpoints/{checkpoint_config_name}.yml'
print(dbutils.fs.head(checkpoints_file_path))

In [0]:
checkpoint_result = context.run_checkpoint(
    checkpoint_name=checkpoint_config_name,
    validations=[
        {
            "batch_request": batch_request,
            "expectation_suite_name": expectation_suite_name,
        }
    ],
)

In [0]:
# uncommitted/validations にディレクトリが作成されたことを確認
checkpoints_file_path = f'{root_directory_in_spark_api}/uncommitted/validations/{expectation_suite_name}'
display(dbutils.fs.ls(checkpoints_file_path))

# uncommitted/data_docs/local_site にファイルとディレクトリが作成されたことを確認
checkpoints_file_path = f'{root_directory_in_spark_api}/uncommitted/data_docs/local_site'
display(dbutils.fs.ls(checkpoints_file_path))

path,name,size
file:/tmp/great_expectations/uncommitted/validations/nyctaxi_tripdata_yellow_yellow_tripdata/20220815-071849-yctaxi_tripdata_yellow_yellow_tripdata/,20220815-071849-yctaxi_tripdata_yellow_yellow_tripdata/,4096


path,name,size
file:/tmp/great_expectations/uncommitted/data_docs/local_site/index.html,index.html,34479
file:/tmp/great_expectations/uncommitted/data_docs/local_site/expectations/,expectations/,4096
file:/tmp/great_expectations/uncommitted/data_docs/local_site/validations/,validations/,4096
file:/tmp/great_expectations/uncommitted/data_docs/local_site/static/,static/,4096


### 検証結果を確認

In [0]:
# 品質チェック結果を表示
checkpoint_result["success"]

In [0]:
# 品質チェック結果の HTML ファイルをを表示
first_validation_result_identifier = (
    checkpoint_result.list_validation_result_identifiers()[0]
)
first_run_result = checkpoint_result.run_results[first_validation_result_identifier]

docs_path = first_run_result['actions_results']['update_data_docs']['local_site']

html = dbutils.fs.head(docs_path,)

displayHTML(html)

Unnamed: 0,Unnamed: 1
Evaluated Expectations,3
Successful Expectations,3
Unsuccessful Expectations,0
Success Percent,100%

Unnamed: 0,Unnamed: 1
Great Expectations Version,0.15.18
Run Name,20220815-071849-yctaxi_tripdata_yellow_yellow_tripdata
Run Time,2022-08-15T07:18:49Z

Unnamed: 0,Unnamed: 1
ge_load_time,20220815T071849.604061Z

Unnamed: 0,Unnamed: 1
batch_data,SparkDataFrame
data_asset_name,nyctaxi_tripdata_yellow_yellow_tripdata

Status,Expectation,Observed Value
,values must be greater than or equal to 0 and less than or equal to 1000.,0% unexpected

Status,Expectation,Observed Value
,values must never be null.,100% not null
,values must be greater than or equal to 0 and less than or equal to 1000.,0% unexpected


## 品質エラーがある場合の動作検証

In [0]:
# エラーとなる expectation を追加
validator.expect_column_values_to_not_be_null(
    column="congestion_surcharge",
)

validator.save_expectation_suite(discard_failed_expectations=False)

In [0]:
checkpoint_result = context.run_checkpoint(
    checkpoint_name=checkpoint_config_name,
    validations=[
        {
            "batch_request": batch_request,
            "expectation_suite_name": expectation_suite_name,
        }
    ],
)

In [0]:
# 品質チェック結果のを表示
checkpoint_result["success"]

In [0]:
# 品質チェック結果の HTML ファイルをを表示
first_validation_result_identifier = (
    checkpoint_result.list_validation_result_identifiers()[0]
)
first_run_result = checkpoint_result.run_results[first_validation_result_identifier]

docs_path = first_run_result['actions_results']['update_data_docs']['local_site']

html = dbutils.fs.head(docs_path,)

displayHTML(html)

Unnamed: 0,Unnamed: 1
Evaluated Expectations,4
Successful Expectations,3
Unsuccessful Expectations,1
Success Percent,75%

Unnamed: 0,Unnamed: 1
Great Expectations Version,0.15.18
Run Name,20220815-071957-yctaxi_tripdata_yellow_yellow_tripdata
Run Time,2022-08-15T07:19:57Z

Unnamed: 0,Unnamed: 1
ge_load_time,20220815T071957.465970Z

Unnamed: 0,Unnamed: 1
batch_data,SparkDataFrame
data_asset_name,nyctaxi_tripdata_yellow_yellow_tripdata

Status,Expectation,Observed Value
Sampled Unexpected Values,Unnamed: 1_level_1,Unnamed: 2_level_1
,values must be greater than or equal to 0 and less than or equal to 1000.,0% unexpected
,values must never be null.  4855978 unexpected values found. ≈63.33% of 7667792 total rows.  Sampled Unexpected Values  null,≈36.67% not null
Sampled Unexpected Values,,
,,

Sampled Unexpected Values
""

Status,Expectation,Observed Value
,values must never be null.,100% not null
,values must be greater than or equal to 0 and less than or equal to 1000.,0% unexpected


## データプロファイリング

次の記事を参考にしている。

- [Great Expectation on Databricks. Run great_expectations on the hosted… | by Probhakar | Medium](https://probhakar-95.medium.com/great-expectation-on-databricks-8777042e00de)

In [0]:
from great_expectations.profile.basic_dataset_profiler import BasicDatasetProfiler
from great_expectations.dataset.sparkdf_dataset import  SparkDFDataset
from great_expectations.render.renderer import *
from great_expectations.render.view import DefaultJinjaPageView

In [0]:
basic_dataset_profiler = BasicDatasetProfiler()

In [0]:
# creating GE wrapper around spark dataframe
from great_expectations.dataset.pandas_dataset import PandasDataset
gdf = PandasDataset(
    tgt_df
    .limit(1000)
    .toPandas()
) 

下記のコードにより、spark データフレームでも実行できるか、パフォーマンスに課題あり

```python
from great_expectations.dataset.sparkdf_dataset import SparkDFDataset
gdf = SparkDFDataset(
    tgt_df
    .limit(1000)
) 

print(gdf.spark_df.count())
gdf.spark_df.display()
```

In [0]:
# データを確認
print(gdf.count())
gdf.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1,2019-01-01 00:46:40,2019-01-01 00:53:20,1,1.5,1,N,151,239,1,7.0,0.5,0.5,1.65,0.0,0.3,9.95,
1,1,2019-01-01 00:59:47,2019-01-01 01:18:59,1,2.6,1,N,239,246,1,14.0,0.5,0.5,1.0,0.0,0.3,16.3,
2,2,2018-12-21 13:48:30,2018-12-21 13:52:40,3,0.0,1,N,236,236,1,4.5,0.5,0.5,0.0,0.0,0.3,5.8,
3,2,2018-11-28 15:52:25,2018-11-28 15:55:45,5,0.0,1,N,193,193,2,3.5,0.5,0.5,0.0,0.0,0.3,7.55,
4,2,2018-11-28 15:56:57,2018-11-28 15:58:33,5,0.0,2,N,193,193,2,52.0,0.0,0.5,0.0,0.0,0.3,55.55,


In [0]:

from great_expectations.profile.basic_dataset_profiler import BasicDatasetProfiler

# データをプロファイリング
expectation_suite, validation_result = gdf.profile(BasicDatasetProfiler)

In [0]:
from great_expectations.render.renderer import (
    ProfilingResultsPageRenderer,
    ExpectationSuitePageRenderer,
)
from great_expectations.render.view import DefaultJinjaPageView

profiling_result_document_content = ProfilingResultsPageRenderer().render(validation_result)
expectation_based_on_profiling_document_content = ExpectationSuitePageRenderer().render(expectation_suite)

In [0]:
profiling_result_document_content

In [0]:
# HTML を生成
profiling_result_HTML = DefaultJinjaPageView().render(profiling_result_document_content) # type string or str
expectation_based_on_profiling_HTML = DefaultJinjaPageView().render(expectation_based_on_profiling_document_content)

In [0]:
displayHTML(profiling_result_HTML)

Unnamed: 0,Unnamed: 1
Number of variables,18
Number of observations  expect_table_row_count_to_be_between,1000
Missing cells,5.56%

Unnamed: 0,Unnamed: 1
int,6
float,9
string,1
datetime,2
bool,0
unknown,0

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,2
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.2%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,870
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,87.0%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Minimum  expect_column_min_to_be_between,.2f
Maximum  expect_column_max_to_be_between,.2f

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,896
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,89.6%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Minimum  expect_column_min_to_be_between,.2f
Maximum  expect_column_max_to_be_between,.2f

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,7
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.7%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,365
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,36.5%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,0.46
Q1  expect_column_quantile_values_to_be_between,1.0
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,1.88
Q3  expect_column_quantile_values_to_be_between,3.58
0.95  expect_column_quantile_values_to_be_between,8.9

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,2.9
Minimum  expect_column_min_to_be_between,0.0
Maximum  expect_column_max_to_be_between,31.57

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,3
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.3%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,2
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.2%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%
Leading or trailing whitespace (n),0

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,79
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,7.9%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,48
Q1  expect_column_quantile_values_to_be_between,113
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,161
Q3  expect_column_quantile_values_to_be_between,236
0.95  expect_column_quantile_values_to_be_between,262

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,163.7
Minimum  expect_column_min_to_be_between,4.0
Maximum  expect_column_max_to_be_between,264.0

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,118
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,11.8%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,41
Q1  expect_column_quantile_values_to_be_between,100
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,162
Q3  expect_column_quantile_values_to_be_between,236
0.95  expect_column_quantile_values_to_be_between,263

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,161.6
Minimum  expect_column_min_to_be_between,4.0
Maximum  expect_column_max_to_be_between,265.0

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,4
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.4%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,84
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,8.4%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,4.0
Q1  expect_column_quantile_values_to_be_between,6.5
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,9.5
Q3  expect_column_quantile_values_to_be_between,15.5
0.95  expect_column_quantile_values_to_be_between,29.5

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,12.54
Minimum  expect_column_min_to_be_between,-2.5
Maximum  expect_column_max_to_be_between,82.5

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,3
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.3%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,2
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.2%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,190
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,19.0%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,0.0
Q1  expect_column_quantile_values_to_be_between,0.0
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,1.26
Q3  expect_column_quantile_values_to_be_between,2.46
0.95  expect_column_quantile_values_to_be_between,5.35

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,1.71
Minimum  expect_column_min_to_be_between,0.0
Maximum  expect_column_max_to_be_between,25.0

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,3
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.3%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,2
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,0.2%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,300
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,30.0%
Missing (n)  expect_column_values_to_not_be_null,0
Missing (%)  expect_column_values_to_not_be_null,0.0%

Unnamed: 0,Unnamed: 1
0.05  expect_column_quantile_values_to_be_between,6.3
Q1  expect_column_quantile_values_to_be_between,8.8
Median  expect_column_quantile_values_to_be_between expect_column_median_to_be_between,12.36
Q3  expect_column_quantile_values_to_be_between,19.1
0.95  expect_column_quantile_values_to_be_between,37.55

Unnamed: 0,Unnamed: 1
Mean  expect_column_mean_to_be_between,15.7
Minimum  expect_column_min_to_be_between,-3.8
Maximum  expect_column_max_to_be_between,83.8

Unnamed: 0,Unnamed: 1
Distinct (n)  expect_column_unique_value_count_to_be_between,--
Distinct (%)  expect_column_proportion_of_unique_values_to_be_between,--
Missing (n)  expect_column_values_to_not_be_null,1000
Missing (%)  expect_column_values_to_not_be_null,100.0%


## リソースのクリーンアップ

In [0]:
dbutils.fs.rm(root_directory_in_spark_api, True)