## Setting up great expectations

This initial section gets the context and then sets everything up ready for us to creation validations for the data.

In [4]:
import great_expectations as gx

context = gx.get_context()


In [5]:
context.add_or_update_expectation_suite("hn_expectation_suite")


{
  "expectation_suite_name": "hn_expectation_suite",
  "ge_cloud_id": null,
  "expectations": [],
  "data_asset_type": null,
  "meta": {
    "great_expectations_version": "0.17.19"
  }
}

In [6]:
datasource = context.sources.add_or_update_pandas(name="hn_dataframe")


In [7]:
validator = context.sources.pandas_default.read_csv("stories.csv")


In [8]:
validator.expectation_suite_name = "hn_expectation_suite"


### Example validate column names

Below is example of how to setup a validation by running `validator.method` it sets up the validator to expect that thing run that cell and the move on!

In [9]:
columns = ["rank", "title", "site", "link", "score", "author", "comments_number"]
for col in columns:
    validator.expect_column_to_exist(col)


Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 256.99it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 287.43it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 284.87it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 263.59it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 259.81it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 235.54it/s]
Calculating Metrics: 100%|██████████| 2/2 [00:00<00:00, 267.71it/s]


# Setup validations for each column

Now you need to setup the validations for the all of the columns using the syntax 


```python
validator.method
```

and finding the correct method in order to meet the requirements of the list!

### Validate rank

- Type of rank in int64
- Rank is unique
- Rank is not less than 1

In [10]:
validator.expect_column_values_to_be_of_type("rank", "int64")
validator.expect_column_values_to_be_unique("rank")
validator.expect_column_values_to_be_between("rank", 1, None, True)


Calculating Metrics: 100%|██████████| 1/1 [00:00<00:00, 217.76it/s]
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 188.16it/s] 
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 202.80it/s] 


{
  "success": false,
  "result": {
    "element_count": 29,
    "unexpected_count": 1,
    "unexpected_percent": 3.4482758620689653,
    "partial_unexpected_list": [
      1
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 3.4482758620689653,
    "unexpected_percent_nonmissing": 3.4482758620689653
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Title 

- Type of string
- Can't be null
- Lengths should be between 1 and 255

In [14]:
validator.expect_column_values_to_be_of_type("title", "str")
validator.expect_column_values_to_not_be_null("title")
validator.expect_column_value_lengths_to_be_between("title", 1, 255)



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 130.11it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 6/6 [00:00<00:00, 131.08it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 9/9 [00:00<00:00, 114.13it/s]


{
  "success": true,
  "result": {
    "element_count": 29,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Site 

- Type of string
- Can't be null
- Should match the regex of a valid domain

In [15]:
validator.expect_column_values_to_be_of_type("site", "str")
validator.expect_column_values_to_not_be_null("site")
validator.expect_column_values_to_match_regex("site", "^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$")



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 112.16it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 6/6 [00:00<00:00, 126.35it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 110.63it/s]


{
  "success": false,
  "result": {
    "element_count": 29,
    "unexpected_count": 3,
    "unexpected_percent": 10.344827586206897,
    "partial_unexpected_list": [
      "github.com/ggerganov",
      "github.com/gristlabs",
      "github.com/bensadeh"
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 10.344827586206897,
    "unexpected_percent_nonmissing": 10.344827586206897
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Link

- Type of string
- Can't be null
- Should match the regex of the a valid link

In [16]:
validator.expect_column_values_to_be_of_type("link", "str")
validator.expect_column_values_to_not_be_null("link")
validator.expect_column_values_to_match_regex("link", "^((http|https)://)[-a-zA-Z0-9@:%._\\+~#?&//=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%._\\+~#?&//=]*)$")



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 127.75it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 6/6 [00:00<00:00, 141.46it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 131.79it/s]


{
  "success": true,
  "result": {
    "element_count": 29,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Score

- Type of int64
- Must be atleast 0

In [17]:
validator.expect_column_values_to_be_of_type("score", "int64")
validator.expect_column_values_to_be_between("score", 0, None, True)



[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 1/1 [00:00<00:00, 82.17it/s] 

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 136.74it/s]


{
  "success": true,
  "result": {
    "element_count": 29,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Author

- Type of string
- Can't be null

In [18]:
validator.expect_column_values_to_be_of_type("author", "str")
validator.expect_column_values_to_not_be_null("author")



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 144.28it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 6/6 [00:00<00:00, 153.29it/s]


{
  "success": true,
  "result": {
    "element_count": 29,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Validate Comments Number

- Type of int64
- Must be zero or greater

In [19]:
validator.expect_column_values_to_be_of_type("comments_number", "int64")
validator.expect_column_values_to_be_between("comments_number", 0, None, True)



[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 1/1 [00:00<00:00, 66.52it/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 114.46it/s]


{
  "success": false,
  "result": {
    "element_count": 29,
    "unexpected_count": 1,
    "unexpected_percent": 3.4482758620689653,
    "partial_unexpected_list": [
      0
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 3.4482758620689653,
    "unexpected_percent_nonmissing": 3.4482758620689653
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### 💾 Save the suite ready to be used in the pipeline

In [20]:
validator.save_expectation_suite(discard_failed_expectations=False)
