# Great Expectations Task

## 1. Install Great Expectations Library


In [51]:
%pip install great_expectations==0.17.14

Note: you may need to restart the kernel to use updated packages.


##2. Import Necessary Libraries

In [53]:
import pandas as pd
import great_expectations as gx

##3. Load Labels.csv

Download and upload the [Labels.csv](https://github.com/zubxxr/SOFE3980U-Lab5/blob/main/Labels.csv) into this notebook, and then load the file.

In [54]:


df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
                 names=["age", "workclass", "fnlwgt", "education", "education-num", "marital-status",
                        "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss",
                        "hours-per-week", "native-country", "income"])

##4. Preview the Dataset

In [55]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


##5. Set Up Great Expectations Context and Data Source

In [57]:
context = gx.get_context()

datasource = context.sources.add_pandas(name="my_pandas_datasource")

##6. Define and Create a Data Batch

In [58]:
asset = datasource.add_dataframe_asset(name="labels_asset_v2")
batch_request = asset.build_batch_request(dataframe=df)

validator = context.get_validator(batch_request=batch_request)


##7. Define Three Expectations for Column Values

Using this [link](https://greatexpectations.io/expectations/), choose three expectation functions and apply them to the labels dataset in a relevant manner.

You should replace the 'ExpectColumnValuesToBeBetween' function with other functions you select from the link.

You can also check the format/parameters required of each function when you click "See more" on the function.

In [59]:
expectation_1 = validator.expect_column_values_to_be_between(
    column="Car1_Location_X",
    min_value=0,
    max_value=500
)

expectation_2 = validator.expect_column_values_to_match_regex(
    column="Ground_Truth_View",
    regex=r".*\.png"
)

expectation_3 = validator.expect_column_values_to_be_unique(
    column="Timestamp"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Expectation 1

In [61]:
expectation_1 = validator.expect_column_values_to_be_between(
    column="Car1_Location_X",
    min_value=0,
    max_value=500
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 1

In [62]:
expectation_1

{
  "success": false,
  "result": {
    "element_count": 121,
    "unexpected_count": 121,
    "unexpected_percent": 100.0,
    "partial_unexpected_list": [
      -51.40297655,
      -53.81963722,
      -50.23914439,
      -53.70722021,
      -52.05372109,
      -53.93975603,
      -50.30258412,
      -53.17447194,
      -52.72667437,
      -50.18179353,
      -52.40699613,
      -52.38122971,
      -53.01906414,
      -50.85034015,
      -51.93070037,
      -50.75051989,
      -50.63015195,
      -50.69818291,
      -51.95966168,
      -50.88663347
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 100.0,
    "unexpected_percent_nonmissing": 100.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Expectation 2

In [63]:
expectation_2 = validator.expect_column_values_to_match_regex(
    column="Ground_Truth_View",
    regex=r".*\.png"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 2

In [64]:
expectation_2

{
  "success": true,
  "result": {
    "element_count": 121,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Expectation 3

In [65]:
expectation_3 = validator.expect_column_values_to_be_unique(
    column="Timestamp"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 3

In [66]:
expectation_3

{
  "success": true,
  "result": {
    "element_count": 121,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}