🚀🚀 Congratulations! 🚀🚀 You have successfully navigated the entire workflow for using Great Expectations with Amazon Web Services S3 and Spark, from installing Great Expectations through Validating your Data.
Spark possesses a few dependencies that need to be installed before it can be used with AWS. You will need to install the `aws-java-sdk-bundle` and `hadoop-aws` files corresponding to your version of pySpark, and update your Spark configuration accordingly. You can find the `.jar` files you need to install in the following MVN repositories:

- [hadoop-aws jar that matches your Spark version](
- [aws-java-sdk-bundle jar that matches your Spark version](

Once the dependencies are installed, you will need to update your Spark configuration from within Python. First, import these necessary modules:

import pyspark as pyspark
from pyspark import SparkContext

Next, update the `pyspark.SparkConf` to match the dependency packages you downloaded. In this example, we are using the 3.3.1 version of `hadoop-aws`, but you will want to enter the version that corresponds to your installed dependency.

conf = pyspark.SparkConf()
conf.set('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.3.1')

Finally, you will need to add your AWS credentials to the `SparkContext`.

sc = SparkContext(conf=conf)
sc._jsc.hadoopConfiguration().set('fs.s3a.access.key', [AWS ACCESS KEY])
sc._jsc.hadoopConfiguration().set('fs.s3a.secret.key', [AWS SECRET KEY])
Great Expectations can work within many frameworks. In this guide you will be shown a workflow for using Great Expectations with AWS and cloud storage. You will configure a local Great Expectations project to store Expectations, Validation Results, and Data Docs in Amazon S3 buckets. You will further configure Great Expectations to use Spark and access data stored in another Amazon S3 bucket.

This guide will demonstrate each of the steps necessary to go from installing a new instance of Great Expectations to Validating your data for the first time and viewing your Validation Results as Data Docs.


- Installed Python 3. (Great Expectations requires Python 3. For details on how to download and install Python on your platform, see [](
- Installed the AWS CLI. (For guidance on how install this, please see [Amazon's documentation on how to install the AWS CLI](
- Configured your AWS credentials. (For guidance in doing this, please see [Amazon's documentation on configuring the AWS CLI](
- The ability to install Python packages ([`boto3`]( and `great_expectations`) with pip.
- Identified the S3 bucket and prefix where Expectations and Validation Results will be stored.


## Steps

## Part 1: Setup

### 1.1 Ensure that the AWS CLI is ready for use

#### 1.1.1 Verify that the AWS CLI is installed
<VerifyAwsInstalled />

#### 1.1.2 Verify that your AWS credentials are properly configured
<VerifyAwsCredentials />

### 1.2 Prepare a local installation of Great Expectations and necessary dependencies

#### 1.2.1 Verify that your Python version meets requirements
<VerifyPythonVersion />

<WhereToGetPython />

#### 1.2.2 Create a virtual environment for your Great Expectations project
<CreateVirtualEnvironment />

#### 1.2.3 Ensure you have the latest version of pip
<GetLatestPip />

#### 1.2.4 Install boto3
<InstallBoto3WithPip />

#### 1.2.5 Install Spark dependencies for S3
<InstallSparkS3Dependencies />

#### 1.2.6 Install Great Expectations
<InstallGxWithPip />

#### 1.2.7 Verify that Great Expectations installed successfully
<VerifySuccessfulGxInstallation />

### 1.3 Create your Data Context
<CreateDataContextWithCli />

### 1.4 Configure your Expectations Store on Amazon S3

#### 1.4.1 Identify your Data Context Expectations Store
<IdentifyDataContextExpectationsStore />

#### 1.4.2 Update your configuration file to include a new Store for Expectations on Amazon S3
<AddS3ExpectationsStoreConfiguration />

#### 1.4.3 Verify that the new Amazon S3 Expectations Store has been added successfully
<VerifyS3ExpectationsStoreExists />

#### 1.4.4 (Optional) Copy existing Expectation JSON files to the Amazon S3 bucket
<OptionalCopyExistingExpectationsToS3 />

#### 1.4.5 (Optional) Verify that copied Expectations can be accessed from Amazon S3
<OptionalVerifyCopiedExpectationsAreAccessible />

### 1.5 Configure your Validation Results Store on Amazon S3

#### 1.5.1 Identify your Data Context's Validation Results Store
<IdentifyDataContextValidationResultsStore />

#### 1.5.2 Update your configuration file to include a new Store for Validation Results on Amazon S3
<AddS3ValidationResultsStoreConfiguration />

#### 1.5.3 Verify that the new Amazon S3 Validation Results Store has been added successfully
<VerifyS3ValidationResultsStoreExists />

#### 1.5.4 (Optional) Copy existing Validation results to the Amazon S3 bucket
<OptionalCopyExistingValidationResultsToS3 />

### 1.6 Configure Data Docs for hosting and sharing from Amazon S3

#### 1.6.1 Create an Amazon S3 bucket for your Data Docs
<CreateAnS3BucketForDataDocs />

#### 1.6.2 Configure your bucket policy to enable appropriate access
<ConfigureYourBucketPolicyToEnableAppropriateAccess />

#### 1.6.3 Apply the access policy to your Data Docs' Amazon S3 bucket
<ApplyTheDataDocsAccessPolicy />

#### 1.6.4 Add a new Amazon S3 site to the `data_docs_sites` section of your `great_expectations.yml`
<AddANewS3SiteToTheDataDocsSitesSectionOfYourGreatExpectationsYml />

#### 1.6.5 Test that your Data Docs configuration is correct by building the site
<TestThatYourConfigurationIsCorrectByBuildingTheSite />

#### Additional notes on hosting Data Docs from an Amazon S3 bucket
<AdditionalDataDocsNotes />

## Part 2: Connect to data

### 2.1 Choose how to run the code for creating a new Datasource
<HowToRunDatasourceCode />

### 2.2 Instantiate your project's DataContext
<InstantiateDataContext />

### 2.3 Configure your Datasource
<ConfigureYourDatasource />

### 2.4 Save the Datasource configuration to your DataContext
<SaveDatasourceConfigurationToDataContext />

### 2.5 Test your new Datasource
<TestS3Datasource />

## Part 3: Create Expectations

### 3.1: Prepare a Batch Request, empty Expectation Suite, and Validator

<PrepareABatchRequestAndValidatorForCreatingExpectations />

### 3.2: Use a Validator to add Expectations to the Expectation Suite

<CreateExpectationsInteractively />

### 3.3: Save the Expectation Suite

<SaveTheExpectationSuite />

## Part 4: Validate Data

### 4.1: Create and run a Checkpoint

<CheckpointCreateAndRun />

#### 4.1.1 Create a Checkpoint

<CreateCheckpoint />

#### 4.1.2 Save the Checkpoint

<SaveCheckpoint />

#### 4.1.3 Run the Checkpoint

<RunCheckpoint />

### 4.2: Build and view Data Docs

<BuildAndViewDataDocs />

## Congratulations!

<Congratulations />

