Skip to content

Conversation

@kurotych
Copy link
Member

@kurotych kurotych commented May 15, 2025

AwsLocal is the code I use in 3 different projects to mock proto files in AWS.
So, I think it would be nice to have the core AwsLocal as workspace member in oracles repo to have capability to reuse it.

https://github.com/novalabsxyz/mobile-rewards-estimator/pull/20#issuecomment-2695429042

@kurotych kurotych requested a review from Copilot May 16, 2025 11:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the aws_local crate for mocking S3 interactions via LocalStack, integrates it into the oracles workspace, and adds an initial dataset downloader with integration tests.

  • Adds aws_local helper crate and wires it into CI
  • Implements DataSetDownloaderDaemon with separate fetch_first_datasets and check_for_new_data_sets APIs
  • Adds integration tests for dataset downloading in mobile_verifier

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mobile_verifier/tests/integrations/boosting_oracles.rs Adds helper functions and integration test for downloader
mobile_verifier/src/boosting_oracles/data_sets.rs Refactors run, exposes check_for_new_data_sets, and adds fetch_first_datasets
mobile_verifier/Cargo.toml Registers aws-local and tempfile
file_store/Cargo.toml Switches AWS crates to workspace references
aws_local/src/lib.rs Implements AwsLocal wrapper over LocalStack S3
aws_local/src/README.md Adds LocalStack usage note
aws_local/Cargo.toml Declares aws-local package
Cargo.toml Adds aws_local workspace member and patches S3 crates
.github/workflows/{DockerCI,CI}.yml Spins up LocalStack in CI
Comments suppressed due to low confidence (1)

mobile_verifier/tests/integrations/boosting_oracles.rs:160

  • [nitpick] Function name hex_assignment_file_exist is grammatically inconsistent; consider renaming to hex_assignment_file_exists for clarity.
pub async fn hex_assignment_file_exist(pool: &PgPool, filename: &str) -> bool {


pub async fn run(mut self) -> anyhow::Result<()> {
tracing::info!("Starting data set downloader task");
self.fetch_first_datasets().await?;
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run method no longer calls check_for_new_data_sets, so it only fetches initial datasets but never polls for new updates. Consider invoking self.check_for_new_data_sets().await? (or a looping mechanism) within run to restore the intended behavior.

Suggested change
self.fetch_first_datasets().await?;
self.fetch_first_datasets().await?;
self.check_for_new_data_sets().await?;

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +40
.unwrap(),
_ => None,
};
let region = Region::new(settings.region.clone());
let region_provider = RegionProviderChain::first_try(region).or_default_provider();

let mut config = aws_config::from_env().region(region_provider);
config = config.endpoint_resolver(endpoint.unwrap());
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling endpoint.unwrap() will panic if settings.endpoint is None. Consider handling the Option safely (e.g., using if let Some(ep) = endpoint or returning an error with a clear message).

Suggested change
.unwrap(),
_ => None,
};
let region = Region::new(settings.region.clone());
let region_provider = RegionProviderChain::first_try(region).or_default_provider();
let mut config = aws_config::from_env().region(region_provider);
config = config.endpoint_resolver(endpoint.unwrap());
.map_err(|e| anyhow!("Invalid endpoint URI: {}", e))?,
None => return Err(anyhow!("AWS endpoint is not set in the settings")),
};
let region = Region::new(settings.region.clone());
let region_provider = RegionProviderChain::first_try(region).or_default_provider();
let mut config = aws_config::from_env().region(region_provider);
if let Some(endpoint) = endpoint {
config = config.endpoint_resolver(endpoint);
}

Copilot uses AI. Check for mistakes.
// So we wait when dir will be empty.
// It means all files are uploaded to aws
loop {
if is_dir_has_files(&dir_path_clone) {
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop waits for directory emptiness but never breaks on timeout expiry, risking an infinite loop. Consider checking timeout and breaking or returning an error when elapsed.

Suggested change
if is_dir_has_files(&dir_path_clone) {
if is_dir_has_files(&dir_path_clone) {
if timeout <= std::time::Duration::ZERO {
eprintln!("Timeout expired while waiting for directory to become empty.");
break;
}

Copilot uses AI. Check for mistakes.
Copy link
Member Author

@kurotych kurotych May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout -= dur;
Causes panic and breaks loop if timeout happens

Comment on lines +107 to +109
tokio::spawn(async move {
let uploaded_files = item_recv.await.unwrap().unwrap();
assert!(uploaded_files.len() == 1);
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assert! inside a spawned task can cause panics that aren’t observed by the main test harness. Consider returning errors through a channel or awaiting the task’s result to propagate failures.

Suggested change
tokio::spawn(async move {
let uploaded_files = item_recv.await.unwrap().unwrap();
assert!(uploaded_files.len() == 1);
let (tx, rx) = tokio::sync::oneshot::channel();
tokio::spawn(async move {
let uploaded_files = item_recv.await.unwrap().unwrap();
if uploaded_files.len() != 1 {
let _ = tx.send(Err(anyhow!("Expected exactly one uploaded file, but found {}", uploaded_files.len())));
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +117
pub async fn create_data_set_downloader(
pool: PgPool,
file_paths: Vec<PathBuf>,
file_upload: FileUpload,
new_coverage_object_notification: NewCoverageObjectNotification,
tmp_dir: &TempDir,
) -> (DataSetDownloaderDaemon, PathBuf, String) {
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Returning a tuple (DataSetDownloaderDaemon, PathBuf, String) can be unclear to callers. Consider defining a small struct to name each field for better readability and maintainability.

Suggested change
pub async fn create_data_set_downloader(
pool: PgPool,
file_paths: Vec<PathBuf>,
file_upload: FileUpload,
new_coverage_object_notification: NewCoverageObjectNotification,
tmp_dir: &TempDir,
) -> (DataSetDownloaderDaemon, PathBuf, String) {
pub struct DataSetDownloaderResult {
pub downloader: DataSetDownloaderDaemon,
pub data_set_directory: PathBuf,
pub bucket_name: String,
}
pub async fn create_data_set_downloader(
pool: PgPool,
file_paths: Vec<PathBuf>,
file_upload: FileUpload,
new_coverage_object_notification: NewCoverageObjectNotification,
tmp_dir: &TempDir,
) -> DataSetDownloaderResult {

Copilot uses AI. Check for mistakes.
@kurotych kurotych changed the title Add aws_local and datasets test Add localstack integration and DataSetDownloaderDaemon test May 16, 2025
@kurotych kurotych marked this pull request as ready for review May 16, 2025 11:35
@kurotych kurotych requested a review from jeffgrunewald May 16, 2025 11:38
) -> Result<String> {
// Uuid uses as random to avoid colisions
let uuid: Uuid = Uuid::new_v4();
let dir_path = format!("/tmp/{}/{}", uuid, self.fs_settings.bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use tempdir here as well?
We don't need to do any cleanup ourselves. All files are removed when the TempDir is dropped.

In the PR we first encountered AwsLocal (https://github.com/novalabsxyz/mobile-rewards-estimator/pull/20), this function was never used. Is it in fact not used in any of your projects and we can drop it?

Copy link
Member Author

@kurotych kurotych May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaeldjeffrey
This function is used at least in the CAS. It serves as the basis for functions that want to put a particular type of proto file.

https://github.com/novalabsxyz/mobile-coverage/blob/master/cas/tests/common/aws_mock.rs#L141

Can we use tempdir here as well?

Yes, changed

@kurotych kurotych requested a review from michaeldjeffrey May 28, 2025 14:20
@kurotych kurotych merged commit f5b7c46 into main Jun 3, 2025
27 checks passed
@kurotych kurotych deleted the dataset-tests branch June 3, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants