Skip to content

πŸ§‘β€πŸ³ Filecoin Extract & Transformation jobs

License

Notifications You must be signed in to change notification settings

filecoin-project/filet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ§‘β€πŸ³ Filet

Filet (Filecoin Extract Transform) makes it simple to get CSV data from Filecoin Archival Snapshots using Lily and lily-archiver.

πŸš€ Usage

The filet image available on Google Container Artifact Hub. Alternatively, you can build it locally with make build.

The following command will generate CSVs from an Filecoin Archival Snapshot:

docker run -it \
    -v $PWD:/tmp/data \
    europe-west1-docker.pkg.dev/protocol-labs-data/pl-data/filet:latest -- \
    /lily/export.sh archival_snapshot.car.zst .

⏰ Scheduling Jobs

You can use the send_export_jobs.sh script to schedule jobs on Google Cloud Batch. The script takes a file with a list of snapshots as input.

./scripts/send_export_jobs.sh SNAPSHOT_LIST_FILE [--dry-run]

For more details on the scheduled jobs configuration, you can check the gce_batch_job.json file.

The SNAPSHOT_LIST_FILE file should contain a list of snapshots, one per line. The snapshots should be available in the fil-mainnet-archival-snapshots Google Cloud Storage bucket.

gsutil ls gs://fil-mainnet-archival-snapshots/historical-exports/ | sort --version-sort > all_snapshots.txt

To get the batches you can use the following command to filter by snapshot height:

grep -E '^[2226480-2232002]$'

πŸ”§ Managing Jobs

In case you need to retry a bunch of failed jobs, you can use the following commands:

# Get the list of failed jobs
gcloud alpha batch jobs list --format=json --filter="Status.state:FAILED" > failed_jobs.json

# Get the snapshot name from failed jobs
cat failed_jobs.json | jq ".[].taskGroups[0].taskSpec.runnables[0].container.commands[0]" -r | cut -d '/' -f 2 | sort > failed_jobs.list

# Retry the failed jobs
./scripts/send_export_jobs.sh failed_jobs.list

About

πŸ§‘β€πŸ³ Filecoin Extract & Transformation jobs

Resources

License

Stars

Watchers

Forks

Packages

No packages published