Skip to content

Download file from CSV file via http; Create training CSV file for AutoML and Sagemaker Ground Truth; Upload file to GCS and S3

License

Notifications You must be signed in to change notification settings

evalphobia/cloud-label-uploader

Repository files navigation

cloud-label-uploader

GoDoc License: MIT Release Build Status Go Report Card Code Climate BCH compliance

cloud-label-uploader download and upload files from url in CSV. And create labeled CSV file for machine learning of Google Cloud AutoML / AWS Sagemaker.

Installation

Install cloud-label-uploader by command below,

$ go get github.com/evalphobia/cloud-label-uploader

Usage

root command

$ cloud-label-uploader
Commands:

  help       show help
  download   Download files from --file csv
  list       Create list file from --input dir images
  upload     Upload files to Cloud Bucket(S3, GCS) from --input dir
  vott       Create object-detection list file from VoTT results

download command

download downloads file from url in CSV file.

$ cloud-label-uploader help download
Download files from --file csv

Options:

  -h, --help           display help information
  -i, --input         *input CSV file --input='/path/to/dir/input.csv'
  -n, --name          *column name for filename --name='name'
  -l, --label         *column name for label --label='group'
  -u, --url           *column name for URL --url='path'
  -m, --parallel[=2]   parallel number (multiple download) --parallel=2
  -o, --output         outout dir --output='/path/to/dir/'
# Save CSV file with name, label and URL.
$ cat my_file_list.csv

id,label,image_url
1,cat,http://example.com/foo.jpg
2,dog,http://example.com/bar.jpg
3,cat,https://example.com/foo2.JPG
4,human,https://example.com/baz.png?q=1
5,human,https://example.com/baz2.png


# Download files from URL in CSV.
$ cloud-label-uploader download -i ./my_file_list.csv -o ./save -n "id" -l "label" -u "image_url"


# Chech downloaded files.
$ tree ./save

./save
├── cat
│   ├── 1.jpg
│   ├── 3.JPG
├── dog
│   ├── 2.jpg
└── human
    ├── 4.png
    └── 5.png

3 directories, 5 files

list command

list creates a CSV file from images files, containing label and expected path on GCS/S3. from url in CSV file. (for multi label classification)

$ cloud-label-uploader help list
Create list file from --input dir images

Options:

  -h, --help                      display help information
  -i, --input                    *image dir path --input='/path/to/image_dir'
  -o, --output[=./output.csv]    *output CSV file path --output='./output.csv'
  -a, --all                       use all files
  -t, --type[=jpg,jpeg,png,gif]   comma separate file extensions --type='jpg,jpeg,png,gif'
  -f, --format[=csv]              set output format --format='[csv,sagemaker]'
  -p, --prefix                   *prefix for file path --prefix='gs://<your-bucket-name>'
# Create file list from given dir and save it to output CSV file.
$ cloud-label-uploader list -i ./save -o result.csv -p "gs://my-bucket/test-project"


# Check saved CSV file.
$ cat result.csv

gs://my-bucket/test-project/cat/1.jpg,cat
gs://my-bucket/test-project/cat/3.JPG,cat
gs://my-bucket/test-project/dog/2.jpg,dog
gs://my-bucket/test-project/human/4.png,human
gs://my-bucket/test-project/human/5.png,human

upload command

upload uploads image files in a directory to GCS/S3 bucket.

$ cloud-label-uploader help upload
Upload files to Cloud Bucket(S3, GCS) from --input dir

Options:

  -h, --help                      display help information
  -i, --input                    *image dir path --input='/path/to/image_dir'
  -t, --type[=jpg,jpeg,png,gif]   comma separate file extensions --type='jpg,jpeg,png,gif'
  -a, --all                       use all files
  -l, --label                     label file for training (outputted CSV file) --label='/path/to/output.csv'
  -c, --provider                 *cloud provider name for the bucket --provider='[s3,gcs]'
  -b, --bucket                   *bucket name of S3/GCS --bucket='<your-bucket-name>'
  -p, --prefix                   *prefix for S3/GCS --prefix='foo/bar'
  -m, --parallel[=2]              parallel number (multiple upload) --parallel=2
# Before uploading, create GCS bucket
# $ gsutil mb gs://example-bucket

# Create file list from given dir and save it to output CSV file.
# $ export GOOGLE_API_GO_PRIVATEKEY=`cat /path/to/gcs.pem`
# $ export GOOGLE_API_GO_EMAIL=gcs@example.iam.gserviceaccount.com
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/gcs.json
$ cloud-label-uploader upload -i ./save -b 'example-bucket' -t 'jpg,png' -p 'automl_model/20180401' -c 'gcs' -l './result.csv' -m 10

# upload files to gs://example-bucket/automl_model/20180401/ ...

vott command

vott creates a CSV file for AutoML Vision object-detection from VoTT's tagging result json files.

$ cloud-label-uploader help vott
Create object-detection list file from VoTT results

Options:

  -h, --help                    display help information
  -i, --input                  *VoTT json results dir path --input='/path/to/vott_json_dir'
  -o, --output[=./output.csv]  *output CSV file path --output='./output.csv'
  -p, --prefix[=gs://]         *prefix for file path --prefix='gs://<your-bucket-name>'
  -r, --recursive[=false]       read files in sub directories
# Create file list from given dir and save it to output CSV file.
$ cloud-label-uploader list -j ./vott/result -o result.csv -p "gs://my-bucket/test-project/"


# Check saved CSV file.
$ cat result.csv

UNASSIGNED,gs://my-bucket/test-project/cat/1.jpg,cat,0.17785499052004333,0.3945237235067437,0.28786057692307687,0.3945237235067437,0.28786057692307687,0.5815947133911368,0.17785499052004333,0.5815947133911368
UNASSIGNED,gs://my-bucket/test-project/cat/3.JPG,cat,0.7158391915641477,0.44460227272727265,0.8379421133567663,0.44460227272727265,0.8379421133567663,0.5285943675889327,0.7158391915641477,0.5285943675889327
UNASSIGNED,gs://my-bucket/test-project/dog/2.jpg,dog,0.6664113285482123,0.4608950407608696,0.7451203615926327,0.4608950407608696,0.7451203615926327,0.6013332201086957,0.6664113285482123,0.6013332201086957
UNASSIGNED,gs://my-bucket/test-project/human/4.png,human,0.730020144907909,0.4057476065751445,0.8625490926327194,0.4057476065751445,0.8625490926327194,0.5787572254335259,0.730020144907909,0.5787572254335259
UNASSIGNED,gs://my-bucket/test-project/human/5.png,human,,0.6799583559046587,0.4373871026011561,0.7823545842361863,0.4373871026011561,0.7823545842361863,0.5737953847543352,0.6799583559046587,0.5737953847543352

About

Download file from CSV file via http; Create training CSV file for AutoML and Sagemaker Ground Truth; Upload file to GCS and S3

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages