# Creating a data-only Docker image


## Setup 


In [1]:
set -x
mkdir -p data
tree

+ mkdir -p data
+ tree
.
├── Dockerfile
├── data
├── data_01
│   ├── a-z.01-1k.tsv
│   ├── a-z.combined.tsv
│   └── titanic.csv
├── data_01.tar
└── image.data-only.ipynb

2 directories, 6 files


## Get data


In [2]:
(
cd data
curl -L -s -o titanic.csv 'https://ddc-datascience.s3.amazonaws.com/Projects/Example/Data/Titanic.train.csv'
curl -L -s -o a-z.01-1k.tsv 'https://ddc-datascience.s3.amazonaws.com/a-z.business/2023-08-21/01.1k.txt'
curl -L -s -o a-z.combined.tsv 'https://ddc-datascience.s3.amazonaws.com/a-z.business/2023-08-21/combined.txt'
)

+ cd data
+ curl -L -s -o titanic.csv https://ddc-datascience.s3.amazonaws.com/Projects/Example/Data/Titanic.train.csv
+ curl -L -s -o a-z.01-1k.tsv https://ddc-datascience.s3.amazonaws.com/a-z.business/2023-08-21/01.1k.txt
+ curl -L -s -o a-z.combined.tsv https://ddc-datascience.s3.amazonaws.com/a-z.business/2023-08-21/combined.txt


In [3]:
tree

+ tree
.
├── Dockerfile
├── data
│   ├── a-z.01-1k.tsv
│   ├── a-z.combined.tsv
│   └── titanic.csv
├── data_01
│   ├── a-z.01-1k.tsv
│   ├── a-z.combined.tsv
│   └── titanic.csv
├── data_01.tar
└── image.data-only.ipynb

2 directories, 9 files


## From a folder tar file


### Create tar file from folder


In [4]:
tar -cvf data_01.tar data


+ tar -cvf data_01.tar data
data/
data/titanic.csv
data/a-z.01-1k.tsv
data/a-z.combined.tsv


In [5]:
mv data data_01
tree


+ mv data data_01
+ tree
.
├── Dockerfile
├── data_01
│   ├── a-z.01-1k.tsv
│   ├── a-z.combined.tsv
│   ├── data
│   │   ├── a-z.01-1k.tsv
│   │   ├── a-z.combined.tsv
│   │   └── titanic.csv
│   └── titanic.csv
├── data_01.tar
└── image.data-only.ipynb

2 directories, 9 files


### Create image


In [6]:
docker image import data_01.tar data:01
docker image list -a | grep data


+ docker image import data_01.tar data:01
sha256:412389b22d70bc6c0f8b55c4bdb5cd0440517960db72f0ffced43974e224f8ba
+ docker image list -a
+ grep --color=auto data
data                    01        412389b22d70   Less than a second ago   39.2MB


### Create volume from instance from image


In [7]:
docker container create --volume data_01:/data --name data_01 data:01 :
docker container list -a | grep data
docker volume list


+ docker container create --volume data_01:/data --name data_01 data:01 :
5e343f54899fb0b2ade30aa085a28247465ec7282db60a5e514d516f3d9d7dbb
+ grep --color=auto data
+ docker container list -a
5e343f54899f   data:01                 ":"                      4 seconds ago   Created                                  data_01
+ docker volume list
DRIVER    VOLUME NAME
local     data_01


### Show data in volume


In [8]:
docker container run --volume data_01:/data --rm -it ubuntu ls -lA / /data


+ docker container run --volume data_01:/data --rm -it ubuntu ls -lA / /data
/:
total 24
-rwxr-xr-x   1 root   root      0 Mar 20 14:11 .dockerenv
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 bin -> usr/bin
drwxr-xr-x   1 root   root      0 Apr 18  2022 boot
drwxr-xr-x   1 root   root     80 Mar 20 14:11 data
drwxr-xr-x   5 root   root    360 Mar 20 14:11 dev
drwxr-xr-x   1 root   root     56 Mar 20 14:11 etc
drwxr-xr-x   1 root   root      0 Apr 18  2022 home
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 lib -> usr/lib
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib32 -> usr/lib32
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib64 -> usr/lib64
lrwxrwxrwx   1 root   root     10 Feb 27 15:59 libx32 -> usr/libx32
drwxr-xr-x   1 root   root      0 Feb 27 15:59 media
drwxr-xr-x   1 root   root      0 Feb 27 15:59 mnt
drwxr-xr-x   1 root   root      0 Feb 27 15:59 opt
dr-xr-xr-x 201 nobody nogroup   0 Mar 20 14:11 proc
drwx------   1 root   root     30 Feb 27 16:02 root
drwxr-xr-x  

## From instance tar file


### Create tar file from instance


In [9]:
docker container export data_01 > data_02.tar
tar -tvf data_02.tar
tar -tvf data_02.tar data/


+ docker container export data_01
+ tar -tvf data_02.tar
-rwxr-xr-x 0/0               0 2024-03-20 14:11 .dockerenv
drwxr-xr-x 0/0               0 2024-03-20 14:10 data/
-rw-r--r-- 0/0         1006550 2024-03-20 14:10 data/a-z.01-1k.tsv
-rw-r--r-- 0/0        38138507 2024-03-20 14:11 data/a-z.combined.tsv
-rw-r--r-- 0/0           61194 2024-03-20 14:10 data/titanic.csv
drwxr-xr-x 0/0               0 2024-03-20 14:11 dev/
-rwxr-xr-x 0/0               0 2024-03-20 14:11 dev/console
drwxr-xr-x 0/0               0 2024-03-20 14:11 dev/pts/
drwxr-xr-x 0/0               0 2024-03-20 14:11 dev/shm/
drwxr-xr-x 0/0               0 2024-03-20 14:11 etc/
-rwxr-xr-x 0/0               0 2024-03-20 14:11 etc/hostname
-rwxr-xr-x 0/0               0 2024-03-20 14:11 etc/hosts
lrwxrwxrwx 0/0               0 2024-03-20 14:11 etc/mtab -> /proc/mounts
-rwxr-xr-x 0/0               0 2024-03-20 14:11 etc/resolv.conf
drwxr-xr-x 0/0               0 2024-03-20 14:11 proc/
drwxr-xr-x 0/0               0 2024-03

### Create image

In [10]:
docker image import data_02.tar data:02
docker image list -a | grep data


+ docker image import data_02.tar data:02
sha256:22a85a7ce44857c550408e22ba0d4cd2b855e59f6883783931390b8862858a67
+ grep --color=auto data
+ docker image list -a
data                    02        22a85a7ce448   Less than a second ago   39.2MB
data                    01        412389b22d70   10 seconds ago           39.2MB


### Create volume from instance from image


In [11]:
docker container create --volume data_02:/data --name data_02 data:02 :
docker container list -a | grep data
docker volume list


+ docker container create --volume data_02:/data --name data_02 data:02 :
dd96c2430435bdb782bb6a583a3e316b7edb794d527c23f0c615775fb7a88e67
+ grep --color=auto data
+ docker container list -a
dd96c2430435   data:02                 ":"                      2 seconds ago    Created                                  data_02
5e343f54899f   data:01                 ":"                      12 seconds ago   Created                                  data_01
+ docker volume list
DRIVER    VOLUME NAME
local     data_01
local     data_02


### Show data in volume


In [12]:
docker container run --volume data_02:/data --rm -it ubuntu ls -lA / /data


+ docker container run --volume data_02:/data --rm -it ubuntu ls -lA / /data
/:
total 24
-rwxr-xr-x   1 root   root      0 Mar 20 14:11 .dockerenv
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 bin -> usr/bin
drwxr-xr-x   1 root   root      0 Apr 18  2022 boot
drwxr-xr-x   1 root   root     80 Mar 20 14:11 data
drwxr-xr-x   5 root   root    360 Mar 20 14:11 dev
drwxr-xr-x   1 root   root     56 Mar 20 14:11 etc
drwxr-xr-x   1 root   root      0 Apr 18  2022 home
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 lib -> usr/lib
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib32 -> usr/lib32
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib64 -> usr/lib64
lrwxrwxrwx   1 root   root     10 Feb 27 15:59 libx32 -> usr/libx32
drwxr-xr-x   1 root   root      0 Feb 27 15:59 media
drwxr-xr-x   1 root   root      0 Feb 27 15:59 mnt
drwxr-xr-x   1 root   root      0 Feb 27 15:59 opt
dr-xr-xr-x 201 nobody nogroup   0 Mar 20 14:11 proc
drwx------   1 root   root     30 Feb 27 16:02 root
drwxr-xr-x  

## From a Dockerfile

### Create image

In [13]:
docker image build --tag csv_datasets ./.
docker image list -a | grep data


+ docker image build --tag csv_datasets ./.
[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 731B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/library/ubuntu:22.04            0.0s
[0m => [internal] load .dockerignore                                          0.0s
 => => transferring context:                                               0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.2s (3/6)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 731B                     

### Create volume from instance from image


In [14]:
docker container create --volume data_03:/data --name data_03 csv_datasets :
docker container list -a | grep data
docker volume list


+ docker container create --volume data_03:/data --name data_03 csv_datasets :
3572673334e75b49125a90aa15b99f268156b4a0df5bfadb6cf97f132188f302
+ grep --color=auto data
+ docker container list -a
3572673334e7   csv_datasets            ":"                      1 second ago     Created                                  data_03
dd96c2430435   data:02                 ":"                      7 seconds ago    Created                                  data_02
5e343f54899f   data:01                 ":"                      17 seconds ago   Created                                  data_01
+ docker volume list
DRIVER    VOLUME NAME
local     data_01
local     data_02
local     data_03


### Show data in volume


In [15]:
docker container run --volume data_03:/data --rm -it ubuntu ls -lA / /data


+ docker container run --volume data_03:/data --rm -it ubuntu ls -lA / /data
/:
total 24
-rwxr-xr-x   1 root   root      0 Mar 20 14:11 .dockerenv
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 bin -> usr/bin
drwxr-xr-x   1 root   root      0 Apr 18  2022 boot
drwxr-xr-x   1 root   root     80 Mar 20 14:11 data
drwxr-xr-x   5 root   root    360 Mar 20 14:11 dev
drwxr-xr-x   1 root   root     56 Mar 20 14:11 etc
drwxr-xr-x   1 root   root      0 Apr 18  2022 home
lrwxrwxrwx   1 root   root      7 Feb 27 15:59 lib -> usr/lib
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib32 -> usr/lib32
lrwxrwxrwx   1 root   root      9 Feb 27 15:59 lib64 -> usr/lib64
lrwxrwxrwx   1 root   root     10 Feb 27 15:59 libx32 -> usr/libx32
drwxr-xr-x   1 root   root      0 Feb 27 15:59 media
drwxr-xr-x   1 root   root      0 Feb 27 15:59 mnt
drwxr-xr-x   1 root   root      0 Feb 27 15:59 opt
dr-xr-xr-x 202 nobody nogroup   0 Mar 20 14:11 proc
drwx------   1 root   root     30 Feb 27 16:02 root
drwxr-xr-x  

## Clean up


In [16]:
docker container rm data_01 data_02  data_03
docker volume rm data_01 data_02 data_03 
docker image rm data:01 data:02 csv_datasets
rm -rf data*

+ docker container rm data_01 data_02 data_03
data_01
data_02
data_03
+ docker volume rm data_01 data_02 data_03
data_01
data_02
data_03
+ docker image rm data:01 data:02 csv_datasets
Untagged: data:01
Deleted: sha256:412389b22d70bc6c0f8b55c4bdb5cd0440517960db72f0ffced43974e224f8ba
Untagged: data:02
Deleted: sha256:22a85a7ce44857c550408e22ba0d4cd2b855e59f6883783931390b8862858a67
Deleted: sha256:32ae24d5dccc3ee34db3341a896cf769db1cb0eafb0e37fd1d36679f603ecc27
Untagged: csv_datasets:latest
Deleted: sha256:aacbedf068395216287006b4374abcb586752fdcffe9986bf2da53dc3b4f4a8e
+ rm -rf data_01 data_01.tar data_02.tar


### Verify clean up


In [17]:
docker image list -a
docker container list -a
docker volume list
ls -lA


+ docker image list -a
REPOSITORY              TAG       IMAGE ID       CREATED         SIZE
rwcitek/jupyter.light   latest    bb28c4444196   5 days ago      934MB
ubuntu                  22.04     ca2b0f26964c   3 weeks ago     77.9MB
ubuntu                  latest    ca2b0f26964c   3 weeks ago     77.9MB
rwcitek/ubuntu          22.04     ca2b0f26964c   3 weeks ago     77.9MB
rwcitek/barcode-gen     latest    1e4213eb03e2   16 months ago   532MB
+ docker container list -a
CONTAINER ID   IMAGE                   COMMAND                  CREATED        STATUS        PORTS                      NAMES
beaf18576a8f   rwcitek/jupyter.light   "jupyter lab --allow…"   10 hours ago   Up 10 hours   127.0.0.1:8888->8888/tcp   jupyter
+ docker volume list
DRIVER    VOLUME NAME
+ ls --color=auto -lA
total 16
drwxr-xr-x 1 root root  128 Mar 20 04:42 .ipynb_checkpoints
-rw-r--r-- 1 1000 1000  692 Mar 20 04:38 Dockerfile
-rw-r--r-- 1 1000 1000 8214 Mar 20 14:09 image.data-only.ipynb
