PreRequisites
---
Need to manually install python into vscode and bootstrap below
~~~
sudo apt update
sudo apt install python3-pip git
~~~

Also ssh to git@github.com first

In [1]:
import sys, os
from pathlib import Path

# For LambdaVMs we want to keep the cache on the persistent filesystem
mount_dir=Path('/home/ubuntu/niels-data')

cache_dir = mount_dir / 'cache'
hub_dir = mount_dir / 'cache/huggingface/hub'
os.makedirs(hub_dir, exist_ok=True)
os.environ['HF_DATASETS_CACHE'] = str(cache_dir)
os.environ['HUGGINGFACE_HUB_CACHE'] = str(hub_dir)
path = f'{os.environ["PATH"]}:{os.path.expanduser("~/.local/bin")}'
os.environ['PATH'] = path

Get dependencies for the project
---

In [2]:
%cd /home/ubuntu/niels-data
!git clone --recurse-submodules git@github.com:provos/stable-diffusion-finetuning.git
%cd stable-diffusion-finetuning
!pip install --upgrade pip
!pip install -r requirements.txt

!pip install --upgrade keras # on lambda stack we need to upgrade keras

/home/provos/src
Cloning into 'stable-diffusion-finetuning'...
remote: Enumerating objects: 1672, done.[K
remote: Counting objects: 100% (762/762), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 1672 (delta 712), reused 695 (delta 677), pack-reused 910[K
Receiving objects: 100% (1672/1672), 73.86 MiB | 9.89 MiB/s, done.
Resolving deltas: 100% (1045/1045), done.
Submodule 'twitterdownloader' (git@github.com:provos/twitterdownloader.git) registered for path 'twitterdownloader'
Cloning into '/home/provos/src/stable-diffusion-finetuning/twitterdownloader'...
remote: Enumerating objects: 2868, done.        
remote: Counting objects: 100% (253/253), done.        
remote: Compressing objects: 100% (230/230), done.        
remote: Total 2868 (delta 10), reused 230 (delta 10), pack-reused 2615        
Receiving objects: 100% (2868/2868), 341.84 MiB | 9.90 MiB/s, done.
Resolving deltas: 100% (136/136), done.
Submodule path 'twitterdownloader': checked out '144da5792

Cache the big CLIP model that usually fails to download
---

In [3]:
from huggingface_hub import scan_cache_dir, hf_hub_download
from datasets import load_dataset

# cache big clip model
hf_cache_info = scan_cache_dir()
hf_hub_download(repo_id="openai/clip-vit-large-patch14", filename="pytorch_model.bin", resume_download=True)

# cache dataset
directory = './twitterdownloader/data/labeled/'
dataset = load_dataset("imagefolder", data_dir=directory, split="train")
dataset[0]["text"]

Resolving data files:   0%|          | 0/2196 [00:00<?, ?it/s]

Using custom data configuration default-c241fe195f21476b


Downloading and preparing dataset imagefolder/default to /home/ubuntu/niels-data/cache/imagefolder/default-c241fe195f21476b/0.0.0/0fc50c79b681877cc46b23245a6ef5333d036f48db40d53765a68034bc48faff...
                

Downloading data files #6:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #14:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #11:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #4:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #12:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #15:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #13:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #2:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #9:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #8:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #10:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #0:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #3:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #1:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #5:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #7:   0%|          | 0/68 [00:00<?, ?obj/s]

             

Downloading data files #1:   0%|          | 0/69 [00:00<?, ?obj/s]

 

Downloading data files #0:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #4:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #2:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #5:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #3:   0%|          | 0/69 [00:00<?, ?obj/s]

 

Downloading data files #8:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #9:   0%|          | 0/69 [00:00<?, ?obj/s]

 

Downloading data files #6:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #10:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #7:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #13:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #12:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #11:   0%|          | 0/69 [00:00<?, ?obj/s]

Downloading data files #15:   0%|          | 0/68 [00:00<?, ?obj/s]

Downloading data files #14:   0%|          | 0/69 [00:00<?, ?obj/s]

Extracting data files:   0%|          | 0/1103 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset imagefolder downloaded and prepared to /home/ubuntu/niels-data/cache/imagefolder/default-c241fe195f21476b/0.0.0/0fc50c79b681877cc46b23245a6ef5333d036f48db40d53765a68034bc48faff. Subsequent calls will reuse this data.


'a neon sign that says karaoke in a dark city at night time with neon lights'