## Part 1: Test current image

In [1]:
!sudo docker images

REPOSITORY                  TAG       IMAGE ID       CREATED         SIZE
pakkinlau/gpt4all-wrapper   latest    d02965c65fcb   6 hours ago     19.7GB
hello-world                 latest    d2c94e258dcb   10 months ago   13.3kB


To run `main.py` from the host system, (that is expected will be changed a lot), into the container (we expect docker image should be changed less frequent), we would mount the file.

So we have the following command:

`!sudo docker run -v $(pwd)/main.py:/app/main.py pakkinlau/gpt4all-wrapper python /app/main.py`

- `-v "$(pwd)/main.py:/app/main.py"`: This option mounts the `main.py` file from the currend director `$(pwd)`, which is a shell command that outputs current directory, into the container at the location `/app/main.py`. Remark: This argument must be quoted, otherwise it would return uppercase / whitespace error when docker is trying to comprehend the location.

- `pakkinlau/gpt4all-wrapper` is the name of th Docker image we want to run.

- (Optional) `python /app/main.py`: This part is an optional argument that tells docker what to do after mounting the file into the target location in the container. 

The overall effect for this `docker run` command is: 
- Replace `main.py` in the image, to be the `main.py` that we provided in first argument. 
- Container will run `/app/main.py` with python after mounting the host system's main.py.

In [10]:
!sudo docker run -v "$(pwd)/main.py:/app/main.py" pakkinlau/gpt4all-wrapper python /app/main.py


== CUDA ==

CUDA Version 12.3.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Traceback (most recent call last):
  File "/app/main.py", line 2, in <module>
    model = GPT4All("./models/nous-hermes-llama2-13b.Q4_0.gguf")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/gpt4all/gpt4all.py", line 101, in __init__
    self.config: ConfigType = self.retrieve_m

In [7]:
!pwd

/home/researcher/Desktop/Project folder/EuropeChinaBigData/LLM setup procedure/Project - Running GPT4ALL


Analyzing error: 

Print current directory tree:

In [1]:
import os

def print_tree(directory, indent=''):
    files = os.listdir(directory)
    files.sort()  # Sort files alphabetically

    tree_string = ""  # Variable to store the tree string

    for file in files:
        path = os.path.join(directory, file)
        if os.path.isfile(path):
            # Append file name with proper indentation to the tree string
            tree_string += f"{indent}|-- {file}\n"
        elif os.path.isdir(path):
            # Append directory name with proper indentation to the tree string
            tree_string += f"{indent}|-- {file}/\n"
            # Recursive call with increased indentation
            tree_string += print_tree(path, indent + "|   ")

    return tree_string

# Generate the tree string
root_dir = os.getcwd()
tree_string = print_tree(root_dir)

# Print the tree string to the console
print(tree_string)

|-- Dockerfile
|-- Part A - Write GPT4ALL-wrapper.ipynb
|-- Part B - Testing the image.ipynb
|-- check_internet.py
|-- controller.ipynb
|-- main.py
|-- models/
|   |-- nous-hermes-llama2-13b.Q4_0.gguf



To inspect the docker image: 

(Run this in terminal, due to its interactive nature)

- `-it`: this option tells Docker to allocate a pseudo-TTY and keep STDIN open.
- `-rm`: remove the container when exiting the container's shell.
- `--name temp-container` this option assigns a name to the running container, making it easier to reference.
- `bash`: this is the command that will be run inside the new container. `bash` is starting a bash shell.

In [None]:
!sudo docker run -it --rm --name temp-container pakkinlau/gpt4all-wrapper bash

In [None]:
root@da26a9e2e8f2:/app# ls -la
total 28
drwxr-xr-x 1 root root 4096 Mar  5 07:40  .
drwxr-xr-x 1 root root 4096 Mar  5 14:24  ..
-rw-rw-r-- 1 root root 3366 Mar  5 07:40  Dockerfile
-rw-rw-r-- 1 root root 6356 Mar  5 07:13 'Run GPT4ALL.ipynb'
-rw-rw-r-- 1 root root  211 Mar  4 23:03  main.py
drwxrwxr-x 2 root root 4096 Mar  4 21:29  models
root@da26a9e2e8f2:/app# ls -la /app/models/
total 7193432
drwxrwxr-x 2 root root       4096 Mar  4 21:29 .
drwxr-xr-x 1 root root       4096 Mar  5 07:40 ..
-rw-rw-r-- 1 root root 7366062080 Mar  4 20:50 nous-hermes-llama2-13b.Q4_0.gguf
root@da26a9e2e8f2:/app# 

There are a few of possible error:

Internet access problem:
- We can check it by using the image to run another python script that access google.com

Permission problem:
- In bash, modify them with `chmod` and `chown`

## Part 2: Updating current image

In [1]:
!sudo docker images

REPOSITORY                  TAG       IMAGE ID       CREATED         SIZE
pakkinlau/gpt4all-wrapper   latest    d02965c65fcb   7 hours ago     19.7GB
hello-world                 latest    d2c94e258dcb   10 months ago   13.3kB


After completed editing the project folder and Dockerfile, run `docker build` to update the image

In [None]:
!sudo docker build -t pakkinlau/gpt4all-wrapper .

## Part 3: New proposal - Running jupyter notebooks inside container, commit changes to the image, after changing the container

When we use `docker commit` to create a new image, Docker uses a layered filesystem. This means the new image will share layers with the old image, only the changes that have been made in the container will add addition storage usage. 

To see the shared layer, and how much space is actually being used, `docker history image_name:tag`.

If deleting old image, Docker's layered file system ensures that only the unique layersof the old images are removed.

We will discuss how to commit image changes below: 

Say the container is still running. We need to do the following steps:

- 1. Find the contaienr ID: `docker ps` or `docker ps -a`

- 2. Commit the container: `docker commit container_id_or_name new_image_name:tag`

- 3. stop the container: `docker stop container_id_or_name`
- 4. Run a new container from the new image: `docker run -p 8888:8888 new_image_name:tag`

## Part 4 Implementing the change

Now, I added `controller.ipynb` into the project folder: 

In [2]:
# Print the tree string to the console
print(tree_string)

|-- Dockerfile
|-- Part A - Write GPT4ALL-wrapper.ipynb
|-- Part B - Testing the image.ipynb
|-- check_internet.py
|-- controller.ipynb
|-- main.py
|-- models/
|   |-- nous-hermes-llama2-13b.Q4_0.gguf



## Part 5: Short term maintenance: `docker commit`

Implement the follwowing changes on the docker image:

- Install python 3.11 
- Use virtual environemtn to keep track on installed packages 
- Run jupyter lab at the end to make a controller environment
- Potentially use `docker commit` to update the image 
- Potentially use `pip freeze > requirements.txt` to make the changes more tracable. 

The following are the major steps of updating a docker image: 

In [None]:
# step 1: update requirements.txt
!pip freeze > requirements.txt

In [None]:
# step 2: copy the new requirements.txt to the host machine.
!sudo docker cp my_container:/path/to/requirements.txt /path/on/host/machine/requirements.txt

Finally, to commit a change

- `abc123` container ID. Use `docker ps` at the time of running the docker image to get to know of that.
- `my-updated-image`: replace it with the tag of the image. For this project, it is `pakkinlau/gpt4all-wrapper`.
- `:v20`: `v20` indicates the custom version tag. It is `latest` is the default version tag, for the last tag that maps to the image. 

In [None]:
!sudo docker commit -m "Installed new_package" abc123 my-updated-image:v20

Inside Dockerfile, we will have the follwoing lines that update the requirements.txt inside the image:

```Dockerfile
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
```

## Part 6 - Long term maintenance - `docker build`

To create/update, we use the following command:

In [3]:
!sudo docker build -t pakkinlau/gpt4all-wrapper .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 3.47kB                                     0.0s
[0m => [internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-runt  0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 3.47kB                                     0.0s
[0m => [internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-runt  0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                          docker:default
[34m => [internal] load build definition from Dockerfile     

# Building an image (6 March 2024): Spent 4 minutes.