Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] History of prompts #171

Open
geroldmeisinger opened this issue Jun 5, 2024 · 2 comments
Open

[feature request] History of prompts #171

geroldmeisinger opened this issue Jun 5, 2024 · 2 comments

Comments

@geroldmeisinger
Copy link
Contributor

geroldmeisinger commented Jun 5, 2024

doesn't have to be per image, but at least a global history would be nice. and also saves settings.

@geroldmeisinger
Copy link
Contributor Author

geroldmeisinger commented Jun 8, 2024

I would strongly argue for a history.jsonl file in the image directory BY DEFAULT as it provides additional infos for any published image datasets on how the prompts came to be and which specific settings were used. if taggui catches on and more and more image datasets are published with this file, we get better insights. just like ComfyUI included workflows in every image.
users who don't like it can "opt-out" by just deleting the file anytime.

@geroldmeisinger
Copy link
Contributor Author

geroldmeisinger commented Jun 8, 2024

some proposals for discussion:

process-centric

everytime you press start caption, a new entry is added with all the captioning settings, parameters, meta data and image selections:

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
    # selection
    images: list[str] # [ "00000/000000001.jpg", "00000/000000005.jpg", ... ]
    # or
    images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

  • just one file
  • easy to append
  • easy to load last settings
  • easy to remove unwanted informations (only use last one)

cons:

  • saving image selection EVERYTIME might be huge (20 chars * 100k ~ 2MB array everytime the full 100k image dataset is captioned)
  • has to parse all entries to get all valid entries for selected image(s)

selection-centric

everytime the image selection changes, a new entry is added. every caption process just append nested in captionings

[
{
  # selection
  images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
  # processes
  captionings:
    [
      {  
        date: datetime # 2023-12-31 12:34:56
        # meta
        model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
        model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
        taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
        # captioning
        prompt: str # "Can you please describe this image in up to two paragraphs?"
        caption_start: str # "This image showcases"
        ...
        # parameters
        min_new_tokens: int
        max_new_tokens: int
        ...
      },
     {
          date: datetime # 2023-12-31 12:00:00
          ...
      }
    ]
},
{ # add different image selection
...
}
]

pros:

  • just one file
  • easy to load last setting

cons:

  • weird
  • size should not be an issue

image-centric

just save a .json next to the captions .txt and load last settings from first image selected

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

  • easily transferable

cons:

  • lots of files
  • no overview, has to look into every file to see if a different setting has been used

Typical Workflows?

a) first you make a small selection and try a lot of prompts. once you have the ideal prompt, you select all and caption all one time. => prefer process-centric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant