Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HF][5/n] Image2Text: Allow base64 inputs for images #856

Merged
merged 5 commits into from
Jan 10, 2024
Merged

[HF][5/n] Image2Text: Allow base64 inputs for images #856

merged 5 commits into from
Jan 10, 2024

Commits on Jan 10, 2024

  1. [HF][streaming][1/n] Text Summarization

    TSIA
    
    Adding streaming functionality to text summarization model parser
    
    ## Test Plan
    Rebase onto and test it with 11ace0a.
    
    Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command
    ```bash
    aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json
    parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
    alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'"
    aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path
    ```
    
    Then in AIConfig Editor run the prompt (it will be streaming format by default)
    
    
    https://github.com/lastmile-ai/aiconfig/assets/151060367/e91a1d8b-a3e9-459c-9eb1-2d8e5ec58e73
    Rossdan Craig rossdan@lastmileai.dev committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    074b768 View commit details
    Browse the repository at this point in the history
  2. [HF][streaming][2/n] Text Translation

    TSIA
    
    Adding streaming output support for text translation model parser. I also fixed a bug where we didn't pass in `"translation"` key into the pipeline
    
    ## Test Plan
    Rebase onto and test it: 5b74344.
    
    Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command
    ```bash
    aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json
    parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
    alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'"
    aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path
    ```
    
    With Streaming
    
    https://github.com/lastmile-ai/aiconfig/assets/151060367/d7bc9df2-2993-4709-bf9b-c5b7979fb00f
    
    Without Streaming
    
    https://github.com/lastmile-ai/aiconfig/assets/151060367/71eb6ab3-5d6f-4c5d-8b82-f3daf4c5e610
    Rossdan Craig rossdan@lastmileai.dev committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    13a4c6e View commit details
    Browse the repository at this point in the history
  3. [HF][streaming][3/n] Text2Speech (no streaming, but updating docs on …

    …completion params)
    
    Ok this one is weird. Today, streaming is only ever supported on text outputs in Transformers library. See `BaseStreamer` in here: https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20BaseStreamer&type=code
    
    In the future it may support other formats, but not yet. For example, OpenAI supports it: https://community.openai.com/t/streaming-from-text-to-speech-api/493784
    
    Anyways, I basically here only did some updates to docs to clarify why completion params were null. Jonathan and I synced about this briefly ofline, but I forgot again so wanted to capture it here so no one forgets
    Rossdan Craig rossdan@lastmileai.dev committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    1f161e5 View commit details
    Browse the repository at this point in the history
  4. [HF][streaming][4/n] Image2Text (no streaming, but lots of fixing)

    This model parser does not support streaming (surprising!):
    
    ```
    TypeError: ImageToTextPipeline._sanitize_parameters() got an unexpected keyword argument 'streamer'
    ```
    
    In general, I mainly just did a lot of fixing up to make sure that this worked as expected. Things I fixed:
    
    1. Now works for multiple images (it did before, but didn't process responses for each properly, just put the entire response)
    2. Constructing responses to be in pure text output
    3. Specified the completion params that are supported (only 2: https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L97-L102C13)
    
    Next diff I will add support for b64 encoded image format --> we need to convert this to a PIL, see https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L83
    
    ## Test Plan
    Rebase onto and test it: 5f3b667.
    
    Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command
    ```bash
    aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json
    parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
    alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'"
    aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path
    ```
    
    Then in AIConfig Editor run the prompt (streaming not supported for this model so just took screenshots)
    
    These are the images I tested
    ![fox_in_forest](https://github.com/lastmile-ai/aiconfig/assets/151060367/ca7d1723-9e12-4cc8-9d8d-41fa9f466919)
    ![trex](https://github.com/lastmile-ai/aiconfig/assets/151060367/2f556ead-a808-4aea-9378-a2537c715e1f)
    
    Before
    <img width="1268" alt="Screenshot 2024-01-10 at 04 00 22" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/4426f2b9-0b83-48e2-8af1-865f157ae12c">
    
    After
    <img width="1277" alt="Screenshot 2024-01-10 at 04 02 01" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/2ed172a8-ed26-4c1b-9a9e-5c240376a278">
    Rossdan Craig rossdan@lastmileai.dev committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    19d7844 View commit details
    Browse the repository at this point in the history
  5. [HF][5/n] Image2Text: Allow base64 inputs for images

    Before we didn't allow base64, only URI (either local or http or https). This is good becuase our text2Image model parser outputs into a base64 format, so this will allow us to chain model prompts!
    
    ## Test Plan
    
    Rebase and test on 0d7ae2b.
    
    Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command
    ```bash
    aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json
    parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
    alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'"
    aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path
    ```
    
    Then in AIConfig Editor run the prompt (streaming not supported so just took screenshots)
    
    These are the images I tested (with bear being in base64 format)
    ![fox_in_forest](https://github.com/lastmile-ai/aiconfig/assets/151060367/ca7d1723-9e12-4cc8-9d8d-41fa9f466919)
    ![bear-eating-honey](https://github.com/lastmile-ai/aiconfig/assets/151060367/a947d89e-c02a-4c64-8183-ff1c85802859)
    
    <img width="1281" alt="Screenshot 2024-01-10 at 04 57 44" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/ea60cbc5-e6ab-4bf2-82e7-17f3182fdc5c">
    Rossdan Craig rossdan@lastmileai.dev committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    fa9d88a View commit details
    Browse the repository at this point in the history