Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API 404 (Not Found)? #3448

Closed
1 task done
N0THSA opened this issue Aug 4, 2023 · 17 comments
Closed
1 task done

API 404 (Not Found)? #3448

N0THSA opened this issue Aug 4, 2023 · 17 comments
Labels
bug Something isn't working stale

Comments

@N0THSA
Copy link

N0THSA commented Aug 4, 2023

Describe the bug

Using the API Chat example and Text Generation examples (and correctly configured host/uri endpoints), there is absolutely no output nor generation. Worth noting I am using Runpod for generation.


import requests

# For local streaming, the websockets are hosted without ssl - http://
HOST = 'removed'
URI = f'http://{HOST}/api/v1/chat'

# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/chat'


def run(user_input, history):
    request = {
        'user_input': user_input,
        'max_new_tokens': 250,
        'auto_max_new_tokens': False,
        'history': history,
        'mode': 'chat',  # Valid options: 'chat', 'chat-instruct', 'instruct'
        'character': 'None',
        'instruction_template': 'Vicuna-v1.1',  # Will get autodetected if unset
        'your_name': 'You',
        # 'name1': 'name of user', # Optional
        # 'name2': 'name of character', # Optional
        # 'context': 'character context', # Optional
        # 'greeting': 'greeting', # Optional
        # 'name1_instruct': 'You', # Optional
        # 'name2_instruct': 'Assistant', # Optional
        # 'context_instruct': 'context_instruct', # Optional
        # 'turn_template': 'turn_template', # Optional
        'regenerate': False,
        '_continue': False,
        'stop_at_newline': False,
        'chat_generation_attempts': 1,
        'chat-instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',

        # Generation params. If 'preset' is set to different than 'None', the values
        # in presets/preset-name.yaml are used instead of the individual numbers.
        'preset': 'None',
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.1,
        'typical_p': 1,
        'epsilon_cutoff': 0,  # In units of 1e-4
        'eta_cutoff': 0,  # In units of 1e-4
        'tfs': 1,
        'top_a': 0,
        'repetition_penalty': 1.18,
        'repetition_penalty_range': 0,
        'top_k': 40,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'mirostat_mode': 0,
        'mirostat_tau': 5,
        'mirostat_eta': 0.1,

        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
    }

    response = requests.post(URI, json=request)

    if response.status_code == 200:
        result = response.json()['results'][0]['history']
        print(json.dumps(result, indent=4))
        print()
        print(result['visible'][-1][1])


if __name__ == '__main__':
    user_input = "Please give me a step-by-step guide on how to plant a tree in my backyard."

    # Basic example
    history = {'internal': [], 'visible': []}

    # "Continue" example. Make sure to set '_continue' to True above
    # arr = [user_input, 'Surely, here is']
    # history = {'internal': [arr], 'visible': [arr]}

    run(user_input, history)

HTTPS is not enabled on the server. Navigating to the endpoint returns a Not Found error.
image

Any help is appreciated.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

  1. Get the example Chat API Python 3 file
  2. Configure it to point to your endpoint
  3. Try to make a request
  4. Get no response

Screenshot

No response

Logs

0 matches
2023-08-04T04:37:09.884895096-04:00 
2023-08-04T04:37:09.885169480-04:00 ==========
2023-08-04T04:37:09.885192488-04:00 == CUDA ==
2023-08-04T04:37:09.885371405-04:00 ==========
2023-08-04T04:37:09.890025166-04:00 
2023-08-04T04:37:09.892400796-04:00 
2023-08-04T04:37:09.892416713-04:00 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-08-04T04:37:09.892421072-04:00 
2023-08-04T04:37:09.892423800-04:00 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-08-04T04:37:09.892426403-04:00 By pulling and using the container, you accept the terms and conditions of this license:
2023-08-04T04:37:09.892429052-04:00 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-08-04T04:37:09.892431782-04:00 
2023-08-04T04:37:09.892434433-04:00 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-08-04T04:37:09.903864299-04:00 
2023-08-04T04:37:09.905810941-04:00 TheBloke's Local LLMs: Pod started
2023-08-04T04:37:09.927989278-04:00  * Starting OpenBSD Secure Shell server sshd
2023-08-04T04:37:09.939610496-04:00    ...done.
2023-08-04T04:37:10.213090764-04:00 Already up to date.
2023-08-04T04:37:10.473857243-04:00 Already up to date.

(logs were removed and aren't recoverable)

System Info

Using Runpod. RTX 3080, 16 VCPUs, 125GB RAM, 12GB VRAM.
@N0THSA N0THSA added the bug Something isn't working label Aug 4, 2023
@Vincent-Stragier
Copy link

Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?

@lanlanji
Copy link

lanlanji commented Aug 4, 2023

I am also getting 404 even though I modified the "host" variable to ensure it matches the port as part of the webui. Eg, my webui host is localhost:7860, and as part of the example code, I have HOST = 'localhost:7860'. I print out the response.status_code and get 404. If I put port 5000 where HOST = 'localhost:5000' then I get connection refused error.

@Vincent-Stragier
Copy link

How are you starting the webui?

You have to explicitly start the API extension (see #3219 (comment)).

@jllllll
Copy link
Contributor

jllllll commented Aug 4, 2023

TheBloke has a runpod template specifically for using the API: TheBloke Local LLMs One-Click UI and API

@Vincent-Stragier
Copy link

@jllllll here?

@jllllll
Copy link
Contributor

jllllll commented Aug 4, 2023

@jllllll here?

Yeah

@N0THSA
Copy link
Author

N0THSA commented Aug 5, 2023

Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?

Not a port issue. Confirmed.

I started the WebUI using the -api tag of course, made sure nothing was being blocked, and made sure I can connect to the /api/v1. /api didnt work.

@Vincent-Stragier
Copy link

I don't know how Runpod works (I have a server with two RTX4090 at work). Personnaly I use the oneclick installer and run the options under. --listen makes the server accept request from external IP (in my case it is not really needed since after that I use Ngrok to reverse proxy the API endpoint, but it allows me to access the UI, APIs from the local network). You could do something similar to test the API.

webui.py --extension api --loader <the model loader> --model  <the model you want to load> --verbose --listen &
# Add your AuthToken
ngrok config add-authtoken <your_auth_token>
ngrok http --domain=<my-ngrok-domain.ngrok-free.app> 5000

Note: it would be better to use screen and start each service in a screen.

I installed the Python Ngrok client using python3 -m pip install pyngrok (mainly because you don't need a root access going that path) and configured an ngrok account, generated the AuthToken and added it (https://dashboard.ngrok.com/get-started/your-authtoken). For the domain, I do not remember how I generated it but you will find yours at https://dashboard.ngrok.com/cloud-edge/domains. To avoid generating a lot of phishing portals, Ngrok requires to add an header to your request:

    # This is the code I use to do my API request, it needs to be adapted before being
    # used in your test client
    def api_request(self, request: dict) -> requests.Response:
        """Send a request to OobaBooga.

        Args:
            request (dict): the request.

        Returns:
            requests.Response: the response.
        """

        request_params = {
            # url = "http://127.0.0.1:5000/api/v1/generate"
            # or url = "https://dommain.com:443/api/v1/generate"
            "url": self.url,
            "json": request,
            "headers": {"ngrok-skip-browser-warning": "true"},
            "timeout": REQUEST_TIMEOUT,
        }
        
        # When starting Ngrok you can add basic auth with this flag:
        # --basic-auth 'username:password'
        if self.basic_auth:
            request_params.update(
                auth=HTTPBasicAuth(self.username, self.password)
            )

        return requests.post(**request_params)

That way, you can test the webui API endpoint without configuring any port forwarding. If you try to open the Ngrok URL, you will get an error 404:

image

And you will not be able to see it but the server will receive the requests (here I started the webui on my laptop, under Windows, but it's the same behaviour on Linux):

image

@N0THSA
Copy link
Author

N0THSA commented Aug 6, 2023

I currently do not have any Runpod tokens, but I will buy some as soon as possible to test this. Honestly, I think it might be because I forgot the "--listen" parameter, and I'm trying to connect from an external machine.

@nutheory
Copy link

nutheory commented Aug 6, 2023

I tried it with the listen parameter, and many other variations, i think it might just be the docker version since that what im using and im pretty sure thats what runpod uses.

@tjb4578
Copy link

tjb4578 commented Aug 6, 2023

Running the api on localhost.

I get a response for the generate endpoint:

http://localhost:5000/api/v1/generate

{
    "prompt": "Hey can you hear me?",
    "max_new_tokens": "64",
    "auto_max_new_tokens": "False",
    "history": {
        "internal": [],
        "visible": []
    },
    "mode": "instruct",
    "character": "Example",
    "instruction_template": "Vicuna-v1.1",
    "your_name": "You",
    "regenerate": "False",
    "_continue": "False",
    "stop_at_newline": "False",
    "chat_generation_attempts": 1,
    "chat-instruct_command": "Continue the chat dialogue below. Write a single reply for the character '<|character|>'.\n\n<|prompt|>",
    "preset": "None",
    "do_sample": "True",
    "temperature": 0.7,
    "top_p": 0.1,
    "typical_p": 1,
    "epsilon_cutoff": 0, 
    "eta_cutoff": 0,  
    "tfs": 1,
    "top_a": 0,
    "repetition_penalty": 1.18,
    "repetition_penalty_range": 0,
    "top_k": 40,
    "min_length": 0,
    "no_repeat_ngram_size": 0,
    "num_beams": 1,
    "penalty_alpha": 0,
    "length_penalty": 1,
    "early_stopping": "False",
    "mirostat_mode": 0,
    "mirostat_tau": 5,
    "mirostat_eta": 0.1,
    "seed": -1,
    "add_bos_token": "True",
    "truncation_length": 2048,
    "ban_eos_token": "False",
    "skip_special_tokens": "True",
    "stopping_strings": []
}

This yields:

{
    "results": [
        {
            "text": "\nI'm in a quiet room with no background noise. I want to record myself speaking, but without any background noise interfering with the audio quality. Is there anyway for me to do this on my own computer or would it be better off doing it at a professional recording studio? Also, how can i"
        }
    ]
}

When I try the chat endpoint (also changing prompt to user_input), my response comes back instantaneously and is empty.

{
    "results": [
        {
            "history": {
                "internal": [],
                "visible": []
            }
        }
    ]
}

Any ideas why the chat endpoint isn't generating anything?

@nutheory
Copy link

nutheory commented Aug 6, 2023

I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.

@jllllll
Copy link
Contributor

jllllll commented Aug 6, 2023

@tjb4578 Don't put quotes around True or False.

@Vincent-Stragier
Copy link

Hi @tjb4578,

Personally, I use exclusively generate, since I handle the “prompt” and the history myself. Though, be careful with the parameters you are using, for example the parameter character will use the Example character which will impact the generation (since it's added to the prompt I believe in chat mode).

@tjb4578
Copy link

tjb4578 commented Aug 6, 2023

@tjb4578 Don't put quotes around True or False.

Thanks this was my issue!

@N0THSA
Copy link
Author

N0THSA commented Aug 7, 2023

I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.

I've seen multiple people with the same issue (or just testing with my setup) and all of the broken ones are on Docker in particular, no matter the actual container image... weird.

@github-actions github-actions bot added the stale label Sep 18, 2023
@github-actions
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

6 participants