Author: [Zoumana KEITA](https://www.linkedin.com/in/zoumana-keita/)

In [1]:
!pip -q install vllm

In [1]:
from vllm import LLM, SamplingParams

In [2]:
prompts = ["Abidjan is located in",
           "A Data Scientist is a person who",
           "The future of agriculture in Africa is"]  # Sample prompts.

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=50)

In [3]:
llm = LLM(model="facebook/opt-125m")

INFO 08-17 02:04:45 llm_engine.py:70] Initializing an LLM engine with config: model='facebook/opt-125m', tokenizer='facebook/opt-125m', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
INFO 08-17 02:04:50 llm_engine.py:196] # GPU blocks: 22217, # CPU blocks: 7281


In [4]:
outputs = llm.generate(prompts, sampling_params)  # Generate texts from the prompts.

Processed prompts: 100%|██████████| 3/3 [00:00<00:00,  4.92it/s]


In [5]:
import textwrap
wrapper = textwrap.TextWrapper(width=60)

In [6]:
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt}\nGenerated text: { wrapper.fill(generated_text)}")
    print('---'*5)

Prompt: Abidjan is located in
Generated text:  the north of the country on the banks of the Karaba river.
It is a UNESCO World Heritage site and is the most visited
tourist destination in the north of the country.  History
Abidjan’s origins  The
---------------
Prompt: A Data Scientist is a person who
Generated text:  is passionate about data science. I am a Data Scientist for
6 years now and in between working in data science and
programming, I am a Data Scientist.  It is the truth.  I am
in my first year in Data Science
---------------
Prompt: The future of agriculture in Africa is
Generated text:  changing, according to a new report.  A recent report by a
ZN World Affairs Council based on data from the African
Development Bank (ADB) shows that one in four producers of
agriculture globally will be changing their local and
international production patterns
---------------


# To use vLLM for online serving, you can start an OpenAI API-compatible server via:



In [None]:
!python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8000 --model facebook/opt-125m & npx localtunnel --port 8000

[K[?25hnpx: installed 22 in 2.107s
your url is: https://tricky-taxis-smash.loca.lt
INFO 08-17 03:59:51 llm_engine.py:70] Initializing an LLM engine with config: model='facebook/opt-125m', tokenizer='facebook/opt-125m', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
INFO 08-17 03:59:58 llm_engine.py:196] # GPU blocks: 22217, # CPU blocks: 7281
[32mINFO[0m:     Started server process [[36m103011[0m]
[32mINFO[0m:     Waiting for application startup.
[32mINFO[0m:     Application startup complete.
[32mINFO[0m:     Uvicorn running on [1mhttp://127.0.0.1:8000[0m (Press CTRL+C to quit)
INFO 08-17 04:00:11 async_llm_engine.py:117] Received request cmpl-fae223cf2fb54f63adfd194fc074a3d8: prompt: 'Abidjan is located in', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.8, top_p=0.95, top_k=-1, use_beam_search=F

In [8]:
!curl https://tricky-taxis-smash.loca.lt/v1/completions \
-H "Content-Type: application/json" \
-d '{"model": "facebook/opt-125m","prompt": "Abidjan is located in", "max_tokens": 50, "temperature": 0.8}'

404

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   105  100     3  100   102      7    260 --:--:-- --:--:-- --:--:--   268
