## class LLMEngine

An LLM engine that receives requests and generates texts.

This is the main class for the vLLM engine. It receives requests
from clients and generates texts from the LLM. It includes a tokenizer, a
language model (possibly distributed across multiple GPUs), and GPU memory
space allocated for intermediate states (aka KV cache). This class utilizes
iteration-level scheduling and efficient memory management to maximize the
serving throughput.

The `LLM` class wraps this class for offline batched inference and the
`AsyncLLMEngine` class wraps this class for online serving.

NOTE: The config arguments are derived from the `EngineArgs` class. For the
comprehensive list of arguments, see `EngineArgs`.

### def from_engine_args()
    
Creates an LLM engine from the engine arguments.

- engine_args: EngineArgs,
- usage_context: UsageContext = UsageContext.ENGINE_CONTEXT,
        
1. Create the engine configs.
2. Initialize the cluster and specify the executor class.
3. Create the LLM engine.

### def __init__()

- model_config: The configuration related to the LLM model.
- cache_config: The configuration related to the KV cache memory management.
- parallel_config: The configuration related to distributed execution.
- scheduler_config: The configuration related to the request scheduler.
- device_config: The configuration related to the device.
- lora_config (Optional): The configuration related to serving multi-LoRA.
- vision_language_config (Optional): The configuration related to vision language models.
- speculative_config (Optional): The configuration related to speculative decoding.
- executor_class: The model executor class for managing distributed execution.
- log_stats: Whether to log statistics.
- usage_context: Specified entry point, used for usage info collection

- model_config: ModelConfig,
- cache_config: CacheConfig,
- parallel_config: ParallelConfig,
- scheduler_config: SchedulerConfig,
- device_config: DeviceConfig,
- load_config: LoadConfig,
- lora_config: Optional[LoRAConfig],
- vision_language_config: Optional[VisionLanguageConfig],
- speculative_config: Optional[SpeculativeConfig],
- decoding_config: Optional[DecodingConfig],
- executor_class: Type[ExecutorBase],
- log_stats: bool,
- usage_context: UsageContext = UsageContext.ENGINE_CONTEXT