# Qwen3-ASR-1.7B Simple Implementation

This notebook demonstrates how to use the Qwen3-ASR-1.7B model for Automatic Speech Recognition (ASR). Qwen3-ASR is a powerful and lightweight model from the Qwen team.

Developed by [Alibaba Qwen Team](https://huggingface.co/Qwen).

## 1. Setup Environment

First, we need to install the `qwen-asr` package. We also recommend installing `flash-attn` for faster inference if you have a compatible GPU.

In [None]:
# Install the core package
!pip install -U qwen-asr

# Optional: install FlashAttention 2 for faster inference (requires GPU)
# !pip install -U flash-attn --no-build-isolation

## 2. Load the Model

We will load the 1.7B model using the Transformers backend. This is the most straightforward way to run it in a notebook environment.

In [None]:
import torch
from qwen_asr import Qwen3ASRModel

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16 if device == "cuda" else torch.float32,
    device_map=device,
    max_inference_batch_size=32,
    max_new_tokens=256
)

print("Model loaded successfully!")

## 3. Transcribe Audio (Example)

Let's test the model with a sample audio file from the Qwen team's repository.

In [None]:
sample_audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"

results = model.transcribe(
    audio=sample_audio_url,
    language=None # Automatically detect language
)

print(f"Detected Language: {results[0].language}")
print(f"Transcription: {results[0].text}")

## 4. Transcribe Your Own Audio from Google Drive

Mount your Google Drive and provide the path to your audio file.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os

# Define the path to your audio file in Google Drive
audio_path = '/content/drive/MyDrive/Voice 251201_131937.m4a' #@param {type: "string"}

if os.path.exists(audio_path):
    print(f'Transcribing file: {audio_path}')
    results = model.transcribe(
        audio=audio_path,
        language=None
    )
    print(f"\nDetected Language: {results[0].language}")
    print(f"Transcription Result:\n{results[0].text}")
else:
    print(f"File not found: {audio_path}")