Using your own Microsoft Copilot account. No API key, no credits, no paid plan: it turns the free chat at copilot.microsoft.com into an API you can call from code.
You can use it in two ways:
- π As a Python library: just call
client.chat("Hi"). Supports streaming and multi-turn conversations. - π As a local OpenAI-compatible API: runs a server at
http://localhost:8000/v1that speaks the OpenAI format, so the officialopenaiSDK (and any OpenAI-compatible app) works as a drop-in, withlocalhostin place of OpenAI.
You sign in once with your Microsoft account in a browser; your session is saved and refreshed automatically after that.
Unofficial project. Not affiliated with or endorsed by Microsoft. It automates the consumer Copilot web experience for personal use, so use it responsibly and within Microsoft's terms.
- Free: uses your normal signed-in Copilot, no API billing.
- Drop-in OpenAI replacement: point any OpenAI client at
localhostand it just works. - Works everywhere you're signed in: the signed-in path works even in regions where anonymous Copilot is blocked (e.g. India).
- Streaming + conversations: token-by-token output and multi-turn threads addressed by
conversation_id.
- Python 3.9+
- A Microsoft account (the free one you use for Copilot is fine)
- Works on Windows, macOS, and Linux
# 1. Clone the project
git clone <your-repo-url>
cd Windows-Copilot-API2. Create and activate a virtual environment
On macOS / Linux:
python3 -m venv venv
source venv/bin/activateOn Windows (PowerShell):
python -m venv venv
venv\Scripts\Activate.ps1On Windows you may need to allow script execution once:
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned. Incmd.exeactivate withvenv\Scripts\activate.batinstead.
3. Install dependencies and sign in
# Install dependencies
pip install -r requirements.txt
# Install the browser Playwright needs (one-time)
playwright install chromium
# Sign in once: a browser opens, log into your Microsoft account
python -m copilot loginThat's it. Your session is saved under session/ (git-ignored, never shared) and reused on every run.
π‘ You can even skip step 4: the first time you call
chat()or start the server, it opens the sign-in browser for you automatically.
The simplest way if your code is already Python.
from copilot import CopilotClient
client = CopilotClient() # loads your signed-in session
# Get a full reply
reply = client.chat("Say hello in one short sentence.")
print(reply.text)
# Continue the SAME conversation β pass the id back
reply2 = client.chat("And now in French?", reply.conversation_id)
print(reply2.text)
# Stream the answer as it's typed
for chunk in client.stream("Tell me a short joke"):
print(chunk, end="", flush=True)chat() returns the full text plus a conversation_id; pass that id back to keep the thread going, or omit it to start fresh. stream() yields the reply piece by piece.
π More: examples/01_direct_chat.py, 02_direct_conversation.py, 03_direct_stream.py
Start a local server that speaks the OpenAI API, so existing OpenAI tools and SDKs work unchanged.
python app.py
# -> Copilot OpenAI-compatible API on http://127.0.0.1:8000Then point any OpenAI client at it (the API key is required by the SDK but ignored):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
model="copilot",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)Or call it with plain HTTP / curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
Chat (supports "stream": true and an optional "conversation_id") |
GET |
/v1/models |
Lists the single copilot model |
Change the address with env vars:
HOST=0.0.0.0 PORT=8080 python app.py, or runuvicorn server.api:app --host 0.0.0.0 --port 8080.
π More: examples/04_server_http.py, 05_server_stream.py, 06_server_openai_sdk.py
python -m copilot login # sign in and save the session
python -m copilot ask "Hello!" # quick one-shot questionThe server bridges a single signed-in Copilot account, and Copilot's chat socket doesn't tolerate concurrent conversations from one process. So the server serializes upstream calls: parallel HTTP requests queue behind a lock and run one at a time (see server/api.py). This is intentional, and it means throughput is sequential, not parallel.
You can measure where it breaks with the included stress test, which fires a batch of simultaneous requests and doubles the batch size every successful round until the first error:
# Start the server in one terminal
python app.py
# Ramp concurrency in another (1 β 2 β 4 β 8 β β¦)
python tests/stress.py
python tests/stress.py --max 64 --timeout 120 --url http://localhost:8000Sample run (one signed-in account):
| Concurrency | Result | Wall time | Latency (min / median / max) |
|---|---|---|---|
| 1 | β all ok | 3.7s | 3.7 / 3.7 / 3.7s |
| 2 | β all ok | 4.6s | 3.4 / 4.6 / 4.6s |
| 4 | β all ok | 8.3s | 3.7 / 6.7 / 8.3s |
| 8 | β 1 failed (HTTP 502) |
13.3s | 3.5 / 9.7 / 13.3s |
Highest fully-successful concurrency: 4. Wall time roughly doubles each round
while minimum latency stays flat (~3.5s) β the signature of a serialized queue:
one request runs immediately, the rest wait their turn. The failure at 8 is an
upstream 502 (Copilot rejecting requests under load), not a server crash or
timeout β so the exact break point is flaky and may vary between runs.
Takeaway: keep concurrent in-flight requests low (β 1β4). This is a personal bridge, not a high-throughput gateway β and please don't hammer your account.
Concurrency (above) is how many at once; the rate limit is how many per
minute, sustained. Microsoft publishes none for consumer Copilot, so the bridge
enforces a self-imposed one with a token bucket: it caps
accepted requests per minute and returns a standard 429 + Retry-After when
you exceed it. Two env vars tune it:
| Env var | Default | Meaning |
|---|---|---|
RATE_LIMIT_RPM |
12 |
Requests/minute the bridge accepts. 0 disables the limit. |
RATE_LIMIT_BURST |
4 |
How many requests may go back-to-back before pacing kicks in. |
RATE_LIMIT_RPM=20 RATE_LIMIT_BURST=5 python app.py # raise it; 0 to disableThe default 12 rpm sits safely below the ~15 rpm where a single account starts
seeing upstream 502s. To find your ceiling, run the server with the limiter
off (RATE_LIMIT_RPM=0) and push the probe until failures appear:
python tests/ratelimit.py --rpm 20 --minutes 3On the client side, use exponential backoff. Both 429 (bridge limit) and
the occasional 502 (Copilot upstream hiccup) are transient β retry with
growing delays (e.g. 1s, 2s, 4s) and they almost always clear. The official
openai SDK does this automatically and honours Retry-After; with plain HTTP,
add a few retries yourself.
| Path | What it does |
|---|---|
| copilot/ | The core library: CopilotClient, auth, browser sign-in, HTTP driver |
| server/ | The FastAPI OpenAI-compatible server |
| examples/ | Runnable examples for every feature (examples/README.md) |
| tests/ | Test scripts, including the concurrency stress test (tests/stress.py) |
| app.py | Starts the server |
- Sign in once, then reuse. The cached token refreshes automatically; you only re-sign-in if the session fully expires.
- No daily limit, but be reasonable. Microsoft doesn't impose a daily chat cap, but please use it in moderation, and don't spam or hammer it with automated bulk requests.
- One model. Copilot has no model picker, so the server advertises a single model named
copilot. - Roughly GPT-4 class. On GPQA Diamond (198 graduate-level questions, closed-book) it scores 40.9%, which puts it in the GPT-4 family rather than the reasoning tier (o1/o3). Measured with tests/gpqa_bench.py.
- Your session is private. Everything in
session/(cookies + token) stays on your machine and is git-ignored.
RuntimeError: Copilot error: invalid-event (or the chat hangs) on a server/VPS.
On datacenter IPs Cloudflare withholds bot-clearance, so the chat socket stalls on an empty challenge sometimes. Fix it manually: on that machine, open copilot.microsoft.com in a browser and pass the "verify you're human" check once; that sets a cf_clearance cookie which the saved session reuses. Re-do it if it expires, or route the server's traffic through a residential connection (e.g. a home-PC exit node).
For personal and educational use. You are responsible for complying with Microsoft's terms of service.
