#Step 1: Install Required Libraries

pip install openai
pip install google-cloud-aiplatform

pip install --upgrade openai python-dotenv
•	openai is the official Python client.  
•	python-dotenv helps load OPENAI_API_KEY from a .env file (cleaner than hard-coding).

Create .env file

OPENAI_API_KEY="Your API KEY"

2) Set your API key (safely)
Create a .env file (same folder as your script):
OPENAI_API_KEY=sk-...your key...
Then load it in Python (next step). OpenAI recommends environment variables for key safety.  

3) Initialize the client

4) Make your first chat call (GPT-5)

Use the model name your account is provisioned 

for (e.g., "gpt-5" or a specific snapshot you see in the dashboard).


In [10]:
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # pulls OPENAI_API_KEY from .env

client = OpenAI()

resp = client.responses.create(
    model="gpt-5",  # replace with your provisioned GPT-5 model
    input="You are a concise senior developer assistant.Summarize why teams adopt feature flags in CI/CD")

print(resp.output_text)


Teams adopt feature flags in CI/CD to:

- Decouple deploy from release: ship code “dark,” turn it on later without redeploying.
- Enable progressive delivery: cohort/percentage/region rollouts, canaries, rings; instant kill switch/rollback.
- Support trunk-based development: hide incomplete work, reduce long-lived branches and merge risk.
- Validate safely in production: internal/beta access, dark launches, shadow traffic; gate risky DB/API migrations (e.g., dual-read/dual-write).
- Run experiments: A/B tests, personalization, entitlement-based access; measure impact before full rollout.
- Improve operability: toggle behavior during incidents, throttle features, control cost-heavy paths.
- Cut lead time and MTTR: smaller changes, fewer emergency deploys to undo mistakes.
- Coordinate multi-service changes: avoid lockstep releases across dependencies.
- Manage per-env/tenant/region differences: meet compliance or customer-specific needs.
- Enhance observability: correlate flags with met

In [11]:
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # pulls OPENAI_API_KEY from .env

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Write a one-sentence bedtime story about a unicorn."
)

print(response.output_text)

Under a silver moon, a sleepy unicorn tiptoed through a whispering meadow, sprinkling stardust over every child’s dreams until the whole world sighed goodnight.


In [12]:
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    instructions="Talk like a pirate.",
    input="Are semicolons optional in JavaScript?",
)

print(response.output_text)

Aye, mostly optional, matey! JavaScript’s Automatic Semicolon Insertion (ASI) drops semicolons fer ye at many line breaks and before closing braces. But there be treacherous reefs where ye should plant a semicolon yerself, else ye’ll scuttle the ship:

- When puttin’ two statements on the same line.
- When a new line starts with tokens that can “attach” to the previous line:
  - ( or [  — could be seen as a call or indexing of the prior expression
  - ` (template literal) — might become a tagged template on the prior identifier
  - + or - — could be parsed as a unary operator on the prior expression
  - / — might be parsed as division instead of a regex literal
  - . or ?. — property access/optional chaining on the prior expression
- After return, throw, break, or continue if ye put a newline right after the keyword:
  - return
    { a: 1 } // returns undefined because ASI inserts a semicolon after return
- When concatenatin’ files, put a defensive leading semicolon at the start of a f

In [13]:
# pip install openai tiktoken numpy

#import os
import numpy as np
import tiktoken
from openai import OpenAI

#os.environ["OPENAI_API_KEY"] = "sk-..."  # or set in your shell

client = OpenAI()

# Choose models you have access to
CHAT_MODEL = "gpt-5-chat" # ← replace with your org’s GPT-5 chat model name

EMBED_MODEL = "text-embedding-3-large" # robust general-purpose embedding model

## A) Tokenization & token counting

    Why: estimate cost/fit and prevent overflows before you call the model.

In [14]:
# Choose an encoding close to your chat model (cl100k_base works for GPT-4/5 family)

enc = tiktoken.get_encoding("cl100k_base")

def count_tokens(text: str) -> int:
    return len(enc.encode(text))

sample = "Build a weekly status report with risks, blockers, and next steps."

print("Tokens:", count_tokens(sample))


Tokens: 14


In [7]:
from openai import OpenAI

help(OpenAI)

Help on class OpenAI in module openai:

class OpenAI(openai._base_client.SyncAPIClient)
 |  OpenAI(*, api_key: 'str | None' = None, organization: 'str | None' = None, project: 'str | None' = None, base_url: 'str | httpx.URL | None' = None, websocket_base_url: 'str | httpx.URL | None' = None, timeout: 'Union[float, Timeout, None, NotGiven]' = NOT_GIVEN, max_retries: 'int' = 2, default_headers: 'Mapping[str, str] | None' = None, default_query: 'Mapping[str, object] | None' = None, http_client: 'httpx.Client | None' = None, _strict_response_validation: 'bool' = False) -> 'None'
 |  
 |  Method resolution order:
 |      OpenAI
 |      openai._base_client.SyncAPIClient
 |      openai._base_client.BaseClient
 |      typing.Generic
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, *, api_key: 'str | None' = None, organization: 'str | None' = None, project: 'str | None' = None, base_url: 'str | httpx.URL | None' = None, websocket_base_url: 'str | httpx.URL | None' 

In [15]:
import os
from openai import OpenAI
client = OpenAI(api_key=api_key)

In [16]:
query = '''What is 5*5? Also solve 5 + 10 + 20 + 30 =?'''

completion = client.chat.completions.create(
  model="gpt-4o",  # Specify the model you want to use
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Help me with my math homework!"}, # <-- This is the system message that provides context to the model
    {"role": "user", "content": query},  # <-- This is the user message for which the model will generate a response
  ],
)

print(completion.choices[0].message.content)

Sure! 

First, let's calculate \(5 \times 5\):

\[ 5 \times 5 = 25 \]

Next, let's solve the addition problem \(5 + 10 + 20 + 30\):

\[ 5 + 10 = 15 \]
\[ 15 + 20 = 35 \]
\[ 35 + 30 = 65 \]

So, the sum of \(5 + 10 + 20 + 30\) is \(65\).

Therefore, \(5 \times 5 = 25\) and \(5 + 10 + 20 + 30 = 65\).


In [17]:
query = '''What is 5*5? Also solve 5 + 10 + 20 + 30 =?'''

completion = client.chat.completions.create(
  model="gpt-5",  # Specify the model you want to use
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Help me with my math homework!"}, # <-- This is the system message that provides context to the model
    {"role": "user", "content": query},  # <-- This is the user message for which the model will generate a response
  ],
)

print(completion.choices[0].message.content)

5*5 = 25
5 + 10 + 20 + 30 = 65


In [26]:
resp = client.chat.completions.create(
    model="gpt-5",  # replace with your provisioned GPT-5 model
    messages=[
        {"role": "system", "content": "You are a concise senior developer assistant."},
        {"role": "user", "content": "Summarize why teams adopt feature flags in CI/CD."},
    ],
)
print(resp.choices[0].message.content)

- Decouple deploy from release: ship code continuously, expose features later via flags.
- Safer rollouts: canary/percentage rollouts to limit blast radius and validate in production.
- Instant rollback/kill switches: disable a bad feature without redeploying.
- Faster velocity: merge incomplete work behind flags, enabling trunk‑based development and fewer long‑lived branches.
- Test in production: enable for internal users or small cohorts to catch real‑world issues early.
- Targeted releases: turn features on per user, account, region, or environment.
- Experimentation: A/B and multivariate tests driven by flags with metrics.
- Operational resilience: degrade or disable noncritical paths when dependencies fail.
- Coordinated changes: manage multi‑service/multi‑step migrations and phased rollouts.
- Compliance and control: scheduled releases, approvals, and audit trails without code changes.
- Reduced downtime: avoid risky big‑bang releases and hotfix redeploys.
- Better observability

In [31]:
stream = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are terse."},
        {"role": "user", "content": "Explain circuit breakers in microservices."}
    ],
)

print(stream.choices[0].message.content)

Circuit breaker pattern: a guard around remote calls that fails fast when a dependency is unhealthy, preventing cascades and giving it time to recover.

Core state machine
- Closed: Calls flow normally; failures are counted.
- Open: Calls are short-circuited immediately (fast fail) for a cool-down period.
- Half-open: Allow a few probe calls; on success, close; on failure, open again.

Key triggers and settings
- Failure-rate threshold over a sliding window (e.g., >50% of last N calls fail).
- Slow-call rate threshold (treat very slow calls as failures).
- Minimum number of calls before tripping (avoid tripping on tiny samples).
- Open duration (cool-down), then transition to half-open.
- Permitted calls in half-open (to avoid a thundering herd).
- Exception types to record or ignore.
- Timeouts are essential; a call that never times out can’t be judged.

How to use it
- Wrap every outbound dependency call (per endpoint/operation) with its own breaker.
- Combine with:
  - Timeouts (sho

In [36]:
from openai import OpenAI
client = OpenAI()

stream = client.responses.create(
    model="gpt-5-mini",
    input=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=False,
)

for event in stream:
    print(event)

('id', 'resp_68bd656c68148195bca3fd496285614602c50422676c9ffd')
('created_at', 1757242732.0)
('error', None)
('incomplete_details', None)
('instructions', None)
('metadata', {})
('model', 'gpt-5-mini-2025-08-07')
('object', 'response')
('output', [ResponseReasoningItem(id='rs_68bd656d25cc819591b81eb139c71d9602c50422676c9ffd', summary=[], type='reasoning', encrypted_content=None, status=None), ResponseOutputMessage(id='msg_68bd657128dc81959caea0238b56a97c02c50422676c9ffd', content=[ResponseOutputText(annotations=[], text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')])
('parallel_tool_calls', True)
('temperature', 1.0)
('tool_choice', 'auto')
('tools', [])
('top_p', 1.0)
('background', False)
('max_output_tokens', None)
('previous_response_id', None)
('pro

In [4]:
#Step 2: Load and Interact with GPT (OpenAI)
import os
from openai import OpenAI

MODEL="gpt-4o"

client = OpenAI(api_key=api_key)
# Step 3: Use the OpenAI client to generate a response

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Write a short product description for a ₹999 kids backpack."}
    ] )

print(response.choices[0].message.content)


Unlock a world of adventure for your little explorer with our colorful Kids Explorer Backpack, priced at just ₹999. Designed with both style and functionality in mind, this backpack features vibrant patterns and durable, lightweight materials perfect for school days and weekend adventures alike. With spacious compartments, it easily accommodates books, snacks, and treasures, while ergonomic straps ensure comfort for growing shoulders. Equipped with sturdy zippers and a water-resistant exterior, it promises reliability rain or shine. Ideal for children aged 5-10, this backpack is the ultimate blend of fun, practicality, and value. Let imagination soar with every new journey!


In [11]:
from openai import AzureOpenAI

#help(AzureOpenAI)

In [18]:
import tiktoken

# Choose encoding compatible with GPT-4 and later families
enc = tiktoken.get_encoding("cl100k_base")

prompt = "Summarize the benefits of using vector embeddings for search."
tokens = enc.encode(prompt)
print("Token count:", len(tokens))
print("First 10 token IDs:", tokens[:10])

Token count: 12
First 10 token IDs: [9370, 5730, 553, 279, 7720, 315, 1701, 4724, 71647, 369]


In [34]:
#from openai import OpenAI
#client = OpenAI()

instructions = """
You are an expert in categorizing IT support tickets. Given the support
ticket below, categorize the request into one of "Hardware", "Software",
or "Other". Respond with only one of those words.
"""

ticket = "My monitor won't turn on - help!"

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "developer", "content": instructions},
        {"role": "user", "content": ticket},
    ],
)

print(response.output_text)


Hardware


1) Instruction Q&A (system / user roles)

In [None]:
#GPT-4
resp = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.2,
    max_tokens=300,
    messages=[
        {"role": "system", "content": "You are a precise technical explainer."},
        {"role": "user", "content": "Explain vector databases for enterprise in simple terms."}
    ],
)
print(resp.choices[0].message.content)

Vector databases are specialized databases designed to handle and manage data that is represented as vectors. In the context of enterprises, these databases are particularly useful for dealing with complex data types such as images, audio, text, and other unstructured data that traditional databases struggle to process efficiently.

Here's a simple breakdown of how vector databases work and their benefits for enterprises:

1. **Data Representation**: In a vector database, data is represented as vectors, which are essentially arrays of numbers. These vectors capture the essential features of the data. For example, a vector might represent the characteristics of an image or the semantic meaning of a piece of text.

2. **Similarity Search**: One of the primary functions of vector databases is to perform similarity searches. This means they can quickly find and retrieve data that is similar to a given query vector. This is particularly useful for applications like recommendation systems, i

In [38]:
#GPT-5 (Responses API; use your current gpt-5 model name)
resp = client.responses.create(
    model="gpt-5",  # replace with the exact gpt-5* model on your account
    input=[
        {"role": "system", "content": "You are a precise technical explainer. Verbosity: medium."},
        {"role": "user", "content": "Explain vector databases for enterprise in simple terms."}
    ]
)
print(resp.output_text)


Short version: A vector database stores “meaning” as numbers so you can find similar things fast. It turns text, images, audio, or tables into vectors (lists of numbers). Similar items end up near each other in this high‑dimensional space. The database then finds nearest neighbors quickly, often in milliseconds, even among millions or billions of items.

Why enterprises care
- Make messy data searchable by meaning, not exact words (semantic search).
- Power RAG (retrieve-augment-generate) for more accurate LLM answers.
- Recommendations and personalization across products, content, or tickets.
- Detect duplicates, near-duplicates, anomalies, or fraud patterns.
- Cross‑modal search (e.g., search images with text).

How it works (simple flow)
1) Embed: An embedding model converts each item (document chunk, product, image) into a vector, typically 384–3072 dimensions.
2) Store: Save the vector plus metadata (title, permissions, timestamps) as a record.
3) Index: Build a specialized index 

Notes: Chat Completions uses messages=[...]. 

The Responses API accepts an input=[...] 

#array with the same roles.

2) Few-shot style transfer (show, then ask)

In [39]:
#GPT-4
messages = [
  {"role": "system", "content": "You are a writing coach."},
  {"role": "user", "content": "Rewrite in a friendlier tone: 'Submit the report by EOD.'"},
  {"role": "assistant", "content": "Could you please send the report by the end of the day? Thanks!"},
  {"role": "user", "content": "Rewrite in the same friendly tone: 'Fix the data pipeline now.'"}
]

resp = client.chat.completions.create(model="gpt-4o", temperature=0.7, messages=messages)

print(resp.choices[0].message.content)

Could you please take a moment to fix the data pipeline? Thanks!


In [40]:
#GPT-5
resp = client.responses.create(
  model="gpt-5",
  input=[
    {"role": "system", "content": "You are a writing coach. Match the tone of assistant examples. Verbosity: low."},
    {"role": "user", "content": "Rewrite in a friendlier tone: 'Submit the report by EOD.'"},
    {"role": "assistant", "content": "Could you please send the report by the end of the day? Thanks!"},
    {"role": "user", "content": "Rewrite in the same friendly tone: 'Fix the data pipeline now.'"}
  ]
)
print(resp.output_text)

Could you please fix the data pipeline as soon as you can? Thanks!


3) “Verbosity” / tone control (semantics, not just length)
You can guide verbosity and tone declaratively in your system instruction.

In [44]:
#GPT-4
resp = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role":"system","content":"Be concise (2–3 bullets). Tone: professional."},
    {"role":"user","content":"Summarize the pros and cons of Top-p vs Temperature."}
  ],
  max_tokens=200
)
print(resp.choices[0].message.content)

**Top-p Sampling (Nucleus Sampling):**
- **Pros:** Dynamic control by adjusting samples, offers fine-tuned creativity by selecting from the most probable results.
- **Cons:** May compromise coherence if 'p' value is too low, potentially omits diverse word options.

**Temperature Sampling:**
- **Pros:** Simple implementation; allows control over randomness by scaling probabilities, useful for consistent outputs.
- **Cons:** High value can lead to incoherent results, while low value limits creativity by focusing on high-probability words.


Tip: In GPT-5, treat “verbosity” and “tone” as semantic dials expressed in instructions. (They complement, rather than replace, hard caps like max_output_tokens.)

In [45]:
#GPT-5
resp = client.responses.create(
  model="gpt-5",
  input=[
    {"role":"system","content":"Verbosity: low. Tone: professional. Output as 2–3 bullets."},
    {"role":"user","content":"Summarize the pros and cons of Top-p vs Temperature."}
  ]
)
print(resp.output_text)

- Temperature: Pros—smooth, single knob; preserves token ranking; works consistently across contexts. Cons—non-adaptive; high values admit very low-probability tokens (instability); low values can make outputs overconfident/repetitive.
- Top-p (nucleus): Pros—adaptive to entropy; trims the unlikely tail, reducing bizarre tokens; maintains fluency in high-uncertainty settings. Cons—can cut out useful rare tokens; discontinuous control; in low-entropy prompts becomes near-greedy, reducing diversity.
- Guidance: Prefer temperature for graded control; use top-p to bound “surprise.” Common combo: temperature ~0.7–1.0 with top-p ~0.9–0.95; avoid extremes and tune per task.


5) Light reasoning with short rationale (no hidden chain-of-thought)

In [46]:
#GPT-4
resp = client.chat.completions.create(
  model="gpt-4o",
  temperature=0.2,
  messages=[
    {"role":"system","content":"Answer with a brief rationale (2 sentences max)."},
    {"role":"user","content":"Which is better for deterministic outputs: temperature=0 or top_p=1?"}
  ]
)
print(resp.choices[0].message.content)

For deterministic outputs, setting temperature=0 is better because it removes randomness by always choosing the highest probability option. In contrast, top_p=1 allows for sampling from the entire probability distribution, which can introduce variability.


In [47]:
#GPT-5
resp = client.responses.create(
  model="gpt-5",
  input=[
    {"role":"system","content":"Provide the answer and a brief rationale (≤2 sentences). Verbosity: low."},
    {"role":"user","content":"Which is better for deterministic outputs: temperature=0 or top_p=1?"}
  ]
)
print(resp.output_text)

Temperature=0. It forces greedy/argmax decoding (no sampling), while top_p=1 only disables nucleus filtering and can still be stochastic; for strict determinism also avoid sampling and fix seeds/hardware settings.
