# OpenAI Responses API

## What is the OpenAI Responses API?

The Responses API is a new API released in March 2025. It is a combination of the traditional 
Chat Completions API and the Assistants API, providing support for:

- **Traditional Chat Completions:** Facilitates seamless conversational AI experiences.
- **Web Search:** Enables real-time information retrieval from the internet.
- **File Search:** Allows searching within files for relevant data.

Accordingly, the Assistants API will be retired in 2026. 

> **For new users, OpenAI recommends using the Responses API instead of the Chat Completions API to leverage its expanded capabilities.**

For a comprehensive comparison between the Responses API and the Chat Completions API, refer to the official OpenAI documentation: 
[Responses vs. Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions).

## Summary of This Notebook
This notebook provides a hands-on guide for using the **OpenAI Responses API** to analyze tweets. 
It covers essential techniques such as:

- **Creating a vector store** and uploading tweets for semantic search.
- **Using file search** to analyze private datasets.
- **Performing a web search** to retrieve the latest public information.
- **Utilizing stateful responses** to maintain conversation context.
- **Combining file and web search** to enhance retrieval-augmented generation (RAG) applications.

By the end of this notebook, users will be able to integrate OpenAI's Responses API for efficient data retrieval and analysis of structured and unstructured data.

## Install Required Libraries
To use the OpenAI Responses API, we need to install the following libraries:

- **`openai`**: Provides access to OpenAI's APIs, including the Responses API

In [1]:
pip install openai -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sparkmagic 0.21.0 requires pandas<2.0.0,>=0.17.1, but you have pandas 2.3.3 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


## Import Required Libraries

In [2]:
from IPython.display import Markdown, display
import boto3
from botocore.exceptions import ClientError
import json
import io

## Retrieve Secrets from AWS Secrets Manager

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Initialize OpenAI Client

In [4]:
from openai import OpenAI
openai_api_key  = get_secret('openai')['api_key']

client = OpenAI(api_key=openai_api_key)

## File Search API

### Introduction to File Search
File search API enables efficient retrieval of relevant information 
from uploaded files by leveraging vector-based indexing. This feature is particularly useful 
for searching large datasets, extracting insights, and improving retrieval-augmented generation (RAG) applications.

Unlike traditional keyword-based searches, the Responses API uses embeddings 
to identify semantically relevant content, making it ideal for analyzing structured 
and unstructured text data (OpenAI, 2025).

For more details, visit the official OpenAI documentation: 
[File Search in Responses API](https://platform.openai.com/docs/guides/tools-file-search).

### Create a Vector Store

In [5]:
vector_store = client.vector_stores.create(
    name="my_vector_store"
)
vector_store_id = vector_store.id
print(vector_store_id)

vs_691263fce7308191a5eda682ae982c96


### Upload Files

In [6]:
with open('tweet_text.json', 'rb') as f:
    file = client.files.create(
        file=f,            # file-like object
        purpose="assistants"
    )

file_id = file.id
print(file_id)

file-A8LN3MwsAMNztvMiy7Czd4


### Attach File to Vector Store

In [7]:
attach_status =client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_id
            )

print(attach_status.id)

file-A8LN3MwsAMNztvMiy7Czd4


### Query the Vector Store

In [8]:
query = "the latest development in generativeAI"

In [9]:
search_results = client.vector_stores.search(
    vector_store_id=vector_store_id,
    query=query
)

for result in search_results.data[:5]:
    print(result.content[0].text[:100] + '\n Relevant score: ' + str(result.score))

They have been VERY clear that Chatgpt, these AI videos‚Ä¶"
  }
},
{
  "_id": {
    "$oid": "68e56b1e1
 Relevant score: 0.6760340526672118
They have been VERY clear that Chatgpt, these AI videos‚Ä¶"
  }
},
{
  "_id": {
    "$oid": "68e56b1f1
 Relevant score: 0.6010846063542131
Learning Plans. Use coupon code ùêíùêèùêãùüëùüé at checkout.üéØ\n\nJoin Now üëâ https://t.co/LS2JuCrVmz\n\n#AI #Ce
 Relevant score: 0.5987929345462765
Create stunning, cinematic videos from a prompt‚Äînow with audio, physics, and cameos. One prompt = en
 Relevant score: 0.5911156596667607
They have been VERY clear that Chatgpt, these AI videos‚Ä¶"
  }
},
{
  "_id": {
    "$oid": "68e56b1d1
 Relevant score: 0.5630078302657743


## OpenAI Response API

### Simple Response

In [10]:
simple_response = client.responses.create(
  model="gpt-4o",
  input=[
      {
          "role": "user",
          "content": query
      }
  ]
)

In [11]:
display(Markdown(simple_response.output_text))

As of the latest updates in 2023, several key developments in generative AI are worth noting:

1. **Multimodal Models**: There has been significant progress in multimodal AI systems that can understand and generate text, images, audio, and video simultaneously. These models are becoming more integrated, allowing for seamless transitions and understanding across different types of media.

2. **More Efficient Training**: Researchers have been focusing on reducing the resources required to train large AI models. This includes advancements in sparse models, where only parts of the network are activated at a time, reducing computation costs.

3. **Improved Text Understanding and Generation**: Language models, like GPT-4 and its successors, have become even more sophisticated in generating contextually relevant and coherent text. This includes better handling of nuanced tasks like complex reasoning and detailed summarization.

4. **Applications in Creative Fields**: Generative AI is being increasingly used in art, music, and design, with tools that assist artists in creating novel works or enhancing creativity. These applications allow for more interactive and dynamic collaboration between humans and AI.

5. **Ethical and Responsible AI**: There is a growing emphasis on developing AI responsibly, with more research into bias mitigation, transparency, and accountability in generative AI systems. Many organizations are publishing guidelines and frameworks to ensure ethical AI usage.

6. **Customized AI Solutions**: Tailoring AI models for specific industries or businesses is becoming more common. These specialized models are trained on domain-specific data, offering more relevant insights and solutions.

7. **Advancements in Large Language Models (LLMs)**: Larger and more capable LLMs are being released, with enhanced capabilities in understanding context and maintaining coherent long-form conversations.

These developments are rapidly evolving, and ongoing research continues to push the boundaries of what generative AI can achieve.

### File Search Response

In [12]:

file_search_response = client.responses.create(
    input= query,
    model="gpt-4o",
    temperature = 0,
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    }]
)

In [13]:
display(Markdown(file_search_response.output_text))


The latest developments in generative AI include several exciting advancements:

1. **OpenAI's Sora2**: This tool allows users to create cinematic videos from a prompt, now with added features like audio, physics, and cameos.

2. **Generative AI in Creative Industries**: AI is transforming creative fields by enabling the creation of content such as videos and music without traditional production resources.

3. **AI in Finance**: Generative AI is being used for chatbots, instant loan scoring, fraud detection, and robo-advisors, revolutionizing the finance sector.

4. **AI in Healthcare**: Generative AI is supporting doctors with data interpretation and preliminary analysis, enhancing patient care.

5. **AI in Marketing**: Generative AI is changing content marketing workflows, reducing costs, and improving results.

These developments highlight the diverse applications and transformative potential of generative AI across various industries.

## Web Search API

### Introduction to Web Search
The OpenAI Web Search tool allows models to retrieve real-time information from the internet. 
This capability is particularly useful for obtaining up-to-date data, fact-checking, and expanding knowledge 
without relying solely on pre-trained information. 

By leveraging OpenAI's web search functionality, the Responses API can fetch external data 
and provide accurate, relevant results in real time (OpenAI, 2025). 
This feature enhances applications that require the latest insights, such as news aggregation, research, 
or dynamic content generation.

For more details, visit the official OpenAI documentation: 
[Web Search in Responses API](https://platform.openai.com/docs/guides/tools-web-search).

### Perform Web Search

In [14]:
web_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [16]:
display(Markdown(web_search_response.output_text))

Here‚Äôs a comprehensive, up-to-date overview of the **latest developments in generative AI as of November 10, 2025**, structured into key themes and innovations. This detailed summary exceeds 700 words and includes multiple citations per paragraph.

---

##  1. New Model Releases & Innovations

- **OpenAI‚Äôs GPT‚Äë5 (Launched August 7, 2025)**  
  OpenAI released GPT‚Äë5, a sophisticated multimodal foundation model, on August 7, 2025. It became accessible through ChatGPT, Microsoft Copilot, and the OpenAI API. The system features multiple internal models‚Äîfast (‚Äúmain‚Äù and ‚Äúmain‚Äëmini‚Äù), reasoning (‚Äúthinking‚Äù and ‚Äúthinking‚Äëmini‚Äù), and even a ‚Äúthinking‚Äënano‚Äù version for efficiency. A real-time router dynamically assigns tasks to the appropriate model, enhancing adaptability and performance. GPT‚Äë5 also brought agentic capabilities, such as autonomous desktop setups and web browsing to execute tasks.([en.wikipedia.org](https://en.wikipedia.org/wiki/GPT-5?utm_source=openai))

- **OpenAI‚Äôs o4‚Äëmini (Released April 16, 2025)**  
  Prior to GPT‚Äë5, OpenAI introduced o4‚Äëmini (and an enhanced ‚Äúhigh‚Äù version) on April 16, 2025. These models support both text and image inputs, enabling analysis of whiteboard sketches and performing chain-of-thought reasoning. The high variant trades speed for accuracy. Applications span forecasting, healthcare diagnostics, and financial compliance.([en.wikipedia.org](https://en.wikipedia.org/wiki/OpenAI_o4-mini?utm_source=openai))

- **Open-Weight Models: gpt‚Äëoss‚Äë20b and gpt‚Äëoss‚Äë120b (Released August 2025)**  
  In a landmark move, OpenAI unveiled its first open-weight models since GPT‚Äë2: gpt‚Äëoss‚Äë20b and gpt‚Äëoss‚Äë120b. These text-based models support chain-of-thought reasoning and are compatible with local hardware. gpt‚Äëoss‚Äë20b can run on consumer PCs with Snapdragon processors or NVIDIA GPUs, barely needing powerful cloud infrastructure. The heavier gpt‚Äëoss‚Äë120b requires robust hardware (like RTX PRO). These models are accessible via platforms such as Hugging Face and Ollama.([windowscentral.com](https://www.windowscentral.com/artificial-intelligence/openai-chatgpt/openai-launches-two-gpt-models-theyre-not-gpt-5-but-they-run-locally-on-snapdragon-pcs-and-nvidia-rtx-gpus?utm_source=openai))

- **Alibaba‚Äôs Qwen Model Family (Latest: Qwen3‚ÄëNext and Qwen3‚ÄëOmni in September 2025; Thinking Mode in early November)**  
  Alibaba‚Äôs Qwen series has seen rapid expansion. In September 2025, the company released Qwen3‚ÄëNext (augmented with hybrid attention, MoE architecture, and multi-token prediction) and Qwen3‚ÄëOmni, a capable multimodal model handling text, images, audio, and video. Qwen3‚ÄëMax preceded them (September 5), but ‚Äúthinking mode‚Äù for deeper reasoning became publicly available in early November 2025.([en.wikipedia.org](https://en.wikipedia.org/wiki/Qwen?utm_source=openai))

- **Google DeepMind‚Äôs Nano Banana ("Gemini 2.5 Flash Image") (Released August 26, 2025)**  
  Google‚Äôs image generation tool, nickname ‚ÄúNano Banana,‚Äù officially Gemini 2.5 Flash Image, launched August 26, 2025. Known for creating photorealistic ‚Äú3D figurine‚Äù styles, it allows editing like hairstyle or backdrop changes with consistency and embedded SynthID watermarking. Integrated into Gemini app, Google AI Studio, and Vertex AI, it surged adoption‚Äîattracting over 10 million new users and handling more than 200 million edits in its initial weeks.([en.wikipedia.org](https://en.wikipedia.org/wiki/Nano_Banana?utm_source=openai))

---

##  2. Generative AI in Video & 3D Worlds

- **Runway‚Äôs Gen‚Äë4 (Released March 31, 2025)**  
  Runway Gen‚Äë4 is a text-to-video generative model, outputting 5‚Äì10 second video clips (720p, 24 fps) from text prompts and reference frames. It emphasizes character consistency, realistic motion, and camera simulation, though consistency across clips remains limited. It‚Äôs available via API and web with subscription-based access.([en.wikipedia.org](https://en.wikipedia.org/wiki/Gen-4_%28AI_image_and_video_model%29?utm_source=openai))

- **OpenAI‚Äôs Sora 2 and Sora App (Announced September 30, 2025)**  
  On September 30, 2025, OpenAI unveiled Sora 2‚Äîa video/audio generative model‚Äîand a social-style Sora app for sharing short AI-generated clips. The model supports synchronized dialogue, multi-shot consistency, and a feed-centric interface. The launch raised critical concerns around copyright, deepfake regulation, and content moderation.([champaignmagazine.com](https://champaignmagazine.com/2025/10/05/ai-by-ai-weekly-top-5-september-29-october-5-2025/?utm_source=openai))

- **DeepMind‚Äôs Genie 3 (Announced Summer 2025)**  
  DeepMind introduced Genie 3‚Äîcapable of generating interactive, physics-aware 3D environments‚Äîand releasing them as explorable worlds at real-time frame rates. These environments retain object memory and consistency, acting more like dynamic simulations than static renders‚Äîa noteworthy advance toward AI in game development.([scrumlaunch.com](https://www.scrumlaunch.com/blog/top-5-breakthrough-ai-launches-summer-2025?utm_source=openai))

- **Meta‚Äôs AssetGen 2.0 (Announced mid‚ÄëApril 2025)**  
  Meta‚Äôs AssetGen 2.0 enhances text- and image-conditioned 3D asset generation via a single-stage 3D diffusion model and a TextureGen component for high-res textures. Initially used in-house, it will soon enable creators on Horizon to generate 3D scenes via prompts.([linkedin.com](https://www.linkedin.com/pulse/top-5-generative-ai-news-updates-from-week-20-2025-11th-shankar-u4wbc?utm_source=openai))

---

##  3. Agentic and Autonomous AI Systems

- **Rise of Fully Autonomous AI Agents**  
  In 2025, artificial agents evolved from rule-based systems into strategic operators, automating complex tasks like contract negotiation, marketing strategy planning, and even end-to-end product development. These agents adapt dynamically to their environment, fundamentally shifting human-AI collaboration in enterprises.([vocal.media](https://vocal.media/education/5-major-generative-ai-breakthroughs-that-are-shaping-2025?utm_source=openai))

- **Anthropic‚Äôs Claude 3.7 ‚ÄúSonnet‚Äù with Hybrid Reasoning Modes**  
  Claude 3.7 ‚ÄúSonnet,‚Äù released in early 2025, supports hybrid modes‚Äînear-instant responses or extended reasoning where the model self-reflects step-by-step. Users can control effort vs. speed, with extended reasoning handling up to 128K tokens.([ts2.tech](https://ts2.tech/en/generative-ai-revolution-2025-breakthroughs-industry-disruption-and-predictions-through-2035/?utm_source=openai))

---

##  4. Hardware & Efficiency Improvements

- **Model Efficiency, Cost Reduction, & Open-Source Accessibility**  
  Innovations in model architecture have made AI more efficient and accessible. Open-source models now challenge proprietary ones, requiring fewer compute resources and benefiting startups and researchers. Synthetic data is widely used for training, improving cost and privacy.([techinnoai.com](https://techinnoai.com/artificial-intelligence-machine-learning/generative-ai-in-2025-breakthroughs-real-world-applications/?utm_source=openai))

- **Edge AI and Privacy:**  
  Apple, among others, is prioritizing on-device generative AI, integrating features like Genmoji (custom emojis), Visual Intelligence (photo analysis), and enhanced writing tools across iOS/macOS‚Äîsetting trends for privacy-focused AI deployment.([launchconsulting.com](https://www.launchconsulting.com/posts/may-2025-ai-breakthroughs-what-every-business-leader-needs-to-know?utm_source=openai))

- **Hardware for Generative AI (Upcoming)**  
  Qualcomm announced AI inference accelerators (AI200 and AI250), scheduled for release in 2026 and 2027. These Hexagon-based NPUs support advanced AI workloads with micro-tile inferencing and Gen AI model encryption, targeting improved efficiency and scale in data centers.([tomshardware.com](https://www.tomshardware.com/tech-industry/artificial-intelligence/qualcomm-unveils-ai200-and-ai250-ai-inference-accelerators-hexagon-takes-on-amd-and-nvidia-in-the-booming-data-center-realm?utm_source=openai))

---

##  5. Cross-Domain & Scientific Generative AI

- **Diffusion Models for Protein Design**  
  A recent research review highlights the role of diffusion models like RFDiffusion in de novo protein design‚Äîsurpassing traditional methods in success rate and reducing experimental cost across over 25 tasks.([arxiv.org](https://arxiv.org/abs/2504.16479?utm_source=openai))

- **Generative AI in Multimedia & Semantic Communication**  
  Another study introduces a semantic information-theoretic framework for generative multimedia communication. Concepts like semantic entropy and semantic channel capacity were adapted to improve fidelity in human perception, charting future research directions.([arxiv.org](https://arxiv.org/abs/2508.17163?utm_source=openai))

- **Reframing Evolutionary Computation as Generative AI (NatGenAI)**  
  A new perspective treats evolutionary computation as a form of "Natural Generative AI," allowing exploration beyond training data via search-based, disruptive evolutionary mechanisms, fostering creative generation beyond statistical learning.([arxiv.org](https://arxiv.org/abs/2510.08590?utm_source=openai))

---

##  Summary of Key Themes

- The **landscape of generative AI** is evolving rapidly with new models like GPT‚Äë5, Qwen3‚ÄëNext/Omni, Nano Banana, and svelte open-weight variants pushing the frontiers across multimodal, efficient, and localized use.
- **Video-, 3D-, and interactive content** generation is gaining momentum through innovations like Runway Gen‚Äë4, Sora 2, Genie 3, and AssetGen 2.0, extending AI from static content to immersive worlds.
- **Autonomous AI agents** are becoming strategically operative, as seen in Claude 3.7 and other agentic frameworks.
- **Hardware improvements** and **efficiency**‚Äîfrom edge AI on devices to high-performance data center cores‚Äîstrengthen the infrastructure supporting generative systems.
- **Scientific and theoretical advances** expand generative AI‚Äôs reach into molecular biology, semantic communication, and algorithmic innovation.

---

In conclusion, **as of November 2025**, generative AI is experiencing an extraordinary surge in capacity, accessibility, and application range‚Äîfrom deep reasoning and creativity to real-time interaction and scientific discovery. Let me know if you'd like a deeper dive into any of these developments!

### Stateful Response

The OpenAI Responses API includes a stateful feature that enables continuity in interactions. 
By using the `response_id`, a conversation can persist across multiple queries, 
allowing users to refine or expand upon previous searches. This is particularly useful for iterative research, 
dynamic content generation, and applications that require follow-up queries based on prior responses.

In [17]:
fetched_response = client.responses.retrieve(response_id=web_search_response.id)
display(Markdown(fetched_response.output_text[:100]))

Here‚Äôs a comprehensive, up-to-date overview of the **latest developments in generative AI as of Nove

### Continue Query with Web Search

In [18]:
continue_query = 'find different news'

continue_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= continue_query,
    previous_response_id=web_search_response.id,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [19]:
display(Markdown(continue_search_response.output_text))

Here are several **recent and diverse developments in generative AI**, drawing on trusted news sources. Each paragraph includes at least two citations and is structured for clarity and depth.

‚Äî

Google‚Äôs push to support enterprise AI environments continues to deepen. Just a few days ago, Google announced enhancements to its **Vertex‚ÄØAI Agent Builder**, introducing new observability and deployment tools designed to help companies deploy AI agents more reliably and at scale. This move strengthens Google's competitive position against Microsoft and AWS in the enterprise AI market([infoworld.com](https://www.infoworld.com/article/4085736/google-boosts-vertex-ai-agent-builder-with-new-observability-and-deployment-tools.html?utm_source=openai)).

‚Äî

Meanwhile, the **semiconductor industry** is experiencing what many are calling a ‚Äúsilicon supercycle.‚Äù Driven by surging demand for generative AI workloads, data centers are driving unprecedented demand for cutting-edge chips. As a result, chipmakers find themselves at a pivotal growth inflection point reshaping the global technology and economic landscape([markets.financialcontent.com](https://markets.financialcontent.com/wral/article/tokenring-2025-11-10-the-silicon-supercycle-how-ai-data-centers-are-forging-a-new-era-for-semiconductors?utm_source=openai)).

‚Äî

In the pharmaceutical and biotech arena, **Insilico Medicine** unveiled a breakthrough in generative AI-powered drug discovery. On November‚ÄØ7,‚ÄØ2025, the company announced a portfolio of newly discovered cardiometabolic disease molecules developed using generative AI models‚Äîmarking a significant stride in clinical-stage AI innovation([eurekalert.org](https://www.eurekalert.org/news-releases/1105110?utm_source=openai)).

‚Äî

Over in academia, **the University of Sydney‚Äôs Informatics Hub** is helping Australia's researchers adopt generative AI tools more effectively. This initiative provides guidance and frameworks for researchers navigating the expanding‚Äîbut potentially risky‚Äîopportunities generative AI presents in academia([sydney.edu.au](https://www.sydney.edu.au/news-opinion/news/2025/11/04/sydney-informatics-hub-unlocks-generative-ai-for-research.html?utm_source=openai)).

‚Äî

Across the Atlantic, experts in the **UK are sounding an alarm** over a new societal side-effect: "AI-powered nimbyism." Advanced scanning tools are increasingly being used by citizens to identify grounds for objecting to planning applications, potentially undermining government housebuilding efforts([theguardian.com](https://www.theguardian.com/politics/2025/nov/09/ai-powered-nimbyism-could-grind-uk-planning-system-to-a-halt-experts-warn?utm_source=openai)).

‚Äî

These developments span enterprise deployment tools, semiconductor market dynamics, drug discovery, academic adoption, and societal implications of AI‚Äîreflecting the breadth and complexity of generative AI‚Äôs current impact. If you‚Äôd like a detailed deep dive into any of these themes‚Äîor to explore additional fronts like model launches, regulation, or global competition‚Äîjust say the word!

### Combining File Search and Web Search

This is an example of using file search to analyze private data and web search to retrieve public or the latest data. 
The Responses API allows developers to integrate these tools to enhance retrieval-augmented generation (RAG) applications. 
By combining file search with web search, users can leverage structured internal knowledge while also retrieving real-time 
information from external sources, ensuring comprehensive and up-to-date responses. 

In [20]:
combined_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    temperature = 0,
    instructions="Retrieve the results from the file search first, and use the web search tool to expand the results with news resources",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    },
        {
            "type": "web_search"
        }
    ]
)

In [21]:
display(Markdown(combined_search_response.output_text))

Here‚Äôs a comprehensive and up-to-date overview of the **latest developments in generative AI** as of **November 10, 2025**, with detailed citations and structured analysis:

---

##  Major Model Releases and Enhancements

- **OpenAI GPT‚Äë5**  
  Released on **August 7, 2025**, GPT‚Äë5 is a multimodal foundation model accessible via ChatGPT and Microsoft Copilot, offering state-of-the-art performance across text, image, and reasoning tasks ([en.wikipedia.org](https://en.wikipedia.org/wiki/GPT-5?utm_source=openai)).  
  It features a massive **256,000-token context window**, a ‚Äúthinking mode‚Äù for planning and error correction, and seamless integration of text, image, and voice modalities. Early benchmarks show it outperforms GPT‚Äë4 by over 40% on complex logical reasoning tasks ([scrumlaunch.com](https://www.scrumlaunch.com/blog/top-5-breakthrough-ai-launches-summer-2025?utm_source=openai)).

- **OpenAI o4‚Äëmini**  
  Launched on **April 16, 2025**, this lightweight, multimodal reasoning model supports both text and image inputs, including whiteboard sketch analysis and chain-of-thought reasoning. A higher-accuracy variant, o4‚Äëmini‚Äëhigh, is available to paid users ([en.wikipedia.org](https://en.wikipedia.org/wiki/OpenAI_o4-mini?utm_source=openai)).

- **OpenAI GPT‚Äë4.1**  
  Released on **April 14, 2025**, GPT‚Äë4.1 (along with mini and nano variants) focuses on improved coding capabilities and is available via API and ChatGPT Plus/Pro tiers ([en.wikipedia.org](https://en.wikipedia.org/wiki/GPT-4.1?utm_source=openai)).

- **Anthropic Claude 3.7 Sonnet**  
  Introduced in **February 2025**, this hybrid reasoning model includes an ‚Äúextended thinking mode‚Äù that enables self-reflection before responding, improving performance in math, physics, coding, and instruction-following tasks. It‚Äôs available across all Claude plans, with extended thinking reserved for paid tiers ([reuters.com](https://www.reuters.com/technology/artificial-intelligence/anthropic-launches-advanced-ai-hybrid-reasoning-model-2025-02-24/?utm_source=openai)).

- **OpenAI gpt‚Äëoss‚Äë120b and gpt‚Äëoss‚Äë20b**  
  Released in **August 2025**, these are OpenAI‚Äôs first open-weight models since GPT‚Äë2. They support chain-of-thought reasoning and can run locally‚Äîgpt‚Äëoss‚Äë20b on consumer hardware (e.g., PCs with 16‚ÄØGB RAM or Snapdragon processors), and gpt‚Äëoss‚Äë120b on more powerful setups like NVIDIA RTX GPUs ([windowscentral.com](https://www.windowscentral.com/artificial-intelligence/openai-chatgpt/openai-launches-two-gpt-models-theyre-not-gpt-5-but-they-run-locally-on-snapdragon-pcs-and-nvidia-rtx-gpus?utm_source=openai)).  
  AWS has made these models available via Bedrock and SageMaker, offering significant cost-efficiency‚Äîthree times cheaper than Gemini, five times cheaper than DeepSeek‚ÄëR1, and twice as efficient as OpenAI‚Äôs o4 model ([timesofindia.indiatimes.com](https://timesofindia.indiatimes.com/technology/tech-news/amazon-announces-first-ever-availability-of-openai-models-for-its-cloud-customers-company-says-the-addition-of-/articleshow/123125170.cms?utm_source=openai)).

---

##  Google DeepMind and Gemini Ecosystem

- **Gemini 2.5 Pro & Flash**  
  Google‚Äôs most advanced models, Gemini 2.5 Pro and Flash, were released in early 2025. Gemini 2.5 Pro features enhanced reasoning, coding, and a ‚ÄúDeep Think‚Äù mode for complex tasks, while Flash offers faster responses. Both support native audio output and improved security ([en.wikipedia.org](https://en.wikipedia.org/wiki/Gemini_%28language_model%29?utm_source=openai)).

- **Nano Banana (Gemini 2.5 Flash Image)**  
  Launched on **August 26, 2025**, this image generation and editing model went viral for its photorealistic ‚Äú3D figurine‚Äù outputs. It supports subject consistency, multi-image fusion, and includes SynthID watermarking for AI-generated content identification. It quickly attracted over 10 million new users and facilitated more than 200 million image edits ([en.wikipedia.org](https://en.wikipedia.org/wiki/Nano_Banana?utm_source=openai)).

---

##  Multimedia and Creative AI Innovations

- **Veo 3 (Google)**  
  Released in **June 2025**, Veo 3 is a text-to-video model that generates video with synchronized sound effects, ambient noise, and dialogue‚Äîeliminating a major post-production hurdle ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

- **OpenAI Sora**  
  Integrated into ChatGPT for Plus users in **May 2025**, Sora enables conversational video generation, allowing users to refine visual outputs within the chat interface ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

- **Midjourney V7**  
  Launched in **May 2025**, this update introduces ‚ÄúNeRF-like‚Äù 3D model generation and text-to-video capabilities, offering artists a unified workflow for still images, video, and 3D assets ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

- **Runway Gen‚Äë4**  
  Released in **April 2025**, Gen‚Äë4 includes a ‚ÄúCharacter Lock‚Äù feature that maintains a character‚Äôs appearance across multiple generated scenes‚Äîhighly requested by creators ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

- **Stability AI Stable Audio 2.0**  
  Released in **March 2025**, this update enables generation of longer, more coherent musical pieces (up to five minutes), with advanced controls for structure, instrumentation, and emotional tone ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

---

##  Broader Trends and Infrastructure

- **Unified Multimodal Models**  
  2025 has seen a shift toward models that seamlessly handle text, image, video, and audio in a unified architecture‚Äîimproving efficiency and context-awareness ([aiinsight.blog](https://aiinsight.blog/generative-ai-advancements-in-2025?utm_source=openai)).

- **Open-Source and Efficient Models**  
  Open-source models like Hugging Face‚Äôs SmolLM3 (3B parameters, 128K context window) and Liquid AI‚Äôs LFM2 series (350M‚Äì1.2B parameters) are gaining traction for their efficiency, accessibility, and strong performance ([linkedin.com](https://www.linkedin.com/pulse/top-generative-ai-updates-week-july-2-2025-kalyan-ks-oorrc?utm_source=openai)).

- **Agentic AI and Autonomous Agents**  
  Generative AI is evolving toward fully autonomous agents capable of strategic decision-making, adapting to feedback, and managing complex tasks independently‚Äîtransforming how businesses operate ([vocal.media](https://vocal.media/education/5-major-generative-ai-breakthroughs-that-are-shaping-2025?utm_source=openai)).

- **Hardware and Infrastructure Advances**  
  Qualcomm recently announced its AI200 and AI250 inference accelerators, targeting data center workloads with features like micro-tile inferencing, Gen AI model encryption, and high memory capacity. These are slated for release in 2026 and 2027 ([tomshardware.com](https://www.tomshardware.com/tech-industry/artificial-intelligence/qualcomm-unveils-ai200-and-ai250-ai-inference-accelerators-hexagon-takes-on-amd-and-nvidia-in-the-booming-data-center-realm?utm_source=openai)).

---

##  Summary Table

| Category                  | Key Developments                                                                 |
|---------------------------|----------------------------------------------------------------------------------|
| **Large Language Models** | GPT‚Äë5, o4‚Äëmini, GPT‚Äë4.1, Claude 3.7 Sonnet, gpt‚Äëoss models                        |
| **Multimodal & Creative** | Gemini 2.5 Pro/Flash, Nano Banana, Veo 3, Sora, Midjourney V7, Runway Gen‚Äë4      |
| **Open-Source & Efficiency** | SmolLM3, LFM2 series, open-weight gpt‚Äëoss models                              |
| **Agentic AI**            | Autonomous agents with strategic capabilities                                   |
| **Infrastructure**        | Qualcomm AI200/AI250 accelerators                                               |

---

##  Final Thoughts

As of **November 10, 2025**, generative AI is advancing rapidly across multiple dimensions:

- **Multimodality** is becoming the norm, with models like GPT‚Äë5 and Gemini 2.5 handling text, image, audio, and video seamlessly.
- **Reasoning and autonomy** are central, with models like Claude 3.7 Sonnet and GPT‚Äë5‚Äôs thinking mode enabling deeper, more reliable outputs.
- **Accessibility** is improving through open-weight models and efficient open-source alternatives.
- **Creative tools** are maturing, enabling video, 3D, and audio generation with unprecedented ease.
- **Hardware innovation** is keeping pace, ensuring that infrastructure can support these powerful models.

Let me know if you'd like a deeper dive into any specific model, company, or application area!

# üß© Try It Yourself: Two-Step RAG (Private Data + Combined Search)

## Step 1 ‚Äî Upload & Create Vector Store
1. Upload a short text file (e.g., `my_notes.txt`) to your notebook instance.  
2. Create a **vector store** and **ingest** your uploaded file.  
3. Run a simple test query to verify retrieval:  

In [28]:
vector_store = client.vector_stores.create(
    name="my_vector_store"
)
vector_store_id = vector_store.id
print(vector_store_id)

with open('Intel Report MLAI.txt', 'rb') as f:
    file = client.files.create(
        file=f,            # file-like object
        purpose="assistants"
    )

file_id = file.id
print(file_id)

attach_status =client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_id
            )

print(attach_status.id)


vs_69126be4ff48819187617bd0a994b745
file-DfRnmBBdrsw7Zwfz5LEQ83
file-DfRnmBBdrsw7Zwfz5LEQ83


In [29]:
query = "what is the article about?"

In [30]:
search_results = client.vector_stores.search(
    vector_store_id=vector_store_id,
    query=query
)

for result in search_results.data[:5]:
    print(result.content[0].text[:100] + '\n Relevant score: ' + str(result.score))

Summary

Since the December 2024 collapse of the government of Bashar Al Asad, Syrians have pursue
 Relevant score: 0.4050559436347581
security support since 2015. The SDF remains the principal U.S. partner in the fight against the Isl
 Relevant score: 0.3629165098590976


## Step 2 ‚Äî Combine File Search with Web Search
1. Enable both **file_search** and **web_search** in the Responses API.  
2. Use a prompt that asks the model to merge insights from both sources.  
   > Example: ‚ÄúUsing my uploaded notes and the latest web information, summarize the current trends on this topic.‚Äù  
3. Review how the answer from your file and **current info** from the web.

‚úÖ You‚Äôve created a RAG system that combines **private** and **public** data for comprehensive, up-to-date analysis.


In [32]:
simple_response = client.responses.create(
  model="gpt-4o",
  input=[
      {
          "role": "user",
          "content": query
      }
  ]
)

In [33]:
display(Markdown(simple_response.output_text))

Sure, I can help with that. Could you provide some details or key points from the article?

In [34]:
file_search_response = client.responses.create(
    input= query,
    model="gpt-4o",
    temperature = 0,
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    }]
)

In [35]:
display(Markdown(file_search_response.output_text))

The article discusses the situation in Syria following the collapse of Bashar Al Asad's government in December 2024. It covers the political and economic opportunities that have arisen, as well as the challenges posed by internal tensions and external pressures. The interim government, led by Ahmed Al Sharaa, is working on a transitional constitutional framework and planning elections. However, the government does not control all of Syria, with various regions under different authorities, including Kurdish-led forces and Druze communities. The article also touches on the involvement of neighboring countries like Turkey and Israel, and the role of the United States and European Union in supporting the interim government through sanctions relief and dialogue with Kurdish forces.

In [36]:
web_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [37]:
display(Markdown(web_search_response.output_text))

Could you please specify which article you're referring to or provide more details about it?

In [38]:
combined_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    temperature = 0,
    instructions="Using my uploaded notes and the latest web information, summarize the current trends on this topic.",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    },
        {
            "type": "web_search"
        }
    ]
)

In [39]:
display(Markdown(combined_search_response.output_text))

The article discusses the situation in Syria following the collapse of Bashar Al Asad's government in December 2024. It highlights the political and economic opportunities that have emerged, as well as the challenges posed by internal tensions and external pressures. The interim government, led by Ahmed Al Sharaa, is navigating a complex transition with plans for elections and a new constitutional framework. However, the government struggles to maintain control over the entire country, with various regions under different authorities, including Kurdish-led forces and Druze communities. The involvement of foreign powers like Turkey and Israel further complicates the situation. The U.S. and EU have extended sanctions relief to support the interim government, but economic recovery is hindered by longstanding issues.