# Multimodal Capabilities of Gemini, Claude, and GPT-4

This notebook demonstrates the advanced multimodal capabilities of three leading AI models: Gemini, Claude, and GPT-4. We'll explore their abilities across various modalities including video, audio, images, and text, showcasing ten advanced features in ten different domains.

## Setup

First, let's install the necessary libraries and set up API keys. Note: You'll need to replace the placeholder API keys with your actual keys.

In [None]:
!pip install google-generativeai openai anthropic

import os
import google.generativeai as genai
import openai
from anthropic import Anthropic, HUMAN_PROMPT, AI_PROMPT

# Set up API keys (replace with your actual keys)
os.environ['GOOGLE_API_KEY'] = 'your_google_api_key'
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key'
os.environ['ANTHROPIC_API_KEY'] = 'your_anthropic_api_key'

# Initialize clients
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
openai.api_key = os.environ['OPENAI_API_KEY']
anthropic = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])

## 1. Code Generation and Analysis (Gemini)

Gemini excels at understanding and generating code across multiple programming languages.

In [None]:
code_prompt = """
Generate a Python function that implements a basic neural network 
using NumPy. The function should take input data, hidden layer size, 
and output size as parameters.
"""

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(code_prompt)
print(response.text)

## 2. Image Analysis and Generation (DALL-E 3 via GPT-4)

GPT-4 can analyze images and generate detailed descriptions. It can also use DALL-E 3 to create images based on text prompts.

In [None]:
# Image analysis (replace 'image_url' with an actual image URL)
image_url = 'https://example.com/sample_image.jpg'
analysis_prompt = f"Analyze the image at {image_url} and describe its contents in detail."

response = openai.ChatCompletion.create(
    model="gpt-4-vision-preview",
    messages=[
        {"role": "user", "content": [{"type": "text", "text": analysis_prompt}, {"type": "image_url", "image_url": image_url}]}
    ]
)
print(response.choices[0].message.content)

# Image generation
generation_prompt = "Create an image of a futuristic city with flying cars and holographic billboards."
response = openai.Image.create(prompt=generation_prompt, n=1, size="1024x1024")
print(f"Generated image URL: {response['data'][0]['url']}")

## 3. Natural Language Processing (Claude)

Claude excels in various NLP tasks, including sentiment analysis, named entity recognition, and text summarization.

In [None]:
nlp_prompt = """
Perform the following NLP tasks on this text:
'Apple Inc. announced today that its new iPhone 15 Pro has revolutionized mobile photography with its advanced AI-powered camera system.'

1. Sentiment Analysis
2. Named Entity Recognition
3. Key Information Extraction
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {nlp_prompt}{AI_PROMPT}",
    max_tokens_to_sample=300
)
print(response.completion)

## 4. Video Analysis (Gemini)

Gemini can analyze video content, extracting information about scenes, actions, and objects.

In [None]:
# Note: This is a placeholder. Actual implementation would require video input.
video_prompt = """
Analyze the following video and provide a detailed description of its contents, 
including key scenes, actions, and any text or speech present.

[Video content would be provided here]
"""

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content(video_prompt)
print(response.text)

## 5. Audio Transcription and Analysis (GPT-4)

GPT-4 can transcribe audio and perform analysis on the transcribed text.

In [None]:
# Note: This is a placeholder. Actual implementation would require audio input.
audio_prompt = """
Transcribe the following audio and analyze its content:
1. Identify the main topics discussed
2. Detect the speaker's emotion
3. Summarize key points

[Audio content would be provided here]
"""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": audio_prompt}]
)
print(response.choices[0].message.content)

## 6. Data Visualization (Claude)

Claude can provide guidance on creating effective data visualizations and explain complex charts.

In [None]:
viz_prompt = """
Create a Python script using matplotlib to visualize the following data:
Year: [2018, 2019, 2020, 2021, 2022]
Sales: [100, 120, 90, 150, 180]
Profit: [20, 25, 15, 30, 40]

Use a line chart for Sales and a bar chart for Profit in the same figure.
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {viz_prompt}{AI_PROMPT}",
    max_tokens_to_sample=500
)
print(response.completion)

## 7. Mathematical Problem Solving (Gemini)

Gemini can solve complex mathematical problems and provide step-by-step explanations.

In [None]:
math_prompt = """
Solve the following calculus problem and provide a step-by-step explanation:

Find the volume of the solid obtained by rotating the region bounded by 
y = x^2, y = 0, and x = 2 about the y-axis.
"""

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(math_prompt)
print(response.text)

## 8. Creative Writing (GPT-4)

GPT-4 excels in creative writing tasks, generating stories, poems, and scripts based on prompts.

In [None]:
writing_prompt = """
Write a short story (about 250 words) that combines elements of science fiction 
and romance. The story should be set on a space station orbiting a distant planet 
and involve two characters from different alien species.
"""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": writing_prompt}]
)
print(response.choices[0].message.content)

## 9. Language Translation and Cultural Context (Claude)

Claude can perform nuanced language translation, considering cultural context and idiomatic expressions.

In [None]:
translation_prompt = """
Translate the following English text to French, Spanish, and Japanese. 
For each translation, provide a brief explanation of any cultural nuances or idiomatic expressions that required special consideration:

"It's raining cats and dogs out there! Let's just Netflix and chill instead of going to the party."
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {translation_prompt}{AI_PROMPT}",
    max_tokens_to_sample=1000
)
print(response.completion)

## 10. Multimodal Reasoning (Gemini)

Gemini can combine information from multiple modalities (text, image, video) to solve complex problems or answer queries.

In [None]:
# Note: This is a placeholder. Actual implementation would require multimodal input.
multimodal_prompt = """
Analyze the following information and answer the question:

1. Text: "The graph shows the population growth of two species in a controlled environment over 10 years."
2. Image: [A line graph showing two intersecting curves]
3. Video: [A 30-second clip explaining predator-prey relationships]

Question: Based on the provided information, what type of ecological relationship 
is likely represented, and what factors might be influencing the population dynamics?
"""

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content(multimodal_prompt)
print(response.text)

## Conclusion

This notebook has demonstrated ten advanced features of Gemini, Claude, and GPT-4 across various domains, showcasing their multimodal capabilities in handling video, audio, images, and text. These AI models exhibit remarkable versatility in tasks ranging from code generation and mathematical problem-solving to creative writing and multimodal reasoning, highlighting the cutting-edge advancements in artificial intelligence.