# Multimodal Capabilities of Gemini, Claude, and GPT-4

This notebook demonstrates the advanced multimodal capabilities of three leading AI models: Gemini, Claude, and GPT-4. We'll explore their abilities across various modalities including video, audio, images, and text, showcasing ten advanced features in ten different domains.

## Setup

First, let's install the necessary libraries and set up API keys.

In [46]:
!pip install google-generativeai openai anthropic

import os
import google.generativeai as genai
import openai
from anthropic import Anthropic, HUMAN_PROMPT, AI_PROMPT


# Initialize clients
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
openai.api_key = os.environ['OPENAI_API_KEY']
anthropic = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])



In [29]:
!pip install --upgrade openai




## 1. Code Generation and Analysis (Gemini)

Gemini excels at understanding and generating code across multiple programming languages.

In [2]:
code_prompt = """
Generate a Python function that implements a basic neural network
using NumPy. The function should take input data, hidden layer size,
and output size as parameters.
"""

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(code_prompt)
print(response.text)

```python
import numpy as np

def neural_network(input_data, hidden_layer_size, output_size):
  """
  Implements a basic neural network using NumPy.

  Args:
    input_data: Input data.
    hidden_layer_size: Size of the hidden layer.
    output_size: Size of the output layer.

  Returns:
    Output of the neural network.
  """

  # Initialize weights and biases.
  weights1 = np.random.randn(input_data.shape[1], hidden_layer_size)
  biases1 = np.zeros((hidden_layer_size,))
  weights2 = np.random.randn(hidden_layer_size, output_size)
  biases2 = np.zeros((output_size,))

  # Forward pass.
  hidden_layer = np.dot(input_data, weights1) + biases1
  hidden_layer = np.maximum(hidden_layer, 0)  # ReLU activation function.
  output_layer = np.dot(hidden_layer, weights2) + biases2
  output_layer = np.softmax(output_layer)  # Softmax activation function.

  return output_layer
```


## 2. Image Analysis and Generation (DALL-E 3 via GPT-4)

GPT-4 can analyze images and generate detailed descriptions. It can also use DALL-E 3 to create images based on text prompts.

In [6]:
# Image generation
generation_prompt = "Create an image of a futuristic city with flying cars and holographic billboards."

response = openai.images.generate(
  prompt=generation_prompt,
  n=1,
  size="1024x1024"
)
print(f"Generated image URL: {response.data[0].url}")

Generated image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-9ocnp45utretxrig3vwbs6Fi/user-745SL8lDnLFvDTTmecLqoObC/img-44HXooyKV7wqZQhP5iwBDfoo.png?st=2024-09-03T05%3A42%3A29Z&se=2024-09-03T07%3A42%3A29Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-09-02T23%3A53%3A00Z&ske=2024-09-03T23%3A53%3A00Z&sks=b&skv=2024-08-04&sig=s1lV7ppsOyCDaTk6L/eiDNVUW8jOjYmR%2B6HSV/R7E4E%3D


## 3. Natural Language Processing (Claude)

Claude excels in various NLP tasks, including sentiment analysis, named entity recognition, and text summarization.

In [11]:
nlp_prompt = """
Perform the following NLP tasks on this text:
'Apple Inc. announced today that its new iPhone 15 Pro has revolutionized mobile photography with its advanced AI-powered camera system.'

1. Sentiment Analysis
2. Named Entity Recognition
3. Key Information Extraction
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {nlp_prompt}{AI_PROMPT}",
    max_tokens_to_sample=300
)
print(response.completion)

 Here are the results of performing those NLP tasks on the given text:

1. Sentiment Analysis:
The text has a positive sentiment. Keywords like "announced", "revolutionized", and "advanced" indicate excitement and positive sentiment around the new iPhone product.

2. Named Entity Recognition:
Entities found: 
- Apple Inc.: Organization
- iPhone 15 Pro: Product

3. Key Information Extraction:
- Company: Apple Inc.
- Product: iPhone 15 Pro 
- Key Features: 
    - Advanced AI-powered camera system
    - Revolutionized mobile photography

The key information extracted includes the company name, product name, and the key advanced feature being touted - the AI-powered camera system which has supposedly revolutionized mobile photography.


## 4. Video Analysis (Gemini)

Gemini can analyze video content, extracting information about scenes, actions, and objects.

In [14]:
video_url = "https://www.youtube.com/watch?v=8sQOxZpEszA"

video_prompt = f"""
Analyze the following video and provide a detailed description of its contents,
including key scenes, actions, and any text or speech present.

{video_url}
"""

response = model.generate_content(video_prompt)
print(response.text)


**Scene 1:**

* Black screen with the text "What is the importance of Education?" in white font.
* A montage of shots showing diverse students in classrooms and learning environments.

**Scene 2:**

* A young girl in a classroom asks her teacher, "Why do I have to learn math?"
* The teacher responds, "Math is a tool that helps you understand the world."
* The girl uses math to solve a problem in her everyday life.

**Scene 3:**

* A montage of shots showing people using math in their careers, such as engineers, scientists, and architects.
* Text on screen: "Math is essential for success in the 21st century."

**Scene 4:**

* A group of students collaborating on a project.
* Text on screen: "Education is more than just learning facts."
* The students learn from each other and develop critical thinking and problem-solving skills.

**Scene 5:**

* A teacher asking her students, "What are the most important things you've learned in school?"
* The students give answers such as:
    * "How t

## 5. Audio Transcription and Analysis (GPT-4)

GPT-4 can transcribe audio and perform analysis on the transcribed text.

In [37]:
from openai import OpenAI
client = OpenAI()

audio_path = "/content/03-01-03-01-01-01-01.wav"
audio_prompt = f"""
Transcribe the following audio and analyze its content:
1. Identify the main topics discussed
2. Detect the speaker's emotion
3. Summarize key points

{audio_path}
"""

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": audio_prompt}]
)
print(response.choices[0].message.content)

As an AI text-based model, I'm unable to process audio files or recognize human speech in them. I can only analyse, summarize and discuss text data. Please provide a text script to analyze.


## 6. Data Visualization (Claude)

Claude can provide guidance on creating effective data visualizations and explain complex charts.

In [18]:
viz_prompt = """
Create a Python script using matplotlib to visualize the following data:
Year: [2018, 2019, 2020, 2021, 2022]
Sales: [100, 120, 90, 150, 180]
Profit: [20, 25, 15, 30, 40]

Use a line chart for Sales and a bar chart for Profit in the same figure.
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {viz_prompt}{AI_PROMPT}",
    max_tokens_to_sample=500
)
print(response.completion)

 Here is the Python script to visualize the data as requested:

```python
import matplotlib.pyplot as plt

years = [2018, 2019, 2020, 2021, 2022]
sales = [100, 120, 90, 150, 180]
profit = [20, 25, 15, 30, 40]

fig, ax1 = plt.subplots()

ax2 = ax1.twinx()
ax1.plot(years, sales, color='blue')
ax2.bar(years, profit, color='green')

ax1.set_xlabel('Year')
ax1.set_ylabel('Sales', color='blue')
ax2.set_ylabel('Profit', color='green')

plt.title('Sales and Profit Over Years')

plt.show()
```

This scripts creates a figure with a line chart for Sales on the left y-axis and a bar chart for Profit on the right y-axis, with Years on the x-axis. The two charts share the same x-axis but have separate y-axes.


## 7. Mathematical Problem Solving (Gemini)

Gemini can solve complex mathematical problems and provide step-by-step explanations.

In [19]:
math_prompt = """
Solve the following calculus problem and provide a step-by-step explanation:

Find the volume of the solid obtained by rotating the region bounded by
y = x^2, y = 0, and x = 2 about the y-axis.
"""

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(math_prompt)
print(response.text)

**Step 1: Sketch the Region and the Solid**

Sketch the region and the solid generated by rotating it around the y-axis to visualize the problem.

**Step 2: Determine the Limits of Integration**

The region is defined by the inequalities: 
```
0 ≤ x ≤ 2 and 0 ≤ y ≤ x^2
```
So, the limits of integration are 0 and 2.

**Step 3: Use the Disk Method**

Since the solid is generated by rotating the region about the y-axis, we will use the Disk Method:

```
Volume = π∫[a,b] (radius)^2 dy
```

In this case, the radius of each disk is given by the distance from the y-axis to the edge of the region, which is simply `x`.

**Step 4: Substitute and Evaluate**

Substitute the limits of integration and the expression for the radius into the formula:

```
Volume = π∫[0,2] x^2 dy
```

**Step 5: Integrate**

Integrate with respect to `y`:

```
Volume = π[x^3/3] evaluated from 0 to 2
```

**Step 6: Simplify**

Evaluate the definite integral:

```
Volume = π[(2^3)/3 - 0]
```

**Step 7: Find the Final Answ

## 8. Creative Writing (GPT-4)

GPT-4 excels in creative writing tasks, generating stories, poems, and scripts based on prompts.

In [34]:
from openai import OpenAI
client = OpenAI()

writing_prompt = """
Write a short story (about 250 words) that combines elements of science fiction
and romance. The story should be set on a space station orbiting a distant planet
and involve two characters from different alien species.
"""

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": writing_prompt}]
)
print(response.choices[0].message.content)

In the deep vacuum of space, the Omega Station hovered silently in the orbit of a blue, luminescent planet, Hydra Seven. It was a multiversal gathering place for all known lifeforms. Aboard it, existed two beings unspeakably diverse, Nevah and Xeon.

Nevah, a radiant sylph from the luminous crystal fields of Lyra, dazzled with ethereal beauty, her translucent form refracting light to display a bright spectrum of colors. Xeon hailed from Galos, a planet thriving in the darkest corners of a black hole, his stout, shadowy figure ever cloaked in enigma.

Though lacking any similarity, Nevah and Xeon were inexplicably drawn towards each other. They spent infinite moments observing Hydra Seven and its scintillating dance with its vibrant triplets of moons. The differences that isolated them from their own species seemed to pull them closer.

Nevah indeed owned a light grander than any Xeon had ever witnessed, yet around him, her crystal form dimmed, casting a mesmerizing, warm glow. Xeon, th

## 9. Language Translation and Cultural Context (Claude)

Claude can perform nuanced language translation, considering cultural context and idiomatic expressions.

In [21]:
translation_prompt = """
Translate the following English text to French, Spanish, and Japanese.
For each translation, provide a brief explanation of any cultural nuances or idiomatic expressions that required special consideration:

"It's raining cats and dogs out there! Let's just Netflix and chill instead of going to the party."
"""

response = anthropic.completions.create(
    model="claude-2",
    prompt=f"{HUMAN_PROMPT} {translation_prompt}{AI_PROMPT}",
    max_tokens_to_sample=1000
)
print(response.completion)

 Here are the translations with explanations of cultural considerations:

French:
"Il pleut des cordes dehors ! Restons ici à regarder Netflix et à se détendre au lieu d'aller à la soirée."

Explanation: The expression "pleut des cordes" (raining ropes) is the equivalent of the English idiom "raining cats and dogs." The phrase "Netflix et chill" has become a cultural loan phrase in French to mean relaxing and watching Netflix.

Spanish:  
"¡Está lloviendo a cántaros ahí afuera! Mejor nos quedamos aquí viendo Netflix y relajándonos en lugar de ir a la fiesta."

Explanation: The expression "lloviendo a cántaros" (raining jugs) conveys the meaning of a heavy downpour like the English "raining cats and dogs." I kept the English phrase "Netflix and chill" since it's commonly used in Spanish when referring to staying in and watching Netflix.

Japanese:
"外は猫犬の如く雨が降っている! パーティーに行く代わりに、Netflixを見ながらちょっとくつろぐことにしましょう。"

Explanation: There isn't an exact idiom equivalent to "raining cats and dogs" i

## 10. Multimodal Reasoning (Gemini)

Gemini can combine information from multiple modalities (text, image, video) to solve complex problems or answer queries.

In [45]:
# Prepare your multimodal data
text_description = "The graph shows the population growth of two species in a controlled environment over 10 years."
image_path = "/content/quokka.jpeg"
video_url = "https://www.youtube.com/watch?v=8sQOxZpEszA"

# Combine them into a structured prompt
multimodal_prompt = f"""
Analyze the following information and answer the question:

1. Text: "{text_description}"
2. Image: {image_path}
3. Video: {video_url}

Question: Based on the provided information, what type of ecological relationship
is likely represented between the content
"""

response = model.generate_content(multimodal_prompt)
print(response.text)




NotFound: 404 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-pro-vision:generateContent?%24alt=json%3Benum-encoding%3Dint: Gemini 1.0 Pro Vision has been deprecated on July 12, 2024. Consider switching to different model, for example gemini-1.5-flash.

## Conclusion

This notebook has demonstrated ten advanced features of Gemini, Claude, and GPT-4 across various domains, showcasing their multimodal capabilities in handling video, audio, images, and text. These AI models exhibit remarkable versatility in tasks ranging from code generation and mathematical problem-solving to creative writing and multimodal reasoning, highlighting the cutting-edge advancements in artificial intelligence.