Both Marvel and DC have been the competitive rivals in comics and released as comic books and cartoons. However, Marvel films have been more popular than DC films, as Marvel films have received higher ratings than DC films in general. Especially, the president of Marvel Studios, Kevin Feige, has invested more time on developing scripts and Marvel Universe. By this, Marvel films have well structured storyline and led this to receive more positive ratings from the audience. 

In this analysis, I will try to solve the underlying questions that I've had:

1. Which entity (or universe) has received higher average IMDb ratings?
2. If the movie has higher IMDb rating, does it mean that it will lead to higher gross?
3. What is the relationship between IMDb rating and Rotten Tomato rating?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Before we analyze, I've browsed dataset named 'mdc.csv.' Then I've renamed this as 'marvel_vs_dc.'

In [None]:
marvel_vs_dc = pd.read_csv('/kaggle/input/marvel-vs-dc-imdb-rotten-tomatoes/mdc.csv')
marvel_vs_dc.head()

Then, I've renamed 'Unnamed : 0' column as 'index.'

In [None]:
marvel_vs_dc.rename(columns = {'Unnamed: 0':'index'}, inplace = True)
marvel_vs_dc.head()

Once I've renamed the column, I just got rid of 'index' column, as I don't really need it.
Then, I've checked to see if there are some missing values.

In [None]:
marvel_vs_dc.drop(columns = ['index'], axis = 1, inplace = True)
marvel_vs_dc.info()

Once I checked, I'm not seeing any missing values. Then, I've separated data set into two data sets:
1. marvel_films
2. dc_films

In [None]:
marvel_films = marvel_vs_dc[marvel_vs_dc.entity != 'DC']
dc_films = marvel_vs_dc[marvel_vs_dc.entity != 'MARVEL']
marvel_films.head()

Then. I check the dc_films.

In [None]:
dc_films.head()

Then, we get the average imdb ratings to make comparison.

In [None]:
marvel_avg_imdb_rating = marvel_films['imdb_rating'].mean()
dc_avg_imdb_rating = dc_films['imdb_rating'].mean()
print("Marvel's average rating is", marvel_avg_imdb_rating)
print("DC's average rating is", dc_avg_imdb_rating)

As the comparison shows, Marvel's average rating is higher than DC's average rating.
From there, let's get the average budget for each entity.

In [None]:
marvel_avg_imdb_gross = marvel_films['imdb_gross'].mean()
dc_avg_imdb_gross = dc_films['imdb_gross'].mean()
print("Marvel's average gross is $", round(marvel_avg_imdb_gross, 2))
print("DC's average gross is $", round(dc_avg_imdb_gross, 2))

So, Marvel's average gross is higher than DC's. Yet, does it mean that higher IMDb rating leads to higher gross? Let's find out!

In [None]:
import matplotlib.pyplot as plt

plt.scatter(data = marvel_films, x = 'imdb_rating', y = 'imdb_gross')
plt.scatter(data = dc_films, x = 'imdb_rating', y = 'imdb_gross')
plt.title('Marvel vs. DC in imdb ratings and gross')
plt.xlabel('IMDb Ratings')
plt.ylabel('IMDb Gross')
plt.legend(['Marvel', 'DC'])

As this shows, Marvel is ahead of DC in terms of the relationship between IMDb gross and IMDb rating in general. Yet, there is an outlier which received highest rating, yet IMDb gross is lower.
What about Rotten Tomato rating? Is there any difference in terms of the relationship with IMDb gross?

In [None]:
marvel_avg_RT_rating = marvel_films['tomato_meter'].mean()
dc_avg_RT_rating = dc_films['tomato_meter'].mean()
print("Marvel's average tomatometer is", round(marvel_avg_RT_rating, 2))
print("DC's average tomatometer is", round(dc_avg_RT_rating,2))

As the comparison shows, it seems that both averages of IMDb rating and tomatometer are similar.
Let's see if the relationship between tomatometer and IMDb gross are also similar to one between IMDb rating and IMDb gross.

In [None]:
plt.scatter(data = marvel_films, x = 'tomato_meter', y = 'imdb_gross')
plt.scatter(data = dc_films, x = 'tomato_meter', y = 'imdb_gross')
plt.title('Marvel vs. DC in Tomatometer and gross')
plt.xlabel('Tomatometer Score')
plt.ylabel('IMDb Gross')
plt.legend(['Marvel', 'DC'])

Both are slightly different, yet are also similar.
Lastly, what is the underlying relationship between IMDb rating and tomatometer?
Let's find out!

This time, we are using Seaborn and the first dataset, 'marvel_vs_dc.'

In [None]:
import seaborn as sns
imdb_vs_rt = sns.lmplot(data=marvel_vs_dc, x="imdb_rating", y="tomato_meter", hue="entity", height=5)
imdb_vs_rt.set_axis_labels("IMDb Ratings", "Tomatometer Score")

Throughout the graph, both IMDb ratings and tomatometer score have similar relationship.
Thus, both IMDb ratings and tomatometer score are reliable in terms of rating the films.
Overall, here are what I found out:

1. Marvel has higher average IMDb rating in general.
2. Even though higher IMDb rating leads to higher gross, nevertheless, it's not always true, as an outlier shows.
3. Both IMDb rating and Tomatometer have somewhat similar relationship.

Thanks for checking out my analysis and please feel free to leave comments!