# Simple Profit Analysis Using IMDB Dataset

Imagine that you want to write and direct an independent film, in the process making a small fortune. In order to determine what type of movie you should create you will first need to determine what kind of movie genres make the largest profit. In order to better understand the market, you decide to do some data analysis on the largest 
profit for movies going back several years. We will have to answer the following questions in order to answer that questsion:

1. What are the top 10 largest profiting movie genres, and what year did they come out?
2. What are the bottom 10 smallest profiting (negative) movie genres, and what year did they come out?

From here the writer/director may be able to make a conclusion on what types of movies are the most profitable in order to create the most profitable story.

***The movie_metadata.csv was found on kaggle.com under the user chuansun76***



In [94]:
# let's import some necessary functions to read in a file
import csv
new_movie_list = list()
clean_movie_list = list()
clean_string = list()

# load the movie database file we will be working with
f = open('movie_metadata.csv','r',encoding='mac_roman', newline='')
movie_list = list(csv.reader(f))
f.close()
#print(movie_list)

# We can now view the movie_list variable with all the movies and data associated with it. Lets clean up this data
# so we can go ahead and start our analysis.


# A lot of this rows we dont really need for example, the colour column, director_facebook_likes, etc. Let's trim down
# the data so it is more managable and gather the only potentially useful ones.

movie_list = list(((row[1], row[3], row[8], row[9], row[11], row[16], row[22], row[23]) for row in movie_list))

# As our calculation depends mainly on gross or budget, if either of those are 0 or blank they should be skipped

for movie in movie_list:
    if movie[2] == '' or movie[6] == '':
        pass
    else:
        new_movie_list.append(movie)
        
# We should now clean up so that only alpha numeric symbols are showing for cleanliness
def stripNonAlphaNum(text):
    import re
    return re.sub('[^A-Za-z0-9-|: ]+', '', text)

for movie in new_movie_list:
    clean_string = list()
    for string in movie: 
        clean_string.append(stripNonAlphaNum(string))
    clean_movie_list.append(clean_string)
    
#print(clean_movie_list)



In [95]:
# Now that we have the data we want to analyze we can start looking at interesting results to find a pattern!

# Top 10 largest profitting films with their genre & year

largest_to_smallest = list()
#print(clean_movie_list[2])
new_clean_movie_list = clean_movie_list[1:]
for movie in new_clean_movie_list:
    profit = int(movie[2])-int(movie[6])
    movie.append(profit)
    
new_clean_movie_list.sort(key=lambda x: x[8], reverse=False)

bot_10_profit = list(((movie[4], movie[8], movie[7], movie[3]) for movie in new_clean_movie_list[:10]))
#print(bot_10_profit)

new_clean_movie_list.sort(key=lambda x: x[8], reverse=True)

top_10_profit = list(((movie[4], movie[8], movie[7], movie[3]) for movie in new_clean_movie_list[:10]))
#print(top_10_profit)

print("Top 10 most profitable movies:\n")
for top in top_10_profit:
    print("Movie: " + top[0] + ", Genre: " + top[3])
print("\n")
print("Top 10 least profitable movies:\n")    
for bot in bot_10_profit:
    print("Movie: " + bot[0] + ", Genre: " + bot[3])

Top 10 most profitable movies:

Movie: Avatar, Genre: Action|Adventure|Fantasy|Sci-Fi
Movie: Jurassic World, Genre: Action|Adventure|Sci-Fi|Thriller
Movie: Titanic, Genre: Drama|Romance
Movie: Star Wars: Episode IV - A New Hope, Genre: Action|Adventure|Fantasy|Sci-Fi
Movie: ET the Extra-Terrestrial, Genre: Family|Sci-Fi
Movie: The Avengers, Genre: Action|Adventure|Sci-Fi
Movie: The Avengers, Genre: Action|Adventure|Sci-Fi
Movie: The Lion King, Genre: Adventure|Animation|Drama|Family|Musical
Movie: Star Wars: Episode I - The Phantom Menace, Genre: Action|Adventure|Fantasy|Sci-Fi
Movie: The Dark Knight, Genre: Action|Crime|Drama|Thriller


Top 10 least profitable movies:

Movie: The Host, Genre: Comedy|Drama|Horror|Sci-Fi
Movie: Lady Vengeance, Genre: Crime|Drama
Movie: Fateless, Genre: Drama|Romance|War
Movie: Princess Mononoke, Genre: Adventure|Animation|Fantasy
Movie: Steamboy, Genre: Action|Adventure|Animation|Family|Sci-Fi|Thriller
Movie: Akira, Genre: Action|Animation|Sci-Fi
Movie:

In the above top 10 most profitable movies we can see the following relationship: 

- Action is contained within 7 of the top 10 most profitable movies
- Adventure is contained within 7 of the top 10 most profitable movies
- Sci-Fi is contained within 7 of the top 10 most profitable movies

Arguably, the most profitable movies contain the above genres.

Now, let's look at the bottom 10 profitable movies:

- Drama is contained within 7 of the top 10 least profitable movies
- Action is contained within 4 of the top 10 least profitable movies
- Sci-Fi is contained within 4 of the top 10 least profitable movies

These would be considered the bottom 3 genres, but there is certainly a greater 
variety in the amount of genres found in the 10 least profitable movies.



# Conclusion

Using this dataset, and the limited number of conclusions we can make, it seems as though the director should focus on an Action and Adventure movie. If the director were to include the science fiction genre, there is a greater possibility it would not be received well and would turn into a flop, based on this dataset. 