In [276]:
import pandas as pd

df1 = pd.read_csv('./data/input/IMDB-Movie-Data.csv')


# 🐼 Challenge 1. Warm up

We want to create bins of movies according to the number of votes they've received. For that matter, we will create a new column named 'bin' which will tag every movie as follow:

From 0 to 999 ==> 1
From 1000 to 9999 ==> 2
From 10000 to 99999 ==> 3
From 100000 to 999999 ==> 4
More than 1000000 ==> 5


In [277]:
def bin_movies(votes):
    if votes < 1000:
        return 1
    elif votes >= 1000 and votes<10000:
        return 2
    elif votes>=10000 and votes<100000:
        return 3
    elif votes >=100000 and votes<1000000:
        return 4
    else:
        return 5


df1["bin"]=df1["Votes"].apply(bin_movies)


# 🐼 🐼 Challenge 2. Using axis concept

We want to know how much is the revenue per minute for every movie.

In [278]:
df1["Revenue per minute (Millions)"]=df1["Revenue (Millions)"]/df1["Runtime (Minutes)"]

# 🐼 🐼 🐼  Challenge 3. Using the lambda 

We want to create a new ranking where we add 1 point if the genre is thriller but subtract 1 point if the genre is comedy.

In [279]:
def new_rating(rating,genre):
    if "Thriller" in genre:
        return (min (10,rating +1))
    elif "Comedy" in genre:
        return (max(0,rating-1))
    else:
        return(rating)

df1["New Rank"]=df1.apply(lambda x: new_rating(x["Rating"],x["Genre"]),axis=1)


# 🐼 🐼 🐼 🐼 Challenge 4. Now the real stuff

We want to know if the sum of the ASCII value of every character of the movie title divided by the number of votes retrieve a prime number...remember, prime numbers are integers.

In [280]:
def ascii_value (movie, votes):
    ascii_value= [ord(letter) for letter in movie]
    sum_ascii_value=sum(ascii_value)
    votes=int(votes)
    value=int((sum_ascii_value/votes))
    not_prime=False

    if value<=1:
        not_prime=True

    elif value>1:
        for i in range(2,value):
            if (value%i)==0:
                not_prime=True
            break
    
    if not_prime==True:
        return ("Not prime")
    else:
        return ("Prime")

In [281]:
df1["Prime values"]=df1.apply(lambda y: ascii_value(y["Title"],y["Votes"]),axis=1)


# 🐼 🐼 🐼 🐼 🐼 Challenge 5. And finally some fantasy


Feel free to propose your own ranking based in aggregations of at least 3 columns of the dataset.

If it is before 2010, Revenue>200 Millions and the Rate is greater than 8.0 votes, it is a "Top classic movie"

In [282]:
def classic_movie(year,revenue ,rate):
    if  year<=2010 and revenue >=200 and rate>=8:
        return "Classic movie"
    else:
        return "Unusual movie"

df1["Type of Movie"]=df1.apply(lambda x: classic_movie(x["Year"],x["Revenue (Millions)"], x["Rating"]),axis=1)
df1.loc[df1['Type of Movie'] == "Classic movie"]

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,bin,Revenue per minute (Millions),New Rank,Prime values,Type of Movie
54,55,The Dark Knight,"Action,Crime,Drama",When the menace known as the Joker wreaks havo...,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart,Mi...",2008,152,9.0,1791916,533.32,82.0,5,3.508684,9.0,Not prime,Classic movie
80,81,Inception,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,74.0,5,1.976824,8.8,Not prime,Classic movie
140,141,Star Trek,"Action,Adventure,Sci-Fi",The brash James T. Kirk tries to live up to hi...,J.J. Abrams,"Chris Pine, Zachary Quinto, Simon Pegg, Leonar...",2009,127,8.0,526324,257.7,82.0,4,2.029134,8.0,Not prime,Classic movie
427,428,The Bourne Ultimatum,"Action,Mystery,Thriller",Jason Bourne dodges a ruthless CIA official an...,Paul Greengrass,"Matt Damon, Edgar Ramírez, Joan Allen, Julia S...",2007,115,8.1,525700,227.14,85.0,4,1.97513,9.1,Not prime,Classic movie
489,490,Ratatouille,"Animation,Comedy,Family",A rat who can cook makes an unusual alliance w...,Brad Bird,"Brad Garrett, Lou Romano, Patton Oswalt,Ian Holm",2007,111,8.0,504039,206.44,96.0,4,1.85982,7.0,Not prime,Classic movie
499,500,Up,"Animation,Adventure,Comedy",Seventy-eight year old Carl Fredricksen travel...,Pete Docter,"Edward Asner, Jordan Nagai, John Ratzenberger,...",2009,96,8.3,722203,292.98,88.0,4,3.051875,7.3,Not prime,Classic movie
634,635,WALL·E,"Animation,Adventure,Family","In the distant future, a small waste-collectin...",Andrew Stanton,"Ben Burtt, Elissa Knight, Jeff Garlin, Fred Wi...",2008,98,8.4,776897,223.81,,4,2.283776,8.4,Not prime,Classic movie
688,689,Toy Story 3,"Animation,Adventure,Comedy",The toys are mistakenly delivered to a day-car...,Lee Unkrich,"Tom Hanks, Tim Allen, Joan Cusack, Ned Beatty",2010,103,8.3,586669,414.98,92.0,4,4.028932,7.3,Not prime,Classic movie
772,773,How to Train Your Dragon,"Animation,Action,Adventure",A hapless young Viking who aspires to hunt dra...,Dean DeBlois,"Jay Baruchel, Gerard Butler,Christopher Mintz-...",2010,98,8.1,523893,217.39,74.0,4,2.218265,8.1,Not prime,Classic movie


# 🐼 🐼 🐼 🐼 🐼 🐼 Challenge 6. Freaky bonus


We want to know which movies might have hidden paterns in their description. A way to know that is finding those movies which the sum of all numeric values of the string description hash (SHA256) are between their revenue and their number of votes.

In [294]:
import hashlib
def hash_description(description, revenue, votes):
    revenue=float(revenue)
    votes=int(votes)

    hash_object = hashlib.sha256(b'Hello World')
    hex_dig = hash_object.hexdigest()
    hash_total_numbers=0
    
    for digit in hex_dig:
        try:
            number=int(digit)
            hash_total_numbers+=number
        except:
            pass
        
    if hash_total_numbers>revenue and hash_total_numbers<votes:
        return("Description has a patern")
    else:
        return("Normal description")

In [297]:
df1["Hidden paterns"]=df1.apply(lambda x: hash_description(x["Description"],x["Revenue (Millions)"],x["Votes"]),axis=1)

df1.loc[df1['Hidden paterns'] == "Description has a patern"]

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,bin,Revenue per minute (Millions),New Rank,Prime values,Type of Movie,Hidden paterns
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0,4,1.019839,7.0,Not prime,Unusual movie,Description has a patern
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0,4,1.180513,8.3,Not prime,Unusual movie,Description has a patern
5,6,The Great Wall,"Action,Adventure,Fantasy",European mercenaries searching for black powde...,Yimou Zhang,"Matt Damon, Tian Jing, Willem Dafoe, Andy Lau",2016,103,6.1,56036,45.13,42.0,3,0.438155,6.1,Not prime,Unusual movie,Description has a patern
6,7,La La Land,"Comedy,Drama,Music",A jazz pianist falls for an aspiring actress i...,Damien Chazelle,"Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....",2016,128,8.3,258682,151.06,93.0,4,1.180156,7.3,Not prime,Unusual movie,Description has a patern
8,9,The Lost City of Z,"Action,Adventure,Biography","A true-life drama, centering on British explor...",James Gray,"Charlie Hunnam, Robert Pattinson, Sienna Mille...",2016,141,7.1,7188,8.01,78.0,2,0.056809,7.1,Not prime,Unusual movie,Description has a patern
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
993,994,Resident Evil: Afterlife,"Action,Adventure,Horror",While still out to destroy the evil Umbrella C...,Paul W.S. Anderson,"Milla Jovovich, Ali Larter, Wentworth Miller,K...",2010,97,5.9,140900,60.13,37.0,4,0.619897,5.9,Not prime,Unusual movie,Description has a patern
994,995,Project X,Comedy,3 high school seniors throw a birthday party t...,Nima Nourizadeh,"Thomas Mann, Oliver Cooper, Jonathan Daniel Br...",2012,88,6.7,164088,54.72,48.0,4,0.621818,5.7,Not prime,Unusual movie,Description has a patern
996,997,Hostel: Part II,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,46.0,3,0.186596,5.5,Not prime,Unusual movie,Description has a patern
997,998,Step Up 2: The Streets,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0,3,0.591939,6.2,Not prime,Unusual movie,Description has a patern
