# 1.1. Что нужно знать будущему миллионеру?

## Где я?

В этом модуле вы:

- Примерите на себя роль аналитика в компании кинопроката.
- Проведете свой первый анализ данных с помощью Pandas.
- Примете участие в соревновании.
- Продолжите собирать портфолио на Github.

## Что нужно сделать?

1. Внимательно изучить детали задачи.
2. Прочитать наши советы по выполнению проекта.
3. Ознакомиться с дополнительным материалом.
4. Скачать датасет и ноутбук-шаблон.
5. Ответить на все вопросы по этому датасету и набрать минимум 80 баллов.
6. Сдать свой ноутбук с решением на проверку ментору, загрузив его на Git.
7. Получить обратную связь от ментора.


### Сколько времени выделяется на прохождение модуля?

В среднем на успешное выполнение заданий проекта вам потребуется 5 часов.

## Работа в команде

В этот раз мы также призываем вас работать в командах и использовать ресурс сообщества — так ваше обучение станет ещё более эффективным! 

Если вы все ещё не нашли себе команду для работы над модулем, напишите об этом в чате своей группы. Вам обязательно помогут.

Если вам всё ещё непонятно, зачем вам работать в группе, объясняем:

- Вы достигнете лучших индивидуальных результатов — напишете более качественный код и наберёте большее количество баллов.
- Вы достигнете лучших командных результатов — вместе вы сможете добиться более высокого суммарного балла.

## Рекомендации

При обработке сложных запросов бывает проще воспользоваться готовыми библиотеками и функциями, а не изобретать свой велосипед.

### Например:

Нам дан список текстов. Нужно выбрать тексты, в которых есть слово миллионер. Есть несколько вариантов решения этой задачи.


#### Вариант 1
- Мы можем написать простой цикл и выбрать нужный текст:
- new_text = []
- for text in data:
    - if "миллионер" in text:
        - new_text.append(text)
        
#### Вариант 2

Примерно то же самое, только компактно:
- data[data.map(lambda x: True if "миллионер" in x else False)]

#### Вариант 3 

Воспользоваться готовым методом в самом pandas:

- data[data.str.contains("миллионер")] 


Этот метод предпочтительней. Он более компактный (в отличие от первого) и более понятный (в отличие от второго). А сам метод contains имеет четкую и понятную документацию. В отличие от нашего велосипеда.

## Полезные библиотеки

- itertools https://docs.python.org/3/library/itertools.html#module-itertools
- collections https://docs.python.org/3/library/collections.html

# 1.2. Готовы к викторине?

## Так что делать-то?

Вводную информацию вы получили. Наконец можно приступить к делу!

В этом модуле вы будете проводить предварительный анализ данных на основе датасета сайта IMBD. Скачать его можно здесь: https://drive.google.com/open?id=1nokVzSNxkUPil3aYRBI_fgzx25gtSEta


## Важно!

1. На следующей странице вы увидите список вопросов с 4 вариантами ответа.
2. **У вас будет только одна попытка ответить на каждый вопрос**.
3. Будьте внимательны к формулировкам заданий и своим решениям в ноутбуке.
4. Используйте шаблон ноутбука для решения заданий проекта (https://drive.google.com/open?id=1nW9ZlvS2hWHOwkJVhrXB0qr4RaJnQUTl).
5. Чтобы успешно пройти модуль, нужно набрать минимум 80 баллов. Максимальное количество баллов — 100.


Если что-то не получается
Помните, что всегда можно задать вопрос вашему ментору в Slack или в канале #0-real_ds. 

Готовы? Листайте вперёд!

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
from collections import Counter
print(os.listdir("../input"))

FileNotFoundError: [WinError 3] Системе не удается найти указанный путь: '../input'

In [2]:
data = pd.read_csv('data.csv')
data.head(5)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year
0,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015
1,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,George Miller,What a Lovely Day.,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/2015,6185,7.1,2015
2,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,Robert Schwentke,One Choice Can Destroy You,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/2015,2480,6.3,2015
3,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,J.J. Abrams,Every generation has a story.,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/2015,5292,7.5,2015
4,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,James Wan,Vengeance Hits Home,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/2015,2947,7.3,2015


In [3]:
len(data)

1890

# Предобработка датасета

In [4]:
answer_ls = [] # создадим список с ответами. сюда будем добавлять ответы по мере прохождения теста
# сюда можем вписать создание новых колонок в датасете

# 1. У какого фильма из списка самый большой бюджет?
Варианты ответов:
1. The Dark Knight Rises (tt1345836)
2. Spider-Man 3 (tt0413300)
3. Avengers: Age of Ultron (tt2395427)
4. The Warrior's Way	(tt1032751)
5. Pirates of the Caribbean: On Stranger Tides (tt1298650)

In [12]:
data.sort_values(by = 'budget', ascending = False).head(5)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year
491,tt1032751,0.25054,425000000,11087569,The Warrior's Way,Kate Bosworth|Jang Dong-gun|Geoffrey Rush|Dann...,Sngmoo Lee,Assassin. Hero. Legend.,An Asian assassin (Dong-gun Jang) is forced to...,100,Adventure|Fantasy|Action|Western|Thriller,Boram Entertainment Inc.,12/2/2010,74,6.4,2010
724,tt1298650,4.95513,380000000,1021683000,Pirates of the Caribbean: On Stranger Tides,Johnny Depp|PenÃ©lope Cruz|Geoffrey Rush|Ian M...,Rob Marshall,Live Forever Or Die Trying.,Captain Jack Sparrow crosses paths with a woma...,136,Adventure|Action|Fantasy,Walt Disney Pictures|Jerry Bruckheimer Films|M...,5/11/2011,3180,6.3,2011
1670,tt0449088,4.965391,300000000,961000000,Pirates of the Caribbean: At World's End,Johnny Depp|Orlando Bloom|Keira Knightley|Geof...,Gore Verbinski,"At the end of the world, the adventure begins.","Captain Barbossa, long believed to be dead, ha...",169,Adventure|Fantasy|Action,Walt Disney Pictures|Jerry Bruckheimer Films|S...,5/19/2007,2626,6.8,2007
14,tt2395427,5.944927,280000000,1405035767,Avengers: Age of Ultron,Robert Downey Jr.|Chris Hemsworth|Mark Ruffalo...,Joss Whedon,A New Age Has Come.,When Tony Stark tries to jumpstart a dormant p...,141,Action|Adventure|Science Fiction,Marvel Studios|Prime Focus|Revolution Sun Studios,4/22/2015,4304,7.4,2015
1015,tt0401729,1.588457,260000000,284139100,John Carter,Taylor Kitsch|Lynn Collins|Mark Strong|Willem ...,Andrew Stanton,Lost in Our World. Found in Another.,Civil War vet John Carter is transplanted to M...,132,Action|Adventure|Fantasy|Science Fiction,Walt Disney Pictures,3/7/2012,1479,6.0,2012


In [7]:
# тут вводим ваш ответ и добавлем в его список ответов (для примера стояло "1")
answer_ls.append(4)

# 2. Какой из фильмов самый длительный (в минутах)
1. The Lord of the Rings: The Return of the King	(tt0167260)
2. Gods and Generals	(tt0279111)
3. King Kong	(tt0360717)
4. Pearl Harbor	(tt0213149)
5. Alexander	(tt0346491)

In [9]:
data.sort_values(by = 'runtime', ascending = False).head(10)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year
1158,tt0279111,0.469518,56000000,12923936,Gods and Generals,Stephen Lang|Jeff Daniels|Robert Duvall|Kevin ...,Ronald F. Maxwell,The nations heart was touched by...,The film centers mostly around the personal an...,214,Drama|History|War,Turner Pictures|Antietam Filmworks,2/21/2003,23,5.8,2003
1082,tt0167260,7.122455,94000000,1118888979,The Lord of the Rings: The Return of the King,Elijah Wood|Ian McKellen|Viggo Mortensen|Liv T...,Peter Jackson,The eye of the enemy is moving.,Aragorn is revealed as the heir to the ancient...,201,Adventure|Fantasy|Action,WingNut Films|New Line Cinema,12/1/2003,5636,7.9,2003
1737,tt0462322,0.906938,67000000,25037897,Grindhouse,Kurt Russell|ZoÃ« Bell|Rosario Dawson|Vanessa ...,Robert Rodriguez|Eli Roth|Quentin Tarantino|Ed...,A double feature that'll tear you in two!,Two full length feature horror movies written ...,191,Thriller|Action|Horror,Big Talk Productions|Yer Dead Productions|Wein...,4/6/2007,197,6.5,2007
1337,tt0360717,1.508329,207000000,550000000,King Kong,Naomi Watts|Jack Black|Adrien Brody|Thomas Kre...,Peter Jackson,The eighth wonder of the world.,"In 1933 New York, an overly ambitious movie pr...",187,Adventure|Drama|Action,WingNut Films|Universal Pictures|Big Primate P...,12/14/2005,1289,6.4,2005
505,tt0213149,2.478879,140000000,449220945,Pearl Harbor,Ben Affleck|Josh Hartnett|Kate Beckinsale|Cuba...,Michael Bay,"December 7, 1941 - A day that shall live in in...",The lifelong friendship between Rafe McCawley ...,183,History|Romance|War,Jerry Bruckheimer Films|Touchstone Pictures,5/21/2001,1044,6.6,2001
1184,tt0993846,4.877927,100000000,392000694,The Wolf of Wall Street,Leonardo DiCaprio|Jonah Hill|Margot Robbie|Kyl...,Martin Scorsese,EARN. SPEND. PARTY.,A New York stockbroker refuses to cooperate in...,180,Crime|Drama|Comedy,Paramount Pictures|Appian Way|EMJAG Production...,12/25/2013,4027,7.9,2013
864,tt0167261,8.095275,79000000,926287400,The Lord of the Rings: The Two Towers,Elijah Wood|Ian McKellen|Viggo Mortensen|Liv T...,Peter Jackson,A New Power Is Rising.,Frodo and Sam are trekking to Mordor to destro...,179,Adventure|Fantasy|Action,WingNut Films|New Line Cinema|The Saul Zaentz ...,12/18/2002,5114,7.8,2002
497,tt0120737,8.575419,93000000,871368364,The Lord of the Rings: The Fellowship of the Ring,Elijah Wood|Ian McKellen|Viggo Mortensen|Liv T...,Peter Jackson,One ring to rule them all,"Young hobbit Frodo Baggins, after inheriting a...",178,Adventure|Fantasy|Action,WingNut Films|New Line Cinema|The Saul Zaentz ...,12/18/2001,6079,7.8,2001
1602,tt0346491,1.319068,155000000,167298192,Alexander,Colin Farrell|Angelina Jolie|Val Kilmer|Jared ...,Oliver Stone,The greatest legend of all was real.,"Alexander, the King of Macedonia, leads his le...",175,War|History|Action|Adventure|Drama,France 3 CinÃ©ma|Intermedia Films|PathÃ© Renn ...,11/21/2004,519,5.6,2004
994,tt1371111,2.478372,102000000,130482868,Cloud Atlas,Tom Hanks|Halle Berry|Jim Broadbent|Hugo Weavi...,Lilly Wachowski|Lana Wachowski|Tom Tykwer,Everything is Connected,A set of six nested stories spanning time betw...,172,Drama|Science Fiction,Anarchos Productions|X-Filme Creative Pool|Asc...,10/26/2012,2162,6.5,2012


In [10]:
answer_ls.append(2)

# 3. Какой из фильмов самый короткий (в минутах)
Варианты ответов:

1. Home on the Range	tt0299172
2. The Jungle Book 2	tt0283426
3. Winnie the Pooh	tt1449283
4. Corpse Bride	tt0121164
5. Hoodwinked!	tt0443536

In [13]:
data.sort_values(by = 'runtime', ascending = True).head(5)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year
769,tt1449283,1.425344,30000000,14460000,Winnie the Pooh,Jim Cummings|Travis Oates|Jim Cummings|Bud Luc...,Stephen Anderson|Don Hall,Oh Pooh.,"During an ordinary day in Hundred Acre Wood, W...",63,Animation|Family,Walt Disney Pictures|Walt Disney Animation Stu...,4/13/2011,174,6.8,2011
931,tt0280030,0.678896,20000000,109862682,Return to Never Land,Harriet Owen|Blayne Weaver|Jeff Bennett|Kath S...,Robin Budd|Donovan Cook,The Classic Continues,The classic tale of 'Peter Pan' continues in D...,72,Adventure|Fantasy|Animation|Family,Walt Disney Pictures|Walt Disney Television An...,2/14/2002,174,6.2,2002
1098,tt0283426,1.781615,20000000,135680000,The Jungle Book 2,John Goodman|Haley Joel Osment|Mae Whitman|Phi...,Steve Trenbirth,Feel the jungle beat,"Mowgli, missing the jungle and his old friends...",72,Family|Animation|Adventure,Walt Disney Pictures|Walt Disney Television An...,2/7/2003,156,5.6,2003
1627,tt0299172,0.837906,110000000,103951461,Home on the Range,Randy Quaid|Steve Buscemi|Judi Dench|Cuba Good...,Will Finn|John Sanford,Bust a Moo.,The Little Piece of Heaven family farm is abou...,76,Western|Animation|Family|Comedy|Music,Walt Disney Pictures|Walt Disney Feature Anima...,4/2/2004,210,5.8,2004
1409,tt0361089,0.558258,35000000,19478106,Valiant,Ewan McGregor|Ricky Gervais|Tim Curry|Jim Broa...,Gary Chapman,"Some pigeons eat crumbs, others make history.",The animated comedy tells the story of a lowly...,76,Animation|Family|Adventure,Vanguard Films|Scanbox,3/25/2005,138,5.3,2005


In [14]:
answer_ls.append(3)

# 4. Средняя длительность фильма?

Варианты ответов:
1. 115
2. 110
3. 105
4. 120
5. 100

In [15]:
data.runtime.mean()

109.65343915343915

In [16]:
answer_ls.append(2)

# 5. Средняя длительность фильма по медиане?
Варианты ответов:
1. 106
2. 112
3. 101
4. 120
5. 115

In [26]:
sorted_data = data.sort_values(by = 'runtime')
lenght = len(sorted_data)
if lenght%2 == 0:
    median = (sorted_data.runtime[lenght//2] + sorted_data.runtime[lenght//2 - 1]) / 2
else:
    median = sorted_data.runtime[lenght//2]
print(median)

1
115.5


In [28]:
len(sorted_data)/2

945.0

In [33]:
sorted_data.iloc[945].runtime

107

In [34]:
sorted_data.iloc[944].runtime

106

In [35]:
(107 + 106) / 2

106.5

In [81]:
sorted_data = data.sort_values(by = 'runtime')
lenght = len(sorted_data)
if lenght%2 == 0:
    #print(lenght, lenght//2, lenght//2-1, sorted_data.iloc(lenght//2).runtime,sorted_data.iloc(lenght//2 - 1).runtime)
    median = (sorted_data.iloc[lenght//2].runtime + sorted_data.iloc[lenght//2 - 1].runtime) / 2
else:
    median = sorted_data.iloc[lenght//2].runtime
print(median)

106.5


In [76]:
sorted_data.iloc[lenght//2 - 1].runtime

106

In [80]:
sorted_data.runtime[lenght//2 - 1]

99

In [65]:
sorted_data.head(2)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year
769,tt1449283,1.425344,30000000,14460000,Winnie the Pooh,Jim Cummings|Travis Oates|Jim Cummings|Bud Luc...,Stephen Anderson|Don Hall,Oh Pooh.,"During an ordinary day in Hundred Acre Wood, W...",63,Animation|Family,Walt Disney Pictures|Walt Disney Animation Stu...,4/13/2011,174,6.8,2011
931,tt0280030,0.678896,20000000,109862682,Return to Never Land,Harriet Owen|Blayne Weaver|Jeff Bennett|Kath S...,Robin Budd|Donovan Cook,The Classic Continues,The classic tale of 'Peter Pan' continues in D...,72,Adventure|Fantasy|Animation|Family,Walt Disney Pictures|Walt Disney Television An...,2/14/2002,174,6.2,2002


In [66]:
sorted_data.runtime.head(10)

769     63
931     72
1098    72
1627    76
1409    76
1349    77
885     78
1808    78
252     79
1769    80
Name: runtime, dtype: int64

In [72]:
sorted_data.iloc[2].runtime

72

In [73]:
sorted_data.runtime[931]

72

In [63]:
sorted_data.runtime[0]

124

In [82]:
answer_ls.append(1)

# 6. Какой самый прибыльный фильм?

**Внимание!** Здесь и далее под «прибылью» или «убытками» понимается разность между сборами и бюджетом фильма.

Варианты ответов:
1. The Avengers	tt0848228
2. Minions	tt2293640
3. Star Wars: The Force Awakens	tt2488496
4. Furious 7	tt2820852
5. Avatar	tt0499549

In [85]:
data.head()
data2 = data
data2['profit'] = data.revenue - data.budget
data2.head()

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
0,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,1363528810
1,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,George Miller,What a Lovely Day.,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/2015,6185,7.1,2015,228436354
2,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,Robert Schwentke,One Choice Can Destroy You,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/2015,2480,6.3,2015,185238201
3,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,J.J. Abrams,Every generation has a story.,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/2015,5292,7.5,2015,1868178225
4,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,James Wan,Vengeance Hits Home,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/2015,2947,7.3,2015,1316249360


Revenue is the total amount of income generated by the sale of goods or services related to the company's primary operations. Profit, typically called net profit or the bottom line, is the amount of income that remains after accounting for all expenses, debts, additional income streams and operating costs.

In [86]:
data2[data2.profit == data2.profit.max()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
239,tt0499549,9.432768,237000000,2781505847,Avatar,Sam Worthington|Zoe Saldana|Sigourney Weaver|S...,James Cameron,Enter the World of Pandora.,"In the 22nd century, a paraplegic Marine is di...",162,Action|Adventure|Fantasy|Science Fiction,Ingenious Film Partners|Twentieth Century Fox ...,12/10/2009,8458,7.1,2009,2544505847


In [88]:
answer_ls.append(5)

# 7. Какой фильм самый убыточный?

Варианты ответов:
1. Supernova tt0134983
2. The Warrior's Way tt1032751
3. Flushed Away	tt0424095
4. The Adventures of Pluto Nash	tt0180052
5. The Lone Ranger	tt1210819

In [87]:
data2[data2.profit == data2.profit.min()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
491,tt1032751,0.25054,425000000,11087569,The Warrior's Way,Kate Bosworth|Jang Dong-gun|Geoffrey Rush|Dann...,Sngmoo Lee,Assassin. Hero. Legend.,An Asian assassin (Dong-gun Jang) is forced to...,100,Adventure|Fantasy|Action|Western|Thriller,Boram Entertainment Inc.,12/2/2010,74,6.4,2010,-413912431


In [89]:
answer_ls.append(2)

# 8. Сколько всего фильмов в прибыли?
Варианты ответов:
1. 1478
2. 1520
3. 1241
4. 1135
5. 1398

In [91]:
data2[data2.profit > 0].profit.count()

1478

In [92]:
answer_ls.append(1)

# 9. Самый прибыльный фильм в 2008 году?
Варианты ответов:
1. Madagascar: Escape 2 Africa	tt0479952
2. Iron Man	tt0371746
3. Kung Fu Panda	tt0441773
4. The Dark Knight	tt0468569
5. Mamma Mia!	tt0795421

In [93]:
data2_2008 = data2[data2.release_year == 2008]
data2_2008.head()

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
600,tt0468569,8.466668,185000000,1001921825,The Dark Knight,Christian Bale|Michael Caine|Heath Ledger|Aaro...,Christopher Nolan,Why So Serious?,Batman raises the stakes in his war on crime. ...,152,Drama|Action|Crime|Thriller,DC Comics|Legendary Pictures|Warner Bros.|Syncopy,7/16/2008,8432,8.1,2008,816921825
601,tt0910970,5.678119,180000000,521311860,WALLÂ·E,Ben Burtt|Elissa Knight|Jeff Garlin|Fred Willa...,Andrew Stanton,An adventure beyond the ordinar-E.,WALLÂ·E is the last robot left on an Earth tha...,98,Animation|Family,Walt Disney Pictures|Pixar Animation Studios,6/22/2008,4209,7.6,2008,341311860
602,tt0371746,4.977955,140000000,585174222,Iron Man,Robert Downey Jr.|Terrence Howard|Jeff Bridges...,Jon Favreau,Heroes aren't born. They're built.,"After being held captive in an Afghan cave, bi...",126,Action|Science Fiction|Adventure,Marvel Studios,4/30/2008,6220,7.3,2008,445174222
603,tt0936501,3.647612,25000000,226830568,Taken,Liam Neeson|Famke Janssen|Maggie Grace|Katie C...,Pierre Morel,They took his daughter. He'll take their lives.,"While vacationing with a friend in Paris, an A...",93,Action|Thriller|Crime,Twentieth Century Fox Film Corporation|M6 Film...,2/18/2008,3075,7.2,2008,201830568
604,tt0367882,3.16167,185000000,786636033,Indiana Jones and the Kingdom of the Crystal S...,Harrison Ford|Cate Blanchett|Shia LaBeouf|Ray ...,Steven Spielberg,The adventure continues . . .,"Set during the Cold War, the Soviets â€“ led b...",122,Adventure|Action,Lucasfilm|Paramount Pictures,5/21/2008,1537,5.6,2008,601636033


In [94]:
data2_2008[data2_2008.profit == data2_2008.profit.max()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
600,tt0468569,8.466668,185000000,1001921825,The Dark Knight,Christian Bale|Michael Caine|Heath Ledger|Aaro...,Christopher Nolan,Why So Serious?,Batman raises the stakes in his war on crime. ...,152,Drama|Action|Crime|Thriller,DC Comics|Legendary Pictures|Warner Bros.|Syncopy,7/16/2008,8432,8.1,2008,816921825


In [96]:
data2[(data2.release_year == 2008) & (data2.profit == data2.profit.max())]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit


In [97]:
answer_ls.append(4)

# 10. Самый убыточный фильм за период с 2012 по 2014 (включительно)?
Варианты ответов:
1. Winter's Tale	tt1837709
2. Stolen	tt1656186
3. Broken City	tt1235522
4. Upside Down	tt1374992
5. The Lone Ranger	tt1210819

In [106]:
data2_12_14 = data2[(data2.release_year >= 2012) & (data2.release_year <= 2014)]
data2_12_14.release_year.unique()


array([2014, 2012, 2013], dtype=int64)

In [107]:
data2_12_14[data2_12_14.profit == data2_12_14.profit.min()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
1246,tt1210819,1.21451,255000000,89289910,The Lone Ranger,Johnny Depp|Armie Hammer|William Fichtner|Hele...,Gore Verbinski,Never Take Off the Mask,The Texas Rangers chase down a gang of outlaws...,149,Action|Adventure|Western,Walt Disney Pictures|Jerry Bruckheimer Films|I...,7/3/2013,1607,6.0,2013,-165710090


In [109]:
data2[data2.release_year in [2012,2013,2014]].profit.min()

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [110]:
answer_ls.append(5)

# 11. Какого жанра фильмов больше всего?
Варианты ответов:
1. Action
2. Adventure
3. Drama
4. Comedy
5. Thriller

In [118]:
cnt = Counter()
genres = data2.genres
for item in genres:
    temp = item.split('|') 
    for genre in temp:
        cnt[genre] += 1
cnt

Counter({'Action': 583,
         'Adventure': 416,
         'Science Fiction': 248,
         'Thriller': 597,
         'Fantasy': 223,
         'Crime': 315,
         'Western': 20,
         'Drama': 782,
         'Family': 260,
         'Animation': 139,
         'Comedy': 683,
         'Mystery': 168,
         'Romance': 308,
         'War': 58,
         'History': 62,
         'Music': 64,
         'Horror': 176,
         'Documentary': 8,
         'Foreign': 2})

In [125]:
cnt.most_common()[0]

('Drama', 782)

In [126]:
answer_ls.append(3)

# 12. Какого жанра среди прибыльных фильмов больше всего?
Варианты ответов:
1. Drama
2. Comedy
3. Action
4. Thriller
5. Adventure

In [128]:
cnt = Counter()
for item in data2[data2.profit > 0].genres:
    temp = item.split('|') 
    for genre in temp:
        cnt[genre] += 1
cnt.most_common()

[('Drama', 560),
 ('Comedy', 551),
 ('Thriller', 446),
 ('Action', 444),
 ('Adventure', 337),
 ('Romance', 242),
 ('Crime', 231),
 ('Family', 226),
 ('Science Fiction', 195),
 ('Fantasy', 188),
 ('Horror', 150),
 ('Animation', 120),
 ('Mystery', 119),
 ('Music', 47),
 ('History', 46),
 ('War', 41),
 ('Western', 12),
 ('Documentary', 7)]

In [129]:
answer_ls.append(1)

# 13. Кто из режиссеров снял больше всего фильмов?
Варианты ответов:
1. Steven Spielberg
2. Ridley Scott 
3. Steven Soderbergh
4. Christopher Nolan
5. Clint Eastwood

In [130]:
cnt = Counter()
for item in data2.director:
    temp = item.split('|') 
    for genre in temp:
        cnt[genre] += 1
cnt.most_common()[0]

[('Steven Soderbergh', 13),
 ('Ridley Scott', 12),
 ('Clint Eastwood', 12),
 ('Robert Rodriguez', 11),
 ('Steven Spielberg', 10),
 ('Shawn Levy', 10),
 ('Peter Farrelly', 10),
 ('Bobby Farrelly', 9),
 ('Tim Burton', 9),
 ('Antoine Fuqua', 8),
 ('Ron Howard', 8),
 ('M. Night Shyamalan', 8),
 ('Christopher Nolan', 8),
 ('Peter Jackson', 8),
 ('Michael Bay', 8),
 ('Brett Ratner', 8),
 ('Todd Phillips', 8),
 ('Adam Shankman', 8),
 ('Gore Verbinski', 8),
 ('Quentin Tarantino', 7),
 ('Paul W.S. Anderson', 7),
 ('Lasse HallstrÃ¶m', 7),
 ('Dennis Dugan', 7),
 ('Marc Forster', 7),
 ('Francis Lawrence', 6),
 ('Adam McKay', 6),
 ('Robert Zemeckis', 6),
 ('Andy Fickman', 6),
 ('David Fincher', 6),
 ('Tim Story', 6),
 ('Zack Snyder', 6),
 ('Robert Luketic', 6),
 ('Steve Carr', 6),
 ('Sam Raimi', 6),
 ('Tony Scott', 6),
 ('Joel Coen', 6),
 ('Martin Scorsese', 6),
 ('Danny Boyle', 6),
 ('Louis Leterrier', 6),
 ('Brian Robbins', 6),
 ('Rob Cohen', 6),
 ('Peter Berg', 6),
 ('Peter Segal', 6),
 ('Raja G

In [131]:
answer_ls.append(3)

# 14. Кто из режиссеров снял больше всего Прибыльных фильмов?
Варианты ответов:
1. Steven Soderbergh
2. Clint Eastwood
3. Steven Spielberg
4. Ridley Scott
5. Christopher Nolan

In [132]:
cnt = Counter()
for item in data2[data2.profit > 0].director:
    temp = item.split('|') 
    for genre in temp:
        cnt[genre] += 1
cnt.most_common()

[('Ridley Scott', 12),
 ('Steven Spielberg', 10),
 ('Clint Eastwood', 10),
 ('Steven Soderbergh', 10),
 ('Shawn Levy', 9),
 ('Tim Burton', 9),
 ('Antoine Fuqua', 8),
 ('Christopher Nolan', 8),
 ('Peter Jackson', 8),
 ('Michael Bay', 8),
 ('Brett Ratner', 8),
 ('Peter Farrelly', 8),
 ('Robert Rodriguez', 8),
 ('M. Night Shyamalan', 7),
 ('Bobby Farrelly', 7),
 ('Todd Phillips', 7),
 ('Adam Shankman', 7),
 ('Quentin Tarantino', 6),
 ('Francis Lawrence', 6),
 ('Adam McKay', 6),
 ('Robert Zemeckis', 6),
 ('Andy Fickman', 6),
 ('David Fincher', 6),
 ('Tim Story', 6),
 ('Zack Snyder', 6),
 ('Ron Howard', 6),
 ('Sam Raimi', 6),
 ('Joel Coen', 6),
 ('Louis Leterrier', 6),
 ('Dennis Dugan', 6),
 ('Paul W.S. Anderson', 6),
 ('Gore Verbinski', 6),
 ('Peter Segal', 6),
 ('Raja Gosnell', 6),
 ('J.J. Abrams', 5),
 ('Sam Mendes', 5),
 ('Jaume Collet-Serra', 5),
 ('F. Gary Gray', 5),
 ('Guy Ritchie', 5),
 ('Nancy Meyers', 5),
 ('Guillermo del Toro', 5),
 ('Bryan Singer', 5),
 ('Doug Liman', 5),
 ('Jon

In [133]:
answer_ls.append(4)

# 15. Кто из режиссеров принес больше всего прибыли?
Варианты ответов:
1. Steven Spielberg
2. Christopher Nolan
3. David Yates
4. James Cameron
5. Peter Jackson

In [166]:
cnt = Counter()
for i in range(0,len(data2)):
    temp = data2.iloc[i]['director'].split('|') 
    for director in temp:
        #print(i, director, data2.iloc[i].profit, data2.iloc[i].original_title)
        cnt[director] += data2.iloc[i].profit
cnt.most_common()

[('Peter Jackson', 5202593685),
 ('David Yates', 3379295625),
 ('Christopher Nolan', 3162548502),
 ('J.J. Abrams', 2839169916),
 ('Michael Bay', 2760938960),
 ('James Cameron', 2548546718),
 ('Francis Lawrence', 2476979588),
 ('Pierre Coffin', 2452006832),
 ('Steven Spielberg', 2449700791),
 ('Joss Whedon', 2424593677),
 ('Gore Verbinski', 2271362290),
 ('Sam Raimi', 2254066354),
 ('Lee Unkrich', 2081614145),
 ('Ridley Scott', 2044035909),
 ('Andrew Adamson', 1957706346),
 ('Chris Columbus', 1906968952),
 ('Carlos Saldanha', 1882719105),
 ('Tim Burton', 1801663457),
 ('Eric Darnell', 1755054393),
 ('Sam Mendes', 1710352791),
 ('Tom McGrath', 1705388064),
 ('Pete Docter', 1682867609),
 ('Chris Renaud', 1648116186),
 ('James Wan', 1620441367),
 ('Brad Bird', 1581913958),
 ('Conrad Vernon', 1578269902),
 ('Shawn Levy', 1515787063),
 ('Justin Lin', 1472947074),
 ('Steven Soderbergh', 1450928588),
 ('Ron Howard', 1379257991),
 ('Bill Condon', 1376664544),
 ('Colin Trevorrow', 1363528810),
 

In [163]:
data2[data2.director.str.contains('\|', na = False)]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
8,tt2293640,7.404165,74000000,1156730962,Minions,Sandra Bullock|Jon Hamm|Michael Keaton|Allison...,Kyle Balda|Pierre Coffin,"Before Gru, they had a history of bad bosses","Minions Stuart, Kevin and Bob are recruited by...",91,Family|Animation|Adventure|Comedy,Universal Pictures|Illumination Entertainment,6/17/2015,2893,6.5,2015,1082730962
11,tt1617661,6.189369,176000003,183987723,Jupiter Ascending,Mila Kunis|Channing Tatum|Sean Bean|Eddie Redm...,Lana Wachowski|Lilly Wachowski,Expand your universe.,In a universe where human genetic material is ...,124,Science Fiction|Fantasy|Action|Adventure,Village Roadshow Pictures|Dune Entertainment|A...,2/4/2015,1937,5.2,2015,7987720
57,tt2381941,2.395366,50100000,153962963,Focus,Will Smith|Margot Robbie|Rodrigo Santoro|Rober...,Glenn Ficarra|John Requa,Never Drop The Con.,"A veteran grifter takes a young, attractive wo...",105,Romance|Comedy|Crime|Drama,Kramer & Sigman Films|RatPac-Dune Entertainmen...,2/25/2015,1831,6.7,2015,103862963
72,tt1524930,2.000338,31000000,104384188,Vacation,Ed Helms|Christina Applegate|Skyler Gisondo|St...,John Francis Daley|Jonathan M. Goldstein,What could go wrong?,Hoping to bring his family closer together and...,99,Adventure|Comedy,New Line Cinema|BenderSpink|David Dobkin Produ...,7/28/2015,846,6.1,2015,73384188
120,tt1843866,12.971027,170000000,714766572,Captain America: The Winter Soldier,Chris Evans|Scarlett Johansson|Sebastian Stan|...,Joe Russo|Anthony Russo,In heroes we trust.,After the cataclysmic events in New York with ...,136,Action|Adventure|Science Fiction,Marvel Studios,3/20/2014,3848,7.6,2014,544766572
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1812,tt0120630,1.525559,45000000,224834564,Chicken Run,Mel Gibson|Miranda Richardson|Timothy Spall|Im...,Peter Lord|Nick Park,This ain't no chick flick. It's poultry in mot...,Having been hopelessly repressed and facing ev...,84,Animation|Comedy|Family,DreamWorks SKG|Aardman Animations|DreamWorks A...,6/21/2000,663,6.3,2000,179834564
1825,tt0138749,1.085408,95000000,76432727,The Road to El Dorado,Kenneth Branagh|Kevin Kline|Rosie Perez|Armand...,Don Michael Paul|Bibo Bergeron,They came for the gold... they stayed for the ...,"After a failed swindle, two con-men end up wit...",89,Adventure|Animation|Comedy|Family,DreamWorks Animation,3/31/2000,462,6.8,2000,-18567273
1842,tt0183505,0.783530,51000000,149270999,"Me, Myself & Irene",Jim Carrey|RenÃ©e Zellweger|Anthony Anderson|R...,Peter Farrelly|Bobby Farrelly,From gentle to mental.,Rhode Island State Trooper Charlie Baileygates...,116,Comedy,Twentieth Century Fox Film Corporation|Conundr...,6/22/2000,521,5.9,2000,98270999
1859,tt0120913,0.546560,75000000,36754634,Titan A.E.,Matt Damon|Bill Pullman|Drew Barrymore|John Le...,Don Bluth|Gary Goldman,"When Earth Ends, The Adventure Begins.",A young man finds out that he holds the key to...,94,Animation|Action|Science Fiction|Family|Adventure,Twentieth Century Fox Film Corporation|David K...,6/16/2000,184,6.1,2000,-38245366


In [167]:
answer_ls.append(5)

In [168]:
a = [1,2,3]
a

[1, 2, 3]

In [169]:
a.pop(-1)

3

In [170]:
a

[1, 2]

# 16. Какой актер принес больше всего прибыли?
Варианты ответов:
1. Emma Watson
2. Johnny Depp
3. Michelle Rodriguez
4. Orlando Bloom
5. Rupert Grint

In [171]:
cnt = Counter()
for i in range(0,len(data2)):
    temp = data2.iloc[i]['cast'].split('|') 
    for actor in temp:
        #print(i, director, data2.iloc[i].profit, data2.iloc[i].original_title)
        cnt[actor] += data2.iloc[i].profit
cnt.most_common()

[('Emma Watson', 6666245597),
 ('Daniel Radcliffe', 6514990281),
 ('Rupert Grint', 6408638290),
 ('Ian McKellen', 6087375777),
 ('Robert Downey Jr.', 5316030161),
 ('Orlando Bloom', 5148578162),
 ('Johnny Depp', 4776417000),
 ('Ralph Fiennes', 4739260140),
 ('Michelle Rodriguez', 4608031235),
 ('Anne Hathaway', 4490351538),
 ('Scarlett Johansson', 4203244858),
 ('Ben Stiller', 4201954571),
 ('Cameron Diaz', 4146490085),
 ('Samuel L. Jackson', 4036974161),
 ('Vin Diesel', 3923216862),
 ('Dwayne Johnson', 3899824603),
 ('Tom Cruise', 3872196893),
 ('Chris Evans', 3793939244),
 ('Gary Oldman', 3708823251),
 ('Zoe Saldana', 3572392568),
 ('Michael Caine', 3481833508),
 ('Shia LaBeouf', 3393362177),
 ('Chris Hemsworth', 3347424695),
 ('Angelina Jolie', 3345409359),
 ('Kristen Stewart', 3324049211),
 ('Helena Bonham Carter', 3289902810),
 ('Brad Pitt', 3287408957),
 ('Paul Walker', 3137283755),
 ('Sam Worthington', 3135365646),
 ('Daniel Craig', 3133778323),
 ('Will Smith', 3107921771),
 ('T

In [172]:
answer_ls.append(1)

# 17. Какой актер принес меньше всего прибыли в 2012 году?
Варианты ответов:
1. Nicolas Cage
2. Danny Huston
3. Kirsten Dunst
4. Jim Sturgess
5. Sami Gayle

In [251]:
cnt = Counter()
#data_17 = data2[(data2.profit > 0) & (data2.release_year == 2012)]#.cast.str.split('|').tolist()
data_17 = data2[data2.release_year == 2012]
for i in range(0,len(data_17)):
   for actor in data_17.iloc[i].cast.split('|'):
    cnt[actor] += data_17.iloc[i].profit
cnt.most_common()[-1]

('Kirsten Dunst', -68109207)

In [239]:
def to_dict_from_series(list_to_dict, values):
    dictionary = {}
    for i in range(0,len(list_to_dict)):
        for word in list_to_dict[i]:
            if word in dictionary:
                dictionary[word] += values[i]
            else:
                dictionary[word] = values[i]
    return(dictionary)

actors = to_dict_from_series(data_17.cast.str.split('|').tolist(), data_17.profit.tolist())
actors

{'Robert Downey Jr.': 1299557910,
 'Chris Evans': 1299557910,
 'Mark Ruffalo': 1299557910,
 'Chris Hemsworth': 1542450773,
 'Scarlett Johansson': 1299557910,
 'Kate Beckinsale': 174302074,
 'Stephen Rea': 62400000,
 'Michael Ealy': 146470507,
 'Theo James': 62400000,
 'India Eisley': 62400000,
 'Christian Bale': 831041287,
 'Michael Caine': 831041287,
 'Gary Oldman': 831041287,
 'Anne Hathaway': 1211851057,
 'Tom Hardy': 858717867,
 'Jamie Foxx': 325368238,
 'Christoph Waltz': 325368238,
 'Leonardo DiCaprio': 325368238,
 'Kerry Washington': 307412515,
 'Samuel L. Jackson': 325368238,
 'Daniel Craig': 908561013,
 'Judi Dench': 908561013,
 'Javier Bardem': 908561013,
 'Ralph Fiennes': 1059561013,
 'Naomie Harris': 908561013,
 'Jeremy Renner': 146572938,
 'Rachel Weisz': 146572938,
 'Edward Norton': 198836104,
 'Scott Glenn': 146572938,
 'Stacy Keach': 146572938,
 'Ian McKellen': 767003568,
 'Martin Freeman': 767003568,
 'Richard Armitage': 767003568,
 'Andy Serkis': 767003568,
 'Cate Bla

In [243]:
min(actors, key=actors.get)

'Kirsten Dunst'

In [247]:
data_17_1 = data2[(data2.release_year == 2012) & (data2.profit > 0)]
data_17_1[data_17_1.profit == data_17_1.profit.min()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
1045,tt1615065,0.998625,45000000,47000000,Savages,Blake Lively|Taylor Kitsch|Aaron Taylor-Johnso...,Oliver Stone,Young Beautiful Deadly.,Pot growers Ben and Chon face off against the ...,131,Crime|Drama|Thriller,Ixtlan|Relativity Media|Onda Entertainment,7/6/2012,516,6.2,2012,2000000


In [245]:
actors_1 = to_dict_from_series(data_17_1.cast.str.split('|').tolist(), data_17_1.profit.tolist())
min(actors_1, key=actors_1.get)

'Blake Lively'

In [246]:
actors_1['Blake Lively']

2000000

In [252]:
answer_ls.append(3)

# 18. Какой актер снялся в большем количестве высокобюджетных фильмов? (в фильмах где бюджет выше среднего по данной выборке)
Варианты ответов:
1. Tom Cruise
2. Mark Wahlberg 
3. Matt Damon
4. Angelina Jolie
5. Adam Sandler

In [254]:
data_bb = data2[data2['budget']>data2['budget'].mean()]
pd.DataFrame(data_bb.cast.str.split('|').tolist()).stack().value_counts()

Matt Damon        18
Adam Sandler      17
Angelina Jolie    16
Eddie Murphy      15
Tom Cruise        15
                  ..
Vera Farmiga       1
Dakota Goyo        1
Will Forte         1
Diane Keaton       1
January Jones      1
Length: 1508, dtype: int64

In [255]:
answer_ls.append(3)

# 19. В фильмах какого жанра больше всего снимался Nicolas Cage?  
Варианты ответа:
1. Drama
2. Action
3. Thriller
4. Adventure
5. Crime

In [266]:
data_nc = data2[data2.cast.str.contains('Nicolas Cage')]
#pd.DataFrame[data_nc.genres.str.split('|').tolist()].stack().value_counts()
pd.DataFrame(data_nc.genres.str.split('|').tolist()).stack().value_counts()

Action             17
Thriller           15
Drama              12
Crime              10
Fantasy             8
Adventure           7
Comedy              6
Science Fiction     4
Mystery             3
Animation           3
Family              3
History             2
War                 1
Horror              1
Romance             1
dtype: int64

In [267]:
answer_ls.append(2)

# 20. Какая студия сняла больше всего фильмов?
Варианты ответа:
1. Universal Pictures (Universal)
2. Paramount Pictures
3. Columbia Pictures
4. Warner Bros
5. Twentieth Century Fox Film Corporation

In [268]:
pd.DataFrame(data2.production_companies.str.split('|').tolist()).stack().value_counts()

Universal Pictures                        173
Warner Bros.                              168
Paramount Pictures                        122
Columbia Pictures                         117
Twentieth Century Fox Film Corporation    109
                                         ... 
Necropia                                    1
Square USA                                  1
Corsan                                      1
HorrorFlix                                  1
Lions Gate Family Entertainment             1
Length: 1772, dtype: int64

In [269]:
answer_ls.append(1)

# 21. Какая студия сняла больше всего фильмов в 2015 году?
Варианты ответа:
1. Universal Pictures
2. Paramount Pictures
3. Columbia Pictures
4. Warner Bros
5. Twentieth Century Fox Film Corporation

In [279]:
data_2015 = data2[data2.release_year == 2015]
pd.DataFrame(data_2015.production_companies.str.split('|').tolist()).stack().value_counts()

Warner Bros.                              12
Universal Pictures                        10
Twentieth Century Fox Film Corporation     8
Columbia Pictures                          7
Paramount Pictures                         7
                                          ..
Beagle Pug Films                           1
Clinica Estetico                           1
Ahimsa Films                               1
Thunder Road Pictures                      1
Gran Via Productions                       1
Length: 246, dtype: int64

In [280]:
answer_ls.append(4)

# 22. Какая студия заработала больше всего денег в жанре комедий за все время?
Варианты ответа:
1. Warner Bros
2. Universal Pictures (Universal)
3. Columbia Pictures
4. Paramount Pictures
5. Walt Disney

In [275]:
data_comedies = data2[data2.genres.str.contains('Comedy')]
cnt = Counter()
for i in range (0, len(data_comedies)):
    for company in data_comedies.iloc[i].production_companies.split('|'):
        cnt [company] += data_comedies.iloc[i].profit
cnt.most_common()

[('Universal Pictures', 8961545581),
 ('Walt Disney Pictures', 7669710326),
 ('Twentieth Century Fox Film Corporation', 5686960294),
 ('Columbia Pictures', 5646343696),
 ('DreamWorks Animation', 4789049764),
 ('Pixar Animation Studios', 4232507237),
 ('Warner Bros.', 3894922770),
 ('New Line Cinema', 3259242692),
 ('DreamWorks SKG', 3143226857),
 ('Paramount Pictures', 3055625722),
 ('Relativity Media', 3036733344),
 ('Twentieth Century Fox Animation', 3030806037),
 ('Blue Sky Studios', 3024335014),
 ('Village Roadshow Pictures', 2383194524),
 ('Happy Madison Productions', 2196445292),
 ('Illumination Entertainment', 1977492847),
 ('Columbia Pictures Corporation', 1817280373),
 ('Regency Enterprises', 1766412563),
 ('Dune Entertainment', 1703149736),
 ('Pacific Data Images (PDI)', 1699136912),
 ('Sony Pictures Animation', 1686288994),
 ('Fox 2000 Pictures', 1673413405),
 ('Apatow Productions', 1419417727),
 ('21 Laps Entertainment', 1262425642),
 ('Dimension Films', 1235565030),
 ('Wal

In [281]:
answer_ls.append(2)

# 23. Какая студия заработала больше всего денег в 2012 году?
Варианты ответа:
1. Universal Pictures (Universal)
2. Warner Bros
3. Columbia Pictures
4. Paramount Pictures
5. Lucasfilm

In [286]:
data_2012 = data2[data2.release_year == 2012]
cnt = Counter()
for i in range (0, len(data_2012)):
    for company in data_2012.iloc[i].production_companies.split('|'):
        cnt [company] += data_2012.iloc[i].profit
cnt.most_common()

[('Columbia Pictures', 2501406608),
 ('Universal Pictures', 1981011579),
 ('Marvel Studios', 1299557910),
 ('Warner Bros.', 1258020056),
 ('Relativity Media', 1032593938),
 ('New Line Cinema', 1028114941),
 ('Metro-Goldwyn-Mayer (MGM)', 1010869947),
 ('Legendary Pictures', 982041287),
 ('Summit Entertainment', 961582873),
 ('Dune Entertainment', 892186707),
 ('DC Entertainment', 831041287),
 ('Syncopy', 831041287),
 ('Blue Sky Studios', 782244782),
 ('Twentieth Century Fox Animation', 782244782),
 ('WingNut Films', 767003568),
 ('DreamWorks Animation', 763862944),
 ('Twentieth Century Fox Film Corporation', 726676825),
 ('Lionsgate', 709309876),
 ('Sunswept Entertainment', 709000000),
 ('Temple Hill Entertainment', 709000000),
 ('Color Force', 616210692),
 ('Fox 2000 Pictures', 544128741),
 ('Laura Ziskin Productions', 537215857),
 ('Marvel Entertainment', 537215857),
 ('The Weinstein Company', 534041592),
 ('Media Rights Capital', 499368315),
 ('Fuzzy Door Productions', 499368315),
 (

In [287]:
answer_ls.append(3)

# 24. Самый убыточный фильм от Paramount Pictures
Варианты ответа:

1. K-19: The Widowmaker tt0267626
2. Next tt0435705
3. Twisted tt0315297
4. The Love Guru tt0811138
5. The Fighter tt0964517

In [289]:
data_pp = data2[data2.production_companies.str.contains('Paramount Pictures')]
data_pp[data_pp.profit == data_pp.profit.min()]

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit
926,tt0267626,0.72233,100000000,35168966,K-19: The Widowmaker,Harrison Ford|Liam Neeson|Peter Sarsgaard|Joss...,Kathryn Bigelow,Fate has found its hero.,When Russia's first nuclear submarine malfunct...,138,Thriller|Drama|History,Paramount Pictures|Intermedia Films|National G...,7/19/2002,146,6.0,2002,-64831034


In [290]:
answer_ls.append(1)

# 25. Какой Самый прибыльный год (заработали больше всего)?
Варианты ответа:
1. 2014
2. 2008
3. 2012
4. 2002
5. 2015

In [299]:
data2.groupby('release_year').profit.sum().sort_values(ascending = False)

release_year
2015    18668572378
2014    16397812953
2012    16077001687
2013    15243179791
2011    14730241341
2009    13423744372
2010    13117292530
2008    11663881990
2007    11565911801
2004     9634180720
2003     9228823312
2002     9002361487
2005     8981925558
2006     8691077320
2001     7950614865
2000     6101399805
Name: profit, dtype: int64

In [300]:
answer_ls.append(5)

# 26. Какой Самый прибыльный год для студии Warner Bros?
Варианты ответа:
1. 2014
2. 2008
3. 2012
4. 2010
5. 2015

In [303]:
data_wb = data2[data2.production_companies.str.contains('Warner Bros')]
data_wb.groupby('release_year').profit.sum().sort_values(ascending = False)

release_year
2014    2295464519
2007    2201675217
2008    2134595031
2010    1974712985
2011    1871393682
2003    1855493377
2009    1822454136
2013    1636453400
2004    1631933725
2005    1551980298
2001    1343545668
2012    1258020056
2002    1022709901
2015     870368348
2006     620170743
2000     452631386
Name: profit, dtype: int64

In [304]:
answer_ls.append(1)

# 27. В каком месяце за все годы суммарно вышло больше всего фильмов?
Варианты ответа:
1. Январь
2. Июнь
3. Декабрь
4. Сентябрь
5. Май

In [360]:
#month = data2.release_date[data2.release_date.str.split('/')[0]]
month = pd.DataFrame(data2.release_date.str.split('/').tolist()).stack().loc[(slice(None),0)]
data3 = data2
data3['month'] = month
data3.groupby('month').count().genres.sort_values()

month
1     110
2     135
5     140
7     142
11    146
6     147
4     149
3     156
8     161
10    186
12    191
9     227
Name: genres, dtype: int64

In [361]:
answer_ls.append(4)

# 28. Сколько суммарно вышло фильмов летом? (за июнь, июль, август)
Варианты ответа:
1. 345
2. 450
3. 478
4. 523
5. 381

In [375]:
data3[data3.month.isin(['6','7','8'])].groupby('month').count().genres.sum()

450

In [376]:
answer_ls.append(2)

# 29. Какой режисер выпускает (суммарно по годам) больше всего фильмов зимой?
Варианты ответов:
1. Steven Soderbergh
2. Christopher Nolan
3. Clint Eastwood
4. Ridley Scott
5. Peter Jackson

In [380]:
data_winter = data3[data3.month.isin(['1','2','12'])]
pd.DataFrame(data_winter.director.str.split('|').tolist()).stack().value_counts()

Peter Jackson        7
Clint Eastwood       6
Steven Soderbergh    6
Adam Shankman        4
Shawn Levy           4
                    ..
Martin McDonagh      1
Scott Stewart        1
Kevin Allen          1
Ben Stiller          1
Cory Edwards         1
Length: 359, dtype: int64

In [381]:
answer_ls.append(5)

# 30. Какой месяц чаще всего по годам самый прибыльный?
Варианты ответа:
1. Январь
2. Июнь
3. Декабрь
4. Сентябрь
5. Май

In [402]:
a = data3.groupby(['release_year','month']).profit.sum()
a

release_year  month
2000          1         -26344591
              10        399938512
              11        487934818
              12       1411487999
              2         310137593
                          ...    
2015          5        1103503482
              6        3757679861
              7        1552273188
              8         491699146
              9        1775686025
Name: profit, Length: 192, dtype: int64

In [415]:
b =a.to_frame()
b

Unnamed: 0_level_0,Unnamed: 1_level_0,profit
release_year,month,Unnamed: 2_level_1
2000,1,-26344591
2000,10,399938512
2000,11,487934818
2000,12,1411487999
2000,2,310137593
...,...,...
2015,5,1103503482
2015,6,3757679861
2015,7,1552273188
2015,8,491699146


In [417]:
b.groupby('month').profit.sum().sort_values(ascending = False)

month
6     27551262500
12    26563071474
5     25039516141
11    18654155473
7     18307895020
3     13720216540
10    12994907859
4     12876005537
9     10878921041
8      9640610446
2      8854574425
1      5396885454
Name: profit, dtype: int64

In [418]:
data3.groupby(['release_year','month']).profit.sum().to_frame().groupby('month').profit.sum().sort_values(ascending = False)

month
6     27551262500
12    26563071474
5     25039516141
11    18654155473
7     18307895020
3     13720216540
10    12994907859
4     12876005537
9     10878921041
8      9640610446
2      8854574425
1      5396885454
Name: profit, dtype: int64

In [419]:
answer_ls.append(2)

# 31. Названия фильмов какой студии в среднем самые длинные по количеству символов?
Варианты ответа:
1. Universal Pictures (Universal)
2. Warner Bros
3. Jim Henson Company, The
4. Paramount Pictures
5. Four By Two Productions

In [422]:
data3.head(1)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit,month
0,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,1363528810,6


In [475]:
data4 = data3
data4['title_lenght'] = data3.original_title.str.len()
data4.head(1)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit,month,title_lenght,number_of_words
0,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,1363528810,6,14,2


In [476]:
cnt = Counter()
movies_number = Counter()
for i in range (0, len(data4)):
    for company in data4.iloc[i].production_companies.split('|'):
        cnt [company] += data4.iloc[i].title_lenght
        movies_number[company] += 1
mean_films_title = {}
for item in movies_number:
    mean_films_title [item] = cnt [item] / movies_number [item]
#max(mean_films_title, key=mean_films_title.get)
mean_films_title

{'Universal Studios': 14.833333333333334,
 'Amblin Entertainment': 17.869565217391305,
 'Legendary Pictures': 13.411764705882353,
 'Fuji Television Network': 12.666666666666666,
 'Dentsu': 13.5,
 'Village Roadshow Pictures': 15.063492063492063,
 'Kennedy Miller Productions': 14.0,
 'Summit Entertainment': 14.804878048780488,
 'Mandeville Films': 10.833333333333334,
 'Red Wagon Entertainment': 9.833333333333334,
 'NeoReel': 14.0,
 'Lucasfilm': 41.5,
 'Truenorth Productions': 28.0,
 'Bad Robot': 18.77777777777778,
 'Universal Pictures': 14.716763005780347,
 'Original Film': 13.10344827586207,
 'Media Rights Capital': 9.142857142857142,
 'One Race Films': 15.0,
 'Regency Enterprises': 13.408163265306122,
 'Appian Way': 15.222222222222221,
 'CatchPlay': 12.0,
 'Anonymous Content': 13.454545454545455,
 'New Regency Pictures': 12.607142857142858,
 'Paramount Pictures': 17.0327868852459,
 'Skydance Productions': 19.444444444444443,
 'Twentieth Century Fox Film Corporation': 16.504587155963304

In [440]:
max(mean_films_title, key=mean_films_title.get)

'Four By Two Productions'

In [441]:
mean_films_title['Four By Two Productions']

83.0

In [442]:
cnt['Four By Two Productions']

83

In [443]:
movies_number['Four By Two Productions']

1

In [444]:
answer_ls.append(5)

# 32. Названия фильмов какой студии в среднем самые длинные по количеству слов?
Варианты ответа:
1. Universal Pictures (Universal)
2. Warner Bros
3. Jim Henson Company, The
4. Paramount Pictures
5. Four By Two Productions

In [452]:
words_in_title = data3.original_title.str.split(' ').tolist()
words_in_title

[['Jurassic', 'World'],
 ['Mad', 'Max:', 'Fury', 'Road'],
 ['Insurgent'],
 ['Star', 'Wars:', 'The', 'Force', 'Awakens'],
 ['Furious', '7'],
 ['The', 'Revenant'],
 ['Terminator', 'Genisys'],
 ['The', 'Martian'],
 ['Minions'],
 ['Inside', 'Out'],
 ['Spectre'],
 ['Jupiter', 'Ascending'],
 ['Ex', 'Machina'],
 ['Pixels'],
 ['Avengers:', 'Age', 'of', 'Ultron'],
 ['The', 'Hateful', 'Eight'],
 ['Taken', '3'],
 ['Ant-Man'],
 ['Cinderella'],
 ['The', 'Hunger', 'Games:', 'Mockingjay', '-', 'Part', '2'],
 ['Tomorrowland'],
 ['Southpaw'],
 ['San', 'Andreas'],
 ['Fifty', 'Shades', 'of', 'Grey'],
 ['The', 'Big', 'Short'],
 ['Mission:', 'Impossible', '-', 'Rogue', 'Nation'],
 ['Ted', '2'],
 ['Kingsman:', 'The', 'Secret', 'Service'],
 ['Spotlight'],
 ['Maze', 'Runner:', 'The', 'Scorch', 'Trials'],
 ['Chappie'],
 ['Pitch', 'Perfect', '2'],
 ['Bridge', 'of', 'Spies'],
 ['Goosebumps'],
 ['Room'],
 ['The', 'Good', 'Dinosaur'],
 ['Run', 'All', 'Night'],
 ['Brooklyn'],
 ['Straight', 'Outta', 'Compton'],
 ['T

In [473]:
number_of_words = []
for item in words_in_title:
    number_of_words.append(len(item))
len(number_of_words)

1890

In [477]:
data5 = data4
data5['number_of_words'] = number_of_words
data5.head(1)

Unnamed: 0,imdb_id,popularity,budget,revenue,original_title,cast,director,tagline,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,profit,month,title_lenght,number_of_words
0,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,1363528810,6,14,2


In [479]:
cnt = Counter()
movies_number = Counter()
for i in range (0, len(data5)):
    for company in data5.iloc[i].production_companies.split('|'):
        cnt [company] += data5.iloc[i].number_of_words
        movies_number[company] += 1
mean_words_in_title = {}
for item in movies_number:
    mean_words_in_title [item] = cnt [item] / movies_number [item]
#max(mean_films_title, key=mean_films_title.get)
mean_words_in_title

{'Universal Studios': 2.6666666666666665,
 'Amblin Entertainment': 3.0434782608695654,
 'Legendary Pictures': 2.4705882352941178,
 'Fuji Television Network': 1.6666666666666667,
 'Dentsu': 2.25,
 'Village Roadshow Pictures': 2.6825396825396823,
 'Kennedy Miller Productions': 3.0,
 'Summit Entertainment': 2.926829268292683,
 'Mandeville Films': 2.0,
 'Red Wagon Entertainment': 1.3333333333333333,
 'NeoReel': 2.6666666666666665,
 'Lucasfilm': 8.0,
 'Truenorth Productions': 5.0,
 'Bad Robot': 3.111111111111111,
 'Universal Pictures': 2.61271676300578,
 'Original Film': 2.586206896551724,
 'Media Rights Capital': 1.7857142857142858,
 'One Race Films': 3.0,
 'Regency Enterprises': 2.510204081632653,
 'Appian Way': 2.888888888888889,
 'CatchPlay': 2.0,
 'Anonymous Content': 2.5454545454545454,
 'New Regency Pictures': 2.3214285714285716,
 'Paramount Pictures': 2.8688524590163933,
 'Skydance Productions': 3.111111111111111,
 'Twentieth Century Fox Film Corporation': 3.0091743119266057,
 'Scot

In [481]:
cnt

Counter({'Universal Studios': 16,
         'Amblin Entertainment': 70,
         'Legendary Pictures': 84,
         'Fuji Television Network': 5,
         'Dentsu': 27,
         'Village Roadshow Pictures': 169,
         'Kennedy Miller Productions': 6,
         'Summit Entertainment': 120,
         'Mandeville Films': 12,
         'Red Wagon Entertainment': 8,
         'NeoReel': 8,
         'Lucasfilm': 32,
         'Truenorth Productions': 5,
         'Bad Robot': 28,
         'Universal Pictures': 452,
         'Original Film': 75,
         'Media Rights Capital': 25,
         'One Race Films': 6,
         'Regency Enterprises': 123,
         'Appian Way': 26,
         'CatchPlay': 2,
         'Anonymous Content': 28,
         'New Regency Pictures': 65,
         'Paramount Pictures': 350,
         'Skydance Productions': 28,
         'Twentieth Century Fox Film Corporation': 328,
         'Scott Free Productions': 65,
         'Mid Atlantic Films': 17,
         'International Trade

In [482]:
max(mean_words_in_title, key=mean_words_in_title.get)

'Four By Two Productions'

In [483]:
mean_words_in_title['Four By Two Productions']

12.0

In [487]:
data5[data5.production_companies.str.contains('Four By Two Productions')]['original_title']

1449    Borat: Cultural Learnings of America for Make ...
Name: original_title, dtype: object

In [488]:
answer_ls.append(5)

# 33. Сколько разных слов используется в названиях фильмов?(без учета регистра)
Варианты ответа:
1. 6540
2. 1002
3. 2461
4. 28304
5. 3432

In [465]:
words_in_titles = pd.DataFrame(data3.original_title.str.split(' ').tolist()).stack()
words_in_titles

0     0    Jurassic
      1       World
1     0         Mad
      1        Max:
      2        Fury
             ...   
1888  0     Hanging
      1          Up
1889  0         The
      1          In
      2       Crowd
Length: 5138, dtype: object

In [470]:
words = {}
for item in words_in_titles:
    word = item.lower()
    if word not in words:
        words[word] = 1
    else:
        words[word] += 1
words

{'jurassic': 2,
 'world': 9,
 'mad': 1,
 'max:': 1,
 'fury': 2,
 'road': 7,
 'insurgent': 1,
 'star': 8,
 'wars:': 3,
 'the': 599,
 'force': 1,
 'awakens': 1,
 'furious': 5,
 '7': 1,
 'revenant': 1,
 'terminator': 3,
 'genisys': 1,
 'martian': 1,
 'minions': 1,
 'inside': 2,
 'out': 8,
 'spectre': 1,
 'jupiter': 1,
 'ascending': 1,
 'ex': 1,
 'machina': 1,
 'pixels': 1,
 'avengers:': 1,
 'age': 5,
 'of': 185,
 'ultron': 1,
 'hateful': 1,
 'eight': 2,
 'taken': 3,
 '3': 16,
 'ant-man': 1,
 'cinderella': 3,
 'hunger': 4,
 'games:': 3,
 'mockingjay': 2,
 '-': 10,
 'part': 12,
 '2': 48,
 'tomorrowland': 1,
 'southpaw': 1,
 'san': 1,
 'andreas': 1,
 'fifty': 1,
 'shades': 1,
 'grey': 2,
 'big': 16,
 'short': 1,
 'mission:': 4,
 'impossible': 4,
 'rogue': 1,
 'nation': 2,
 'ted': 2,
 'kingsman:': 1,
 'secret': 7,
 'service': 1,
 'spotlight': 1,
 'maze': 2,
 'runner:': 1,
 'scorch': 1,
 'trials': 1,
 'chappie': 1,
 'pitch': 4,
 'perfect': 7,
 'bridge': 2,
 'spies': 1,
 'goosebumps': 1,
 'room

In [472]:
len(words)

2462

# 34. Какие фильмы входят в 1 процент лучших по рейтингу?
Варианты ответа:
1. Inside Out, Gone Girl, 12 Years a Slave
2. BloodRayne, The Adventures of Rocky & Bullwinkle
3. The Lord of the Rings: The Return of the King
4. 300, Lucky Number Slevin

In [525]:
data5[data5.vote_average >= data5.vote_average.nlargest(len(data5)//100).min()].original_title.tolist()

['Inside Out',
 'Spotlight',
 'Room',
 'Interstellar',
 'Guardians of the Galaxy',
 'Big Hero 6',
 'The Imitation Game',
 'Gone Girl',
 'The Grand Budapest Hotel',
 'The Theory of Everything',
 'The Fault in Our Stars',
 'Mr. Nobody',
 '3 Idiots',
 'Inception',
 'The Lord of the Rings: The Fellowship of the Ring',
 'The Dark Knight',
 'The Lord of the Rings: The Two Towers',
 'The Pianist',
 'The Lord of the Rings: The Return of the King',
 'The Wolf of Wall Street',
 'Her',
 '12 Years a Slave',
 'Prisoners',
 'Dallas Buyers Club',
 'The Prestige',
 'Eternal Sunshine of the Spotless Mind',
 'There Will Be Blood',
 'Memento']

In [528]:
data5[data5['vote_average'] > data5.quantile(0.99, numeric_only=True)['vote_average']].original_title.tolist()

['Inside Out',
 'Room',
 'Interstellar',
 'Guardians of the Galaxy',
 'The Imitation Game',
 'Gone Girl',
 'The Grand Budapest Hotel',
 'Inception',
 'The Dark Knight',
 'The Pianist',
 'The Lord of the Rings: The Return of the King',
 'The Wolf of Wall Street',
 '12 Years a Slave',
 'Memento']

In [508]:
data5.iloc[6]

imdb_id                                                         tt1340138
popularity                                                        8.65436
budget                                                          155000000
revenue                                                         440603537
original_title                                         Terminator Genisys
cast                    Arnold Schwarzenegger|Jason Clarke|Emilia Clar...
director                                                      Alan Taylor
tagline                                                  Reset the future
overview                The year is 2029. John Connor, leader of the r...
runtime                                                               125
genres                          Science Fiction|Action|Thriller|Adventure
production_companies              Paramount Pictures|Skydance Productions
release_date                                                    6/23/2015
vote_count                            

In [529]:
answer_ls.append(1)

# 35. Какие актеры чаще всего снимаются в одном фильме вместе
Варианты ответа:
1. Johnny Depp & Helena Bonham Carter
2. Hugh Jackman & Ian McKellen
3. Vin Diesel & Paul Walker
4. Adam Sandler & Kevin James
5. Daniel Radcliffe & Rupert Grint

In [537]:
def pairs_of_items(string, pairs_list, separator):
    temp = string.split(separator)
    if separator not in string:
        return pairs_list
    for i in range(0, len(temp)):
        for j in range(i + 1, len(temp)):
            pair = temp[i] + ' & ' + temp[j]
            if (pair) in pairs_list:
                pairs_list[pair] += 1
            else:
                pairs_list[pair] = 1
    return pairs_list

actors = 'Arnold Schwarzenegger|Jason Clarke|Emilia Clar'
pairs = {}
pairs = pairs_of_items(actors, pairs, '|')
pairs

{'Arnold Schwarzenegger & Jason Clarke': 1,
 'Arnold Schwarzenegger & Emilia Clar': 1,
 'Jason Clarke & Emilia Clar': 1}

In [534]:
a = 'Arnold Schwarzenegger|Jason Clarke|Emilia Clar'.split('|')
a

['Arnold Schwarzenegger', 'Jason Clarke', 'Emilia Clar']

In [540]:
pairs = {}
for item in data5.cast:
    pairs = pairs_of_items(item, pairs, '|')
pairs

{'Chris Pratt & Bryce Dallas Howard': 1,
 'Chris Pratt & Irrfan Khan': 1,
 "Chris Pratt & Vincent D'Onofrio": 1,
 'Chris Pratt & Nick Robinson': 1,
 'Bryce Dallas Howard & Irrfan Khan': 1,
 "Bryce Dallas Howard & Vincent D'Onofrio": 1,
 'Bryce Dallas Howard & Nick Robinson': 1,
 "Irrfan Khan & Vincent D'Onofrio": 1,
 'Irrfan Khan & Nick Robinson': 1,
 "Vincent D'Onofrio & Nick Robinson": 1,
 'Tom Hardy & Charlize Theron': 1,
 'Tom Hardy & Hugh Keays-Byrne': 1,
 'Tom Hardy & Nicholas Hoult': 1,
 'Tom Hardy & Josh Helman': 1,
 'Charlize Theron & Hugh Keays-Byrne': 1,
 'Charlize Theron & Nicholas Hoult': 1,
 'Charlize Theron & Josh Helman': 1,
 'Hugh Keays-Byrne & Nicholas Hoult': 1,
 'Hugh Keays-Byrne & Josh Helman': 1,
 'Nicholas Hoult & Josh Helman': 1,
 'Shailene Woodley & Theo James': 2,
 'Shailene Woodley & Kate Winslet': 2,
 'Shailene Woodley & Ansel Elgort': 2,
 'Shailene Woodley & Miles Teller': 2,
 'Theo James & Kate Winslet': 2,
 'Theo James & Ansel Elgort': 1,
 'Theo James & M

In [544]:
max(pairs,key = pairs.get)

'Daniel Radcliffe & Rupert Grint'

In [545]:
pairs['Daniel Radcliffe & Rupert Grint']

8

In [546]:
answer_ls.append(5)

# 36. У какого из режиссеров выше вероятность выпустить фильм в прибыли? (5 баллов)101
(У какого из режиссеров самый высокий процент фильмов со сборами выше бюджета?)

Варианты ответа:
1. Quentin Tarantino
2. Steven Soderbergh
3. Robert Rodriguez
4. Christopher Nolan
5. Clint Eastwood

In [547]:
data6_profitable = data5[data5.profit > 0]
data6.profit.min()

2000000

In [552]:
filmography = pd.DataFrame(data5.director.str.split('|').tolist()).stack().value_counts()
filmography

Steven Soderbergh    13
Ridley Scott         12
Clint Eastwood       12
Robert Rodriguez     11
Steven Spielberg     10
                     ..
Peter Sollett         1
Callan Brunker        1
Christian Duguay      1
R.J. Cutler           1
Mimi Leder            1
Length: 998, dtype: int64

In [553]:
filmography_profitable = pd.DataFrame(data6.director.str.split('|').tolist()).stack().value_counts()
filmography_profitable

Ridley Scott         12
Steven Soderbergh    10
Clint Eastwood       10
Steven Spielberg     10
Tim Burton            9
                     ..
Steve Purcell         1
Gavin O'Connor        1
Shane Black           1
Brandon Camp          1
James Mather          1
Length: 814, dtype: int64

In [561]:
profitability = {}
for item in filmography.index:
    if item not in filmography_profitable:
        profitability[item] = 0
        continue
    profitability[item] = filmography_profitable[item] / filmography[item]
profitability

{'Steven Soderbergh': 0.7692307692307693,
 'Ridley Scott': 1.0,
 'Clint Eastwood': 0.8333333333333334,
 'Robert Rodriguez': 0.7272727272727273,
 'Steven Spielberg': 1.0,
 'Shawn Levy': 0.9,
 'Peter Farrelly': 0.8,
 'Tim Burton': 1.0,
 'Bobby Farrelly': 0.7777777777777778,
 'Ron Howard': 0.75,
 'Antoine Fuqua': 1.0,
 'Michael Bay': 1.0,
 'Peter Jackson': 1.0,
 'M. Night Shyamalan': 0.875,
 'Christopher Nolan': 1.0,
 'Adam Shankman': 0.875,
 'Brett Ratner': 1.0,
 'Gore Verbinski': 0.75,
 'Todd Phillips': 0.875,
 'Dennis Dugan': 0.8571428571428571,
 'Quentin Tarantino': 0.8571428571428571,
 'Lasse HallstrÃ¶m': 0.7142857142857143,
 'Marc Forster': 0.7142857142857143,
 'Paul W.S. Anderson': 0.8571428571428571,
 'Steve Carr': 0.8333333333333334,
 'Tim Story': 1.0,
 'Peter Segal': 1.0,
 'Peter Berg': 0.8333333333333334,
 'Louis Leterrier': 1.0,
 'Andy Fickman': 1.0,
 'Francis Lawrence': 1.0,
 'Raja Gosnell': 1.0,
 'Rob Cohen': 0.6666666666666666,
 'Danny Boyle': 0.8333333333333334,
 'Zack Sny

In [562]:
max(profitability,key = profitability.get)

'Ridley Scott'

In [563]:
profitability['Ridley Scott']

1.0

In [574]:
directors = ['Quentin Tarantino', 'Steven Soderbergh', 'Robert Rodriguez', 'Christopher Nolan', 'Clint Eastwood']
for director, value in profitability.items():
    if director in directors:
        print(director, item)

Steven Soderbergh 1.0
Clint Eastwood 1.0
Robert Rodriguez 1.0
Christopher Nolan 1.0
Quentin Tarantino 1.0


In [581]:
for director, value in profitability.items():
    if director in directors:
        print(director, value)
        #print (filmography[director])
        #print (filmography_profitable[director])
        #print (filmography_profitable[director] / filmography[director])

Steven Soderbergh 0.7692307692307693
Clint Eastwood 0.8333333333333334
Robert Rodriguez 0.7272727272727273
Christopher Nolan 1.0
Quentin Tarantino 0.8571428571428571


In [582]:
answer_ls.append(4)

# Submission

In [583]:
len(answer_ls)

35

In [584]:
pd.DataFrame({'Id':range(1,len(answer_ls)+1), 'Answer':answer_ls}, columns=['Id', 'Answer'])

Unnamed: 0,Id,Answer
0,1,4
1,2,2
2,3,3
3,4,2
4,5,1
5,6,5
6,7,2
7,8,1
8,9,4
9,10,5
