# Final Project Report - Explore the Movie Sales
#### Name:Yutang Xiong DUID:873302263 Emails: yutang.xiong@du.edu / quciet@gmail.com

## Context

It’s really hard to define a good movie. Since everyone has his/her own taste. Plus, it's always a debate which movie/movies are the best of all time. This project does not intend to address this question, instead it aims to find some insights about movie sales. As a side effect, this might also suggest some 'great' movies in which "great" refers to movies that are the box office attraction. 

To explore the movie market, the first thing we need to know is which movie markets are more informative, or we are just looking at a overall situation. United States, the land that owns the famous Hollywood, would certainly be in our consideration. Meanwhile, China, the second biggest economy in the world, could be one of the candidates. Thus, we need some background context. 

With some navigation on the internet, here are some contexts:
- China has grew from a total of 1,000 million USD movie sales in year 2001 to 55,900 million USD in year 2017. This is really a blast! Moreover, it becomes second biggest movie market in the world. First place? North America! 
- In the first quarter of 2018, China has surpassed North America for the first time in the history in the movie market.
Thus, like other markets, movie sales in China becomes a big cake that everyone wants to take a share of it. 

With those contexts, this project will then concern on the movie sales in those two areas as well as boxoffice worldwide. 

## Literature Review

Many data scientists have done similar works related to movie sales. But their focus mostly lies in the data scraping part as their main purposes are to give tutorials for those who are insterested in learning data scraping skills. Furthermore, almost all those works only consider domestic markets, sometimes even not marktets- Chinese data scientists have posted numerous online blogs for scraping data related to movie ratings and comments while data scientists in North America never care about the movie market in China. 

For example, website https://www.boxofficemojo.com/ carries rich information about movie sales worldwide. But it only provides boxoffice, nothing else. A good strategy could be combining the boxoffice here with more detailed movie information on website https://www.imdb.com. 
Examples here:
- https://www.statista.com/statistics/188658/movie-genres-in-north-america-by-box-office-revenue-since-1995/ 
- https://www.statista.com/statistics/645169/china-box-office-revenue-genre/
actually captured a summary statistics for box office by genres in China and North America, but statistics for China is out of date while the change in Chinese movie market is so rapid as introduced above. Still, those could serve as a starting point for what we could do with scraped movie sales data.

Upon data scraping, this project intends to explore first and tries to reveal more hidden information. Besides, the project will compare the boxoffice between those two big markets which might do some contribution to research in social science or business analysis. 

## Data Source & Scraping

There are many websites that provide boxoffice information as well as other movie related information. However, variety methods have been used on those websites to prevent data scraping (well, this is a common scence for data scraper), so the website chosen in this project is considered to be relative easier to scrape comparing with other pages. Please see the links provided below:
- Example page for box office: http://movie.mtime.com/boxoffice/
- Example page for detailed information for each movie: http://movie.mtime.com/229733/
- Example page for detailed information for each movie: http://movie.mtime.com/229733/fullcredits.html 

From above links, we can tell a straight forward top-down strategy to scrape the data:
- Top layer: Get the big picture which are the movies in the top 100 list of the box office for China/ North America/ Global
- Second layer: Embeded in the data from first layer, there are movie ids that serves as links to individual information pages, enter those pages
- Third layer: Navigating the movie page, you can see some additional information, such as awards and actors, which should be append to movie information

The first layer is the easiest part since the website stores the boxoffice as json data. Through REQUESTS and Beautiful soup, we can simulate the call made to this website and acquire the json data easily (see the provided Python code in the scraping part in file "Final_Project_Codes_Combined.ipynb").The second and bottom layers are a little bit trickier as the website doesn't organize them well:
- Some titles and the corresponding contents are not seperated as different tags in the HTML file. E.g., title "Actor" and its values, "Vin Disel" for example, are written in the same level
- Many missing values as movies could have no actors or awards or companies...
Thus, ultilization of regular expression (re module in Python) will help a lot to parse the data into cleaner format.

## Metadata

First of all, lets take a glance at the scraped raw data which has been stored and provided as "raw_final_data.csv".

In [1]:
### import module
import pandas as pd
import requests as rq
import numpy as np
from bs4 import BeautifulSoup as bsp
import time
import re
import ast

In [47]:
### read the data
dt=pd.read_csv("data/raw_final_data.csv")
dt=dt.drop(dt.columns[0], axis=1)
dt.head(2)

Unnamed: 0,Name,NameEng,MtimePage,ReleaseDate,BoxOffice,BoxOfficeDetail,BoxOfficeRankCN,BoxOfficeRankNA,BoxOfficeRankG,MtimeRating,NumberofReviewers,BoxOfficeGlobal,Runtime,Genre,FilmFormat,ActorAndStaff,AwardSummary,AwardDetail,ProductionCompany,DistributorCompany
0,战狼2,Wolf Warriors Ⅱ,http://movie.mtime.com/229733,2017年07月27日,56.82亿CNY,首日1.02亿|首周9.97亿|连冠4周,1.0,,61.0,7.4,17010,8.70亿USD,123分钟,动作/战争,2D/3D/IMAX/中国巨幕,"{'演员 Actor': ['吴京 Jing Wu', '弗兰克·格里罗 Frank Gri...","['本片共获香港金像奖等重要奖项1次，提名6次', ['香港电影金像奖 (2018；第37届...","{'香港电影金像奖': ['(2018；第37届) 提名：1', '提名 #最佳两岸华语电...",制作公司 1. 北京登峰国际文化传播有限公司 2. 嘲风影业（北京）有限公司 ...,发行公司 1. 北京京西文化旅游股份有限公司 2. 北京聚合影联文化传媒有限公...
1,流浪地球,The Wandering Earth,http://movie.mtime.com/218707,2019年02月05日,46.20亿CNY,首日1.88亿|首周19.98亿|连冠2周,2.0,,,7.9,12103,,125分钟,冒险/科幻/剧情,2D/3D/IMAX3D/中国巨幕,"{'演员 Actor': ['吴京 Jing Wu', '屈楚萧 Chu Xiao Qu',...",,,制作公司 1. 中国电影股份有限公司北京电影制片分公司 2. 霍尔果斯登峰国际...,发行公司 1. 中国电影股份有限公司 [中国] 2. 北京京西文化旅游股份有限公...


In [3]:
dt.shape

(300, 20)

In [48]:
dt.isna().sum()

Name                    0
NameEng                 1
MtimePage               0
ReleaseDate             0
BoxOffice               0
BoxOfficeDetail       124
BoxOfficeRankCN       160
BoxOfficeRankNA       110
BoxOfficeRankG        102
MtimeRating             0
NumberofReviewers       0
BoxOfficeGlobal       102
Runtime                 0
Genre                   0
FilmFormat             76
ActorAndStaff           0
AwardSummary           63
AwardDetail            63
ProductionCompany       0
DistributorCompany      6
dtype: int64

The raw input contains 300 rows with 20 columns. The cleaning process is as following steps (again, please check detailed Python code in the file "Final_Project_Codes_Combined.ipynb"):

- Ensure important information like movie names, release date, boxoffice, genres, and movie rating on the website have been correctly scraped, i.e., better with no missing values or mistakenly scraped information
    + One movie missed its English name which is movie "2012"
    + Movie "Meet the Focker2" has some mistakes in runtime and genre which has been manually fixed
    + Missing values in boxoffice rank have been replaced by 0s for the ease of data handling
    + Missing values in global boxoffice have been filled with corresponding domestic boxoffice
- Text cleaning and information extraction, date type conversion
    + Remove extra texts in columns like Date, Runtime, units in box office
    + Convert CNY to USD, convert 0.1 billion to 100 millison
    + Extract year and month from date
    + Get actor names, director names...
- Creating various subsets of the data for futher analysis & visulization
    + Calculation of genres' frequencies, average sales for each genres
    + Do the same for month, actors, ...
    + Actors were stored together with staff members in seperated lists in a dictionry. Apply function will be used to extract them and to create individual dataset for boxoffice of actors

## Data Summary with Research Questions & Visulizations

### Question 1 Big Hits/ Shining movie stars?
    
As the top sales could be easily identified by directly looking at the data, we could go into more details, such as movie actors that own big boxoffice. Since the website have already listed actors in a decreasing importance order, so I would just take the first 4 actors to serve as the main cast for each movie. Just from a quick glance (see the first two rows of sorted data below), we already know who is the most popular movie star in China now. 

In [4]:
### read the data
dt_cn_actor=pd.read_csv("data/china_actor.csv")
dt_cn_actor=dt_cn_actor.drop(dt_cn_actor.columns[0], axis=1)
dt_cn_actor.head(2)

Unnamed: 0,Name,NameEng,MtimePage,ReleaseDate,BoxOffice,Actor
0,战狼2,Wolf Warriors Ⅱ,http://movie.mtime.com/229733,2017-07-27,852.3,吴京 Jing Wu
1,流浪地球,The Wandering Earth,http://movie.mtime.com/218707,2019-02-05,693.0,吴京 Jing Wu


Now we can see the top 10 actors, in terms of sales/ frequency of occurence. Wow, Iron Man is really doing a great job here in China! Another apparent thing is that, Jing Wu only replies on a few products to stand on the top sales which means those few movies from him must be really record breakers in China!

In [5]:
dt_cn_actor.groupby(['Actor']).sum().sort_values(by="BoxOffice",ascending=False)[0:10]

Unnamed: 0_level_0,BoxOffice
Actor,Unnamed: 1_level_1
吴京 Jing Wu,1545.3
徐峥 Zheng Xu,1535.25
沈腾 Teng Shen,1514.1
黄渤 Bo Huang,1335.75
王宝强 Baoqiang Wang,1140.0
白百何 Fay Bai,1081.05
井柏然 Boran Jing,1056.3
小罗伯特·唐尼 Robert Downey Jr.,996.6
范·迪塞尔 Vin Diesel,937.65
杰森·斯坦森 Jason Statham,926.1


In [6]:
dt_cn_actor['Actor'].value_counts()[:10]

徐峥 Zheng Xu                  6
黄渤 Bo Huang                  6
沈腾 Teng Shen                 5
白百何 Fay Bai                  5
王宝强 Baoqiang Wang            5
小罗伯特·唐尼 Robert Downey Jr.    5
成龙 Jackie Chan               5
克里斯·埃文斯 Chris Evans          4
郭富城 Aaron Kwok               4
周润发 Yun-Fat Chow             4
Name: Actor, dtype: int64

We can do the same to North America's market. No surprise that Iron Man is so welcomed worldwide. Actually, this also indicates that Marvel Studio has been really successful for these years.

In [7]:
### read the data
dt_na_actor=pd.read_csv("data/na_actor.csv")
dt_na_actor=dt_na_actor.drop(dt_na_actor.columns[0], axis=1)
dt_na_actor.groupby(['Actor']).sum().sort_values(by="BoxOffice",ascending=False)[0:10]

Unnamed: 0_level_0,BoxOffice
Actor,Unnamed: 1_level_1
小罗伯特·唐尼 Robert Downey Jr.,3542.0
艾玛·沃森 Emma Watson,2383.0
哈里森·福特 Harrison Ford,2314.0
克里斯·埃文斯 Chris Evans,2169.0
克里斯·海姆斯沃斯 Chris Hemsworth,2076.0
斯嘉丽·约翰逊 Scarlett Johansson,2072.0
丹尼尔·雷德克里夫 Daniel Radcliffe,1879.0
鲁伯特·格林特 Rupert Grint,1879.0
克里斯·帕拉特 Chris Pratt,1793.0
奥斯卡·伊萨克 Oscar Isaac,1557.0


In [8]:
dt_na_actor['Actor'].value_counts()[:10]

小罗伯特·唐尼 Robert Downey Jr.     8
艾玛·沃森 Emma Watson             7
丹尼尔·雷德克里夫 Daniel Radcliffe    6
鲁伯特·格林特 Rupert Grint          6
哈里森·福特 Harrison Ford          5
斯嘉丽·约翰逊 Scarlett Johansson    5
克里斯·海姆斯沃斯 Chris Hemsworth     4
约翰尼·德普 Johnny Depp            4
詹妮弗·劳伦斯 Jennifer Lawrence     4
布莱德利·库珀 Bradley Cooper        4
Name: Actor, dtype: int64

From those two lists, you might already have a sense that Chinese movies haven't got any chance to open up the door to enter the North America's market while Hollywood has been famous all over the world. This is further proved when you compare the top 10 sales for China and North America.

+ Top 10 sales in China has only 1 movie ranked in top 100 in North America. Furthermore, that only one is the Fast & Furious 7 which was produced by America. 
+ In contrast, top 10 in US have 5 movies that are in the top 100 in China. BTW, they are all US domestic movies. 

In [9]:
# This time, use cleaned data
dt=pd.read_csv("data/Cleaned_data.csv")
dt=dt.drop(dt.columns[0], axis=1)
dt_cn=dt.iloc[0:100]
dt_na=dt.iloc[200:300]
print(set(dt_cn[:10].NameEng.values) & set(dt_na.NameEng.values))
print(set(dt_na[:10].NameEng.values) & set(dt_cn.NameEng.values))

{'Fast & Furious 7'}
{'Jurassic World', 'Avengers: Infinity War', 'Star Wars: The Force Awakens', 'Titanic', 'Avatar'}


## Q2 Release time matters?
- Another way to look at the difference between markets
- Visulization focusing on release date

Scatter plots on relationship between Release year & Boxoffice (Top rows China, Bottom ones NA):
<img src='img/Scatter_year.png'>
- Scatter plots might not be super informative, but we can still take a look at them
- From monthly data, it seems there are 1 or 2 quarters of the year that are in favor in terms of movie releasing, we will take a more detailed look later
- From yearly data, it's obvious that within the same growth of amount (from 300 million USD to 600 or 700 million USD), North America took from year 2000 to around 2015 while China only took 2015 to now which is much shorter. This again, proves what a monster China's movie market is
- Meanwhile, this also shows that how well developed North America's movie market has been for a really long term (it already has movie sales that exceed 300 million USD back to 1980) 

Bar plots on relationship between Release month & Boxoffice (Left columns China, right ones NA):
<img src='img/Bar_month.png'>
+ From the frequncy bar plot, it's obvious that movies released during summer or winnter breaks are getting most of their benefits, and this is the same between China and US.  
+ A cultural difference here is that movie sales (not count, but total gross) in China tend to gather around February even the number of releases is relative low compared to other months. This is because the Chinese New Year is using lunar calendar and February is usually when the Chinese New Year comes. You will have to balance the trade-off at that time, since the market is super competitive- you have the chance to make a record if you have a big confidence about the quality of your movie. 

## Q3 Popular Genres?
- Again, one more way to look at the difference between markets
- Visulization focusing different movie catergories

Let's look at the market in the North America first. Bar and box plots on sales among different movie genres (North America):
<img src='img/na_genres.png'>

- Popular ones (in decreasing order): 冒险Adventure, 动作Action, 奇幻Fantasy, 科幻Science Fiction, 喜剧Comedy
- Bar plot shows an evenly spread average sales, however, combined with box plots:
    + Although Adventure, Action, and Fantasy movies have the highest counts and total sales, it is because of those outliers. 
    + 喜剧Comedy movies have very steady perfomance- high counts, good average, but no very outstanding sales. 
  This tells us that a good quality comedy movie guarantee you a well selling. Meanwhile, an Action, or Fantasy, or Adventure movie has to be really really good to earn a lot, or your sale will instead fall into a low tier. Plus, consider the huge investment you will need to film an Action/Adventure/Fantasy movie comparing with a Comedy. 

Now let's switch to the market in China. Bar and box plots on sales among different movie genres (China):
<img src='img/cn_genres.png'>
- Popular ones are very consistent with genres in North America
- A surprise that Chinese audiences are not so in fond of 家庭Family movies 
- Instead, 爱情Romance movie seems to have a bigger market in China rather than North America
    + It might be hard to combine Romance with Comedy, but combining Romance & Comedy could be a good choice to earn the favor of audience. Actually this is a very common combination, among the top 100 list, there are 7 movies that use this combination
    + Although Adventure, Action, and Fantasy movies have the highest counts and total sales, it is because of those outliers. 

In [10]:
dt_cn[(dt_cn["爱情"]==True) & (dt_cn["喜剧"]==True)].shape[0]

7

- Another "outlier" in the boxplot is the 战争War
    + A detailed look tells us that how Jing Wu's Wolf Warriors II donimates the market. Moreover, it stands in the global top 100 list solely rely on it's domestic boxoffice!

In [11]:
dt_cn[(dt_cn["战争"]==True)].iloc[:,0:12]

Unnamed: 0,Name,NameEng,MtimePage,ReleaseDate,BoxOffice,BoxOfficeDetail,BoxOfficeRankCN,BoxOfficeRankNA,BoxOfficeRankG,MtimeRating,NumberofReviewers,BoxOfficeGlobal
0,战狼2,Wolf Warriors Ⅱ,http://movie.mtime.com/229733,2017-07-27,852.3,首日1.02亿|首周9.97亿|连冠4周,1,0,61,7.4,17010,870.0
29,芳华,Youth,http://movie.mtime.com/236404,2017-12-15,213.45,首日7610.5万|首周2.94亿|连冠2周,30,0,0,7.7,7140,213.45
85,无问西东,Forever Young,http://movie.mtime.com/154373,2018-01-12,113.1,首日3552.2万|首周1.38亿,86,0,0,7.0,4178,113.1


## Q4 Big hits = High rating?
- To be more specific, we will look at the global boxoffice and ratings on the Mtime website
- Choosing unique movie names which contains 202 movies

In [12]:
dt_rate=pd.read_csv("data/global_unique.csv")
dt_rate=dt_rate.drop([dt_rate.columns[0],"BoxOffice"], axis=1)

In [13]:
dt_rate.iloc[:10,:14]

Unnamed: 0,Name,NameEng,MtimePage,ReleaseDate,BoxOfficeDetail,BoxOfficeRankCN,BoxOfficeRankNA,BoxOfficeRankG,MtimeRating,NumberofReviewers,BoxOfficeGlobal,Runtime,Genre,FilmFormat
0,阿凡达,Avatar,http://movie.mtime.com/45997,2010-01-04,,33,2,1,8.8,68056,2788.0,162,动作/冒险/奇幻,2D/3D/IMAX/IMAX3D
1,泰坦尼克号,Titanic,http://movie.mtime.com/11925,1998-04-03,,65,5,2,8.9,61658,2188.0,194,剧情/爱情,
2,星球大战：原力觉醒,Star Wars: The Force Awakens,http://movie.mtime.com/192895,2016-01-09,首日1.96亿|首周3.32亿|连冠2周,75,1,3,7.6,9138,2068.0,135,动作/冒险/奇幻,2D/3D/IMAX3D/中国巨幕
3,复仇者联盟3：无限战争,Avengers: Infinity War,http://movie.mtime.com/217497,2018-05-11,首日3.87亿|首周12.13亿,11,4,4,8.1,8224,2047.0,150,动作/冒险/奇幻,3D/IMAX3D/中国巨幕
4,侏罗纪世界,Jurassic World,http://movie.mtime.com/191813,2015-06-10,首日1.01亿|首周5.98亿|连冠2周,31,6,5,7.6,12063,1672.0,125,动作/冒险/科幻,2D/3D/IMAX3D/中国巨幕
5,复仇者联盟,The Avengers,http://movie.mtime.com/83336,2012-05-04,,0,7,6,8.2,26125,1519.0,143,动作/冒险/科幻,2D/3D/IMAX3D/中国巨幕
6,速度与激情7,Fast & Furious 7,http://movie.mtime.com/196613,2015-04-12,首日3.46亿|首周3.46亿|连冠4周,10,46,7,8.3,21667,1516.0,137,动作/犯罪/惊悚,2D/3D/IMAX3D/中国巨幕
7,复仇者联盟2：奥创纪元,Avengers: Age of Ultron,http://movie.mtime.com/173060,2015-05-12,首日1.86亿|首周9.37亿,28,16,8,7.6,15266,1405.0,141,动作/冒险/科幻,2D/3D/IMAX3D/中国巨幕
8,黑豹,Black Panther,http://movie.mtime.com/218085,2018-02-16,,0,3,9,7.0,4540,1347.0,134,动作/冒险/科幻,3D/IMAX3D/中国巨幕
9,哈利·波特与死亡圣器(下),Harry Potter and the Deathly Hallows: Part 2,http://movie.mtime.com/78592,2011-07-15,,0,35,10,8.5,20945,1342.0,130,冒险/剧情/奇幻,3D/IMAX


Let's see the summary statistics first (together with Boxen Plot):

In [14]:
dt_rate.iloc[:,8:11].describe()

Unnamed: 0,MtimeRating,NumberofReviewers,BoxOfficeGlobal
count,202.0,202.0,202.0
mean,7.514356,13392.658416,614.350743
std,0.762127,15314.534284,461.519056
min,4.7,148.0,106.5
25%,7.2,4587.75,196.65
50%,7.6,9392.5,620.325
75%,8.0,17155.5,895.5
max,9.1,118992.0,2788.0


<img src="img/boxen_rate.png">

- Note that three variables have way different scales, to make them viewable through one boxen plot, a Min-Max scaling has been used (check the code in file for details). Use the summary statistics together with this boxen plot to get the information
- At least those movies have enough amount of reviewers (min is 148)
- Ratings are pretty gathered around 7-8
- There indeed are some movies with very low rating (min is 4.7), we will need to check those

<img src="img/scatter_rate_sale.png">
At this point, at least we know that only movies with high boxoffice will have the chance to receive high ratings which meets my expectation. How about those low rating movies? How come they make it to the top sales (either domestic or global)? By accessing the dictionary embeded in the column ActorAndStaff, we will take a peek at those low rating movies. Threshhold of rating is set to 6. 

In [46]:
dt_low_rate=dt_rate[dt_rate["MtimeRating"]<6]
ActorAndStaff=[]
for i in dt_low_rate.ActorAndStaff:
    try:
        math.isnan(i)
        ActorAndStaff.append(np.NaN)
    except:
        ActorAndStaff.append(ast.literal_eval(i))
dt_low_rate.ActorAndStaff=ActorAndStaff
actor=dt_low_rate.ActorAndStaff.apply(lambda x: pd.Series(x['演员 Actor'][:5]))
actor_low_rate=pd.merge(left=dt_low_rate[dt_low_rate.columns[0:14]],right=actor,
                     right_index=True,left_index=True)
actor_low_rate

Unnamed: 0,Name,NameEng,MtimePage,ReleaseDate,BoxOfficeDetail,BoxOfficeRankCN,BoxOfficeRankNA,BoxOfficeRankG,MtimeRating,NumberofReviewers,BoxOfficeGlobal,Runtime,Genre,FilmFormat,0,1,2,3,4
163,澳门风云3,From Vegas To Macau,http://movie.mtime.com/222326,2016-02-08,首日1.76亿|首周7.87亿,52,0,0,4.7,5455,167.7,112,动作/冒险/喜剧,2D/3D,周润发 Yun-Fat Chow,刘德华 Andy Lau,张家辉 Nick Cheung,李宇春 Yuchun Li,刘嘉玲 Carina Lau
166,西游记之大闹天宫,The Monkey King,http://movie.mtime.com/126817,2014-01-31,首日1.29亿,56,0,0,5.2,22779,157.5,120,奇幻/动作/冒险,2D/3D/IMAX3D/中国巨幕,甄子丹 Donnie Yen,郭富城 Aaron Kwok,周润发 Yun-Fat Chow,何润东 Peter Ho,海一天 Yitian Hai
168,盗墓笔记,Time Raiders,http://movie.mtime.com/203656,2016-08-05,首日1.61亿|首周4.40亿,59,0,0,5.8,14601,150.75,123,奇幻/动作/冒险,3D/IMAX3D/中国巨幕,井柏然 Boran Jing,鹿晗 Luhan,马思纯 Sandra Ma,王景春 Jingchun Wang,张博宇 Boyu Zhang
171,澳门风云2,The Man From Macao II,http://movie.mtime.com/212471,2015-02-19,首日6462.9万|首周2.73亿,63,0,0,5.6,6786,146.25,110,动作/喜剧,2D/3D,周润发 Yun-Fat Chow,张家辉 Nick Cheung,刘嘉玲 Carina Lau,余文乐 Shawn Yue,王诗龄 Angela
183,从你的全世界路过,Belonged To You,http://movie.mtime.com/219178,2016-09-29,首日7409.1万|首周3.10亿,77,0,0,5.5,4213,122.1,113,爱情/喜剧,2D,邓超 Chao Deng,白百何 Fay Bai,杨洋 Yang Yang,张天爱 Crystal Zhang,岳云鹏 Yunpeng Yue
188,大闹天竺,Buddies In India,http://movie.mtime.com/225095,2017-01-28,首日1.89亿|首周3.07亿,85,0,0,5.9,2290,113.7,100,喜剧/爱情,2D,王宝强 Baoqiang Wang,柳岩 Ada,岳云鹏 Yunpeng Yue,白客 Baike,林永健 Yongjian Lin
195,西游记女儿国,The Monkey King 3,http://movie.mtime.com/209205,2018-02-16,首日1.64亿|首周3.25亿,93,0,0,5.4,2160,109.05,116,喜剧/爱情/动作,3D/中国巨幕,郭富城 Aaron Kwok,冯绍峰 Shaofeng Feng,赵丽颖 Liying Zhao,小沈阳 Xiao Shenyang,罗仲谦 Chung Him Law


All Chinese domestic movies with those big names in Actor list. This shows the celebrity effect of China's movie market. People sometimes go to theaters only for those movie stars, it's after the show that they realize that the movie was so terrible. This is a very common marketing strategy. 

## Technical Support
- Displaying Chinese characters in matplotlib or seaborn: 
    + Download required font and intall it into the computer, for this project, it's "SimHei"
    + Put the font into the directory of matplotlib font, can be shown by matplotlib.matplotlib_fname()
    + figure the setting in Python when loading the module (details in the code file)