# BoxOfficeMojo Pipeline

In [1]:
import pandas as pd
import boxOfficeMojoMethods as mojoMethods 
import datetime
import os

In [2]:
help(mojoMethods)

Help on module boxOfficeMojoMethods:

NAME
    boxOfficeMojoMethods - Jonathan L Chu, for Metis Data Science, 7 April 2020

DESCRIPTION
    Methods for scraping and downloading movie data from
    www.boxofficemojo.com. Actual pipline can be found in the
    accompanying boxOfficeMojoPipeline.py file.

FUNCTIONS
    get_dataframe_from_year(year, num_releases=-1)
        Method to retrieve movie data from one year of
        www.boxofficemojo.com/year/
        
        Number of releases defaults to all [0:-1]
        
        Note that the execution time is equal to 
        2 * len(releases in the year) + 1 seconds
        
        Returns pandas dataframe
    
    get_movie_info_from_title(url)
        Parse the following data from a boxofficemojo.com Title url: 
        ['Movie_Title','Domestic_Distributor','Domestic_Total_Gross',
        'Runtime','Rating','Release_Date','Budget', 'Cast1','Cast2','Cast3','Cast4']
        
        Input: boxofficemojo.com url like:
        'https://

## Define range of years

In [3]:
info = []
years = range(2013,1990,-1)

## Run the pipeline using a for loop

In [None]:
for year in years:
    df = pd.DataFrame()
    
    start = datetime.datetime.now()
    df = mojoMethods.get_dataframe_from_year(year) #,num_releases=5)

    df.to_pickle(path=('./data/mojo_'+str(year)+'_movies.pkl'))
    
    info.append([year, td, df.shape])
    
    end = datetime.datetime.now()
    td = end - start
    print('Elapsed time: ',td)

200   https://www.boxofficemojo.com/year/2013
200   https://www.boxofficemojo.com/release/rl1532659201/?ref_=bo_yld_table_1
200   https://www.boxofficemojo.com/title/tt1300854/credits/?ref=bo_tt_tab
dataframe shape:  (1, 15)
200   https://www.boxofficemojo.com/release/rl2638775809/?ref_=bo_yld_table_2
200   https://www.boxofficemojo.com/title/tt1951264/credits/?ref=bo_tt_tab
dataframe shape:  (2, 15)
200   https://www.boxofficemojo.com/release/rl105874945/?ref_=bo_yld_table_3
200   https://www.boxofficemojo.com/title/tt1690953/credits/?ref=bo_tt_tab
dataframe shape:  (3, 15)
200   https://www.boxofficemojo.com/release/rl4034037249/?ref_=bo_yld_table_4
200   https://www.boxofficemojo.com/title/tt0770828/credits/?ref=bo_tt_tab
dataframe shape:  (4, 15)
200   https://www.boxofficemojo.com/release/rl1919256065/?ref_=bo_yld_table_5
200   https://www.boxofficemojo.com/title/tt1453405/credits/?ref=bo_tt_tab
dataframe shape:  (5, 15)
200   https://www.boxofficemojo.com/release/rl357926401/?ref

## Inform the user that the program is finished

In [None]:
os.system('say "your program has finished"')


In [None]:
print(info)