imdb_top250

In the 'imdb_top_chart.py' script, BeautifulSoup is used to scrape IMDB's website for their 250 highest rated movies. Then, these titles are sent to themoviedb.org's API to return more information about the movies and their directors and lead actors. This data is then cleaned and compiled into 3 seperate csv files for SQL databse construction.

In the 'database_construction.py' script, Pandas is used to convert the three csv files into dataframes. 'mysql.connector' is used to create three connected tables outs of these dataframes and input the data.

Lastly, the 'imdb_clone_views.sql' script creates and presents four different views from our joined tables.

To run this program, create a '.env' text file in the provided repository. In this '.env' file, add your themoviedb.org API key (free to get), your mysql server root, your mysql username, and your mysql password as follows:

API_KEY = "apikey123"

HOST = "localhost"

USER = "root"

PASSWD = "yourpassword"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
.read		.read
README.md		README.md
artists_data.csv		artists_data.csv
cast_data.csv		cast_data.csv
database_construction.py		database_construction.py
imdb_clone_views.sql		imdb_clone_views.sql
imdb_top_chart.py		imdb_top_chart.py
movie_data.csv		movie_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

imdb_top250

About

Releases

Packages

Languages

lucasps100/imdb_top_250_scrape_etl

Folders and files

Latest commit

History

Repository files navigation

imdb_top250

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages