This is where I will be uploading my Kaggle notebooks, I have abandoned the MySQL part of the project in favor of using pandas (or polaris in future) for working with the IMDB Datasets. Pandas does have a to_sql() function for loading a DataFrame into a RDBMS like MySQL, PostgreSQL, Oracle later on.
Subsets of IMDB data are available for access to customers for personal and non-commercial use. You can hold local copies of this data, and it is subject to our terms and conditions. Please refer to the Non-Commercial Licensing and copyright/license and verify compliance.
The dataset files can be accessed and downloaded from https://datasets.imdbws.com/. The data is refreshed daily.
Each dataset is contained in a gzipped, tab-separated-values (TSV) formatted file in the UTF-8 character set. The first line in each file contains headers that describe what is in each column. A ‘\N’ is used to denote that a particular field is missing or null for that title/name.
- 1.4G title.akas.tsv
- 676M title.basics.tsv
- 256M title.crew.tsv
- 150M title.episode.tsv
- 2.0G title.principals.tsv
- 20M title.ratings.tsv
- 640M name.basics.tsv
This will contain my Python code for extracting and ploting data in the IMDB datasets. This will be used in both Kaggle and JetBrains DataSpell.