Skip to content

jaketaylorpro/imdb-pg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

imdb-pg

set to scripts to load the IMDB exported files into a postgres database. This is mostly an excersize to learn postgres and remove rust from my sql skills.

to reproduce

  1. download source files from imdb
  1. run the create-database.sql script and the two create-schema.sql scripts
  2. each tsv file has a corresponding sql file /table>
  • if there is a scratch table (ie ends with _s):
    • run that both create table scripts to create the real table and the scratch table
    • use a tool1 to import the data from the tsv into the scratch table
    • run the insert script and drop table script to remove the scratch table, after the data is transformed into the real table
  • otherwise:
    • run the create table script for the main table
    • use a tool1 to import the data from the tsv to the main table directly (no transformation will be needed)
  1. run the keys.sql script to add a number of constraints
  • more info here about the constraints

1 i used datagraip, and it worked pretty well

todo

  • need to find a tool that handles escaped double quotes in the TSV files properly.
    • currently we just ignore a few thousand records per table, and it isn't soo bad

references

https://www.postgresql.org/docs/10/index.html https://www.imdb.com/interfaces/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published