I will store EU industry production data that I downloaded from the EU Open Data Portal in a PostgreSQL database. The data is the same as in the previous project. Let me perform the necessary steps to import the data to a pandas dataframe:

In [3]:
import pandas as pd  # Package for organizing datasets in dataframes

# URL of dataset (EU industry production data)
dataset_tsv_url = 'http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/sts_inpr_m.tsv.gz'

# Read in dataset: From compressed TSV file directly to pandas dataframe
df = pd.read_csv(dataset_tsv_url, compression='gzip', sep='\t|,', index_col=[0,1,2,3,4], engine='python')

print(df.info())

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 19197 entries, (PROD, B, CA, I10, AT) to (PROD, MIG_NRG_X_E, SCA, PCH_PRE, UK)
Columns: 776 entries, 2017M08  to 1953M01
dtypes: object(776)
memory usage: 113.8+ MB
None


The next step is to connect to the PostgreSQL server with the (empty) "production" database that I previously set up with the tool pgadmin 4:

In [1]:
import sqlalchemy  # Package for accessing SQL databases via Python

# Connect to database (Note: The package psychopg2 is required for Postgres to work with SQLAlchemy)
engine = sqlalchemy.create_engine("postgresql://postgres:xfkLVeMj@localhost/production")
con = engine.connect()

# Verify that there are no existing tables
print(engine.table_names())

[]


Now it's time to ingest the dataset into the PostgreSQL database. Using pandas, this can be conveniently done with the "to_sql()" method. I only need to specify a name for the new table that will represent the dataframe and pass the SQLAlchemy connection object:

In [4]:
table_name = 'industry_production'
df.to_sql(table_name, con)

That was easy, eh? Let's see if the table creation was successful:

In [5]:
print(engine.table_names())

['industry_production']


At the end, I should not forget to close the connection to the database:

In [6]:
con.close()