# Fastest Way to Load Data Into PostgreSQL Using Python
## From two minutes to less than half a second!

https://hakibenita.com/fast-load-data-python-postgresql#setup-a-beer-brewery

As glorified data plumbers, we are often tasked with loading data fetched from a remote source into our systems. If we are lucky, the data is serialized as JSON or YAML. When we are less fortunate, we get an Excel spreadsheet or a CSV file which is always broken in some way, can't explain it.

Data from large companies or old systems is somehow always encoded in a weird way, and the Sysadmins always think they do us a favour by zipping the files (please gzip) or break them into smaller files with random names.

Modern services might provide a decent API, but more often that not we need to fetch a file from an FTP, SFTP, S3 or some proprietary vault that works only on Windows.

In this article we explore the best way to import messy data from remote source into PostgreSQL.

To provide a real life, workable solution, we set the following ground roles:

* The Data is fetched from a remote source.
* The Data is dirty and needs to be transformed.
* Data is big.

please use sample data of 'dataframe.pkl'

In [97]:
import pandas as pd
df_source = pd.read_pickle('source_ds3.pkl')

In [98]:
# # delete unused columns
df_source = df_source.drop(columns=['description','name'])

# and, convert to dict, with record orientation
df_list = df_source.values.tolist()

df_source.head()

Unnamed: 0,id,sentiment_category,score
0,3739468523042327,1.0,0.996787
1,3715862161980765,0.0,0.99751
2,3696078640670790,0.0,0.974375
3,3650999871822519,0.0,0.992722
4,3617898908525994,0.0,0.999711


In [99]:
# create database connection
import psycopg2
import psycopg2.extras
from typing import Iterator, Dict, Any

connection = psycopg2.connect(
    host="98.98.117.105",
    port='5432',
    database="medols",
    user="postgres",
    password='FEWcTB3JIX5gK4T06c1MdkM9N2S8w9pb',
)
connection.autocommit = True

def insert_execute_batch(connection, ssql, df_dict) -> None:
    
    with connection.cursor() as cursor:
        psycopg2.extras.execute_batch(cursor, ssql, df_dict)

In [112]:
# print(df_dict)
vals = [(3739468523042327.0, 1.0, 0.996787428855896)]
ssql = """
INSERT INTO temp_sentiment_test (id, facebook_id, sentiment_category, score) VALUES (nextval('temp_sentiment_test_id_seq'::regclass), %s, %s, %s)
"""
# print(df_list)
insert_execute_batch(connection, ssql, vals)

In [111]:
print(df_list)

[[3739468523042327.0, 1.0, 0.996787428855896], [3715862161980765.0, 0.0, 0.9975104331970215], [3696078640670790.0, 0.0, 0.9743753671646118], [3650999871822519.0, 0.0, 0.9927219748497009], [3617898908525994.0, 0.0, 0.9997105002403259], [3582084628669923.0, nan, nan], [3526092584367824.0, 1.0, 0.8426055908203125], [3213216538985635.0, 0.0, 0.9997408986091614], [3046283552175114.0, 0.0, 0.9996476173400879], [2788956051276293.0, 1.0, 0.9995957016944885], [2686506118175609.0, 1.0, 0.9988743662834167], [2607335886106982.0, 1.0, 0.9995957016944885], [2523321651205055.0, -1.0, 0.601773738861084], [2408965882621654.0, 1.0, 0.9993153810501099], [2332905860250023.0, 1.0, 0.990151584148407], [2206845656152743.0, -1.0, 0.7203463315963745], [2162989240703598.0, nan, nan], [2124773084524561.0, 0.0, 0.9996918439865112], [2124345601279347.0, 1.0, 0.9996246099472046], [2093046471074395.0, 0.0, 0.9993404746055603], [2086079248414984.0, -1.0, 0.9998624324798584], [2083521412035153.0, -1.0, 0.9050205349922