In [22]:
import credentials,psycopg2,time,tqdm
from faker import Faker
fake = Faker()
__username = credentials.user
__password = credentials.passw

The function below is forked from [KhanShaheb34/Explore-PostgreSQL](https://nbviewer.jupyter.org/github/KhanShaheb34/Explore-PostgreSQL/blob/master/Stress-test/StressTest.ipynb)

In [15]:
def time_of(fn, args=None, verbose=True):
    start_time = time.time()
    if(args):
        out = fn(args)
    else:
        out = fn()
    run_time = time.time() - start_time
    if verbose:
        print("Time: %ss" % run_time)
    return run_time, out

In [6]:
test1 = psycopg2.connect(database="testinsert1", user=__username)
cur = test1.cursor()

Let's create a table where I can insert data. This query is tested in the *insert testing* notebook in the same directory.

In [19]:
sql_table_schema = '''
CREATE TABLE IF NOT EXISTS funtable_onerow( 
id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
password VARCHAR(80) NOT NULL,
email VARCHAR(200),
phone BIGINT
);
'''
try:
    cur.execute(sql_table_schema)
except InFailedSqlTransaction:
    print('InFailedSqlTransaction error occured! Did any of your queries fail recently and you haven\'t rolled back?')
    print("Rolling back because of", "InFailedSqlTransaction")
    cur.execute("ROLLBACK")

When we insert a row using faker library or from a CSV file, that data retrieval from source takes a little amount of time to load into memory. So, for the *first step* we'll insert same data for a number of time.

In [4]:
onerow_data = f"('{fake.simple_profile()['username']}', '{fake.password()}', '{fake.email()}', '{fake.msisdn()}')"
onerow_data

"('dean51', '@Ry9DzF)Zl', 'ronaldhorton@hotmail.com', '7398670308885')"

In [10]:
tablename = "funtable_onerow"
col_header = "username, password, email, phone"

Let's start by making a function so that I can time it.

In [23]:
def one_row_insert():
    p =  cur.execute(
                f"INSERT INTO {tablename} "
                f"({col_header}) VALUES " + onerow_data)
def insert_5k():
    for _ in tqdm.tqdm([0]*5000):
        one_row_insert()

Turns out I can time the functions with `tqdm` library!

In [24]:
time_of(insert_5k)

100%|██████████| 5000/5000 [01:35<00:00, 52.54it/s]

Time: 95.2841067314148s





(95.2841067314148, None)

I think I can start pushing more data now. **I'll in crease rows by 10 fold!**

In [25]:
def insert_50k():
    for _ in tqdm.tqdm([0]*50000):
        one_row_insert()

In [26]:
insert_50k()

100%|██████████| 50000/50000 [17:03<00:00, 48.86it/s]  


All right! Let's keep it going! Let's see what happens if I start putting in unique data using `faker` library.

In [27]:
def insert_unique_5k():
    for _ in tqdm.tqdm([0]*5000):
        cur.execute(
            f"INSERT INTO {tablename} "
            f"({col_header}) VALUES "
            f"('{fake.simple_profile()['username']}', '{fake.password()}', '{fake.email()}', '{fake.msisdn()}')")

In [28]:
insert_unique_5k()

100%|██████████| 5000/5000 [01:31<00:00, 54.85it/s]


#### What just happened?!
Did faker take *less* time?!

In [29]:
insert_unique_5k()

100%|██████████| 5000/5000 [01:38<00:00, 50.97it/s]


Hmm looks llike it varies! Alright then, moving on.

In [30]:
def insert_unique_50k():
    for _ in tqdm.tqdm([0]*50000):
        cur.execute(
            f"INSERT INTO {tablename} "
            f"({col_header}) VALUES "
            f"('{fake.simple_profile()['username']}', '{fake.password()}', '{fake.email()}', '{fake.msisdn()}')")

50k inserts took one minute more - that was expected. But other timing was close.

In [31]:
insert_unique_50k()

100%|██████████| 50000/50000 [18:02<00:00, 46.20it/s]  


Up next we'll need to insert from CSV file.

In [32]:
cur.close()
test1.close()

In [8]:
# cur.execute("ROLLBACK")

Found this article in favor of `\copy` instead of `INSERT` [bulk insert - copy vs insert](https://www.citusdata.com/blog/2017/11/08/faster-bulk-loading-in-postgresql-with-copy/)

will follow up on those later.