### Writing data to SQL database

Initially, it was considered to use SQLite in order to store data within a local file, however, to achieve up-to-date state of the database for both team members, it was decided to move the database to MariaDB on Amazon AWS. 

In [1]:
# Using SQLite within Jupyter notebooks - option 1

# %%capture
# %load_ext sql
# %sql sqlite:///vivino.db
# %sql SELECT * FROM wine

In [2]:
# Using SQLite within Jupyter notebooks - option 2

import sqlite3
conn = sqlite3.connect('vivino.db')
test_query = 'SELECT * FROM wine LIMIT 5'
print(conn.execute(test_query).fetchall())
conn.close()

[(1105374, 'Alandra Tinto', 'alandra-tinto', 1, 0, 0, 1395, 3155, 3.1399224, None, 3.7395344, 1.8591489, 2.9908836, 540, 184, 2, 'Normal', 18112, 3.2, 128391, 51, 1), (1706071, 'Alentejano Monte das Ânforas Tinto', 'alentejano-monte-das-anforas-tinto', 1, 0, 0, 1394, 4107, 3.0573564, None, 3.7483184, 1.799455, 2.8970287, 171, 38, 2, 'Normal', 5176, 3.4, 30372, 38, 1), (4269600, 'Vinea Tinto', 'vinea-tinto', 1, 0, 0, 1394, 10992, 3.0659487, None, 3.2864575, 2.008701, 2.8928797, 435, 31, 2, 'Normal', 5827, 3.3, 37482, 39, 1), (1200770, 'Lisboa Tinto', 'lisboa-tinto', 1, 0, 0, 834, 26708, 3.042016, None, 3.4385235, 1.8823165, 2.7859948, 69, 59, 210, 'Normal', 2153, 3.4, 11457, 34, 1), (4269602, 'Vinea Branco', 'vinea-branco', 2, 0, 0, 1394, 10992, 2.8494768, None, 2.7141547, 1.6764449, None, 68, 9, 211, 'Normal', 1186, 3.3, 6375, 25, 1)]


In [47]:
import mariadb

In [48]:
import sys
sys.path.append('..')
import settings

In [49]:
def connect_to_vivino_db():
    """
    connect to vivino db and return a connection instance
    """
    try:
        conn =  mariadb.connect(
                user="admin",
                password=settings.db_pass,
                host=settings.db_url,
                port=3306,
                database="vivino")
    except mariadb.Error as e:
        print(f"Error connecting to MariaDB Platform: {e}")
        sys.exit(1)
    return conn

In [50]:
conn = connect_to_vivino_db()

Database schema was created separately based on the data structure, and looks as follows: 

![](vivino-schema.png)

Insertion to the database should be done in the following order (in order to avoid problems with foreign keys):
* wine type
* winery
* country
* region
* style
* food
* facts
* style_food
* grape
* style_grape
* grape_country
* wine
* price
* vintage
* toplist
* vintage_toplist
* keyword - **not loaded yet**
* wine_keyword - **not loaded yet**
* wine_flavor_group - **not loaded yet**

Once the database schema and the name of the columns is fixed, we can run functions that insert data to each table by extracting information from JSON file. 

Since we need only specific data from JSON, we need to specify the path to such data for each column in each table. For convenience, such path will be formed using a forward slash (for example, `vintage/wine/region/name`). 

The function extracts data found at a given path inside a given record (making one step of the path at a time), and returns the resulting value. If any of the path steps is missing for a given record, it returns the value at the latest step available. 

In order to generate the path, here is the full list of columns in the dataframe which mimics the structure of JSON:

['vintage_grapes', 'vintage_has_valid_ratings', 'vintage_id',
       'vintage_image', 'vintage_name', 'vintage_seo_name',
       'vintage_statistics', 'vintage_wine', 'vintage_year',
       'vintage_top_list_rankings', 'status', 'ratings_count',
       'ratings_average', 'labels_count', 'id', 'name', 'seo_name', 'type_id',
       'vintage_type', 'is_natural', 'has_valid_ratings', 'region.id',
       'region.name', 'region.name_en', 'region.seo_name',
       'region.country.code', 'region.country.name',
       'region.country.native_name', 'region.country.seo_name',
       'region.country.currency.code', 'region.country.currency.name',
       'region.country.currency.prefix', 'region.country.currency.suffix',
       'region.country.regions_count', 'region.country.users_count',
       'region.country.wines_count', 'region.country.wineries_count',
       'region.country.most_used_grapes', 'region.background_image.location',
       'region.background_image.variations.large',
       'region.background_image.variations.medium', 'winery.id', 'winery.name',
       'winery.seo_name', 'winery.status', 'taste.structure.acidity',
       'taste.structure.fizziness', 'taste.structure.intensity',
       'taste.structure.sweetness', 'taste.structure.tannin', 
       'taste.structure.user_structure_count',
       'taste.structure.calculated_structure_count', 'taste.flavor',
       'statistics.status', 'statistics.ratings_count',
       'statistics.ratings_average', 'statistics.labels_count',
       'statistics.vintages_count', 'style.id', 'style.seo_name',
       'style.regional_name', 'style.varietal_name', 'style.name',
       'style.image', 'style.background_image.location',
       'style.background_image.variations.small', 'style.description',
       'style.blurb', 'style.interesting_facts', 'style.body',
       'style.body_description', 'style.acidity', 'style.acidity_description',
       'style.country.code', 'style.country.name', 'style.country.native_name',
       'style.country.seo_name', 'style.country.currency.code',
       'style.country.currency.name', 'style.country.currency.prefix',
       'style.country.currency.suffix', 'style.country.regions_count',
       'style.country.users_count', 'style.country.wines_count',
       'style.country.wineries_count', 'style.country.most_used_grapes',
       'style.wine_type_id', 'style.food', 'style.grapes', 'style.region',
       'style.region.id', 'style.region.name', 'style.region.name_en',
       'style.region.seo_name', 'style.region.country.code',
       'style.region.country.name', 'style.region.country.native_name',
       'style.region.country.seo_name', 'style.region.country.currency.code',
       'style.region.country.currency.name',
       'style.region.country.currency.prefix',
       'style.region.country.currency.suffix',
       'style.region.country.regions_count',
       'style.region.country.users_count', 'style.region.country.wines_count',
       'style.region.country.wineries_count',
       'style.region.country.most_used_grapes',
       'style.region.background_image.location',
       'style.region.background_image.variations.large',
       'style.region.background_image.variations.medium', 'region', 'winery',
       'style', 'taste.structure', 'region.background_image',
       'style.background_image', 'style.region.background_image']


In [236]:
def get_value(match_entry, path0):
    """
    Function that returns a value found at a given path inside a given JSON record 
    """
    if path0 is None:
        current_el = match_entry
    else:
        path = path0.split('/')
        current_el = match_entry
        for p in path:
            if current_el is None:
                break
            current_el = current_el.get(p)
    return current_el

In [237]:
# load data from a backup file

import pickle

with open(f"backup_data/full_match_list", 'rb') as f:
    recovered_data = pickle.load(f)
    
def remove_wine_duplicates(json_data):
    distinct_dict = {entry['vintage']['id']: entry for entry in json_data}
    recovered_data_distinct = distinct_dict.values()
    return list(recovered_data_distinct)

recovered_data_distinct = remove_wine_duplicates(recovered_data)
len(recovered_data_distinct)

55819

In [238]:
# entry = (5, 7)
# print(len(entry))
# [entry[0]] + [entry[1] for i in range(3)]

In [239]:
# insert data to sql
import time

def extract_json_to_sql(conn, matches_list, table_name, paths, pk_sql=[], first_entry=True, from_json_list_with_id = False):
    """
    Function that accepts the folowing arguments:
    * conn: active connection to a database
    * matches list: JSON list with data, 
    * table_name: name of SQL table to include data,  
    * paths: the paths leading to data in JSON corresponding to each column in SQL table,
    * pk_sql: the names of primary key columns in SQL (to check for uniqueness condition),
    * first_entry: boolean indicating whether it's the first time data is written to the table
    
    Function inserts data and (for the first entry) checks whether the resulting number of unique records in SQL
    matches the number of unique records in JSON.
    
    """
    cur = conn.cursor()

    timepoint_1 = time.time()

    all_args = {}

    for entry in matches_list:
        if from_json_list_with_id:
            # if data is passed from json list, entry contains a tuple (id, relevant_data), which need to be flattened in a single arg list
            values_entry = [entry[0]] + [int_to_float(get_value(entry[1], path)) for path in paths]
            if paths == []:
                for record in entry[1:]: 
                    all_args[(entry[0], record)] = (entry[0], record)
        else:
            values_entry = [int_to_float(get_value(entry, path)) for path in paths]
        pk_values = values_entry[:len(pk_sql)]
#         if entry == matches_list[0]:
#             print(entry)
#             print(values_entry)
#             print(pk_values)
            
        if all(pk_value is not None for pk_value in pk_values) and paths != []:
            all_args[tuple(pk_values)] = tuple(values_entry)

#     print(all_args)
            
    if len(pk_sql) == 0:
        if_duplicates_do_nothing = ""
    if len(pk_sql) == 1:
        if_duplicates_do_nothing = f" ON DUPLICATE KEY UPDATE {pk_sql[0]} = {pk_sql[0]}"
    if len(pk_sql) == 2:
        if_duplicates_do_nothing = f" ON DUPLICATE KEY UPDATE {pk_sql[0]} = {pk_sql[0]}, {pk_sql[1]} = {pk_sql[1]}"

    fields_num = len(paths)
    if from_json_list_with_id and paths != []:
        fields_num += 1
    elif paths == []:
        fields_num = 2
    
    query = f"""
        INSERT INTO {table_name} VALUES ({', '.join('?' * fields_num)})
        {if_duplicates_do_nothing};
    """.strip()
    cur.executemany(query, list(all_args.values()))
    conn.commit()

    timepoint_2 = time.time()
    print('Insertion complete and took {} s.'.format(timepoint_2 - timepoint_1))

    if first_entry:
        cur = conn.cursor()
        cur.execute('SELECT COUNT(*) FROM {};'.format(table_name))
        unique_sql = cur.fetchall()[0][0]
        unique_python = len(list(all_args.values()))
        if unique_python == unique_sql:
            print('Number of unique records is accurate')
        else:
            print('Something went wrong')

In [240]:
def extract_json_list_to_sql(conn, matches_list, table_name, path_to_list, paths_from_list, path_to_id_outside_list="", \
                             pk_sql=[], first_entry=True):
    """
    converts list stored inside json to a normal list
    """
    results = [] 
    for entry in matches_list:
        if get_value(entry, path_to_list) is not None:
            for element in get_value(entry, path_to_list):
                if path_to_id_outside_list == "":
                    results.append(element)
                else:
                    results.append((get_value(entry, path_to_id_outside_list), element))
    if path_to_id_outside_list == "":
        from_json_list_with_id = False
    else: 
        from_json_list_with_id = True
    extract_json_to_sql(conn, results, table_name, paths_from_list, pk_sql, first_entry, from_json_list_with_id)

In [241]:
def clean_sql_table(conn, table_name):
    """
    delete all records from a given table in a given database
    """
    cur = conn.cursor()
    cur.execute(f'DELETE FROM {table_name}')

In [242]:
def count_unique_records_sql(conn, table_name):
    """
    checks the number of unique records in a given table
    """
    cur = conn.cursor()
    cur.execute(f"SELECT COUNT(*) FROM {table_name};")
    print(cur.fetchall()[0][0])

In [243]:
def int_to_float(smth):
    """
    converts integers to floats
    """
    if isinstance(smth, int):
        return float(smth)
    elif smth == 'N.V.':
        return 0.0    # meaning, wine is of type 'non-vintage' and is made of grapes from more than one harvest
    else:
        return smth

Before writing into MariaDB we need to make sure that the data is cleaned.

In [244]:
# tables = ['wine_flavor_group', 'wine_keyword', 'keyword', 'vintage_toplist', 'toplist', 'vintage', 'wine', 'grape_country', 'style_grape', \
#           'style_food', 'grape', 'food', 'facts', 'style', 'region', 'country', 'price', 'winery', 'type']

# conn = connect_to_vivino_db()
# for table in tables: 
#     clean_sql_table(conn, table)
#     print(f"{table} cleaned")

Make sure correct encoding (since MariaDB may reject certain text values which take 4 bytes)

In [246]:
conn = connect_to_vivino_db()
try:
    conn.cursor().execute("ALTER DATABASE vivino CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;")
finally:
    conn.close()

In [248]:
tables = ['wine_flavor_group', 'wine_keyword', 'keyword', 'vintage_toplist', 'toplist', 'vintage', 'wine', 'style_grape', \
          'style_food', 'grape', 'food', 'facts', 'price', 'winery', 'type', 'country', 'style', 'region', 'country_grape']

conn = connect_to_vivino_db()
cur = conn.cursor()

try:
    for table in tables:
        cur.execute(f"ALTER TABLE {table} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;")
        print(f"enconding of table {table} converted")
finally: 
    conn.close()

enconding of table wine_flavor_group converted
enconding of table wine_keyword converted
enconding of table keyword converted
enconding of table vintage_toplist converted
enconding of table toplist converted
enconding of table vintage converted
enconding of table wine converted
enconding of table style_grape converted
enconding of table style_food converted
enconding of table grape converted
enconding of table food converted
enconding of table facts converted
enconding of table price converted
enconding of table winery converted
enconding of table type converted
enconding of table country converted
enconding of table style converted
enconding of table region converted
enconding of table country_grape converted


#### Insert wine types

In [249]:
def insert_to_wine_types(conn, table_name='type'):
    """
    manually inserts all wine types to SQL table
    """
    cur = conn.cursor()
    cur.execute(f"INSERT INTO {table_name} VALUES (1, 'Red'), (2, 'White'), (3, 'Sparkling'), (4, 'Rose'), (7, 'Dessert'), \
    (24, 'Fortified'), (25, 'Other');")

In [253]:
# conn = connect_to_vivino_db()
# try:
#     clean_sql_table(conn, 'type')
#     insert_to_wine_types(conn)
# finally:
#     conn.close()

IntegrityError: Duplicate entry '1' for key 'PRIMARY'

In [254]:
conn = connect_to_vivino_db()
cur = conn.cursor()
cur.execute("SELECT * FROM type")
print(cur.fetchall())

[(1, 'Red'), (2, 'White'), (3, 'Sparkling'), (4, 'Rose'), (7, 'Dessert'), (24, 'Fortified'), (25, 'Other')]


#### Insert wineries

In [255]:
#insert wineries

def insert_to_wineries(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the wine table
    """
    table = 'winery'
    main = 'vintage/wine/winery/'
    paths = [main + 'id', main + 'name', main + 'seo_name', main + 'status']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry)

In [257]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'winery')
    insert_to_wineries(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 1.2071480751037598 s.
Number of unique records is accurate


#### Insert countries

In [258]:
def insert_to_countries(db, matches, first_entry=True):
    """
    inserts data to correct fields in the country table
    """
    table = 'country'
    main = 'vintage/wine/region/country/'
    paths = [main + 'code', main + 'name', main + 'native_name', main + 'seo_name', main + 'currency/code', main + 'regions_count',\
            main + 'users_count', main + 'wines_count', main + 'wineries_count']
    pk_sql = ['code']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry)

In [259]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'country')
    insert_to_countries(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.8709571361541748 s.
Number of unique records is accurate


#### Insert regions

In [260]:
def insert_to_regions(db, matches, first_entry=True):
    """
    inserts data to correct fields in the region table
    """
    table = 'region'
    main = 'vintage/wine/region/'
    paths = [main + 'id', main + 'name', main + 'name_en', main + 'seo_name', main + 'country/code']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry)

In [261]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'region')
    insert_to_regions(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.9717741012573242 s.
Number of unique records is accurate


#### Insert style

In [262]:
def insert_to_style(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the style table
    """
    table = 'style'
    main = 'vintage/wine/style/'
    paths = [main + 'id', main + 'seo_name', main + 'regional_name', main + 'varietal_name', main + 'name',\
            main + 'description', main + 'blurb', main + 'body', main + 'body_description', main + 'acidity',\
            main + 'acidity_description', main + 'country/code', main + 'wine_type_id']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry)

In [263]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'style')
    insert_to_style(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 1.4676568508148193 s.
Number of unique records is accurate


#### Insert food

In [264]:
def insert_to_food(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the food table
    """
    table = 'food'
    path_to_list = 'vintage/wine/style/food'
    paths_from_list = ['id', 'name', 'seo_name']
    pk_sql = ['id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, pk_sql=pk_sql, first_entry=first_entry)

In [265]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'food')
    insert_to_food(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.7559759616851807 s.
Number of unique records is accurate


In [266]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'food')
finally:
    conn.close()

22


#### Insert facts

In [267]:
def insert_to_facts(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the facts table
    """
    table = 'facts'
    path_to_list = 'vintage/wine/style/interesting_facts'
    paths_from_list = []
    path_to_id_outside_list = 'vintage/wine/style/id'
    pk_sql = []
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [270]:
conn = connect_to_vivino_db()
try:
    clean_sql_table(conn, 'facts')
    insert_to_facts(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.627943754196167 s.
Number of unique records is accurate


In [271]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'facts')
finally:
    conn.close()

502


#### Insert style_food

In [272]:
def insert_to_style_food(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the style-food table
    """
    table = 'style_food'
    path_to_list = 'vintage/wine/style/food'
    paths_from_list = ['id']
    path_to_id_outside_list = 'vintage/wine/style/id'
    pk_sql = ['style_id', 'food_id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [273]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'style_food')
    insert_to_style_food(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.6490700244903564 s.
Number of unique records is accurate


In [274]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'style_food')
finally:
    conn.close()

1063


#### Insert grapes

In [275]:
def insert_to_grape(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the grape table
    """
    table = 'grape'
    path_to_list = 'vintage/wine/style/grapes'
    paths_from_list = ['id', 'name', 'seo_name', 'has_detailed_info', 'wines_count']
    path_to_id_outside_list = ''
    pk_sql = ['id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [276]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'grape')
    insert_to_grape(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.8053209781646729 s.
Number of unique records is accurate


In [277]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'grape')
finally:
    conn.close()

128


#### Insert style_grape

In [278]:
def insert_to_style_grape(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the style-grape table
    """
    table = 'style_grape'
    path_to_list = 'vintage/wine/style/grapes'
    paths_from_list = ['id']
    path_to_id_outside_list = 'vintage/wine/style/id'
    pk_sql = ['style_id', 'grape_id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [279]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'style_grape')
    insert_to_style_grape(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.5234789848327637 s.
Number of unique records is accurate


In [280]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'style_grape')
finally:
    conn.close()

551


#### Insert country_grape

In [281]:
def insert_to_country_grape(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the country-grape table
    """
    table = 'country_grape'
    path_to_list = 'vintage/wine/style/grapes'
    paths_from_list = ['id']
    path_to_id_outside_list = 'vintage/wine/style/country/code'
    pk_sql = ['country_code', 'grape_id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [282]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'country_grape')
    insert_to_country_grape(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.4409778118133545 s.
Number of unique records is accurate


In [283]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'country_grape')
finally:
    conn.close()

283


#### Insert wine

In [284]:
def insert_to_wine(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the wine table
    """
    table = 'wine'
    main = 'vintage/wine/'
    paths = [main + 'id', main + 'name', main + 'seo_name', main + 'type_id', main + 'vintage_type', main + 'is_natural',\
             main + 'region/id', main + 'winery/id', main + 'taste/structure/acidity', main + 'taste/structure/fizziness',\
             main + 'taste/structure/intensity', main + 'taste/structure/sweetness', main + 'taste/structure/tannin',\
             main + 'taste/structure/user_structure_count', main + 'taste/structure/calculated_structure_count', \
             main + 'style/id', main + 'statistics/status', main + 'statistics/ratings_count', main + 'statistics/ratings_average',\
             main + 'statistics/labels_count', main + 'statistics/vintages_count', main + 'has_valid_ratings']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry, False)

In [285]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'wine')
    insert_to_wine(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 2.957200050354004 s.
Number of unique records is accurate


In [286]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'wine')
finally:
    conn.close()

29811


#### Insert price

In [287]:
def insert_to_price(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the price table
    """
    table = 'price'
    main = 'price/'
    paths = [main + 'id', main + 'amount', main + 'discounted_from', main + 'type', main + 'visibility',\
            main + 'currency/code', main + 'bottle_type/name']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry)

In [288]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'price')
    insert_to_price(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 1.6994068622589111 s.
Number of unique records is accurate


In [289]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'price')
finally:
    conn.close()

55819


#### Insert vintage

In [290]:
def insert_to_vintage(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the vintage table
    """
    table = 'vintage'
    main = 'vintage/'
    paths = [main + 'id', main + 'seo_name', main + 'name', main + 'wine/id', main + 'year', main + 'has_valid_ratings',\
             main + 'statistics/status', main + 'statistics/ratings_count', main + 'statistics/ratings_average',\
             main + 'statistics/labels_count', 'price/id']
    pk_sql = ['id']
    extract_json_to_sql(conn, matches, table, paths, pk_sql, first_entry, False)

In [291]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'vintage')
    insert_to_vintage(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 2.4885811805725098 s.
Number of unique records is accurate


In [292]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'vintage')
finally:
    conn.close()

55819


#### Insert toplist

In [293]:
def insert_to_toplist(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the toplist table
    """
    table = 'toplist'
    path_to_list = 'vintage/top_list_rankings'
    paths_from_list = ['top_list/id', 'top_list/location', 'top_list/name', 'top_list/seo_name', 'top_list/type', 'top_list/year']
    pk_sql = ['id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, pk_sql=pk_sql, first_entry=first_entry)

In [294]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'toplist')
    insert_to_toplist(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.5344991683959961 s.
Number of unique records is accurate


In [295]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'toplist')
finally:
    conn.close()

533


#### Insert vintage_toplist

In [296]:
def insert_to_vintage_toplist(conn, matches, first_entry=True):
    """
    inserts data to correct fields in the vintage-toplist table
    """
    table = 'vintage_toplist'
    path_to_list = 'vintage/top_list_rankings'
    paths_from_list = ['top_list/id', 'top_list/rank', 'top_list/previous_rank', 'top_list/description']
    path_to_id_outside_list = 'vintage/id'
    pk_sql = ['vintage_id', 'toplist_id']
    extract_json_list_to_sql(conn, matches, table, path_to_list, paths_from_list, path_to_id_outside_list, pk_sql, first_entry)

In [297]:
conn = connect_to_vivino_db()
try:
#     clean_sql_table(conn, 'vintage_toplist')
    insert_to_vintage_toplist(conn, recovered_data_distinct, True)
finally:
    conn.close()

Insertion complete and took 0.4511551856994629 s.
Number of unique records is accurate


In [298]:
conn = connect_to_vivino_db()
try: 
    count_unique_records_sql(conn, 'vintage_toplist')
finally:
    conn.close()

1900
