# Importing SnowFlake database server data into KatanaGraph using Python Pandas/Dask DataFrames

[![](./01_Images/SnowFlake_logo.png)](https://www.katanagraph.com)

**This section details how to import data from a SnowFlake database server, into KatanaGraph using Python Pandas/Dask DataFrames.**
<br>
Presumably, SnowFlake is acting as the system of record, for operational data, and you wish to import that data into KatanaGraph,
where you can perform; graph queries, graph analytics, graph machine learning and mining.

## Importing data into KatanaGraph:

All told, there are several means to (get data into) Katanagraph, as well as affect data that is already present. These include: 
importing, manpiulating, and mutating.

- For a full treatment on all of (importing, manipulating, and mutating) data into and inside KatanaGraph, see [Here](https://www/google.com).
- Here we detail importing data from SnowFlake into KatanaGraph using Python Pandas/Dask DataFrames.

To source data from SnowFlake, you can use SQLAlchemy, [Here](https://docs.snowflake.com/en/user-guide/sqlalchemy.html)
or the SnowFlake Connector for Python, [Here](https://docs.snowflake.com/en/user-guide/python-connector-pandas.html).
This section uses the latter.


## In this section, the following assumptions are made:

-  You have a functional Katanagraph cluster that you can authenticate against, with at least 3 worker nodes. See [Here](https://www/google.com).
-  You have a functional SnowFlake database server that you can authenticate against. If you are using the SnowFlake free/developer tier
   you will need a username and password, and your "account" value. One means to attain your account value is to visit your Welcome/Sigup
   email from SnowFlake. In the image below, we need to copy the value, KR61275.us-central1.gcp
   
![Account Value](./01_Images/SnowFlake_creds.png)
   
-  For this example, we use the common SnowFlake demonstration database titled, snowflake_sample_data.tpch_sf1, and the tables titled;
   Customer and Nation. Customer and Nation become our vertices/nodes. For edge/realtionship records, we extract the join pairs between
   Customer and Nation from the Customer table.

![Data Model](./01_Images/models2.png) 



In [None]:
import pandas as pd
from pandas import DataFrame


#  From,
#     https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
#
#  pip install "snowflake-connector-python[pandas]"

import snowflake.connector

print("--")


In [None]:
#  Our credentials allowing connectivity to SnowFlake

l_connector_sf  = snowflake.connector.connect(
   user         = "farrell0",
   password     = "Fs_St1nky28!",
   account      = "KR61275.us-central1.gcp"
   )

print("--")


In [None]:
#  Verfication; Can we connect to SnowFlake correctly ..

# l_cursor = l_connector_sf.cursor()
# 
# try:
#    l_cursor.execute("SELECT current_version()")
#    one_row = l_cursor.fetchone()
#    print(one_row[0])
# 
# finally:
#    l_cursor.close()
#     
# l_connector_sf.close()


In [None]:
#  Get SnowFlake: Customer

l_customer1  = DataFrame(list(l_connector_sf.cursor().execute("SELECT c_custkey, c_name, c_address, c_phone, c_comment, c_mktsegment, c_nationkey, c_acctbal FROM snowflake_sample_data.tpch_sf1.customer").fetchall()))
   #
l_customer2 = l_customer1.rename(columns={0: "c_custkey", 1: "c_name", 2: "c_address", 3: "c_phone", 4: "c_comment", 5: "c_mktsegment", 6: "c_nationkey", 7: "c_acctbal"})


display("Number of records: " + str(len(l_customer2.index)))
   #
display(l_customer2.head(5))


print("--")    
    

In [None]:
#  Get SnowFlake: Nation

l_nation1  = DataFrame(list(l_connector_sf.cursor().execute("SELECT n_nationkey, n_name, n_comment, n_regionkey FROM snowflake_sample_data.tpch_sf1.nation").fetchall()))
   #
l_nation2 = l_nation1.rename(columns={0: "n_nationkey", 1: "n_name", 2: "n_comment", 3: "n_regionkey"})


display("Number of records: " + str(len(l_nation2.index)))
   #
display(l_nation2.head(5))


print("--") 


In [None]:
#  Get SnowFlake: (Edge records)

l_InNation1  = DataFrame(list(l_connector_sf.cursor().execute("SELECT c_custkey, c_nationkey FROM snowflake_sample_data.tpch_sf1.customer").fetchall()))
   #
l_InNation2 = l_InNation1.rename(columns={0: "c_custkey", 1: "n_nationkey"})


display("Number of records: " + str(len(l_InNation2.index)))
   #
display(l_InNation2.head(5))


print("--") 

#  Graph setup ..

In [None]:
#  Variables settings used later

NUM_PARTITIONS  = 5

DB_NAME         = "my_db"
GRAPH_NAME      = "my_graph"

print("--")


In [None]:
#  Get a KatanaGraph Connection handle 

import os

from katana import remote
from katana.remote import import_data


my_client = remote.Client()

print(my_client)


In [None]:
#  CREATE DATABASE

my_database = my_client.create_database(name=DB_NAME)

print(my_database.database_id)

In [None]:
#  CREATE A GRAPH

my_graph=my_client.get_database(name=DB_NAME).create_graph(name=GRAPH_NAME, num_partitions=NUM_PARTITIONS)

print(my_graph)

#  Make the Graph from the 3 previously imported DataFrames

In [None]:
# Import the 3 previously created (LDBC) Python DataFrames into KatanaGraph

with import_data.DataFrameImporter(my_graph) as df_importer:   
    
   df_importer.nodes_dataframe(l_customer2,                    #  Customer set of Nodes
      id_column             = "c_custkey",
      id_space              = "Customer",  
      label                 = "Customer",  
      )
    
   df_importer.nodes_dataframe(l_nation2,                      #  Nation set of Nodes
      id_column             = "n_nationkey",
      id_space              = "Nation", 
      label                 = "Nation", 
      )
   
   df_importer.edges_dataframe(l_InNation2,                    #  Our Edge, specifying the relationship between Customer --> IN_NATION --> Nation
      source_id_space       = "Customer", 
      destination_id_space  = "Nation",   
      source_column         = "c_custkey",
      destination_column    = "n_nationkey",
      type                  = "IN_NATION"
      )

print("--")

#  The result set should resemble

![Data Model](./01_Images/result_set3.png)  

In [None]:
#  Take a look at the graph ..

display(my_graph.num_nodes())
display(my_graph.num_edges())

l_result = my_graph.query("""

   MATCH (n)  - [ r ] ->  (m )
   RETURN n, m, r
   LIMIT 100
   
   """, contextualize=True)

l_result.view()