## Example: Connecting Hopsworks with AzureSQL

### Instructions

#### Retrieve the connection details from Azure Portal

You can connect to AzureSQL from Hopsworks using the JDBC connection. You can see your JDBC connection details from the Azure Portal:

<div>
<img src="images/AzureSQL_ConnDetails.png" width="800"/>
</div>

#### Open the Firewall (Optional)
Depending on your AzureSQL firewall configuration, you might need to whitelist the Hopsworks IPs with the firewall.

#### Create the storage connector in Hopsworks

All the connection attributes in the screenshot above should be set as `arguments` in a storage connector of type `JDBC` in Hopsworks.
Additionally you should set an extra `argument` named `driver` with value `com.microsoft.sqlserver.jdbc.SQLServerDriver`

Relevant Documentation: https://docs.hopsworks.ai/latest/user_guides/fs/storage_connector/creation/jdbc/

<div>
<img src="images/Hopsworks_JDBC_SC.png" width="800"/>
</div>

#### Download the JDBC Driver JAR and upload it in your project in Hopsworks

You can download the AzureSQL JAR from here: https://learn.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server.
The zip file contains the JAR for both Java 8 and Java 11. You should upload and use the Java 8 JAR 

### Use the connector:


In [2]:
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/15480
Connected. Call `.close()` to terminate connection gracefully.

#### Retrieve the Storage Connector using the APIs

In [21]:
sc = fs.get_storage_connector("azure_sql")

#### Example 1: External Feature Groups 

Use the storage connector to create an external feature group in Hopsworks.
    
<div class="alert alert-info">
These APIs are only supported in a (Py)Spark Execution Engine
</div>

Specify a Query (e.g. `SELECT * FROM test`) to execute every time the feature data is needed to create a new training dataset.
With external feature groups, the offline data stays in the external system and the query is executed every time.

Relevant Documentation: https://docs.hopsworks.ai/latest/user_guides/fs/feature_group/create_external/

In [4]:
external_feature_group = fs.create_external_feature_group(
    name="profiles_upstream",
    version=1,
    storage_connector = sc,
    query="SELECT * FROM test",
    statistics_config={'histograms': True, 'correlations': True}
)

In [15]:
external_feature_group.save()

#### Example 2: Derived Feature Groups 

Use the previously created external feature group as data source to create additional derived features:

In [5]:
external_feature_group = fs.get_external_feature_group(name="profiles_upstream", version=1)

In [7]:
profiles_df = external_feature_group.read()

In [8]:
profiles_df.show(5)

+-------------+---+--------------------+----------+------------+-------+----------------+
|         name|sex|                mail| birthdate|        city|country|          cc_num|
+-------------+---+--------------------+----------+------------+-------+----------------+
|Tonya Gregory|  F|sandratorres@hotm...|1976-01-16|Far Rockaway|     US|4796807885357879|
| Lisa Gilbert|  F| michael53@yahoo.com|1986-09-30|   Encinitas|     US|4529266636192966|
|Carolyn Meyer|  F| anthony47@yahoo.com|2001-07-13|      Canton|     US|4922690008243953|
|  Sara Morris|  F|  amylloyd@yahoo.com|1938-06-23|  Greenpoint|     US|4897369589533543|
|  Paul Ashley|  M|matthew97@hotmail...|1974-12-06|     Rutland|     US|4848518335893425|
+-------------+---+--------------------+----------+------------+-------+----------------+
only showing top 5 rows

In [11]:
from pyspark.sql import functions as F

derived_df = profiles_df.withColumn("age", F.floor(F.datediff(F.current_date(), F.to_date(F.col('birthdate'), 'yyyy-mm-dd'))/365.25))

In [14]:
derived_fg = fs.get_or_create_feature_group(
    name="profiles_derived",
    version=1,
    primary_key=['mail'],
    online_enabled=True,
    parents=[external_feature_group],
    statistics_config={'histograms': True, 'correlations': True}
)

In [15]:
derived_fg.insert(derived_df)

Feature Group created successfully, explore it at 
https://snurran.hops.works/p/15480/fs/15428/fg/16402
(None, None)

#### Example 3: Create a Training Dataset

<div class="alert alert-info">
These APIs are also supported from a Python Engine:
    
    - https://docs.hopsworks.ai/feature-store-api/latest/generated/api/feature_view_api/#create_train_test_split
    
    - https://docs.hopsworks.ai/feature-store-api/latest/generated/api/feature_view_api/#create_training_data
    
    - https://docs.hopsworks.ai/feature-store-api/latest/generated/api/feature_view_api/#create_train_validation_test_split
</div>

In [17]:
fv = fs.create_feature_view(
    name="azure_sql_demo",
    version=1,
    query=external_feature_group.select_all(),
)

Feature view created successfully, explore it at 
https://snurran.hops.works/p/15480/fs/15428/fv/azure_sql_demo/version/1

In [18]:
fv.create_training_data()

(1, None)

#### Example 4: Use the Storage Connector to read the data in a Spark DataFrame

You can use this option if you want to retrieve raw data to create features without having to create an external feature group

In [22]:
df = sc.read(query="""
    SELECT *
    FROM test
    WHERE city = 'Canton'
""")

In [23]:
df.show(10)

+----------------+---+--------------------+----------+------+-------+----------------+
|            name|sex|                mail| birthdate|  City|Country|          cc_num|
+----------------+---+--------------------+----------+------+-------+----------------+
|   Carolyn Meyer|  F| anthony47@yahoo.com|2001-07-13|Canton|     US|4922690008243953|
|Brandon Mitchell|  M|  ajackson@yahoo.com|1967-03-25|Canton|     US|4928442302922211|
|     John Sutton|  M| qcalderon@gmail.com|1998-10-18|Canton|     US|4459273780148699|
|    Taylor Pitts|  F|cohenrussell@gmai...|1989-10-08|Canton|     US|4038150065544828|
|   Larry Andrews|  M| annette68@yahoo.com|1938-02-06|Canton|     US|4421833463642311|
|     Hector Cook|  M| utucker@hotmail.com|1963-10-04|Canton|     US|4069293169784098|
|  Abigail Murray|  F|rodriguezjulie@ho...|1971-02-01|Canton|     US|4839891313999949|
|Jessica Gonzales|  F|rphillips@hotmail...|1936-01-09|Canton|     US|4179358160776166|
|  Anthony Fowler|  M|   jason33@yahoo.com|