# Intro

This notebook creates the table for hubspot associations


## Change History

<style>
  table {margin-left: 0 !important;}
</style>

| Date    | Author | Description |
| :-------- | :------- | :------- | 
|2025-02-12 | Mclain R |  Created Date|

# Code

## Imports

###### notebookutils
- **mssparkutils**

###### pyspark.sql.functions
- **col**
- **F**

###### python
- **re**

In [1]:
%%pyspark

from pyspark.sql import SparkSession

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 3, Finished, Available, Finished)

In [2]:
# Initialize the Spark session with the specific configuration to improve join performance
spark = SparkSession.builder \
    .appName("Optimized Joins") \
    .config("spark.advise.nonEqJoinConvertRule.enable", "true") \
    .getOrCreate()

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 4, Finished, Available, Finished)

## Define Parameters
- none

Note: the following is a parameter cell and will be interpreted by Pipelines as such.

## Reused Functions
- none

## Define Fields

- **workspace_name**: name of workspace


## Process Data

write current associations table to previous table for historical tracking

In [3]:
%%pyspark
df = spark.sql("SELECT * FROM smartsync_lakehouse.hubspot_associations")

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 5, Finished, Available, Finished)

In [4]:
%%pyspark
df.write.format("delta")\
    .mode("overwrite")\
    .option("overwriteSchema", "true")\
    .save('abfss://Mythic@onelake.dfs.fabric.microsoft.com/smartsync_lakehouse.Lakehouse/Tables/previous_hubspot_associations')

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 6, Finished, Available, Finished)

temp view for contacts from nucleus that have hubspot ids.

In [5]:
%%sql
CREATE OR REPLACE TEMPORARY VIEW temp_contacts_1 AS

SELECT DISTINCT
    h.id as contact_id,
    contact.uniqueaccountid,
    contact.job_title as persona
FROM silver_lakehouse.nucleus__combined_contacts contact
INNER JOIN silver_lakehouse.hubspot__contact h
    ON contact.email = h.property_email

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 7, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

In [6]:
%%sql
SELECT count(*) FROM temp_contacts_1

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 8, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 1 fields>

temp view for hubspot companies that have a primary unique id

In [7]:
%%sql
CREATE OR REPLACE TEMPORARY VIEW temp_company_1 AS

SELECT DISTINCT
    id as company_id,
    property_icrm_account_id,
    property_dcrm_account_id
from silver_lakehouse.hubspot__company
where property_primary_unique_id is not null

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 9, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

temp view for nucleus contacts that have a hubspot company match. Unioned for icrm and dcrm.

In [8]:
%%sql
CREATE OR REPLACE TEMPORARY VIEW temp_company_2 AS

SELECT DISTINCT
    a.contact_id,
    company.company_id,
    a.persona
FROM temp_contacts_1 a
INNER JOIN temp_company_1 company
    ON a.uniqueaccountid = company.property_icrm_account_id

UNION

SELECT DISTINCT
    a.contact_id,
    company.company_id,
    a.persona
FROM temp_contacts_1 a
INNER JOIN temp_company_1 company
    ON a.uniqueaccountid = company.property_dcrm_account_id

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 10, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

In [9]:
%%sql
SELECT count(*) FROM temp_company_2

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 11, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 1 fields>

temp view for only association labels relevant for contact-company.

In [10]:
%%sql
CREATE OR REPLACE TEMPORARY VIEW temp_associations AS

SELECT DISTINCT
    hub.from_object_type,
    hub.to_object_type,
    hub.name as hubspot_label,
    hub.id as hubspot_code,
    hub.category as hubspot_type,
    b.contact_id,
    b.company_id
FROM temp_company_2 b
INNER JOIN silver_lakehouse.hubspot__association_type hub
    ON b.persona = hub.label
WHERE hub.from_object_type = 'contact'
    AND hub.to_object_type = 'company'

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 12, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

In [11]:
%%sql
SELECT count(*) FROM temp_associations

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 13, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 1 fields>

create associations table

In [12]:
%%sql
CREATE OR REPLACE TABLE hubspot_associations
USING DELTA
LOCATION 'Tables/hubspot_associations' AS

SELECT DISTINCT
    from_object_type,
    contact_id as from_id,
    to_object_type,
    company_id as to_id,
    hubspot_label,
    hubspot_code,
    hubspot_type
from temp_associations

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 14, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

In [13]:
%%sql
SELECT count(*) FROM hubspot_associations

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 15, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 1 fields>

In [14]:
%%sql
SELECT * FROM hubspot_associations LIMIT 20

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 16, Finished, Available, Finished)

<Spark SQL result set with 20 rows and 7 fields>

create difference table from previous associations to new ones.

In [15]:
differences = spark.sql("""
SELECT *
FROM hubspot_associations
EXCEPT
SELECT *
FROM previous_hubspot_associations
""")

differences.count()

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 17, Finished, Available, Finished)

+----------------+-----------+--------------+-----------+---------------+------------+------------+
|from_object_type|    from_id|to_object_type|      to_id|  hubspot_label|hubspot_code|hubspot_type|
+----------------+-----------+--------------+-----------+---------------+------------+------------+
|         contact|16290176522|       company|26519562534|        unknown|          39|USER_DEFINED|
|         contact|16290176522|       company|26522671395|        unknown|          39|USER_DEFINED|
|         contact|77244886351|       company|26516415102|        unknown|          39|USER_DEFINED|
|         contact|     232450|       company|26516701848| never_targeted|          29|USER_DEFINED|
|         contact|     309933|       company|26531588160|owner/principal|          33|USER_DEFINED|
|         contact|     254205|       company|26532660653|  sales_manager|          37|USER_DEFINED|
|         contact|     168026|       company|29008408869|finance_manager|          25|USER_DEFINED|


In [16]:
%%pyspark
differences.write.format("delta")\
    .mode("overwrite")\
    .option("overwriteSchema", "true")\
    .save('abfss://Mythic@onelake.dfs.fabric.microsoft.com/smartsync_lakehouse.Lakehouse/Tables/hubspot_associations_delta')

StatementMeta(, 286985d8-96e1-4e26-a8c5-1d89b0f501d6, 18, Finished, Available, Finished)