## Some housekeeping before we start

In [None]:
USE ROLE ACCOUNTADMIN;
use schema ML_LINEAGE_DATABASE.ML_LINEAGE_SCHEMA;

In [None]:
import streamlit as st
st.image("Lineage_title.png")

**Agenda**
1. Intro to Lineage and Use Cases
2. Lineage Metadata
    - Lineage Views in ACCOUNT_USAGE
    - Lineage APIs
3. Create/Visualize Data Lineage Workflow
    - Stages, Tables, DTs, Views
4. Create/Visualize ML Lineage Workflow
    - Feature Views, Training Data, Models
5. Wrapup

### To access the notebooks used in this demo

### https://github.com/sfc-gh-tmanfredi/SnowflakeLineage

In [None]:
import streamlit as st
st.image("DataMLLineageonSnowflake.png")

In [None]:
import streamlit as st
st.image("LineageUseCases2.png")

In [None]:
import streamlit as st
st.image("LineageRelationships.png")

### 3 Important sources of lineage info: 

Query History
https://docs.snowflake.com/en/sql-reference/account-usage/query_history

Access History
https://docs.snowflake.com/en/sql-reference/account-usage/access_history

Object Dependencies
https://docs.snowflake.com/en/sql-reference/account-usage/object_dependencies


In [None]:
select * from SNOWFLAKE.ACCOUNT_USAGE.Query_History 
where   query_text like 'CREATE OR REPLACE VIEW%'
 and start_time::date > current_date()-7
limit 10;

In [None]:
select * from snowflake.account_usage.access_history 
where user_name = 'TMANFREDI' 
and query_id = '01bc0ae3-0708-325d-0027-57030cc72e0a'
      
limit 100;


In [None]:
select *
from snowflake.account_usage.object_dependencies 
where 
    referenced_object_domain = 'VIEW'
limit 10;

In [None]:
import streamlit as st
st.image("LineageArchitectureDetail.png")

In [None]:
import streamlit as st
st.image("LineageSupportedObjects.png")

In [None]:
import streamlit as st
st.image("LineageConsiderations.png")

In [None]:
import streamlit as st
st.image("DataLineageFlow.png")

In [None]:
create or replace table wine_attributes 
(c1 variant);

In [3]:
-- Load the raw wine attributes from an internal stage
COPY INTO wine_attributes
FROM @LINEAGE_STAGE
  FILE_FORMAT=(TYPE=parquet);

In [None]:
select * from wine_attributes;

In [None]:
create or replace table wine_names as
select c_custkey as wine_id, c_address as wine_name, c_name as wine_taster
from snowflake_sample_data.tpch_sf1.customer
where wine_id < 1600;

In [None]:
import streamlit as st
st.image("TagPropogation.png")

### The Wine Names table has a column called Wine Taster which is a person's name.  This may be considered PII so we will tag it and watch the tags propagate through the lineage

In [None]:
alter tag wine_taster_tag unset masking policy simple_mask_string;

create or replace tag wine_taster_tag propagate=ON_DEPENDENCY_AND_DATA_MOVEMENT;

alter table wine_names 
modify column wine_taster set tag wine_taster_tag = 'IS_PII';

### Next lets add a masking policy to protect the PII values

In [None]:
create or replace masking policy simple_mask_string as
  (val string) returns string ->
  case
    when current_role() in ('XACCOUNTADMINX') then val
      else '*** masked *****'
    end;

alter tag wine_taster_tag set masking policy simple_mask_string;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE wine_attributes_dt
    TARGET_LAG = '1 minute'
    WAREHOUSE = COMPUTE_WH
    REFRESH_MODE = auto
    INITIALIZE = on_create
AS
   SELECT
        c1:_COL_0::number as wine_id, 
        case 
            when wine_id between 0 and 999 then 'Red' 
            when wine_id between 1000 and 1299 then 'White'
            when wine_id > 1299 then 'Rose'
        end as wine_color,  
        c1:_COL_2::number as CountryId,
        case 
            when CountryId = 1 then 'US'
            when CountryId = 2 then 'CHILE'
            when CountryId = 3 then 'ARGENTINA'
            when CountryId = 4 then 'AUSTRALIA'
            when CountryId = 5 then 'NEW ZEALAND'
            when CountryId = 6 then 'FRANCE'
            when CountryId = 7 then 'SPAIN'
            when CountryId = 8 then 'ITALY'
            when CountryId = 9 then 'GERMANY'
            when CountryId = 10 then 'GREECE'
        END AS Country,
        b.wine_name,
        b.wine_taster,
        c1:_COL_4::number as VintageYear
    from 
        wine_attributes a, 
        wine_names b
    where b.wine_id = wine_id;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE wine_attributes_french_dt
    TARGET_LAG = '1 minute'
    WAREHOUSE = COMPUTE_WH
    REFRESH_MODE = auto
    INITIALIZE = on_create
AS
    SELECT  
        a.wine_id, 
        a.wine_name,
        a.wine_taster,
        a.wine_color,
        a.countryid,
        a.country,
        a.VintageYear,
        b.alcohol,
        b.chlorides,
        b.citric_acid,
        b.density,
        b.fixed_acidity,
        b.free_sulfur_dioxide,
        b.ph,
        b.quality,
        b.residual_sugar,
        b.sulphates,
        b.total_sulfur_dioxide,
        b.volatile_acidity, 
        b.fixed_acidity / b.ph as acidity_ph_ratio
    FROM wine_attributes_dt a,winedata b
    WHERE a.wine_id = b.wine_id
      AND Country = 'FRANCE';

In [None]:
CREATE OR REPLACE VIEW FrenchWines as 
SELECT * FROM wine_attributes_french_dt;

## Now lets take a look at the lineage that was created...

### First, from the SQL API, then from the Lineage UI in Snowsight

In [None]:
import streamlit as st
st.image("DataLineageFlow.png")

### Lineage to source - what are all the upstream objects that went into the FrenchWines view

In [None]:
SELECT
    DISTANCE,
    SOURCE_OBJECT_DOMAIN,
    SOURCE_OBJECT_DATABASE,
    SOURCE_OBJECT_SCHEMA,
    SOURCE_OBJECT_NAME,
    SOURCE_STATUS,
    TARGET_OBJECT_DOMAIN,
    TARGET_OBJECT_DATABASE,
    TARGET_OBJECT_SCHEMA,
    TARGET_OBJECT_NAME,
    TARGET_STATUS,
FROM TABLE (SNOWFLAKE.CORE.GET_LINEAGE('ML_LINEAGE_DATABASE.ML_LINEAGE_SCHEMA.FRENCHWINES', 'VIEW', 'UPSTREAM',4));

### Now lets get column-level lineage on SULPHATES which appears in the FrenchWines view

In [None]:
SELECT
    DISTANCE,
    SOURCE_OBJECT_DOMAIN,
    SOURCE_OBJECT_DATABASE,
    SOURCE_OBJECT_SCHEMA,
    SOURCE_OBJECT_NAME,
    SOURCE_STATUS,
    TARGET_OBJECT_DOMAIN,
    TARGET_OBJECT_DATABASE,
    TARGET_OBJECT_SCHEMA,
    TARGET_OBJECT_NAME,
    TARGET_STATUS,
FROM TABLE (SNOWFLAKE.CORE.GET_LINEAGE('ML_LINEAGE_DATABASE.ML_LINEAGE_SCHEMA.FRENCHWINES.SULPHATES', 'COLUMN', 'UPSTREAM', 3));

### Impact Analysis - what are all the downstream objects that are impacted by my source?

In [None]:
SELECT
    DISTANCE,
    SOURCE_OBJECT_DOMAIN,
    SOURCE_OBJECT_DATABASE,
    SOURCE_OBJECT_SCHEMA,
    SOURCE_OBJECT_NAME,
    SOURCE_STATUS,
    TARGET_OBJECT_DOMAIN,
    TARGET_OBJECT_DATABASE,
    TARGET_OBJECT_SCHEMA,
    TARGET_OBJECT_NAME,
    TARGET_STATUS,
FROM TABLE (SNOWFLAKE.CORE.GET_LINEAGE('ML_LINEAGE_DATABASE.ML_LINEAGE_SCHEMA.FEATURE_STORE_TEMP_STAGE', 'STAGE', 'DOWNSTREAM', 3));

## Now lets look at the Lineage UI for this same workflow
- Open another Snowsight window or tab
- Navigate to Data->Databases->ML_LINEAGE_DATABASE->ML_LINEAGE_SCHEMA->Views->FrenchWines
- Click the "Lineage" tab in the center of your screen
- Click the "+" signs on any object to navigate between source and target objects in your workflow
- Click any object to see the column list and other metadata for that object
- Notice that the column "WINE_TASTER" in WIN_ATTRIBUTES_DT has inhereted the object tag identify as PII
- "View lineage" for any column in this list
- Find column ACIDITY_PH_RATIO and click View Lineage.  Click "Upstream Lineage".  You will see the two columns that are part of the calculation of this ratio 
- Click "Downstream Lineage" and click the column namd ACIDITY_PH_RATIO. 
- Hover under the description and click "Generate with Cortex". 

In [None]:
-- show datasets;
drop dataset if exists my_dataset;
drop dataset if exists my_dataset_from_table;
drop table if exists MY_TABLE_DATASET_1;
drop DYNAMIC TABLE if exists IDENTIFIER('"ML_LINEAGE_DATABASE"."ML_LINEAGE_SCHEMA"."WINE_FEATURES$1.0"');
drop VIEW if exists IDENTIFIER('"ML_LINEAGE_DATABASE"."ML_LINEAGE_SCHEMA"."EXTRA_WINE_FEATURES$1.0"');
drop TAG if exists IDENTIFIER('"ML_LINEAGE_DATABASE"."ML_LINEAGE_SCHEMA"."SNOWML_FEATURE_STORE_ENTITY_WINE"');

drop model if exists MODEL_TRAINED_ON_DATASET;
drop model if exists MODEL_TRAINED_ON_TABLE;
drop model if exists MODEL_TRAINED_ON_PANDAS;

In [None]:
use schema ML_LINEAGE_DATABASE.ML_LINEAGE_SCHEMA;

## Now, lets expand this example to include components of an ML flow...

In [None]:
import streamlit as st
st.image("MLLineageFlow.png")

In [1]:
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# Add a query tag to the session. This helps with debugging and performance monitoring.
session.query_tag = {"origin":"sf_sit-is", "name":"aiml_notebooks_lineage", "version":{"major":1, "minor":0}, "attributes":{"is_quickstart":1, "source":"notebook"}}

CURRENT_DB = session.get_current_database()
CURRENT_SCHEMA = session.get_current_schema()
print(CURRENT_DB,CURRENT_SCHEMA)

<snowflake.snowpark.session.Session: account="ax_test_qa3", role="ACCOUNTADMIN", database="LINEAGE_DEMO_DB", schema="PUBLIC", warehouse="AX_XL">


In [None]:
# creates 
# external stage: FEATURE_STORE_TEMP_STAGE
# table: winedata (populated w/1.6K rows) 
# file format: FEATURE_STORE_TEMP_FORMAT
from snowflake.ml.feature_store.examples.example_helper import ExampleHelper

example_helper = ExampleHelper(session, CURRENT_DB, CURRENT_SCHEMA)
source_tables = example_helper.load_example('wine_quality_features')

In [None]:
# display as Pandas DataFrame (Source is WineData)
for table in source_tables:
    print(f"{table}:")
    df = session.table(table).limit(5).to_pandas()
    df.style

Unnamed: 0,WINE_ID,FIXED_ACIDITY,VOLATILE_ACIDITY,CITRIC_ACID,RESIDUAL_SUGAR,CHLORIDES,FREE_SULFUR_DIOXIDE,TOTAL_SULFUR_DIOXIDE,DENSITY,PH,SULPHATES,ALCOHOL,QUALITY
0,1,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4,5
1,2,7.8,0.88,0.0,2.6,0.098,25,67,0.9968,3.2,0.68,9.8,5
2,3,7.8,0.76,0.04,2.3,0.092,15,54,0.997,3.26,0.65,9.8,5
3,4,11.2,0.28,0.56,1.9,0.075,17,60,0.998,3.16,0.58,9.8,6
4,5,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4,5


## 2. Feature View Lineage

We will start by creating a new feature store and register and entities and feature views. More details on feature store APIs can be found [here](https://docs.snowflake.com/en/developer-guide/snowpark-ml/feature-store/overview)). For the detailed workflow of feature store refer to the notebook [here](https://quickstarts.snowflake.com/guide/overview-of-feature-store-api/index.html?index=..%2F..index#0) 

In [5]:
from snowflake.ml.feature_store import (
    FeatureStore,
    FeatureView,
    Entity,
    CreationMode
)

# Create the feature store
fs = FeatureStore(
    session=session, 
    database=CURRENT_DB, 
    name=CURRENT_SCHEMA, 
    default_warehouse=session.get_current_warehouse(),
    creation_mode=CreationMode.CREATE_IF_NOT_EXIST,
)

all_entities = []
for e in example_helper.load_entities():
    entity = fs.register_entity(e)
    all_entities.append(entity)

# Create the feature views
all_feature_views = []
for fv in example_helper.load_draft_feature_views():
    rf = fs.register_feature_view(
        feature_view=fv,
        version='1.0', 
        overwrite=True
    )
    all_feature_views.append(rf)

fs.list_feature_views().select('name', 'version', 'desc', 'refresh_freq').show()

---------------------------------------------------------------------------------------------------------
|"NAME"               |"VERSION"  |"DESC"                                              |"REFRESH_FREQ"  |
---------------------------------------------------------------------------------------------------------
|WINE_FEATURES        |1.0        |Managed feature view about wine quality which r...  |1 day           |
|EXTRA_WINE_FEATURES  |1.0        |Static feature view about wine quality which ne...  |NULL            |
---------------------------------------------------------------------------------------------------------



In [6]:
# Query the upstream lineage of the feature views we just created. 
for fv in all_feature_views:
    print("Upstream Lineage of feature view '" + fv.name + "'")
    print(fv.lineage(direction='upstream'))

LineageNode.lineage() is in private preview since 1.5.3. Do not use it in production. 
Lineage.trace() is in private preview since 1.16.0. Do not use it in production. 


Upstream Lineage of feature view 'EXTRA_WINE_FEATURES'
[LineageNode(
  name='LINEAGE_DEMO_DB.FEATURE_STORE.WINEDATA',
  version='None',
  domain='table',
  status='ACTIVE',
  created_on='2024-08-01 22:44:14'
)]
Upstream Lineage of feature view 'WINE_FEATURES'
[LineageNode(
  name='LINEAGE_DEMO_DB.FEATURE_STORE.WINEDATA',
  version='None',
  domain='table',
  status='ACTIVE',
  created_on='2024-08-01 22:44:14'
)]


## 3. Training Data Lineage

Next step in ML workflows will be generating training data that is needed to train the model. There are 2 ways to generate training data. 
1. Using feature views.
2. Using source tables directly.  

### 3.1 Training Data from Feature views

Lets explore the workflow of creating training data sets using the feature views. 

In [None]:
label_cols = example_helper.get_label_cols()
timestamp_col = example_helper.get_training_data_timestamp_col()
excluded_cols = example_helper.get_excluded_cols()
join_keys = [key for entity in all_entities for key in entity.join_keys]
spine_table = example_helper.get_training_spine_table()
print(f'timestamp col: {timestamp_col}')
print(f'excluded cols: {excluded_cols}')
print(f'label cols: {label_cols}')
print(f'join keys: {join_keys}')
print(f'training spine table: {spine_table}')

In [None]:
sample_count = 512
source_df = session.sql(f"""
    select {','.join(label_cols)}, 
            {','.join(join_keys)} 
            {',' + timestamp_col if timestamp_col is not None else ''} 
    from {spine_table}
""")
spine_df = source_df.sample(n=sample_count)
# preview spine dataframe
spine_df.show()

Unnamed: 0,QUALITY,WINE_ID
0,6,544
1,5,978
2,5,679
3,5,1459
4,5,5
...,...,...
507,6,508
508,6,624
509,5,1132
510,5,511


#### 3.1.1 Dataset as training data



[Snowflake Dataset](https://docs.snowflake.com/en/developer-guide/snowpark-ml/dataset) generated from feature views created above. Dataset is a readonly objects helps in reproducability of the ML model. 

Use Snowflake Datasets in the following situations:

- You need to manage and version large datasets for reproducible machine learning model training and testing.
- You need fine-grained file-level access and/or data shuffling for distributed training or data streaming.
- You need to integrate with external machine learning frameworks and tools.
- You need to track the lineage used to create an ML model.

In [9]:
my_dataset = fs.generate_dataset(
    name="my_dataset",
    spine_df=spine_df, 
    features=all_feature_views,
    version="1.0",
    spine_timestamp_col=timestamp_col,
    spine_label_cols=label_cols,
    exclude_columns=excluded_cols,
    desc="This is the dataset joined spine dataframe with feature views",
)

In [10]:
# Query Upstream lineage of the dataset we just generated. This will include a table and 2 feature views
my_dataset.lineage(direction="upstream")

[LineageNode(
   name='LINEAGE_DEMO_DB.FEATURE_STORE.WINEDATA',
   version='None',
   domain='table',
   status='ACTIVE',
   created_on='2024-08-01 22:44:14'
 ),
 FeatureView(_name=EXTRA_WINE_FEATURES, _entities=[Entity(name=WINE, join_keys=['WINE_ID'], owner=None, desc=Wine ID column.)], _feature_df=<snowflake.snowpark.dataframe.DataFrame object at 0x1711bc3d0>, _timestamp_col=None, _desc=Static feature view about wine quality which never refresh., _infer_schema_df=<snowflake.snowpark.dataframe.DataFrame object at 0x1711fd910>, _query=SELECT "WINE_ID", "SULPHATES", "ALCOHOL" FROM "LINEAGE_DEMO_DB".FEATURE_STORE.winedata, _version=1.0, _status=FeatureViewStatus.STATIC, _feature_desc=OrderedDict([('SULPHATES', ''), ('ALCOHOL', '')]), _refresh_freq=None, _database=LINEAGE_DEMO_DB, _schema=FEATURE_STORE, _warehouse=None, _refresh_mode=None, _refresh_mode_reason=None, _owner=ACCOUNTADMIN, _lineage_node_name=LINEAGE_DEMO_DB.FEATURE_STORE.EXTRA_WINE_FEATURES, _lineage_node_domain=feature_vie

In [11]:
# Query Downstream lineage from the feature views.
for fv in all_feature_views:
    print("Downstream Lineage of feature view '" + fv.name + "'")
    print(fv.lineage(direction='downstream'))

Downstream Lineage of feature view 'EXTRA_WINE_FEATURES'
[Dataset(
  name='LINEAGE_DEMO_DB.FEATURE_STORE.MY_DATASET',
  version='4.0',
)]
Downstream Lineage of feature view 'WINE_FEATURES'
[Dataset(
  name='LINEAGE_DEMO_DB.FEATURE_STORE.MY_DATASET',
  version='4.0',
)]


## 4. Model Lineage

Now let's train a simple random forest model, and evaluate the prediction accuracy.

In [16]:
# Let's create a registry to save the trained models. 
# All models need to be logged into the registry for their lineage to be tracked.

from snowflake.ml.registry import Registry

registry = Registry(
    session=session, 
    database_name=CURRENT_DB, 
    schema_name=CURRENT_SCHEMA,
)

### 4.1 Model trained in snowflake ecosystem

In [17]:
# Lets define a training function that uses Random forest to build the model

from snowflake.ml.modeling.ensemble import RandomForestRegressor
from snowflake.ml.modeling import metrics as snowml_metrics
from snowflake.snowpark.functions import abs as sp_abs, mean, col

def train_model_using_snowpark_ml(training_data_df):
    train, test = training_data_df.random_split([0.8, 0.2], seed=42)
    feature_columns = list(set(training_data_df.columns) - set(label_cols) - set(join_keys) - set([timestamp_col]))
    print(f"feature cols: {feature_columns}")
    
    rf = RandomForestRegressor(
        input_cols=feature_columns, label_cols=label_cols, 
        max_depth=3, n_estimators=20, random_state=42
    )

    rf.fit(train)
    predictions = rf.predict(test)

    output_label_names = ['OUTPUT_' + col for col in label_cols]
    mse = snowml_metrics.mean_squared_error(
        df=predictions, 
        y_true_col_names=label_cols, 
        y_pred_col_names=output_label_names
    )

    accuracy = 100 - snowml_metrics.mean_absolute_percentage_error(
        df=predictions,
        y_true_col_names=label_cols,
        y_pred_col_names=output_label_names
    )

    print(f"MSE: {mse}, Accuracy: {accuracy}")
    return rf

#### 4.1.1 Model trained using Dataset

In [18]:
# Convert dataset to a snowpark dataframe and examine all the features in it.

training_data_df = my_dataset.read.to_snowpark_dataframe()
assert training_data_df.count() == sample_count
# drop rows that have any nulls in value. 
training_data_df = training_data_df.dropna(how='any')
training_data_df.to_pandas()

Unnamed: 0,QUALITY,SULPHATES,ALCOHOL,FIXED_ACIDITY,CITRIC_ACID,CHLORIDES,TOTAL_SULFUR_DIOXIDE,PH,MY_NEW_FEATURE
0,5,0.56,9.4,7.4,0.00,0.076,34,3.51,0.000
1,5,0.68,9.8,7.8,0.00,0.098,67,3.20,0.000
2,5,0.64,9.5,7.6,0.29,0.075,66,3.40,2.204
3,5,0.70,11.1,7.9,0.40,0.062,20,3.28,3.160
4,7,0.76,10.7,11.8,0.49,0.093,80,3.30,5.782
...,...,...,...,...,...,...,...,...,...
507,4,0.46,9.6,8.1,0.00,0.081,24,3.38,0.000
508,5,0.64,9.7,6.7,0.08,0.064,34,3.33,0.536
509,7,0.68,11.4,13.3,0.75,0.084,43,3.04,9.975
510,6,0.48,10.5,5.6,0.78,0.074,92,3.39,4.368


In [19]:
# Train the random forest model using Snowpark-ML and the dataset (2min runtime)
# then log the model in the registry.

random_forest_model = train_model_using_snowpark_ml(training_data_df) 

model_version = registry.log_model(
    model_name="MODEL_TRAINED_ON_DATASET",
    version_name="v1",
    model=random_forest_model,
    options={'relax_version': True},
    comment="Model trained with feature views, dataset"
)

feature cols: ['MY_NEW_FEATURE', 'PH', 'TOTAL_SULFUR_DIOXIDE', 'CITRIC_ACID', 'CHLORIDES', 'SULPHATES', 'FIXED_ACIDITY', 'ALCOHOL']


The version of package 'snowflake-snowpark-python' in the local environment is 1.20.2, which does not fit the criteria for the requirement 'snowflake-snowpark-python'. Your UDF might not work when the package version is different between the server and your local environment.
The version of package 'numpy' in the local environment is 1.24.4, which does not fit the criteria for the requirement 'numpy==1.24.3'. Your UDF might not work when the package version is different between the server and your local environment.
The version of package 'scikit-learn' in the local environment is 1.3.2, which does not fit the criteria for the requirement 'scikit-learn==1.3.0'. Your UDF might not work when the package version is different between the server and your local environment.
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
The version of package 'numpy' in the

MSE: 0.25267477340674793, Accuracy: 99.92349333548655


  return next(self.gen)


In [20]:
# Query lineage of new model
ds = model_version.lineage(direction="upstream")
ds

[Dataset(
   name='LINEAGE_DEMO_DB.FEATURE_STORE.MY_DATASET',
   version='4.0',
 )]

### 4.2 Model trained in non-snowflake ecosystem

For the workflows such as:
- A model trained using snowpark.ml but not a Snowpark DataFrame (like pandas).
- A model trained without using snowpark.ml or a Snowpark DataFrame.
- A model trained outside of Snowflake.


You can still associate the lineage between the source data object and the trained model by passing the snowpark dataframe backed by the source data object to model registry’s log_model API as sample_input_data. 


## 5. Visualization of lineage
 

(Show this in Snowsight starting with the Stage object)

## What's Next? 

- Support for Stored Procedures and Tasks in the Lineage Graph
- Support for Organizational Listings in the Lineage Graph
- Support for 3rd Party Tools in the Lineage Graph (dbt, PowerBI)
- Advanced filtering capabilities for deleted nodes
- Simplifying permission model to make onboarding even easier
- Integrate lineage data from OpenFlow to get complete source->sink view

## Partner Solutions
- Alation
- Collibra
- Atlan
- Informatica

# References

[Repo with this Demo Notebook](https://github.com/sfc-gh-tmanfredi/SnowflakeLineage)

Data Lineage Docs
[Snowsight Lineage](https://docs.snowflake.com/en/user-guide/ui-snowsight-lineage),
[Get_Lineage() Function](https://docs.snowflake.com/en/sql-reference/functions/get_lineage-snowflake-core),
[Access_History](https://docs.snowflake.com/en/user-guide/access-history),
[Object_Dependencies](https://docs.snowflake.com/en/user-guide/object-dependencies)

ML Feature Docs
[Snowflake Feature Store](https://docs.snowflake.com/en/developer-guide/snowpark-ml/feature-store/overview), 
[Dataset](https://docs.snowflake.com/en/developer-guide/snowpark-ml/dataset), 
[ML Lineage](https://docs.snowflake.com/en/developer-guide/snowflake-ml/ml-lineage), 
[Snowpark ML Modeling](https://docs.snowflake.com/en/developer-guide/snowpark-ml/modeling) and 
[Snowflake Model Registry](https://docs.snowflake.com/en/developer-guide/snowpark-ml/model-registry/overview)


Blogs
- https://medium.com/@nethaji.bhuma/snowflake-lineage-features-b35aae81893a
- https://medium.com/@pascalpfffle/snowflake-data-lineage-transparency-in-your-data-landscape-with-snowsight-597a5aba8010
- https://medium.com/@jfgiudicelli/advanced-data-lineage-in-snowflake-5bc693777887