1. Introduction to Feature Engineering

Feature engineering is the process of transforming raw data into meaningful features that help a machine learning model learn better.
Features are the input variables the model uses to make predictions.

Why it’s important:

It improves model accuracy.
Helps extract hidden patterns in data.
Reduces noise and irrelevant data.

Technique	Description	Example
Normalization - Scaling numeric values between 0 and 1.	eg, Salary → 0.75
Encoding - Converting text/categorical data into numbers. eg, Gender → Male=1, Female=0
Aggregation - Summarising data over time or a group, eg, Avg. purchase amount per user


2. Using Snowflake for Data Storage & Processing

What is Snowflake?
Snowflake is a cloud-based data warehouse that stores both structured (tables, numbers, text) and semi-structured (JSON, XML) data.

Why use it?

It’s scalable, fast, and supports SQL.
It’s commonly used in companies for data pipelines.
It integrates well with machine learning tools.

Integration with ML Pipelines:
Snowflake lets you:
Use Snowpark (Python API) to run ML preprocessing.
Connect directly to feature stores or model training environments.

Example-

In [None]:
-- Creating compute warehouse
CREATE OR REPLACE WAREHOUSE DEMO_WH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

-- Creating database and schema
CREATE OR REPLACE DATABASE STUDENT_DB;
USE DATABASE STUDENT_DB;
CREATE OR REPLACE SCHEMA PUBLIC;



EXTRACT

In [None]:
-- Creating a simple student table
CREATE OR REPLACE TABLE STUDENT_LOGS (
    student_id INT,
    student_name STRING,
    department STRING,
    gpa FLOAT,
    attendance FLOAT
);

-- Inserting sample data
INSERT INTO STUDENT_LOGS VALUES
(1, 'Alice', 'Computer Science', 8.5, 0.92),
(2, 'Bob', 'Electrical', 7.8, 0.85),
(3, 'Carol', 'Mechanical', 9.1, 0.95),
(4, 'David', 'Computer Science', 6.9, 0.80),
(5, 'Emma', 'Electrical', 8.2, 0.88);
SELECT * FROM STUDENT_LOGS;

3️. Feature Store Concepts

Definition:
A Feature Store is a centralized system to store, manage, and serve ML features for training and inference.
Why use a Feature Store?

Ensures consistent features between training and production.
Reusability — the same feature can be used in multiple ML models.
Reduces data leakage and improves governance.

Feature Store	Description	Example Usage
AWS SageMaker Feature Store - Managed by AWS for storing ML features. Use with AWS ML services.
Snowflake Feature Store - Integrated with Snowflake data warehouse.	Works natively with Snowpark ML.
Databricks Feature Store - Works with Spark and MLflow.	Used in big data pipelines.

TRANSFORM

In [None]:
-- Creating a new table with engineered features
CREATE OR REPLACE TABLE STUDENT_FEATURES AS
SELECT
    student_id,
    student_name,
    department,
    gpa,
    attendance,
    CASE
        WHEN gpa >= 8.5 THEN 'High'
        WHEN gpa >= 7.0 THEN 'Medium'
        ELSE 'Low'
    END AS gpa_level,
    CASE
        WHEN attendance >= 0.9 THEN 'Good'
        WHEN attendance >= 0.8 THEN 'Average'
        ELSE 'Poor'
    END AS attendance_status
FROM STUDENT_LOGS;
SELECT * FROM STUDENT_FEATURES;

LOAD

In [None]:
CREATE OR REPLACE SCHEMA FEATURE_STORE;

CREATE OR REPLACE TABLE FEATURE_STORE.STUDENT_FEATURES AS
SELECT * FROM STUDENT_FEATURES;


In [None]:
SELECT * FROM FEATURE_STORE.STUDENT_FEATURES WHERE department = 'Electrical';


PYTHON SCRIPT

In [None]:
from snowflake.snowpark import Session
from snowflake.snowpark.functions import when, lit, col


session = Session.builder.configs(connection_parameters).create()
print(" Connected to Snowflake")

#  Creating student dataset
data = [
    (1, 'Alice', 'Computer Science', 8.5, 0.92),
    (2, 'Bob', 'Electrical', 7.8, 0.85),
    (3, 'Carol', 'Mechanical', 9.1, 0.95),
    (4, 'David', 'Computer Science', 6.9, 0.80),
    (5, 'Emma', 'Electrical', 8.2, 0.88)
]

columns = ["student_id", "student_name", "department", "gpa", "attendance"]

# Creating Snowflake table from data
df = session.create_dataframe(data, schema=columns)
df.write.save_as_table("STUDENT_LOGS", mode="overwrite")
print(" STUDENT_LOGS table created.")

# Adding feature columns
features_df = (
    df.with_column(
        "gpa_level",
        when(col("gpa") >= 8.5, lit("High"))
        .when(col("gpa") >= 7.0, lit("Medium"))
        .otherwise(lit("Low"))
    )
    .with_column(
        "attendance_status",
        when(col("attendance") >= 0.9, lit("Good"))
        .when(col("attendance") >= 0.8, lit("Average"))
        .otherwise(lit("Poor"))
    )
)

#Saving features into a new table
features_df.write.save_as_table("STUDENT_FEATURES", mode="overwrite")
print("STUDENT_FEATURES table created with engineered columns!")


session.table("STUDENT_FEATURES").show()


For Model training, I used Jupyter Notebook