# Notebook 03: Feature Engineering

Demonstrates the feature extraction pipeline from `src/features.py`.

In [None]:
import sys
sys.path.append('../src')

import pandas as pd
from features import FeatureExtractor, extract_all_features

df = pd.read_csv('../data/processed_sample.csv')
print(f"Loaded {len(df)} samples")
df.head(3)

## Feature Groups

1. **TF-IDF**: Unigrams + Bigrams
2. **POS**: Noun/Verb/Adj/Adv/Pronoun percentages
3. **Negation**: Count + windowed negation-adjective
4. **Sentiment Lexicon**: Sum/Mean/Max/Min scores
5. **Stylistic**: Word/sentence length, punctuation, TTR
6. **Discourse**: First-sentence sentiment, discourse markers

In [None]:
extractor = FeatureExtractor(tfidf_max_features=5000, tfidf_ngram=(1,2))
extractor.fit_tfidf(df['text_clean'])

features = extract_all_features(df, extractor, include_tfidf=False)
print(f"Feature matrix shape: {features.shape}")
features.head()

In [None]:
print("Feature statistics:")
features.describe()

## Summary

âœ… **Phase 3 Complete**

Feature extraction implemented with 6 feature groups. Use `src/features.py` to generate features for full dataset.