# Social Science Concept Integration Analysis
This notebook performs the analysis of various sentence transformer models for concept integration.
It uses a modular configuration setup for flexibility.

In [1]:
import pandas as pd
import model_utils
import config

# Reproducibility Setup
SEED = model_utils.setup_reproducibility(config.SEED)

  from .autonotebook import tqdm as notebook_tqdm


## 1. Load and Prepare Data
Load positive and negative pairs datasets using paths from the configuration.

In [2]:
pos_path = config.DATA_PATHS['pos_pairs']
neg_path = config.DATA_PATHS['neg_pairs']

df = model_utils.load_and_prepare_data(pos_path, neg_path, SEED)
labels = df['label'].values
print(f"Total Rows: {len(df)}")
df.head()

Loading datasets...
Positive samples: 3686
Negative samples (total): 2884741
Negative samples (sampled): 3686
Total Rows: 7372


Unnamed: 0,term1,term2,concept,concept_uri,label,concept1_uri,concept2_uri
0,DONOR COUNTRIES,AID GIVING COUNTRIES,DONOR COUNTRIES,https://elsst.cessda.eu/id/5/eb6b58a5-4be4-49a...,1,,
1,HEALTH SERVICES,MEDICAL SERVICES,HEALTH SERVICES,https://elsst.cessda.eu/id/5/38854495-5f8e-4a5...,1,,
2,TEACHERS OF STUDENTS WITH SPECIAL EDUCATIONAL ...,SEN TEACHERS,TEACHERS OF STUDENTS WITH SPECIAL EDUCATIONAL ...,https://elsst.cessda.eu/id/5/10f297bb-8002-4e5...,1,,
3,NURSES,DANCE,,,0,https://elsst.cessda.eu/id/5/52f955ef-0871-437...,https://elsst.cessda.eu/id/5/57ee4ccc-d457-4b0...
4,CRIMES AGAINST HUMANITY,HUMANITARIAN CRIMES,CRIMES AGAINST HUMANITY,https://elsst.cessda.eu/id/5/436aaebf-5594-4bb...,1,,


## 2. Models to Evaluate
We iterate through models defined in `config.py`.
Models are categorized as 'sentence' (SentenceTransformer) or 'token' (Transformer + Mean Pooling).

In [3]:
models_to_test = config.MODELS
print("Models to be evaluated:")
for key, conf in models_to_test.items():
    print(f"- {conf['display_name']} ({key}) -> Type: {conf['type']}")

Models to be evaluated:
- All-MPNet-Base-v2 (all-mpnet-base-v2) -> Type: sentence_transformer
- MPNet-Personality (dwulff/mpnet-personality) -> Type: sentence_transformer
- SciBERT (SciVocab) (allenai/scibert_scivocab_uncased) -> Type: token_embedding_mean_pool
- BERT Base (bert-base-uncased) -> Type: token_embedding_mean_pool


## 3. Analysis Loop
Iterate through each model:
1. Load the model based on type.
2. Compute embeddings and cosine similarities.
3. Evaluate metrics across thresholds defined in config.
4. Generate high-quality visualizations.

In [4]:
all_summary_data = []

# Map raw names to display names for later
display_names_map = {k: v['display_name'] for k, v in config.MODELS.items()}

for model_name, model_conf in models_to_test.items():
    print(f"\n{'='*50}")
    print(f" Processing: {model_conf['display_name']}")
    print(f"{'='*50}")

    # Load with specific type (Sentence vs Token)
    model = model_utils.load_model(model_name, model_type=model_conf['type'])
    if model is None:
        continue

    # Compute similarity
    similarities = model_utils.compute_embeddings_and_similarity(
        model, df, batch_size=config.BATCH_SIZE
    )
    
    # Evaluate
    current_model_results, best_threshold = model_utils.evaluate_model(
        model_name, similarities, labels, thresholds=config.THRESHOLDS
    )
    all_summary_data.extend(current_model_results)
    
    # Plot Individual
    model_utils.plot_individual_performance(
        current_model_results, 
        model_name, 
        best_threshold,
        display_name=model_conf['display_name']
    )


 Processing: All-MPNet-Base-v2
Loading Model (all-mpnet-base-v2)...


'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 817c0682-eafa-43d9-82b3-df2b3915cb4d)')' thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./README.md
Retrying in 1s [Retry 1/5].


Encoding sentences...
Computing Cosine Similarities...

--- Results per Threshold (all-mpnet-base-v2) ---
Threshold  | F1         | Precision  | Recall     | PosAcc     | NegAcc    
--------------------------------------------------------------------------------
0.10       | 0.7342     | 0.5809     | 0.9976     | 0.9976     | 0.2802
0.11       | 0.7439     | 0.5935     | 0.9965     | 0.9965     | 0.3174
0.12       | 0.7549     | 0.6082     | 0.9948     | 0.9948     | 0.3592
0.13       | 0.7678     | 0.6258     | 0.9932     | 0.9932     | 0.4061
0.14       | 0.7807     | 0.6433     | 0.9927     | 0.9927     | 0.4495
0.15       | 0.7934     | 0.6615     | 0.9910     | 0.9910     | 0.4929
0.16       | 0.8077     | 0.6823     | 0.9897     | 0.9897     | 0.5391
0.17       | 0.8184     | 0.6986     | 0.9878     | 0.9878     | 0.5738
0.18       | 0.8299     | 0.7165     | 0.9859     | 0.9859     | 0.6099
0.19       | 0.8423     | 0.7359     | 0.9848     | 0.9848     | 0.6465
0.20       | 0.85

'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: d29153dc-731f-47ef-80b0-1b38385b26c5)')' thrown while requesting HEAD https://huggingface.co/dwulff/mpnet-personality/resolve/main/./config_sentence_transformers.json
Retrying in 1s [Retry 1/5].
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 9fe802cc-1592-4fb1-bb6e-43a5cc97f4cd)')' thrown while requesting HEAD https://huggingface.co/dwulff/mpnet-personality/resolve/main/./README.md
Retrying in 1s [Retry 1/5].


Encoding sentences...
Computing Cosine Similarities...

--- Results per Threshold (dwulff/mpnet-personality) ---
Threshold  | F1         | Precision  | Recall     | PosAcc     | NegAcc    
--------------------------------------------------------------------------------
0.10       | 0.7683     | 0.6266     | 0.9929     | 0.9929     | 0.4083
0.11       | 0.7884     | 0.6543     | 0.9916     | 0.9916     | 0.4761
0.12       | 0.8070     | 0.6812     | 0.9900     | 0.9900     | 0.5366
0.13       | 0.8286     | 0.7138     | 0.9872     | 0.9872     | 0.6042
0.14       | 0.8480     | 0.7455     | 0.9832     | 0.9832     | 0.6644
0.15       | 0.8644     | 0.7744     | 0.9780     | 0.9780     | 0.7151
0.16       | 0.8757     | 0.7945     | 0.9753     | 0.9753     | 0.7477
0.17       | 0.8926     | 0.8236     | 0.9742     | 0.9742     | 0.7914
0.18       | 0.9030     | 0.8437     | 0.9712     | 0.9712     | 0.8201
0.19       | 0.9156     | 0.8682     | 0.9685     | 0.9685     | 0.8530
0.20      

'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: aaa6ced3-aae3-4df7-8dd8-4c088e1bc084)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json
Retrying in 1s [Retry 1/5].


Encoding sentences...
Computing Cosine Similarities...

--- Results per Threshold (bert-base-uncased) ---
Threshold  | F1         | Precision  | Recall     | PosAcc     | NegAcc    
--------------------------------------------------------------------------------
0.10       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.11       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.12       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.13       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.14       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.15       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.16       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.17       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.18       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.19       | 0.6667     | 0.5000     | 1.0000     | 1.0000     | 0.0000
0.20       | 0.66

## 4. Aggregate Results & Comparison
Compare all models in a single high-quality chart.

In [None]:
summary_df = pd.DataFrame(all_summary_data)
summary_df.to_csv(config.DATA_PATHS['results_csv'], index=False)
print(f"\nDetailed results saved to '{config.DATA_PATHS['results_csv']}'")

# Plot Comparison
model_utils.plot_comparison(summary_df, model_display_names=display_names_map)

# Print Text Summary
model_utils.print_best_settings(summary_df)

# Social Science Concept Integration Analysis
This notebook performs the analysis of various sentence transformer models for concept integration.

In [5]:
import pandas as pd
import model_utils

# Reproducibility Setup
SEED = model_utils.setup_reproducibility()

## 1. Load and Prepare Data
Load positive and negative pairs datasets, balance them, and combine into a single dataframe.

In [2]:
pos_path = 'datasets/processed_datasets/train_positive_pairs.csv'
neg_path = 'datasets/processed_datasets/train_negative_pairs.csv'

df = model_utils.load_and_prepare_data(pos_path, neg_path, SEED)
labels = df['label'].values
df.head()

Loading datasets...
Positive samples: 3686
Negative samples (total): 2884741
Negative samples (sampled): 3686


Unnamed: 0,term1,term2,concept,concept_uri,label,concept1_uri,concept2_uri
0,DONOR COUNTRIES,AID GIVING COUNTRIES,DONOR COUNTRIES,https://elsst.cessda.eu/id/5/eb6b58a5-4be4-49a...,1,,
1,HEALTH SERVICES,MEDICAL SERVICES,HEALTH SERVICES,https://elsst.cessda.eu/id/5/38854495-5f8e-4a5...,1,,
2,TEACHERS OF STUDENTS WITH SPECIAL EDUCATIONAL ...,SEN TEACHERS,TEACHERS OF STUDENTS WITH SPECIAL EDUCATIONAL ...,https://elsst.cessda.eu/id/5/10f297bb-8002-4e5...,1,,
3,NURSES,DANCE,,,0,https://elsst.cessda.eu/id/5/52f955ef-0871-437...,https://elsst.cessda.eu/id/5/57ee4ccc-d457-4b0...
4,CRIMES AGAINST HUMANITY,HUMANITARIAN CRIMES,CRIMES AGAINST HUMANITY,https://elsst.cessda.eu/id/5/436aaebf-5594-4bb...,1,,


## 2. Models to Evaluate
Define the list of models to test. This includes standard Sentence Transformers and BERT variants.

In [3]:
models_to_test = [
    'all-mpnet-base-v2', 
    'dwulff/mpnet-personality', 
    'allenai/scibert_scivocab_uncased', 
    'bert-base-uncased'
]

## 3. Analysis Loop
Iterate through each model:
1. Load the model.
2. Compute embeddings and cosine similarities.
3. Evaluate performance across various thresholds.
4. Plot individual model performance.

In [5]:
all_summary_data = []

for model_name in models_to_test:
    print(f"\n{'='*40}")
    print(f" Processing Model: {model_name}")
    print(f"{'='*40}")

    model = model_utils.load_model(model_name)
    if model is None:
        continue

    similarities = model_utils.compute_embeddings_and_similarity(model, df)
    
    current_model_results, best_threshold = model_utils.evaluate_model(model_name, similarities, labels)
    all_summary_data.extend(current_model_results)
    
    model_utils.plot_individual_performance(current_model_results, model_name, best_threshold)


 Processing Model: all-mpnet-base-v2
Loading Model (all-mpnet-base-v2)...
Encoding sentences...
Computing Cosine Similarities...

--- Results per Threshold (all-mpnet-base-v2) ---
Threshold  | F1         | Precision  | Recall     | PosAcc     | NegAcc    
--------------------------------------------------------------------------------
0.10       | 0.7342     | 0.5809     | 0.9976     | 0.9976     | 0.2802
0.11       | 0.7439     | 0.5935     | 0.9965     | 0.9965     | 0.3174
0.12       | 0.7549     | 0.6082     | 0.9948     | 0.9948     | 0.3592
0.13       | 0.7678     | 0.6258     | 0.9932     | 0.9932     | 0.4061
0.14       | 0.7807     | 0.6433     | 0.9927     | 0.9927     | 0.4495
0.15       | 0.7934     | 0.6615     | 0.9910     | 0.9910     | 0.4929
0.16       | 0.8077     | 0.6823     | 0.9897     | 0.9897     | 0.5391
0.17       | 0.8184     | 0.6986     | 0.9878     | 0.9878     | 0.5738
0.18       | 0.8299     | 0.7165     | 0.9859     | 0.9859     | 0.6099
0.19       | 0

## 4. Aggregate Results
Save the results to a CSV file and display the comparison plot.

In [6]:
summary_df = pd.DataFrame(all_summary_data)
summary_df.to_csv('cv_results.csv', index=False)
print(f"\nDetailed results saved to 'cv_results.csv'")

model_utils.plot_comparison(summary_df)
model_utils.print_best_settings(summary_df)


Detailed results saved to 'cv_results.csv'
Generating comparison plot...
Graph saved to 'performance_comparison.png'

 BEST SETTINGS PER MODEL 
Model: all-mpnet-base-v2
  Optimal Threshold: 0.35
  Max F1 Score:      0.9375
  Precision:         0.9469
  Recall:            0.9284
----------------------------------------
Model: dwulff/mpnet-personality
  Optimal Threshold: 0.24
  Max F1 Score:      0.9411
  Precision:         0.9366
  Recall:            0.9457
----------------------------------------
Model: allenai/scibert_scivocab_uncased
  Optimal Threshold: 0.75
  Max F1 Score:      0.8118
  Precision:         0.8338
  Recall:            0.7908
----------------------------------------
Model: bert-base-uncased
  Optimal Threshold: 0.69
  Max F1 Score:      0.7702
  Precision:         0.7834
  Recall:            0.7575
----------------------------------------
