# **Term Extraction Ensemble (BERT + spaCy + Dictionary)**

This notebook implements a complete post-processing pipeline for the ATE-IT Subtask A (Automatic Term Extraction).  
It takes the raw predictions from a fine-tuned **BERT token classification model** and combines them with **spaCy noun-chunk spans** and a **gold-derived domain vocabulary** to produce a higher-quality list of domain terms for each sentence.

### Pipeline Summary
1. **Load BERT and spaCy predictions**  
   - Import model outputs in ATE-IT JSON format.  
   - Map predictions to sentence identifiers for easy lookup.

2. **Normalize and clean BERT terms**  
   - Remove punctuation, unify quotes, lowercase, collapse whitespace.  
   - Filter out spurious or generic one-word candidates.

3. **Build a domain vocabulary from the gold training set**  
   - Normalize gold terms.  
   - Track frequencies to identify strong (repeated) vs. weak (rare) terms.

4. **Merge BERT + spaCy + Dictionary knowledge**  
   - **Upgrade** short BERT terms to longer spaCy spans when they form a valid multi-word expression present in the gold vocabulary.  
   - **Add** additional spaCy multi-word spans only if they appear in the gold vocabulary.  
   - **Filter out** generic, meaningless, or uninformative unigrams.  
   - **Normalize and deduplicate** final terms.

5. **Generate final ensemble predictions**  
   - For each sentence, produce an improved term list combining all signals.  
   - Output saved in ATE-IT JSON format.

### Goal
The notebook improves recall and precision of automatic term extraction by combining:
- contextual predictions (BERT),
- linguistic structure (spaCy),
- and domain consistency (gold vocabulary).

This hybrid ensemble typically outperforms each component alone.


In [29]:
import json, os
def load_json(path: str):
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)
    

def save_json(obj, path: str):
    os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
    with open(path, "w", encoding="utf-8") as f:
        json.dump(obj, f, ensure_ascii=False, indent=2)
    print(f"✓ Saved cleaned predictions to {path}")

In [30]:
DATA_DIR = "../data/"
PRED_DIR = "../src/predictions/"

TRAIN_FILE = os.path.join(DATA_DIR, "subtask_a_train.json")
DEV_FILE = os.path.join(DATA_DIR, "subtask_a_dev.json")

# BERT_DEV_PRED_FILE = os.path.join(
#     PRED_DIR, "subtask_a_dev_bert_token_classification_preds_clean.json"
# )
BERT_DEV_PRED_FILE = os.path.join(
    PRED_DIR, "subtask_a_dev_bert_preds_2e-5_changed_cleaned.json"
)
SPACY_DEV_PRED_FILE = os.path.join(
    PRED_DIR, "subtask_a_dev_spacy_trained_preds.json"
)

ENSEMBLE_OUT_FILE = os.path.join(
    PRED_DIR, "subtask_a_dev_ensemble_bert_2e-5_changed_spacy_dictfilter2.json"
)

os.makedirs(PRED_DIR, exist_ok=True)

In [31]:
print(BERT_DEV_PRED_FILE)
print(SPACY_DEV_PRED_FILE)

../src/predictions/subtask_a_dev_bert_preds_2e-5_changed_cleaned.json
../src/predictions/subtask_a_dev_spacy_trained_preds.json


In [32]:
import re
import unicodedata

def norm(t: str) -> str:
    """
    Canonical normalization used everywhere for:
      - train vocabulary
      - predictions
      - matching / dedup
    """
    if not t:
        return ""
    t = t.lower()
    t = unicodedata.normalize("NFKC", t)
    t = t.replace("’", "'").replace("`", "'")
    t = t.replace("“", '"').replace("”", '"')
    t = " ".join(t.split())
    # strip punctuation at boundaries
    t = t.strip(".,;:-'\"()[]{}")
    return t


## Train vocabulary from gold terms

In [33]:


def build_train_vocab(train_data: dict) -> set:
    """
    Build a normalized vocabulary of gold terms from the training set.
    Each term is normalized with `norm`.
    """
    vocab = set()
    for entry in train_data["data"]:
        for term in entry.get("term_list", []):
            n = norm(term)
            if n:
                vocab.add(n)
    return vocab


def build_term_map(pred_json: dict) -> dict:
    """
    Build a mapping:
        (document_id, paragraph_id, sentence_id) -> list of predicted terms
    from a prediction JSON in the ATE-IT format.
    """
    m = {}
    for e in pred_json["data"]:
        key = (e["document_id"], e["paragraph_id"], e["sentence_id"])
        m[key] = e.get("term_list", []) or []
    return m

In [34]:
from collections import Counter

def build_train_vocab_with_freq(train_data):
    freq = Counter()
    for e in train_data["data"]:
        for term in e.get("term_list", []):
            norm = norm(term)
            if norm:
                freq[norm] += 1
    strong = {t for t, c in freq.items() if c >= 3}
    weak   = {t for t, c in freq.items() if c == 1}
    return freq, strong, weak


### Generic / acronyms handling

In [35]:
GENERIC_HEADS = {
    "rifiuti", "materiali", "utenti", "plastica", "carta",
    "residui", "tariffe", "gestore", "servizio", "modalità",
    "conferimento", "costi", "parte", "quota", "impianto"
}
GENERIC_BAD = {
    "parte", "gestione", "città", "territorio", "comune",
    "ore", "no", "si", "anno", "mese", "giorno"
} 
def looks_like_acronym(n: str) -> bool:
    """
    Heuristic for acronyms:
      - remove dots
      - length 2–6
      - alphanumeric with at least one letter
    E.g. 'tmb', 'raee', 'r.a.e.e', 'tari'.
    """
    n_clean = n.replace(".", "")
    if not (2 <= len(n_clean) <= 6):
        return False
    # at least one letter
    has_letter = any(ch.isalpha() for ch in n_clean)
    if not has_letter:
        return False
    # letters and digits are allowed (for things like R1)
    if not all(ch.isalnum() for ch in n_clean):
        return False
    return True

In [36]:
def filter_generic_unigrams(terms, train_vocab_norm):
    """
    Filter out generic one-word heads ONLY if:
      - they are in GENERIC_HEADS
      - they do NOT appear in the gold vocabulary
      - they do NOT look like acronyms
    All other unigrams are kept.
    """
    filtered = []
    for t in terms:
        n = norm(t)
        tokens = n.split()
        if len(tokens) == 1:
            if n in GENERIC_HEADS and n not in train_vocab_norm and not looks_like_acronym(n):
                # drop generic heads not validated by gold vocab
                continue
        filtered.append(t)
    return filtered

### Multiword upgrade logic

In [37]:
def contains_as_subspan(longer: str, shorter: str) -> bool:
    """
    Check if `shorter` (already normed) appears as a contiguous
    token subsequence inside `longer` (already normed).
    """
    long_tokens = longer.split()
    short_tokens = shorter.split()
    L, S = len(long_tokens), len(short_tokens)
    if S > L or S == 0:
        return False
    for i in range(L - S + 1):
        if long_tokens[i:i+S] == short_tokens:
            return True
    return False

In [38]:
def upgrade_with_longer_spacy(bert_terms, spacy_terms, train_vocab_norm):
    """
    Upgrade BERT terms to longer spaCy spans ONLY WHEN BENEFICIAL.

    For each BERT term:
      - search among spaCy spans:
          * multiword (len >= 2)
          * present in train_vocab_norm
          * whose tokens contain the BERT term tokens as a subspan
      - pick the longest such span if any, otherwise keep BERT term.

    This is what helps recover multiword FNs like
    'materiali ferrosi', 'modalità di conferimento', ecc.
    """
    final = []
    seen = set()
    
    spacy_norm_map = {norm(t): t for t in (spacy_terms or [])}

    for b in bert_terms or []:
        b_norm = norm(b)
        if not b_norm or b_norm in GENERIC_BAD:
            continue

        best = None

        for s_norm, s in spacy_norm_map.items():
            # only multiword spaCy spans
            if len(s_norm.split()) < 2:
                continue
            # keep only spans that appear as gold terms
            if s_norm not in train_vocab_norm:
                continue
            # check if BERT term tokens appear inside the spaCy span
            if contains_as_subspan(s_norm, b_norm):
                if best is None or len(s_norm.split()) > len(norm(best).split()):
                    best = s

        chosen = best if best else b
        c_norm = norm(chosen)

        if c_norm not in seen and c_norm not in GENERIC_BAD:
            final.append(chosen)
            seen.add(c_norm)
    return final



### Dictionary-based multiword mining (train vocab)


In [39]:

def find_dict_multiwords_in_sentence(sentence_text: str, train_vocab_norm: set):
    """
    Use the train gold vocabulary as a gazetteer:
    - normalize sentence_text
    - return all multiword terms from train_vocab_norm whose tokens
      appear as a contiguous subsequence in the sentence.
    """
    sent_norm = norm(sentence_text)
    sent_tokens = sent_norm.split()
    results = []

    for term in train_vocab_norm:
        # focus on multiword only
        term_tokens = term.split()
        if len(term_tokens) < 2:
            continue
        if contains_as_subspan(sent_norm, term):
            results.append(term)

    return results

### Merge BERT + spaCy + dictionary

In [40]:
def merge_bert_spacy_with_dict(bert_terms, spacy_terms, train_vocab_norm):
    """
    Ensemble strategy:
      1) Upgrade BERT spans to longer valid spaCy spans when possible.
      2) Add extra spaCy multiword spans if:
           - multiword
           - present in train gold vocabulary
           - not already included (by normalized form)
           - not in GENERIC_BAD
    """
    upgraded = upgrade_with_longer_spacy(
        bert_terms=bert_terms,
        spacy_terms=spacy_terms,
        train_vocab_norm=train_vocab_norm,
    )

    final = upgraded[:]
    seen = {norm(t) for t in upgraded}

    for s in spacy_terms or []:
        s_norm = norm(s)

        # only multiword additions here
        if len(s_norm.split()) < 2:
            continue
        if s_norm not in train_vocab_norm:
            continue
        if s_norm in seen:
            continue
        if s_norm in GENERIC_BAD:
            continue

        final.append(s)
        seen.add(s_norm)
    return final

In [41]:
def merge_sentence(bert_terms, spacy_terms, train_vocab_norm, sentence_text: str):
    """
    High-level per-sentence merge:
      - combine BERT and spaCy with vocab constraints
      - add dictionary multiwords that appear in the sentence
      - filter generic unigrams (but keep all terms seen in gold)
      - normalize and deduplicate.
    """
    # 1) merge BERT + spaCy
    merged = merge_bert_spacy_with_dict(
        bert_terms=bert_terms,
        spacy_terms=spacy_terms,
        train_vocab_norm=train_vocab_norm,
    )

    # 2) gazetteer: add multiword from train vocab that appear in the sentence
    dict_terms = find_dict_multiwords_in_sentence(sentence_text, train_vocab_norm)
    merged.extend(dict_terms)

    # 3) remove only truly generic unigrams
    merged = filter_generic_unigrams(merged, train_vocab_norm)

    # 4) dedupe by normalized form
    seen = set()
    final = []
    for t in merged:
        n = norm(t)
        if n not in seen:
            final.append(n)   # output normalized term
            seen.add(n)
    return final


In [42]:
                   
def micro_f1_score(gold_standard, system_output):
  """
  Evaluates a term extraction system's performance using Precision, Recall,
  and F1 score based on individual term matching (micro-average).

  Args:
    gold_standard: A list of lists, where each inner list contains the
        gold standard terms for an item.
    system_output: A list of lists, where each inner list contains the
                   terms extracted by the system for the corresponding item.

  Returns:
    A tuple containing the Precision, Recall, and F1 score.
  """
  total_true_positives = 0
  total_false_positives = 0
  total_false_negatives = 0

  # Iterate through each item's gold standard and system output terms
  for gold, system in zip(gold_standard, system_output):
    # Convert to sets for efficient comparison
    gold_set = set(gold)
    system_set = set(system)

    # Calculate True Positives, False Positives, and False Negatives for the current item
    true_positives = len(gold_set.intersection(system_set))
    false_positives = len(system_set - gold_set)
    false_negatives = len(gold_set - system_set)

    # Accumulate totals across all items
    total_true_positives += true_positives
    total_false_positives += false_positives
    total_false_negatives += false_negatives

  # Calculate Precision, Recall, and F1 score (micro-average)
  precision = total_true_positives / (total_true_positives + total_false_positives) if (total_true_positives + total_false_positives) > 0 else 0
  recall = total_true_positives / (total_true_positives + total_false_negatives) if (total_true_positives + total_false_negatives) > 0 else 0
  f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

  return precision, recall, f1

In [43]:
def type_f1_score(gold_standard, system_output):
  """
  Evaluates a term extraction system's performance using Type Precision,
  Type Recall, and Type F1 score based on the set of unique terms extracted
  at least once across the entire dataset.

  Args:
    gold_standard: A list of lists, where each inner list contains the
                   gold standard terms for an item.
    system_output: A list of lists, where each inner list contains the
                   terms extracted by the system for the corresponding item.

  Returns:
    A tuple containing the Type Precision, Type Recall, and Type F1 score.
  """

  # Get the set of all unique gold standard terms across the dataset
  all_gold_terms = set()
  for item_terms in gold_standard:
    all_gold_terms.update(item_terms)

  # Get the set of all unique system extracted terms across the dataset
  all_system_terms = set()
  for item_terms in system_output:
    all_system_terms.update(item_terms)

  # Calculate True Positives (terms present in both sets)
  type_true_positives = len(all_gold_terms.intersection(all_system_terms))

  # Calculate False Positives (terms in system output but not in gold standard)
  type_false_positives = len(all_system_terms - all_gold_terms)

  # Calculate False Negatives (terms in gold standard but not in system output)
  type_false_negatives = len(all_gold_terms - all_system_terms)

  # Calculate Type Precision, Type Recall, and Type F1 score
  type_precision = type_true_positives / (type_true_positives + type_false_positives) if (type_true_positives + type_false_positives) > 0 else 0
  type_recall = type_true_positives / (type_true_positives + type_false_negatives) if (type_true_positives + type_false_negatives) > 0 else 0
  type_f1 = 2 * (type_precision * type_recall) / (type_precision + type_recall) if (type_precision + type_recall) > 0 else 0

  return type_precision, type_recall, type_f1

###   BUILD BERT + SPACY ENSEMBLE USING merge_sentence()

In [44]:
from tqdm import tqdm
import json

# ---- Load train data and build vocabulary ----
with open(TRAIN_FILE, "r", encoding="utf-8") as f:
    train_data = json.load(f)

train_vocab_norm = build_train_vocab(train_data)
print(f"# unique normalized terms from train gold: {len(train_vocab_norm)}")

# ---- Load dev gold (for evaluation) ----
with open(DEV_FILE, "r", encoding="utf-8") as f:
    dev_data = json.load(f)

# ---- Load BERT and spaCy predictions ----
with open(BERT_DEV_PRED_FILE, "r", encoding="utf-8") as f:
    bert_pred = json.load(f)

with open(SPACY_DEV_PRED_FILE, "r", encoding="utf-8") as f:
    spacy_pred = json.load(f)

# Convert JSON predictions → dict[(doc,par,sent)] → [terms...]
bert_map = build_term_map(bert_pred)
spacy_map = build_term_map(spacy_pred)

# ---- Build ensemble predictions using merge_sentence ----
ensemble_output = {"data": []}

print("Building improved BERT+spaCy ensemble ...")

for idx, row in enumerate(tqdm(dev_data["data"])):

    key = (row["document_id"], row["paragraph_id"], row["sentence_id"])

    bert_terms = bert_map.get(key, []) or []
    spacy_terms = spacy_map.get(key, []) or []

    merged_terms = merge_sentence(
        bert_terms=bert_terms,
        spacy_terms=spacy_terms,
        train_vocab_norm=train_vocab_norm,
        sentence_text=row["sentence_text"],
    )

    # Debug on first 3
    if idx < 3:
        print("\n---------------------------------------")
        print("Sentence", idx)
        print("TEXT:", row["sentence_text"])
        print("  BERT  :", bert_terms)
        print("  SPACY :", spacy_terms)
        print("  MERGED:", merged_terms)

    ensemble_output["data"].append({
        "document_id": row["document_id"],
        "paragraph_id": row["paragraph_id"],
        "sentence_id": row["sentence_id"],
        "term_list": merged_terms,
    })






# unique normalized terms from train gold: 710
Building improved BERT+spaCy ensemble ...


  0%|          | 0/577 [00:00<?, ?it/s]


---------------------------------------
Sentence 0
TEXT: Non Domestica; CAMPEGGI, DISTRIBUTORI CARBURANTI, PARCHEGGI; 1,22; 4,73 
  BERT  : []
  SPACY : []
  MERGED: []

---------------------------------------
Sentence 1
TEXT: Il presente disciplinare per la gestione dei centri di raccolta comunali è stato redatto ai sensi e per effetto del DM 13/05/2009, pubblicato sulla G.U. n. 165 del 18/07/2009, con il quale sono state apportate le modifiche sostanziali al DM 08/04/2008, Disciplina dei centri di raccolta dei rifiuti urbani raccolti in modo differenziato, come previsto dall'art. 183, comma 7, lettera cc) del Dlgs 3 aprile 2006, n. 152, e ss.mm.ii.
  BERT  : ['disciplinare per la gestione dei centri di raccolta comunali', 'centri di raccolta dei rifiuti urbani raccolti in']
  SPACY : ['gestione dei centri di raccolta comunali', 'centri di raccolta dei rifiuti urbani raccolti']
  MERGED: ['disciplinare per la gestione dei centri di raccolta comunali', 'centri di raccolta dei rifiuti 

100%|██████████| 577/577 [00:00<00:00, 941.62it/s]


#### Save predictions

In [45]:
# ---- Save final merged predictions ----
with open(ENSEMBLE_OUT_FILE, "w", encoding="utf-8") as f:
    json.dump(ensemble_output, f, ensure_ascii=False, indent=2)

print(f"\nEnsemble predictions saved to: {ENSEMBLE_OUT_FILE}")



Ensemble predictions saved to: ../src/predictions/subtask_a_dev_ensemble_bert_2e-5_changed_spacy_dictfilter2.json


In [46]:

#Extract gold + predicted lists
dev_gold = [entry["term_list"] for entry in dev_data["data"]]
ensemble_preds = [entry["term_list"] for entry in ensemble_output["data"]]

precision, recall, f1 = micro_f1_score(dev_gold, ensemble_preds)
type_precision, type_recall, type_f1 = type_f1_score(dev_gold, ensemble_preds)

print("\n=====================================================")
print("    IMPROVED BERT + SPACY + DICTIONARY MERGE")
print("=====================================================")

print("\nMicro-averaged Metrics:")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")
print(f"  F1 Score:  {f1:.4f}")

print("\nType-level Metrics:")
print(f"  Type Precision: {type_precision:.4f}")
print(f"  Type Recall:    {type_recall:.4f}")
print(f"  Type F1 Score:  {type_f1:.4f}")


    IMPROVED BERT + SPACY + DICTIONARY MERGE

Micro-averaged Metrics:
  Precision: 0.6875
  Recall:    0.7561
  F1 Score:  0.7202

Type-level Metrics:
  Type Precision: 0.6667
  Type Recall:    0.7025
  Type F1 Score:  0.6841


In [47]:
import pandas as pd
def get_fp_fn_from_listformat(gold_entries, pred_entries):
    """
    gold_entries: list of rows from dev_data["data"]
    pred_entries: list of rows from ensemble_output["data"]
    
    Each entry has:
        - document_id
        - paragraph_id
        - sentence_id
        - term_list (list of terms)
    
    Returns DataFrames:
        fp_df (false positives)
        fn_df (false negatives)
    """

    gold_rows = []
    pred_rows = []

    # --- Expand GOLD ---
    for e in gold_entries:
        doc = e["document_id"]
        par = e["paragraph_id"]
        sid = e["sentence_id"]
        for t in e["term_list"]:
            t_norm = norm(t)
            if t_norm:
                gold_rows.append((doc, par, sid, t_norm))

    # --- Expand PRED ---
    for e in pred_entries:
        doc = e["document_id"]
        par = e["paragraph_id"]
        sid = e["sentence_id"]
        for t in e["term_list"]:
            t_norm = norm(t)
            if t_norm:
                pred_rows.append((doc, par, sid, t_norm))

    gold_set = set(gold_rows)
    pred_set = set(pred_rows)

    fp = pred_set - gold_set
    fn = gold_set - pred_set

    fp_df = pd.DataFrame(list(fp),
                         columns=["document_id", "paragraph_id", "sentence_id", "term"])
    fn_df = pd.DataFrame(list(fn),
                         columns=["document_id", "paragraph_id", "sentence_id", "term"])

    return fp_df, fn_df


In [48]:
gold_entries = dev_data["data"]      # gold JSON
pred_entries = ensemble_output["data"]  # merged predictions JSON

fp_df, fn_df = get_fp_fn_from_listformat(gold_entries, pred_entries)

print("False Positives:", len(fp_df))
print("False Negatives:", len(fn_df))

display(fp_df.tail(20))
display(fn_df.tail(20))


False Positives: 155
False Negatives: 110


Unnamed: 0,document_id,paragraph_id,sentence_id,term
135,doc_battipaglia_13,20,0,manutenzione verde pubblico
136,doc_salerno_06,64,1,utenze
137,doc_santegidiodelmontealbino_03,12,2,esporre
138,doc_salerno_05,7,6,vanno conferiti
139,doc_praiano_07,21,1,superficie ka
140,doc_sorrento_10,28,2,parte variabile
141,doc_capaccio_28,14,5,concorrente
142,doc_sorrento_05,3,0,gestione dei rifiuti
143,doc_poggiomarino_12,17,5,carta chimica
144,doc_salerno_06,27,1,svuotamento


Unnamed: 0,document_id,paragraph_id,sentence_id,term
90,doc_caserta_02,64,9,r.a.e.e
91,doc_nocerainferiore_06,4,0,campana
92,doc_praiano_05,10,0,raccolta/ritiro
93,doc_sorrento_10,60,1,ka
94,doc_santegidiodelmontealbino_03,15,0,allumino
95,doc_salerno_06,12,5,interruzione programmata del servizio
96,doc_marigliano_01,9,1,abbandono di rifiuti non pericolosi e non ingo...
97,doc_santegidiodelmontealbino_03,48,0,conferire
98,doc_salerno_06,12,15,autorizzazione unica ambientale
99,doc_auletta_13,36,1,gestore dello spazzamento e lavaggio delle strade


In [49]:
import pandas as pd

def build_sentence_error_table(dev_data, pred_data):
    """
    Build a table where each row corresponds to a sentence that has errors.
    Coherent with:
      - norm()
      - collect_sentence_errors()
      - merge structure
      - (doc, par, sent) key
    """

    # ---- Build gold map ----
    gold_map = {}
    for e in dev_data["data"]:
        key = (e["document_id"], e["paragraph_id"], e["sentence_id"])
        gold_map[key] = set(norm(t) for t in e.get("term_list", []))

    # ---- Build pred map ----
    pred_map = {}
    for e in pred_data["data"]:
        key = (e["document_id"], e["paragraph_id"], e["sentence_id"])
        pred_map[key] = set(norm(t) for t in e.get("term_list", []))

    # ---- Build rows ----
    rows = []
    for key in gold_map:
        doc, par, sent = key
        gold_set = gold_map[key]
        pred_set = pred_map.get(key, set())

        missing = gold_set - pred_set
        extra   = pred_set - gold_set

        if missing or extra:
            rows.append({
                "document_id": doc,
                "paragraph_id": par,
                "sentence_id": sent,
                "n_gold": len(gold_set),
                "n_pred": len(pred_set),
                "n_missing": len(missing),
                "missing_terms": sorted(missing),
                "n_extra": len(extra),
                "extra_terms": sorted(extra),
                "sentence_text": next(
                    r["sentence_text"] 
                    for r in dev_data["data"] 
                    if (r["document_id"], r["paragraph_id"], r["sentence_id"]) == key
                )
            })

    df = pd.DataFrame(rows)
    return df
errors_df = build_sentence_error_table(dev_data, ensemble_output)
print(f"Error table shape: {errors_df.shape}")
errors_df.head(20)


Error table shape: (132, 10)


Unnamed: 0,document_id,paragraph_id,sentence_id,n_gold,n_pred,n_missing,missing_terms,n_extra,extra_terms,sentence_text
0,doc_caserta_06,3,1,2,8,1,[disciplina dei centri di raccolta dei rifiuti...,7,"[centri di raccolta, centri di raccolta comuna...",Il presente disciplinare per la gestione dei c...
1,doc_poggiomarino_01,6,1,1,0,1,[raccolta],0,[],"È un Servizio Supplementare di raccolta, rivol..."
2,doc_nola_05,2,2,2,5,0,[],3,"[raccolta dei rifiuti, servizio di raccolta, s...",ll servizio di raccolta dei rifiuti derivanti ...
3,doc_poggiomarino_12,17,4,0,1,0,[],1,[carta],- giornali; - la carta per alimenti;
4,doc_capaccio_10,3,3,2,1,1,[sacchetto trasparente],0,[],MULTIMATERIALE; Sacchetto blu trasparente; Lun...
5,doc_salerno_05,11,2,1,2,0,[],1,[tessuto],"Indumenti usati, accessori, lenzuola, coperte,..."
6,doc_capaccio_15,7,1,1,2,0,[],1,[utenze turistiche],UTENZE TURISTICHE NON DOMESTICHE
7,doc_caserta_06,6,2,1,1,1,[gestione del centro di raccolta],1,[centro di raccolta],- alla vigilanza nel rispetto delle norme del ...
8,doc_poggiomarino_02,15,0,1,2,0,[],1,[secco residuale],Secco residuale non riciclabile
9,doc_capaccio_15,5,12,1,3,0,[],2,"[accumulatori al piombo, utenze domestiche]","PILE PORTATILI, BATTERIE E ACCUMULATORI AL PIO..."
