<a href="https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/opentf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Extension to New Domains-Datasets


## Structure and Inheritance

### Dataset Structure
<p align="center"><img src='https://github.com/fani-lab/OpeNTF/blob/main/ipynb/src/cmn/dataset_hierarchy.png?raw=1' width="500" ></p>

To integrate a new dataset into the baseline, follow the structure of the `team` class. Additional fields can be added, like its derived classes. Ideally, only the `read_data()` function should be overriden.



In [None]:
import json
from cmn.member import Member
from cmn.team import Team

class Review(Team):
    def _init_(self, id, title, year, fos, reviewers):
        super().__init__(id, reviewers, fos, year)
        self.title = title

    @staticmethod
    def read_data(datapath, output, index, filter, settings):
        try:
            return super(Review, Review).load_data(output, index)
        except (FileNotFoundError, EOFError) as e:
            print(f"Pickles not found! Reading raw data from {datapath} (progress in bytes) ...")
            teams = {}; candidates = {}

            with open(datapath, "r", encoding='utf-8') as jf:
                for line in jf:
                    try:
                        if not line: break
                        jsonline = json.loads(line.lower().lstrip(","))
                        id = jsonline['id']
                        title = jsonline['title']
                        year = jsonline['year']

                        # a team must have skills and members
                        try: fos = jsonline['fos']
                        except: continue
                        try: reviewers = jsonline['reviewers']
                        except: continue

                        members = []
                        for reviewer in reviewers:
                            member_id = reviewer['id']
                            member_name = reviewer['name'].replace(" ", "_")
                            if (idname := f'{member_id}_{member_name}') not in candidates:
                                candidates[idname] = Member(member_id, member_name)
                                candidates[idname].skills.update(set(reviewer['expertise']))
                            members.append(candidates[idname])

                        team = Review(id, title, year, fos, members)
                        teams[team.id] = team
                    except json.JSONDecodeError as e:  # ideally should happen only for the last line ']'
                        print(f'JSONDecodeError: There has been error in loading json line `{line}`!\n{e}')
                        continue
                    except Exception as e:
                        raise e
            return super(Review, Review).read_data(teams, output, filter, settings)
        except Exception as e: raise e

## Extensiong to New Models
![Class Diagram of the Model baseline.](https://github.com/fani-lab/OpeNTF/blob/main/ipynb/new-class-diagram.png?raw=1)

To integrate a new model into the baseline, follow the `Ntf` class. Ideally, only the `learn()` method should be overriden, with `eval()` remaining the same for fair comparison.

In [None]:
import numpy as np
import keras
import pandas as pd

from mdl.ntf import Ntf
from mdl.cds import TFDataset
from cmn.team import Team
from cmn.tools import merge_teams_by_skills
from mdl.cds import SuperlossDataset
from mdl.superloss import SuperLoss

class Random(Ntf):
    def __init__(self):
        super(Random, self).__init__()

    def init(self):
        self.model = keras.Sequential()

    def learn(self, splits, indexes, vecs, params, prev_model, output):
        input_size = vecs['skill'].shape[1]
        output_size = len(indexes['i2c'])

        for foldidx in splits['folds'].keys():
            self.init(input_size=input_size, output_size=output_size)
            if prev_model: keras.saving.load_model((prev_model[foldidx]))

            keras.saving.save_model(self.model, f"{output}/state_dict_model.f{foldidx}.pt")