# CHAOS: Learning Group Scenario

## Starting up

The following block is just a common boilerplate to ensure the modules are loaded properly.
Execute the code cells step-by-step by clicking on the "play" button.

In [None]:
import os
import sys

root = os.path.abspath(os.path.join('..'))
if root not in sys.path:
    sys.path.append(root)
    print("Added chaos root to syspath")
    os.chdir(root)
    print("Changed working directory to project root")

import chaos

Here, we import the needed modules for this scenario and set-up logging:

In [None]:
import logging.config
import math

import pandas as pd
import yaml

from chaos.fetch.local import CsvSource
from chaos.process.extract.graph import GraphEdgeMapper, GraphPopularityExtractor
from chaos.process.pipeline import SequentialDataPipeline
from chaos.recommend.evaluate.evaluator import LFMEvaluator
from chaos.recommend.predict.predictor import LFMPredictor
from chaos.recommend.predict.reciprocal import ReciprocalWrapper, ArithmeticStrategy
from chaos.recommend.translator import LFMTranslator
from chaos.shared.user import User

with open('data/logging.yml') as logging_cfg_file:
    logging_cfg = yaml.safe_load(logging_cfg_file)
    logging.config.dictConfig(logging_cfg)

logger = logging.getLogger(__name__)

## The workflow

Let's go through the following workflow:

![Workflow](img/workflow.png)

We skip the first task, since the interactions/features are already prepared for this simple "learning group" example.

First, let's source the data. For this scenario, we simply load a CSV that located within the project's directory:

In [None]:
data = CsvSource(CsvSource.RES_ROOT / 'learning-group').source_data()
data.user_df # This will show the data frame here

Now that we see the profile data of the users, let's have a quick look at the interaction graph:

In [None]:
data.interaction_graph.draw()
data.describe()

## Process Data Model

As we can see in the graph, there are some interaction edges that have a particular high interaction **accumulated interaction strength**. 

This might lead to problems when we generate recommendations so that only the nodes with an incoming high strength are recommended.

Let's try to change that with the following pipeline:

In [None]:
pipeline = SequentialDataPipeline([
    GraphEdgeMapper(
        capacity=lambda e: e.strength,
        cost=lambda e: 1 / e.strength
    ),
    # Map other useful attributes for usage in algorithms:
    GraphPopularityExtractor('popularity', quantiles=3, metrics=('eigenvector', 'degree'),
                             labels=['low', 'medium', 'high'], add_as_node_attrib=True),
    # Discount too edges towards too popular nodes:
    GraphEdgeMapper(
        strength=lambda e: e.strength - (0.7 * e.strength * e.v.data['degree'])
    ),
    GraphEdgeMapper(
        strength=lambda e: math.log(e.strength)
    )
])

dm = pipeline.execute(data)
dm.interaction_graph.draw()

Looks better, right?
Especially the connections to nodes with "high" popularity are now reduced.

In [None]:
dm.user_df.loc[dm.user_df.popularity == 'high']

As a small extra, let's build and test the standard reciprocal candidate generator for user "Kai":

In [None]:
from chaos.recommend.candidates import * 
cg = CandidateGeneratorBuilder.build_reciprocal_default(dm)
cg.retrieve_candidates('Kai')

Only these users are considered for the recommendation process for this user.
They are reciprocally compatible with user "Kai" and it works as illustrated in the following sequence diagram:

![CG](img/cg.png)

Let's continue with the next tasks that are highly related to each other.

## Translate, train and evaluate

In the following, we first construct a translator that can be used to retrieve the interaction matrix in a model-compatible format.

Then, we train different `LFMPredictor`s based on different parameters:
1. Hybrid model wih course and indicator feature
2. Hybrid model with course, popularity (which we previously calculated) and indictor feature
3. A pure Collaborative Filtering model with no profile data feature embeddings.

In [None]:
translator = LFMTranslator(dm, [], use_indicator=True)
hp = {'learning_rate': 0.003, 'no_components': 32}
evaluator = LFMEvaluator(
    {
        'Hybrid, course with ID': LFMPredictor(LFMTranslator(dm, ['course'], use_indicator=True), **hp),
        'Hybrid, course + popularity with ID': LFMPredictor(
            LFMTranslator(dm, ['course', 'popularity'], use_indicator=True), **hp
        ),
        'Collaborative Filtering only': LFMPredictor(translator, **hp),
    }, translator.interaction_matrix, test_split=0.3
)
evaluator.run_all(epochs=range(0, 144, 4))

Let's output the chart report to find out which performed best visually and call the `best_of_all` method to confirm our finding:

In [None]:
evaluator.create_report()

In [None]:
res = evaluator.best_of_all('f1')
print(f"Best predictor for F1: {res.predictor} @ epoch {res.epoch} with {res.value}")
hybrid = evaluator[res.predictor]

## Recommend

Finally, let's sample some recommendations based on this evaluator.

But before, let's build a candidate generator that **filters out users that have already met for studying**:

In [None]:
cg = (CandidateGeneratorBuilder(DMCandidateRepo(data))
               .filter(PreferenceCG).cache().only_reciprocal()
               .filter(InteractionCG, include=True, include_new=True)
               .filter(StrategicCG, on_unknown_user=DMCandidateRepo(data))
               .build())


Let's re-create the "Hybrid, course with ID" predictor with the full spectrum of interactions with the newly created candidate generator:

In [None]:
predictor = LFMPredictor(LFMTranslator(data, ['course']), candidate_generator=cg)
predictor.train(10)

In [None]:
print("Recommended for user Kai", predictor.predict("Kai")) # known user from ISE
print("Recommended for user Stefan", predictor.predict("Stefan")) # known user from MCD
print("Recommended for user Ivan", predictor.predict("Ivan")) # known during training, but cold-start user
print("Recommended for unknown user from MCD", predictor.predict(User.from_data({'course': "MCD"}))) # unknown during training and cold-start user

## Done!

You might tweak around and change some code blocks or build an entirely knew interaction graph.
The stage is yours!