# Life 101

In this notebook, we create a simple analogy between Wikispeedia game and "Life". We found out that some advices given in the game are also applicable in real life and we can learn from them! 

# Data Preparation

In [None]:
from numpy.lib.utils import source

import src.utils as utils
import src.LLM as LLM
# automatically reload the module in case of changes
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
finished_paths = utils.load_dataframe("./data/wikispeedia_paths-and-graph/paths_finished.tsv", skip_rows=16, columns=["hashedIpAddress", "timestamp", "durationInSec", "path", "rating"])

unfinished_paths = utils.load_dataframe("./data/wikispeedia_paths-and-graph/paths_unfinished.tsv", skip_rows=16, columns=["hashedIpAddress", "timestamp", "durationInSec", "path", "target", "type"])

categories = utils.load_dataframe("./data/wikispeedia_paths-and-graph/categories.tsv", skip_rows=12, columns=["page", "category"])

links_df = utils.load_dataframe("./data/wikispeedia_paths-and-graph/links.tsv", skip_rows=12, columns=["source", "target"])

In [None]:
categories_dict = utils.manage_categories(categories.copy())
finished_paths_df = utils.manage_paths(finished_paths.copy(), categories_dict.copy())
unfinished_paths_df = utils.manage_paths(unfinished_paths.copy(), categories_dict.copy())
links_dict = utils.manage_links(links_df.copy())
graph = utils.create_graph(links_df.copy())
finished_paths_df.to_csv("./data/clean_data/clean_finished_paths.csv")
unfinished_paths_df.to_csv("./data/clean_data/clean_unfinished_paths.csv")


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  paths_df["rating"].fillna(-1, inplace=True)


# Don't trust LLMs blindly! (Not you Copilot, you know I love you)

We will compare people performance to LLM performance by condidering the most taken paths by people and launching them on a 4bit quantized version of Qwen-3b. A fairly small LLM that could run on our machines/Colab in a reasonable time.

## Most taken paths by players

In [None]:
top_paths = finished_paths.groupby(["source", "target"]).size().sort_values(ascending=False).head(10)
sources = top_paths.index.get_level_values(0).tolist()
targets = top_paths.index.get_level_values(1).tolist()
paths = LLM.llm_paths(sources, targets, links_dict)

(['Asteroid',
  'Brain',
  'Theatre',
  'Pyramid',
  'Batman',
  'Bird',
  'Batman',
  'Bird',
  'Beer',
  'Batman'],
 ['Viking',
  'Telephone',
  'Zebra',
  'Bean',
  'Wood',
  'Great_white_shark',
  'The_Holocaust',
  'Adolf_Hitler',
  'Sun',
  'Banana'])