# Navigating Movie Scripts

In this notebook, we explain
how to do DEFT (DAG extraction from text)
using as a test case, the following 3 movie scripts by Pixar/Disney.

* [Toy Story](../m_scripts/toy-story.txt)
* [Up](../m_scripts/up.txt)
* [WALL-E](../m_scripts/wall-e.txt)

In [1]:
# this makes sure it starts looking for things from the project folder down.
import os
import sys
os.chdir('../')
sys.path.insert(0,os.getcwd())
print(os.getcwd())

C:\Users\rrtuc\Desktop\backed-up\python-projects\mappa_mundi


In [2]:
from utils import *
print_welcome_message()

Welcome Causal AI Navigator. We have been waiting for you for millenia. Where would you like us to go next?


## Global variables
All the global variables used by Mappa Mundi are  in this file.
Change with caution.


[my_globals.py](../my_globals.py)

## Originals

Here is the original form of the movie scripts, in the form they were downloaded
from the IMSDb website.

* [Toy Story](../m_scripts/toy-story.txt)
* [Up](../m_scripts/up.txt)
* [WALL-E](../m_scripts/wall-e.txt)

## Cleaning

The results of this step can be found in the **m_scripts_clean** 
directory.

* [Toy Story (clean)](../m_scripts_clean/toy-story.txt)
* [Up (clean)](../m_scripts_clean/up.txt)
* [WALL-E (clean)](../m_scripts_clean/wall-e.txt)

In [3]:
from cleaning import *

remove_dialog = False
# batch_file_names=my_listdir(M_SCRIPTS_DIR)
batch_file_names = ["toy-story.txt", "up.txt", "wall-e.txt"]
clean_batch_of_m_scripts(
    in_dir=M_SCRIPTS_DIR,
    out_dir=CLEAN_DIR if not remove_dialog else CLEAN_RD_DIR,
    batch_file_names=batch_file_names,
    remove_dialog=remove_dialog)

1046.
fetching toy-story.txt
indent prob dist = [(0, 1.0), (1, 0.0)]
dialog indents= [0]
narration indents= []
1072.
fetching up.txt
indent prob dist = [(0, 1.0)]
dialog indents= [0]
narration indents= []
1085.
fetching wall-e.txt
indent prob dist = [(0, 0.998), (1, 0.002)]
dialog indents= [0]
narration indents= []


## Spell-checking

The results of this step can be found in the **m_scripts_spell** directory

* [Toy Story (spell)](../m_scripts_spell/toy-story.txt)
* [Up (spell)](../m_scripts_spell/up.txt)
* [WALL-E (spell)](../m_scripts_spell/wall-e.txt)

In [4]:
from spell_checking import *
use_local_dict=True
error_type = "all"

print("use_local_dict=", use_local_dict)
print("error_type=", error_type)
print("SPELLING_CORRECTION_RISK=", SPELLING_CORRECTION_RISK)
print()

in_dir = "m_scripts_clean"
out_dir = "m_scripts_spell"
batch_file_names = my_listdir(in_dir)
correct_this_batch_of_files(in_dir,
                            out_dir,
                            batch_file_names,
                            error_type= error_type,
                            verbose=False,
                            use_local_dict=use_local_dict)

use_local_dict= True
error_type= all
SPELLING_CORRECTION_RISK= 1e-08

1.
toy-story.txt
all changes: [('puttin', 'putting'), ('whirrrr', 'whirrr'), ('cuttin', 'cutting'), ('lyin', 'lying'), ('fellahs', 'fellas')]
2.
up.txt
all changes: [('raaar', 'radar'), ('shoos', 'shows')]
3.
wall-e.txt
all changes: [('spork', 'spark'), ('difficul', 'difficult'), ('neee', 'need')]


## Simplifying

The results of this step can be found in the **m_scripts_simp** directory

* [Toy Story (simp)](../m_scripts_simp/toy-story.txt)
* [Up (simp)](../m_scripts_simp/up.txt)
* [WALL-E (simp)](../m_scripts_simp/wall-e.txt)

In [5]:
from simplifying import *

in_dir = "m_scripts_spell"
out_dir = "m_scripts_simp"
batch_file_names = my_listdir(in_dir)[0:3]
simplify_batch_of_m_scripts(
    in_dir, out_dir,
    batch_file_names,
    verbose=False)

1. toy-story.txt
2. up.txt
3. wall-e.txt


## DAG Atlas creation


The results of this step found in the **m_scripts_dag_atlas** directory. They are 3 pickled 
files, one for each of the 3 movie scripts. They will be opened in the next step.

In [6]:
from DagAtlas import *



## Visualizing

In this step, we a draw DAG for each of the 3 movie scripts, based on the
pickled files in the **movie_scripts_dag_atlas** directory.
We do this for 2 arrow repetition thresholds: 3, 4.

In [7]:
from Dag import *

def visualize_all_dags(reps_threshold, draw):

    dag_dir = "m_scripts_dag_atlas"
    simp_dir = "m_scripts_simp"
    clean_dir = "m_scripts_clean"
    file_names = [file_name for
                  file_name in my_listdir(dag_dir)[0:3]]
    dags = []
    for fname in file_names:
        path = dag_dir + "/" + fname
        with open(path, "rb") as f:
            dag = pik.load(f)
            dags.append(dag)
    for dag in dags:
        print("-------------------------")
        print(dag.m_title)
        hreps_arrows = dag.build_high_reps_arrows(
            reps_threshold)
        print({arrow_str(arrow):dag.arrow_to_reps[arrow] \
               for arrow in hreps_arrows})
        print()
        if draw:
            dag.draw(reps_threshold, jupyter=True)
            dag.print_map_legend(clean_dir, simp_dir, reps_threshold)

In [8]:
# visualize_all_dags(reps_threshold=4, draw=True)

In [9]:
# visualize_all_dags(reps_threshold=3, draw=True)