# Laboratorio 3

> **Tiempo:** -

> **Entrega de informe:** -

> **Manuel Alejandro Ramos La Gambino** 


## Instalación

Para facilitar el proceso de instalación de esta actividad, trabajaremos con una máquina virtual que tendrá _casi_ todos los programas instalados.

Como motor de máquinas virtuales usaremos [Virtual Box](https://www.virtualbox.org/wiki/Downloads). Desde ese link tendrán que descargar la versión que mejor se ajuste a su sistema operativo. Luego desde los servidores de la universidad deben descargar la [máquina virtual](http://niebla.ing.puc.cl/diplomadobigdata/Lab%202.ova) ya configurada.

Finalmente debemos importar la máquina descargada dentro de Virtual Box, para ello deben seguir los siguientes pasos: Abrir virtual box > Archivo > Abrir servicio virtualizado, o bien `Crtl+I`.

**Observación:** la contraseña del usuario configurado es _ubuntu_.

Descargar este proyecto ya sea con `git` o mediante el botón de descargar y ejecutar `notebook`.

```bash
$ git clone https://github.com/stgolarrain/recsys-labs.git
$ cd recsys-labs/assignment-3
$ jupyter notebook
```

Si ya tienen el repositorio descargado en la máquina virtual, puedes actualizar el código del repositorio con el siguiente comando.

```bash
$ cd recsys-labs
$ git pull origin master
$ de assignment-3
```

Para más detalles de cómo utilizar git puedes revisar la documentación [git pull](https://git-scm.com/docs/git-pull).

Una vez descargada e importada la máquina virtual, puedes instalar las dependencias ejecutando la siguiente celda.

In [1]:
# Instalation setup
! pip3 install nltk
! pip3 install sklearn
! pip3 install gensim
! pip3 install pandas
! pip3 install numpy



# Instrucciones

En esta oportunidad tendrán que experimentar con la librería [gensim](https://radimrehurek.com/gensim/) para el modelamiento de tópicos latentes en textos. Gensim es una librería que implementa modelos de tópicos, específicamente tendrás que trabajar con [Latent Dirichlet Allocation (LDA)](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation). La librería también permite transformar texto no estructurado en diferentes representaciones vectoriales, tales como [TF-IDF](https://es.wikipedia.org/wiki/Tf-idf), y buscar similaridades mediante diferentes métricas de distancia.

Este laboratorio se divide en las siguientes secciones:

1. Preprocesamiento de Datos: en esta sección tendrás que descargar librerías de _Natural Language Tool Kit_, la cual implementa las funcines básicas necesarias para trabajar con texto (datos no estructurados) y transformarlos en una representación vectorial estructurada.
2. Modelo de Recomendaciones: en la segunda sección tendrás que entrenar un modelo de tópicos latentes (LDA)
3. Generar Recomendaciones: finalmente, tendrás que utilizar el modelo de tópicos para generar recomendaciones basadas en contenido para un usuario ficticio que ha consumido 3 documentos.

# 1. Preprocesamiento de Datos

In [2]:
import os
import nltk
import sklearn
import gensim
import string
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from collections import Counter
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer 
from gensim import corpora, models, similarities
from sklearn.neighbors import NearestNeighbors

Antes de comenzar debemos descargar las librerías de lenguaje de [NLTK](https://www.nltk.org/), ejecutando la siguiente celda:

In [3]:
# Download corpora
nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

Para comenzar cargaremos el set de datos en un _dataframe_ de _Pandas_, e imprimimos los 5 primeros registros para visualizar la estructura de los datos.

In [4]:
corpus_df = pd.read_csv('./dataset/corpus1.csv', sep='\t', header=None, encoding='latin')
corpus_df.columns = ['id', 'title', 'abstract']
corpus_df = corpus_df[['id', 'title', 'abstract']]
corpus_df[:5]

Unnamed: 0,id,title,abstract
0,100002,Nonlinear Shape Statistics in Mumford{Shah Bas...,We present a variational integration of nonlin...
1,100007,On the Complexity of Equilibria,"We prove complexity, approximability, and inap..."
2,100008,On QoS-Aware Publish-Subscribe,This position paper addresses the issue of sup...
3,10001,Checking Mergeable Priority Queues,We present an efficient algorithm which can ch...
4,100012,Mobile Code Security by Java Bytecode Instrume...,Mobile code provides significant opportunities...


Lo siguiente es implementar una función que transforme texto no estructurado a una lista de _tokens_ procesados.

In [5]:
stemm = False
stemmer = PorterStemmer()

def get_tokens(text):
    lowers = text.lower()
    no_punctuation = lowers.translate(string.punctuation)
    tokens = nltk.word_tokenize(no_punctuation)
    if stemm:
        tokens = map(stemmer.stem, tokens)
    return tokens

get_tokens("I'm a super student for recommender systems!")

['i', "'m", 'a', 'super', 'student', 'for', 'recommender', 'systems', '!']

**Pregunta** Explique en sus palabras qué hace la función `get_tokens()`

**Respuesta** 

- Cambia todas las palabras a minuscula.
- Separar el texto en vectores de palabras
- Iterar por cada palabra separadas para encontrar su raíz común

Ahora se tiene que generar un diccionario con todas las palabras del _corpus_. Se recomiendo revisar la documentación de gensim y leer cómo usar los diccionarios. [corpora.dictionary](https://radimrehurek.com/gensim/corpora/dictionary.html)

In [6]:
dic_file = './resources/dictionary-stemm.p' if stemm else './resources/dictionary.p'
if os.path.isfile(dic_file):
    dictionary = corpora.dictionary.Dictionary().load(dic_file)
else:
    dictionary = corpora.dictionary.Dictionary(documents=corpus_df.tokenised_abstract.tolist())
    dictionary.save(dic_file)
    
corpus_df['tokenised_abstract'] = corpus_df.abstract.map(get_tokens)
corpus_df[:5]

Unnamed: 0,id,title,abstract,tokenised_abstract
0,100002,Nonlinear Shape Statistics in Mumford{Shah Bas...,We present a variational integration of nonlin...,"[we, present, a, variational, integration, of,..."
1,100007,On the Complexity of Equilibria,"We prove complexity, approximability, and inap...","[we, prove, complexity, ,, approximability, ,,..."
2,100008,On QoS-Aware Publish-Subscribe,This position paper addresses the issue of sup...,"[this, position, paper, addresses, the, issue,..."
3,10001,Checking Mergeable Priority Queues,We present an efficient algorithm which can ch...,"[we, present, an, efficient, algorithm, which,..."
4,100012,Mobile Code Security by Java Bytecode Instrume...,Mobile code provides significant opportunities...,"[mobile, code, provides, significant, opportun..."


**Pregunta** Explique a qué corresponde la columan tokenised_abstract del dataframe

**Respuesta** La columna tokenised_abstract son los valores de la columna abstract pero tokenizados

In [7]:
corpus_df['bow'] = corpus_df.tokenised_abstract.map(dictionary.doc2bow)
del corpus_df['tokenised_abstract']
corpus = corpus_df['bow'].tolist()
corpus_df[:5]

Unnamed: 0,id,title,abstract,bow
0,100002,Nonlinear Shape Statistics in Mumford{Shah Bas...,We present a variational integration of nonlin...,"[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1..."
1,100007,On the Complexity of Equilibria,"We prove complexity, approximability, and inap...","[(4, 5), (7, 2), (8, 1), (10, 1), (15, 2), (30..."
2,100008,On QoS-Aware Publish-Subscribe,This position paper addresses the issue of sup...,"[(1, 1), (4, 6), (7, 1), (15, 3), (16, 1), (22..."
3,10001,Checking Mergeable Priority Queues,We present an efficient algorithm which can ch...,"[(4, 7), (7, 2), (8, 1), (10, 1), (15, 7), (16..."
4,100012,Mobile Code Security by Java Bytecode Instrume...,Mobile code provides significant opportunities...,"[(4, 4), (5, 1), (7, 1), (10, 2), (15, 6), (16..."


**Pregunta** Explicar a qué corresponde la columna _bow_

**Respuesta** Representa el texto de forma vectorial donde se representa cada documento considerando la relevancia de cada término.

# 2. Modelo de Tópicos

In [8]:
tfidf_model_file = 'resources/tfidf_model-stemm.p' if stemm else 'resources/tfidf_model.p'
if os.path.isfile(tfidf_model_file):
    tfidf_model = models.tfidfmodel.TfidfModel().load(tfidf_model_file)
else:
    tfidf_model = models.tfidfmodel.TfidfModel(corpus, dictionary=dictionary)
    tfidf_model.save(tfidf_model_file)

# tfidf_model = models.tfidfmodel.TfidfModel(corpus, dictionary=dictionary)
corpus_df['tf_idf'] = tfidf_model[corpus_df.bow.tolist()]
corpus_df[:5]

Unnamed: 0,id,title,abstract,bow,tf_idf
0,100002,Nonlinear Shape Statistics in Mumford{Shah Bas...,We present a variational integration of nonlin...,"[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1...","[(0, 0.1698699393434408), (1, 0.07433434938753..."
1,100007,On the Complexity of Equilibria,"We prove complexity, approximability, and inap...","[(4, 5), (7, 2), (8, 1), (10, 1), (15, 2), (30...","[(4, 0.0034049927904606456), (7, 0.02368271003..."
2,100008,On QoS-Aware Publish-Subscribe,This position paper addresses the issue of sup...,"[(1, 1), (4, 6), (7, 1), (15, 3), (16, 1), (22...","[(1, 0.06139095622165877), (4, 0.0048410584957..."
3,10001,Checking Mergeable Priority Queues,We present an efficient algorithm which can ch...,"[(4, 7), (7, 2), (8, 1), (10, 1), (15, 7), (16...","[(4, 0.0022335526594202118), (7, 0.01109643213..."
4,100012,Mobile Code Security by Java Bytecode Instrume...,Mobile code provides significant opportunities...,"[(4, 4), (5, 1), (7, 1), (10, 2), (15, 6), (16...","[(4, 0.0016825551149825637), (5, 0.03113607390..."


**Pregunta** Explicar a qué corresponde la columna tf_idf y por qué es útil en el procesamiento de texto. Mencione sus 2 principales parts, mediante la explicación del puntaje.

**Respuesta** la columna tf_idf muestra el puntaje de la frecuencia de términos en toda la colección de documentos, como se puede observar las 2 parts principales corresponden a los ids **[100007, 10001]** donde el puntaje de la frecuencia de los terminos es mayor. 

A continuación entrenaremos un modelo LDA.

In [9]:
topic_number = 10

lda_model = models.LdaModel(corpus, num_topics=topic_number, id2word=dictionary, passes=5, iterations=200)
corpus_df['lda'] = lda_model[corpus_df.bow.tolist()]
corpus_df[:5]

Unnamed: 0,id,title,abstract,bow,tf_idf,lda
0,100002,Nonlinear Shape Statistics in Mumford{Shah Bas...,We present a variational integration of nonlin...,"[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1...","[(0, 0.1698699393434408), (1, 0.07433434938753...","[(2, 0.8580955), (7, 0.12651847)]"
1,100007,On the Complexity of Equilibria,"We prove complexity, approximability, and inap...","[(4, 5), (7, 2), (8, 1), (10, 1), (15, 2), (30...","[(4, 0.0034049927904606456), (7, 0.02368271003...","[(1, 0.13715997), (2, 0.6316505), (4, 0.057881..."
2,100008,On QoS-Aware Publish-Subscribe,This position paper addresses the issue of sup...,"[(1, 1), (4, 6), (7, 1), (15, 3), (16, 1), (22...","[(1, 0.06139095622165877), (4, 0.0048410584957...","[(2, 0.17472342), (5, 0.58359134), (7, 0.23325..."
3,10001,Checking Mergeable Priority Queues,We present an efficient algorithm which can ch...,"[(4, 7), (7, 2), (8, 1), (10, 1), (15, 7), (16...","[(4, 0.0022335526594202118), (7, 0.01109643213...","[(1, 0.5959923), (7, 0.39924502)]"
4,100012,Mobile Code Security by Java Bytecode Instrume...,Mobile code provides significant opportunities...,"[(4, 4), (5, 1), (7, 1), (10, 2), (15, 6), (16...","[(4, 0.0016825551149825637), (5, 0.03113607390...","[(2, 0.15740693), (5, 0.20869948), (7, 0.57065..."


**Pregunta** Explique qué representa la columna lda, ¿qué significan cada tupla de números?

**Respuesta** El termino lda calcula probabilidad que un documento tenga un tópico asociado, la tupla representa la probabilidad de que la primera palabra pertenezca al tópico de la segunda

En la siguiente celda se mostrarán 10 tópicos del modelo LDA.

In [10]:
lda_model.print_topics(10)

[(0,
  '0.028*"," + 0.018*"[" + 0.018*"]" + 0.015*"(" + 0.014*")" + 0.011*"university" + 0.010*"science" + 0.007*"computer" + 0.006*"--" + 0.006*"department"'),
 (1,
  '0.111*"." + 0.039*"the" + 0.039*"," + 0.021*"of" + 0.018*"and" + 0.015*"a" + 0.014*"to" + 0.012*"in" + 0.011*"for" + 0.009*"is"'),
 (2,
  '0.060*"the" + 0.043*"of" + 0.038*"." + 0.035*"," + 0.030*"a" + 0.021*"and" + 0.020*"in" + 0.020*"to" + 0.017*"is" + 0.013*"for"'),
 (3,
  '0.324*":" + 0.012*"network" + 0.008*"the" + 0.008*"routing" + 0.008*"protocol" + 0.008*"," + 0.006*"a" + 0.006*"and" + 0.006*"protocols" + 0.006*"multicast"'),
 (4,
  '0.043*")" + 0.042*"(" + 0.034*"," + 0.020*"the" + 0.018*"of" + 0.016*"." + 0.016*"n" + 0.015*";" + 0.014*"a" + 0.013*"#"'),
 (5,
  '0.050*"the" + 0.046*"," + 0.039*"of" + 0.037*"and" + 0.034*"." + 0.024*"to" + 0.022*"a" + 0.019*"in" + 0.013*"for" + 0.011*"this"'),
 (6,
  '0.006*"broadcast" + 0.006*"causal" + 0.006*"networks" + 0.005*"network" + 0.005*"sites" + 0.005*"counting" + 0.0

**Pregunta** ¿Qué representa lo impreso en la celda anterior?

**Respuesta** Representa el puntaje y la probabilidad de que las palabras esten asociadas

**Pregunta** A su parecer, ¿son buenos los tópicos encontrados por el modelo? ¿cómo se podrían mejorar?

**Respuesta** No, para mejora los resultados recomiendo excluir los signos de puntuación y palabras comunes como ["and, "a", "of", "is", "for"], etc

# 3. Generar Recomendaciones

En esta sección se implementan las funciones necesarias para poder generar recomendaciones dado lo que un usuario ha consumido. De manera artificial, se samplearán 3 documentos aleatorios que representarán al usuario objetivo (`sample`). Luego tendrás que generar diferentes recomendaciones y evaluar los resultados.

In [11]:
# Random user

samples = corpus_df.sample(3)

for n, (ix, paper) in enumerate(samples.iterrows()):
    idx, title, abstract, bow, tf_idf, lda = paper
    print('%d) %s' % (n+1, title))
    print('')
    print(abstract)
    print('\n' )

1) An Ecient Implementation of Multiple Return Values in Scheme

This paper describes an implementation of the new Scheme multiple values interface. The implementation handles multiple values efficiently, with no run-time overhead for normal calls and returns. Error checks are performed where necessary to insure that the expected number of values is returned in all situations. The implementation fits cleanly with our direct-style compiler and stack-based representation of control, but is equally well suited to continuation-passing style compilers and to heap-based run-time architectures.


2) Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Preneel, Govaerts, and Vandewalle considered the 64 most basic ways to construct a hash function $HColonits^* ightarrowits^n$ from a block cipher $EColon its^n imes its^n ightarrow its^n$. They regarded 12 of these 64 schemes as secure, though no proofs or formal claims were given. The remaining 52 schemes were show

In [12]:
# Recommendation functions

N = len(dictionary)

def to_sparse(matrix):
    return csr_matrix([gensim.matutils.sparse2full(row, length=N) for row in matrix]) 

def make_recommendations(model, metric, neighbors):
    M = len(corpus)

    X = to_sparse(corpus_df[model].tolist())
    document_index = NearestNeighbors(n_neighbors=(neighbors + 1), algorithm='brute', metric=metric).fit(X)
    return document_index

def print_recommendations(indexes, model):
    for n, (ix, paper) in enumerate(samples.iterrows()):
        dists, neighbors = indexes.kneighbors([gensim.matutils.sparse2full(paper[model],length=N)])
        print(paper['title'])
        print('')
        print('Documentos cercanos: ')
        i = 1
        for neighbour in neighbors[0]:
            if ix != neighbour:
                line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
                print(line)
                i+=1
        print('\n')
    

A continuación deberá utilizar las funciones implementadas anteriormente para generar nuevas recomendaciones variando los parámetros del modelo. Agregue nuevas celdas para cada implementación y/o pregunta.

In [13]:
document_index = make_recommendations('tf_idf', 'euclidean', 5)
print_recommendations(document_index, 'tf_idf')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. Value Profiling and Optimization
2. Dictionary-free Overloading by Partial Evaluation
3. Speculative Execution based on Value Prediction
4. Encoding Types in ML-like Languages
5. Run-time Code Generation and Modal-ML


Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Documentos cercanos: 
1. New Public-Key Schemes Based on Elliptic Curves over the Ring Z_n
2. On Interactive Visualization of High-dimensional Data using the Hyperbolic Plane
3. A Secure Signature Scheme from Bilinear Maps
4. Universal One-Way Hash Functions and their Cryptographic Applications
5. How To Prove Yourself: Practical Solutions to Identification and Signature Problems


OdeView: The Graphical Interface to Ode

Documentos cercanos: 
1. ODE (Object Database and Environment): The Language and the Data Model
2. Towards Reusable Real-Time Objects
3. Rationale for the Design of Persistence and Quer

In [14]:
document_index = make_recommendations('tf_idf', 'euclidean', 10)
print_recommendations(document_index, 'tf_idf')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. Value Profiling and Optimization
2. Dictionary-free Overloading by Partial Evaluation
3. Speculative Execution based on Value Prediction
4. Encoding Types in ML-like Languages
5. Run-time Code Generation and Modal-ML
6. Implementation Approaches for Reconfigurable Logic Applications
7. OASIS: An Optimizing Action-based Compiler Generator
8. On Decomposition for Incomplete Data
9. Catching Accurate Profiles in Hardware
10. Just-in-Time Aspects: Efficient Dynamic Weaving for Java


Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Documentos cercanos: 
1. New Public-Key Schemes Based on Elliptic Curves over the Ring Z_n
2. On Interactive Visualization of High-dimensional Data using the Hyperbolic Plane
3. A Secure Signature Scheme from Bilinear Maps
4. Universal One-Way Hash Functions and their Cryptographic Applications
5. How To Prove Yourself: Practical Solutions to 

In [15]:
document_index = make_recommendations('tf_idf', 'euclidean', 20)
print_recommendations(document_index, 'tf_idf')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. Value Profiling and Optimization
2. Dictionary-free Overloading by Partial Evaluation
3. Speculative Execution based on Value Prediction
4. Encoding Types in ML-like Languages
5. Run-time Code Generation and Modal-ML
6. Implementation Approaches for Reconfigurable Logic Applications
7. OASIS: An Optimizing Action-based Compiler Generator
8. On Decomposition for Incomplete Data
9. Catching Accurate Profiles in Hardware
10. Just-in-Time Aspects: Efficient Dynamic Weaving for Java
11. Partial Evaluation for Dictionary-free Overloading
12. Optimizing ML with Run-Time Code Generation
13. Compiler and Run-Time Support for Irregular Computations
14. A Provably Correct Compiler Generator
15. Sideway Value Algebra for Object-Relational Databases
16. Program Integration for Languages with Procedure Calls
17. CCured in the Real World
18. Elimination of Redundant Array Subscript Range Checks
19. Value Profiling


** Pregunta** Ejecute el modelo utilizando como representación `tf-idf` y métrica de distancia euclideana. Modifique el parámetro nearest_neighbors a [5, 10, 20]. ¿qué efecto tiene el modelo en las recomendaciones observadas?

**Respuesta** Mientras más vecinos, mejores son las recomendaciones pero el costo aumenta a medida de que aumenta la cantidad de vecinos cercanos

In [16]:
document_index = make_recommendations('tf_idf', 'cosine', 10)
print_recommendations(document_index, 'tf_idf')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. Value Profiling and Optimization
2. Dictionary-free Overloading by Partial Evaluation
3. Speculative Execution based on Value Prediction
4. Encoding Types in ML-like Languages
5. Run-time Code Generation and Modal-ML
6. Implementation Approaches for Reconfigurable Logic Applications
7. OASIS: An Optimizing Action-based Compiler Generator
8. On Decomposition for Incomplete Data
9. Catching Accurate Profiles in Hardware
10. Just-in-Time Aspects: Efficient Dynamic Weaving for Java


Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Documentos cercanos: 
1. New Public-Key Schemes Based on Elliptic Curves over the Ring Z_n
2. On Interactive Visualization of High-dimensional Data using the Hyperbolic Plane
3. A Secure Signature Scheme from Bilinear Maps
4. Universal One-Way Hash Functions and their Cryptographic Applications
5. How To Prove Yourself: Practical Solutions to 

**Pregunta** Eligiendo un valor fijo para nearest neighbors y utilizando representación `tf-idf`, ejecute el modelo con métrica de distancia _cosine_.¿Qué efecto tiene la métrica de distancia en las recomendaciones observadas?

**Respuesta** 

In [17]:
document_index = make_recommendations('lda', 'euclidean', 10)
print_recommendations(document_index, 'lda')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. HOL-OCL: Experiences, Consequences and Design Choices
2. Towards a Declarative Query and Transformation Language for XML and Semistructured Data: Simulation Unification
3. Pegasus: An Efficient Intermediate Representation
4. Automatic Generation of Program Specifications
5. Composing Hidden Information Modules over Inclusive Institutions
6. Executable Temporal Logic for Nonmonotonic Reasoning
7. Defining the Java Virtual Machine as Platform for Provably Correct Java Compilation
8. From Operational Semantics to Denotational Semantics for Verilog
9. A Hybrid Approach To Representation In The Janus Natural Language Processor
10. Consistent Answers from Integrated Data Sources


Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Documentos cercanos: 
1. What Do Message Sequence Charts Mean?
2. Authenticated Data Structures for Graph and Geometric Searching
3. The Power of 

**Pregunta** Eligiendo un valor fijo de nearest_neighbors y modelo _lda_ ¿Qué efecto tiene el usar LDA versus TF-IDF en las recomendaciones observadas?

**Respuesta**

In [18]:
document_index = make_recommendations('lda', 'euclidean', 5)
print_recommendations(document_index, 'lda')

An Ecient Implementation of Multiple Return Values in Scheme

Documentos cercanos: 
1. HOL-OCL: Experiences, Consequences and Design Choices
2. Towards a Declarative Query and Transformation Language for XML and Semistructured Data: Simulation Unification
3. Pegasus: An Efficient Intermediate Representation
4. Composing Hidden Information Modules over Inclusive Institutions
5. Automatic Generation of Program Specifications


Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV

Documentos cercanos: 
1. What Do Message Sequence Charts Mean?
2. Authenticated Data Structures for Graph and Geometric Searching
3. The Power of Reflective Relational Machines
4. The Importance of Prior Probabilities for Entry Page Search Wessel Kraaij
5. Analysing Approximate Confinement under Uniform Attacks


OdeView: The Graphical Interface to Ode

Documentos cercanos: 
1. Abduction from Logic Programs: Semantics and Complexity
2. Fischer's Protocol Revisited: A Simple Proof Usi

**Pregunta** Pruebe nuevamente con LDA usando sólo 5 tópicos ¿qué efecto tiene el número de tópicos en las recomendaciones observadas?

**Respuesta**

In [19]:
# Recommendation example
# 
# doc_idx = make_recommendations('tf_idf', 'euclidean', 5)
# print_recommendations(doc_idx, 'tf_idf')

## Entregable

Una vez completado el laboratorio y respondido las preguntas deberán exportar este archivo en formato `html` y subir a la plataforma _Moodle_.

Para exportar este archivo deben ir a `File > Donwload as > HTML (.html)`

Si tienen algún problema o duda enviar mail a **dparra [at] ing [dot] puc [dot] cl** o **slarrain [at] uc [dot] cl** anteponiendo [Diplomada Bog Data] en el asunto.