In [1]:
MAIN_PATH = "/home/carlos/MasterDS/tfm"
JSON_DATA_PATH = '{}/data/json/'.format(MAIN_PATH)
CSV_DATA_PATH = '{}/data/csv/'.format(MAIN_PATH)

In [2]:
import sys
sys.path.insert(0, MAIN_PATH)

In [3]:
%load_ext autoreload
%autoreload 2
from scripts.text.article_text_processor import ArticleTextProcessor
from scripts.text.basic_text_processor import BasicTextProcessor
from scripts.extractive_summary.ltr.learn_to_rank import LearnToRank
from scripts.extractive_summary.ltr.ltr_features_targets import LTRFeaturesTargets 
from scripts.extractive_summary.ltr.ltr_features import LTRFeatures
from scripts.extractive_summary.ltr.ltr_targets import LTRTargets

from scripts.conf import TEAMS

from rouge import Rouge

%reload_ext autoreload

In [4]:
import pandas as pd

# Learn to Rank

El objetivo es entrenar un algoritmo que sea capaz de ordenar los eventos de acuerdo a la "probabilidad" de que aparezcan en las noticias.
La propuesta es la siguiente (basado en este [artículo](https://www.aclweb.org/anthology/P16-1129.pdf)):

- Primero, se debe asignar un score a cada evento, que cuantifique lo propenso que es un evento a tener información que aparece en el artículo.
Para ello, se proponen diferentes métodos para obtener este score, que se basan en calcular distancias o palabras en común entre cada evento y las 
frases de cada artículo

- Una vez construidos los targets, debemos sintetizar la información de cada evento en un vector numérico de características. Las features propuestas se
detallan en la sección Features.

- Con todo junto, tendremos para cada partido, vectores de características con su correspondiente target, por lo que se puede entrenar un modelo supervisado
que trate de predecir este target. Al ser numérico, la primera aproximación puede ser entrenar un modelo de regresión, aunque se podría convertir en un 
problema de clasificación si fijamos un umbral de aparece/ no aparece, para poder estimar una probabilidad usando un algoritmo de clasificación.


## Targets

In [5]:
processor = ArticleTextProcessor()
text_proc = BasicTextProcessor()

In [6]:
all_files = processor.load_json()

In [7]:
season_file = 'premier_league_2019_2020.json'
league_season_teams = TEAMS[season_file.split('.')[0]]

In [29]:
all_files[season_file].keys()

dict_keys(['http://www.premierleague.com/match/38678', 'http://www.premierleague.com/match/38679', 'http://www.premierleague.com/match/38680', 'http://www.premierleague.com/match/38681', 'http://www.premierleague.com/match/38682', 'http://www.premierleague.com/match/38683', 'http://www.premierleague.com/match/38684', 'http://www.premierleague.com/match/38685', 'http://www.premierleague.com/match/38686', 'http://www.premierleague.com/match/38687', 'http://www.premierleague.com/match/38674', 'http://www.premierleague.com/match/38671', 'http://www.premierleague.com/match/38673', 'http://www.premierleague.com/match/38668', 'http://www.premierleague.com/match/38669', 'http://www.premierleague.com/match/38676', 'http://www.premierleague.com/match/38677', 'http://www.premierleague.com/match/38670', 'http://www.premierleague.com/match/38675', 'http://www.premierleague.com/match/38672', 'http://www.premierleague.com/match/38662', 'http://www.premierleague.com/match/38659', 'http://www.premierle

In [8]:
match_dict = all_files[season_file]["http://www.premierleague.com/match/46975"]
events = match_dict['events']

In [27]:
match_dict['article']

"Watford were relegated from the Premier League after losing 3-2 to Arsenal as Pierre-Emerick Aubameyang's brace was not enough in the race for the Golden Boot.\nAubameyang opened the scoring in the fifth minute from the penalty spot after a Video Assistant Referee review of Craig Dawson's foul on Alexandre Lacazette.\nArsenal doubled their advantage on 24 minutes through Kieran Tierney's first goal for the club.\xa0\nIt was 3-0 on 33 minutes, Aubameyang producing an overhead kick into the net.\nWatford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.\nWelbeck fired in from Ismaila Sarr's cross to make it 3-2 in the 66th minute before Emiliano Martinez denied the former Arsenal forward an equaliser.\nArsenal rise two places to eighth on 56 points, while\xa0Watford go down in 19th with 34 points.\xa0\nSee: Arsenal report |\xa0Watford report"

In [28]:
events

['Penalty conceded by Craig Dawson (Watford) after a foul in the penalty area.',
 'Penalty Arsenal. Alexandre Lacazette draws a foul in the penalty area.',
 'VAR Decision: Penalty Arsenal.',
 'Goal!   Arsenal 1, Watford 0. Pierre-Emerick Aubameyang (Arsenal) converts the penalty with a right footed shot to the bottom left corner.',
 'Attempt missed. Danny Welbeck (Watford) left footed shot from the left side of the box is high and wide to the left. Assisted by Adam Masina.',
 'Attempt saved. Ismaila Sarr (Watford) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Abdoulaye Doucouré with a cross.',
 'Troy Deeney (Watford) wins a free kick in the defensive half.',
 'Foul by Dani Ceballos (Arsenal).',
 'Corner,  Watford. Conceded by Kieran Tierney.',
 'Granit Xhaka (Arsenal) wins a free kick in the defensive half.',
 'Foul by Christian Kabasele (Watford).',
 'Attempt blocked. Pierre-Emerick Aubameyang (Arsenal) right footed shot from outside the 

### ROUGE

Utilizaremos esta métrica para asignar un score a cada par evento-frase artículo, para generar un target que indique qué evento tiene 
más opciones de aparecer en el resumen. Con este target se entrenará un modelo Learning to rank, de tal forma que se pueda construir un resumen 
con el conjunto de eventos más representativo de cada partido. Inspirado en [link](https://www.aclweb.org/anthology/P16-1129.pdf)

Probamos los tipos de ROUGE disponibles en el paquete: ROUGE-1, 2 y ROUGE-L. También probamos a usar f1 score y recall como score, ya que hemos visto
con los anteriores experimentos que el las palabras de los eventos.

Vemos que esto puede tener varios problemas:

- Al ser los eventos por lo general mucho más largos que los resúmenes, es probable que la información resultante sea redundante
- Los estilos y palabras usados son bastante distintos en los eventos (más simples y menor vocabulario) que en los artículos (uso de frases más
complejas y una mayor variedad en el vocabulario)

__ROUGE-1__

In [14]:
metric_params = {'rouge_mode': 'rouge-1', 'rouge_metric': 'r'}

In [15]:
ltr_metrics = LTRTargets(metric='rouge', metric_params=metric_params, lemma=True, drop_teams=True)

Setting target metric to rouge


In [16]:
event_article_list = ltr_metrics.create_match_targets(match_dict, verbose=False, league_season_teams=league_season_teams)

En este ejemplo se ve muy bien uno de los problemas. Para la misma frase del artículo aparecen muchos eventos!

In [17]:
ltr_metrics.print_scores_info(match_dict, event_article_list)

Score: 0.42857142857142855
Event: Goal.  Arsenal 3, Watford 0. Pierre-Emerick Aubameyang (Arsenal) right footed shot from very close range to the top left corner. Assisted by Kieran Tierney.
Nearest article sentence: Arsenal doubled their advantage on 24 minutes through Kieran Tierney's first goal for the club. 

Score: 0.42857142857142855
Event: Goal.  Arsenal 2, Watford 0. Kieran Tierney (Arsenal) left footed shot from the centre of the box to the bottom right corner. Assisted by Pierre-Emerick Aubameyang.
Nearest article sentence: Arsenal doubled their advantage on 24 minutes through Kieran Tierney's first goal for the club. 

Score: 0.36363636363636365
Event: Attempt saved. Danny Welbeck (Watford) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Troy Deeney.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.36363636363636365
Eve

__ROUGE-2__

Al usar esta métrica, las correspondencias que se obtienen no tienen demasiado sentido...

In [20]:
metric_params = {'rouge_mode': 'rouge-2', 'rouge_metric': 'r'}

In [21]:
ltr_metrics = LTRTargets(metric='rouge', metric_params=metric_params, lemma=True, drop_teams=True)

Setting target metric to rouge


In [22]:
event_article_list = ltr_metrics.create_match_targets(match_dict, verbose=False, league_season_teams=league_season_teams)

Con bigramas la cosa se complica, y parece que se base unicamente en los nombres de los futbolistas.

In [23]:
ltr_metrics.print_scores_info(match_dict, event_article_list)

Score: 0.3
Event: Goal!   Arsenal 3, Watford 1. Troy Deeney (Watford) converts the penalty with a right footed shot to the bottom right corner.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.2222222222222222
Event: Attempt saved. Pierre-Emerick Aubameyang (Arsenal) right footed shot from the left side of the box is saved in the bottom left corner. Assisted by Eddie Nketiah.
Nearest article sentence: Watford were relegated from the Premier League after losing 3-2 to Arsenal as Pierre-Emerick Aubameyang's brace was not enough in the race for the Golden Boot.

Score: 0.2222222222222222
Event: Pierre-Emerick Aubameyang (Arsenal) wins a free kick in the attacking half.
Nearest article sentence: Watford were relegated from the Premier League after losing 3-2 to Arsenal as Pierre-Emerick Aubameyang's brace was not enough in the race for the Golden Boot.

Score: 0.2222222222222222

__ROUGE-L__

In [24]:
metric_params = {'rouge_mode': 'rouge-l', 'rouge_metric': 'r'}

In [25]:
ltr_metrics = LTRTargets(metric='rouge', metric_params=metric_params, lemma=True, drop_teams=True)

Setting target metric to rouge


In [26]:
event_article_list = ltr_metrics.create_match_targets(match_dict, verbose=False, league_season_teams=league_season_teams)

Este realiza una mezcla

In [27]:
ltr_metrics.print_scores_info(match_dict, event_article_list)

Score: 0.36363636363636365
Event: Goal!   Arsenal 3, Watford 1. Troy Deeney (Watford) converts the penalty with a right footed shot to the bottom right corner.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.36363636363636365
Event: Penalty Watford. Danny Welbeck draws a foul in the penalty area.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.3333333333333333
Event: Pierre-Emerick Aubameyang (Arsenal) wins a free kick in the attacking half.
Nearest article sentence: It was 3-0 on 33 minutes, Aubameyang producing an overhead kick into the net.

Score: 0.3
Event: Attempt saved. Pierre-Emerick Aubameyang (Arsenal) right footed shot from the left side of the box is saved in the bottom left corner. Assisted by Eddie Nketiah.
Nearest article sentence: Watford were re

### Distancia coseno

Usando esta distancia, empezamos a tener claro que muchas de las correspondencias se deben únicamente a la aparición del nombre de un jugador.
De nuevo vemos como muchos eventos se asocian a la misma frase del artículo (sin mucha información) solo porque coincide de pleno con el nombre.

In [32]:
metric_params = {'ngram_range': (1, 2), 'strip_accents': 'unicode'}

In [33]:
ltr_metrics = LTRTargets(metric='cosine_tfidf', metric_params=metric_params, lemma=True, drop_teams=True)

Setting target metric to cosine_tfidf


In [34]:
event_article_list = ltr_metrics.create_match_targets(match_dict, verbose=True, league_season_teams=league_season_teams)

Event: Penalty conceded by Craig Dawson (Watford) after a foul in the penalty area.
Nearest article sentence: Aubameyang opened the scoring in the fifth minute from the penalty spot after a Video Assistant Referee review of Craig Dawson's foul on Alexandre Lacazette.
Processed event: penalty concede craig dawson foul penalty area
Processed article sentence: aubameyang open scoring fifth minute penalty spot video assistant referee review craig dawson foul alexandre

Event: Penalty Arsenal. Alexandre Lacazette draws a foul in the penalty area.
Nearest article sentence: Aubameyang opened the scoring in the fifth minute from the penalty spot after a Video Assistant Referee review of Craig Dawson's foul on Alexandre Lacazette.
Processed event: penalty alexandre draw foul penalty area
Processed article sentence: aubameyang open scoring fifth minute penalty spot video assistant referee review craig dawson foul alexandre

Event: VAR Decision: Penalty Arsenal.
Nearest article sentence: Watford 

In [35]:
ltr_metrics.print_scores_info(match_dict, event_article_list)

Score: 0.5477420284957557
Event: Goal!   Arsenal 3, Watford 1. Troy Deeney (Watford) converts the penalty with a right footed shot to the bottom right corner.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.4996658472922424
Event: Attempt saved. Danny Welbeck (Watford) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Troy Deeney.
Nearest article sentence: Watford pulled one back 10 minutes later, Troy Deeney converting a penalty after Danny Welbeck had been fouled by David Luiz.

Score: 0.4948205264658473
Event: Attempt saved. Pierre-Emerick Aubameyang (Arsenal) right footed shot from the left side of the box is saved in the bottom left corner. Assisted by Eddie Nketiah.
Nearest article sentence: Watford were relegated from the Premier League after losing 3-2 to Arsenal as Pierre-Emerick Aubameyang's brace was not enough in the ra

__Examinando tfidf__

In [38]:
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.pipeline import Pipeline
import pandas as pd

In [40]:
proc_events, proc_article_sents = ltr_metrics._process_events_article(match_dict)

In [41]:
[proc_events.index(event) for event in proc_events if 'tierney' in event]

[8, 24, 27, 29, 8, 88, 29]

In [42]:
[event for event in proc_events if 'tierney' in event]

['corner concede kieran tierney',
 'goal kieran tierney leave footed shot centre box corner assist pierre emerick aubameyang',
 'goal pierre emerick aubameyang footed shot close range left corner assist kieran tierney',
 'kieran tierney win free kick defensive half',
 'corner concede kieran tierney',
 'attempt miss kieran tierney leave footed shot difficult angle left high wide left assist reiss nelson',
 'kieran tierney win free kick defensive half']

In [53]:
proc_article_sents

['relegate premier league lose pierre emerick aubameyang brace race golden',
 'aubameyang open scoring fifth minute penalty spot video assistant referee review craig dawson foul alexandre',
 'double advantage minute kieran tierney goal club',
 'minute aubameyang produce overhead kick net',
 'pull minute troy deeney convert penalty danny welbeck foul david luiz',
 'welbeck fire sarr cross minute emiliano martinez deny forward equaliser',
 'rise eighth point point',
 'report report']

In [54]:
len(proc_article_sents)

8

In [55]:
len(proc_events)

105

In [44]:
count_vec_kwargs = {'ngram_range': (1, 2), 'strip_accents': 'unicode'}

In [45]:
pipe = Pipeline([('count', CountVectorizer(**count_vec_kwargs)),
                         ('tfid', TfidfTransformer())])

In [46]:
X = pipe.fit_transform(proc_article_sents)

In [47]:
tfidf_df = pd.DataFrame(X.todense(), columns=pipe['count'].get_feature_names())

In [49]:
article_sentences = ltr_metrics.text_proc.get_sentences(match_dict['article'])
article_sentences_text = [str(sent).replace('\n', '') for sent in article_sentences]

In [50]:
article_sentences_text[2]

"Arsenal doubled their advantage on 24 minutes through Kieran Tierney's first goal for the club.\xa0"

In [51]:
proc_article_sents[2]

'double advantage minute kieran tierney goal club'

In [52]:
pd_df_sent = tfidf_df.loc[2]

In [53]:
pd_df_sent[pd_df_sent>0].sort_values(ascending=False)

tierney goal        0.284726
tierney             0.284726
minute kieran       0.284726
kieran tierney      0.284726
kieran              0.284726
goal club           0.284726
goal                0.284726
double advantage    0.284726
double              0.284726
club                0.284726
advantage minute    0.284726
advantage           0.284726
minute              0.164852
Name: 2, dtype: float64

In [54]:
events[24]

'Goal.  Arsenal 2, Watford 0. Kieran Tierney (Arsenal) left footed shot from the centre of the box to the bottom right corner. Assisted by Pierre-Emerick Aubameyang.'

In [55]:
proc_events[24]

'goal kieran tierney leave footed shot centre box corner assist pierre emerick aubameyang'

In [56]:
X_events = pipe.transform(proc_events)

In [57]:
tfidf_events_df = pd.DataFrame(X_events.todense(), columns=pipe['count'].get_feature_names())

In [58]:
pd_df_sent_event = tfidf_events_df.loc[24]

In [59]:
pd_df_sent_event[pd_df_sent_event>0].sort_values(ascending=False)

tierney               0.342207
pierre emerick        0.342207
pierre                0.342207
kieran tierney        0.342207
kieran                0.342207
goal                  0.342207
emerick aubameyang    0.342207
emerick               0.342207
aubameyang            0.251306
Name: 24, dtype: float64

### WMD

WMD (Word Movers Distance) es una distancia basada en la representación de palabras usando Word Embeddings. La principal ventaja de usar word embeddings
es que la distancia puede ser pequeña, aunque no haya palabras en común (sinónimos)

Modelos disponibles en gensim:

__EJECUTAR NUEVO PARTIDO__

In [9]:
import gensim.downloader as api

In [10]:
api.info()['models'].keys()

dict_keys(['fasttext-wiki-news-subwords-300', 'conceptnet-numberbatch-17-06-300', 'word2vec-ruscorpora-300', 'word2vec-google-news-300', 'glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-twitter-25', 'glove-twitter-50', 'glove-twitter-100', 'glove-twitter-200', '__testing_word2vec-matrix-synopsis'])

In [11]:
api.info()['models']['word2vec-google-news-300']

{'num_records': 3000000,
 'file_size': 1743563840,
 'base_dataset': 'Google News (about 100 billion words)',
 'reader_code': 'https://github.com/RaRe-Technologies/gensim-data/releases/download/word2vec-google-news-300/__init__.py',
 'license': 'not found',
 'parameters': {'dimension': 300},
 'description': "Pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). The model contains 300-dimensional vectors for 3 million words and phrases. The phrases were obtained using a simple data-driven approach described in 'Distributed Representations of Words and Phrases and their Compositionality' (https://code.google.com/archive/p/word2vec/).",
 'read_more': ['https://code.google.com/archive/p/word2vec/',
  'https://arxiv.org/abs/1301.3781',
  'https://arxiv.org/abs/1310.4546',
  'https://www.microsoft.com/en-us/research/publication/linguistic-regularities-in-continuous-space-word-representations/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F189726%2Frvec

In [61]:
metric_params = {'norm': True}

In [63]:
ltr_metrics = LTRTargets(metric = 'wmd', metric_params=metric_params, lemma=True, drop_teams=True)

Setting target metric to wmd


In [None]:
# Esto a veces casca o tarda mucho
event_article_list = ltr_metrics.create_match_targets(match_dict, verbose=False, league_season_teams=league_season_teams)

In [135]:
ltr_metrics.print_scores_info(match_dict, event_article_list, reverse=False)

Score: 0.13826085867936916
Event: Substitution, Arsenal. Reiss Nelson replaces Nicolas Pépé.
Nearest article sentence: signing Nicholas Pepe failed to last the course, being substituted for Reiss Nelson after 72 minutes.

Score: 0.16723420581333923
Event: Goal!  Manchester United 1, Arsenal 1. Pierre-Emerick Aubameyang (Arsenal) left footed shot from the centre of the box to the centre of the goal. Assisted by Bukayo Saka with a through ball.Goal awarded following VAR Review.
Nearest article sentence: Goalkeeper Bernd Leno excelled for Arsenal with fine saves from Maguire and Marcus Rashford's late free-kick, while Bukayo Saka's goalbound shot crucially struck Victor Lindelof and flew over the top.

Score: 0.17164765828453452
Event: Attempt missed. Scott McTominay (Manchester United) header from very close range is just a bit too high. Assisted by Ashley Young with a cross following a corner.
Nearest article sentence: There was no shortage of effort but this was a scrappy mess of a gam

## Features

Una vez estudiados las distintas distancias o métricas que se pueden usar como target, se pasa a construir las features del modelo.
Siguiendo el artículo, se pueden incorporar las siguientes:

- Posición del evento
- Longitud del evento (después de quitar stopwords)
- Número de stopwords
- Suma de pesos TF-IDF de cada palabra en el evento
- Similiaridad a eventos vecinos
- Presencia de palabras que indican eventos importantes: goles, tarjetas, var...
- Cambios en el resultado: 0/1 en función de si ha habido cambios (equivaldría a tener la palabra gol...)
- Si el cambio sirve de empate o para poner a alguien por delante
- Parte de la que se encuentra cada evento (puede equivaler a posición del evento...)
- Número de jugadores que aparecen en el evento
- Identificación de jugadores importantes, que aparecen mucho en el partido

In [76]:
key_events = ['goal', 'red_card', 'penalty']

In [77]:
ltr_features = LTRFeatures(key_events)

In [78]:
ltr_features.processor.league_season_teams = league_season_teams

In [81]:
goal_event = [e for e in events if 'goal' in e.lower()][0]

In [82]:
goal_event

'Goal!   Arsenal 1, Watford 0. Pierre-Emerick Aubameyang (Arsenal) converts the penalty with a right footed shot to the bottom left corner.'

In [83]:
count_vec_kwargs = {'ngram_range': (1, 2), 'strip_accents': 'unicode'}

In [84]:
tfidf_dict = ltr_features._match_level_features(events, **count_vec_kwargs)

In [90]:
features_dict = ltr_features.create_features(match_dict, league_season_teams, **count_vec_kwargs)

In [91]:
len(features_dict['players_importance'])

105

In [92]:
features = ltr_features.get_features_pandas(match_dict, league_season_teams, **count_vec_kwargs)

In [93]:
features

Unnamed: 0,length,n_stop,is_key_event,n_players,players_importance,advantage,equalize,position,tfidf_sum,sim_previous_1,sim_previous_3,sim_previous_5
0,8,5,0,1,0.038462,0,0,0.009524,3.569398,0.000000,0.000000,0.000000
1,7,3,0,1,0.023077,0,0,0.019048,3.313886,0.439070,0.000000,0.000000
2,4,0,0,0,0.000000,0,0,0.028571,2.557825,0.277217,0.000000,0.000000
3,14,6,1,1,0.030769,1,0,0.038095,4.965324,0.088368,0.096671,0.000000
4,16,10,0,2,0.123077,0,0,0.047619,5.107512,0.159605,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
100,13,6,0,2,0.069231,0,0,0.961905,4.678774,0.000000,0.015249,0.000000
101,4,1,0,0,0.000000,0,0,0.971429,2.525825,0.016328,0.057778,0.021915
102,8,3,0,1,0.015385,0,0,0.980952,3.724435,0.000000,0.012434,0.260072
103,18,12,0,2,0.076923,0,0,0.990476,5.744472,0.007046,0.077170,0.011431


In [94]:
len(events)

105

## Todo junto

In [14]:
key_events = ['goal', 'red_card', 'penalty']
lags = [1, 3, 5]
target_metric = 'rouge'
drop_teams = True
lemma = True
metric_params = {'rouge_mode': 'rouge-1', 'rouge_metric': 'r'}
#metric_params = {'ngram_range': (1, 2), 'strip_accents': 'unicode'}
count_vec_kwargs = {'ngram_range': (1, 2), 'strip_accents': 'unicode'}

train_perc = 0.7
val_perc = 0.2

In [9]:
processor = ArticleTextProcessor(drop_teams=drop_teams, lemma=lemma)

#### Features

In [10]:
features = LTRFeatures(key_events=key_events, lags=lags, processor=processor,
                                    count_vec_kwargs=count_vec_kwargs)
targets = LTRTargets(metric=target_metric, metric_params=metric_params, processor=processor)

Setting target metric to rouge


In [29]:
features.run_all_matches()

  0%|          | 0/20 [00:00<?, ?it/s]

0 matches have already been processed
Updated all_files
Results path in /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/features/b8bcd377c1/features.csv
Writing config in /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/features/b8bcd377c1/config.pickle
premier_league_2018_2019.json
http://www.premierleague.com/match/38678
http://www.premierleague.com/match/38679


  0%|          | 0/20 [00:04<?, ?it/s]


KeyboardInterrupt: 

#### Targets

In [21]:
targets.run_all_matches()

  0%|          | 0/20 [00:00<?, ?it/s]

0 matches have already been processed
Updated all_files
Results path in /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/c4f2c5790f/targets.csv
premier_league_2018_2019.json
http://www.premierleague.com/match/38678
Calculating targets...
http://www.premierleague.com/match/38679
Calculating targets...
http://www.premierleague.com/match/38680
Calculating targets...
http://www.premierleague.com/match/38681
Calculating targets...
http://www.premierleague.com/match/38682
Calculating targets...
http://www.premierleague.com/match/38683
Calculating targets...
http://www.premierleague.com/match/38684
Calculating targets...
http://www.premierleague.com/match/38685
Calculating targets...
http://www.premierleague.com/match/38686
Calculating targets...
http://www.premierleague.com/match/38687
Calculating targets...
http://www.premierleague.com/match/38674
Calculating targets...
http://www.premierleague.com/match/38671
Calculating targets...
http://www.premierleague.com/match/38673
Calculating targe

  5%|▌         | 1/20 [08:07<2:34:23, 487.54s/it]

mls_2015_2016.json
https://matchcenter.mlssoccer.com/matchcenter/2015-03-17-cs-herediano-vs-club-america/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-06-28-new-york-city-fc-vs-new-york-red-bulls/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-09-05-montreal-impact-vs-chicago-fire/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-10-03-portland-timbers-vs-sporting-kansas-city/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-08-29-colorado-rapids-vs-sporting-kansas-city/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-08-07-portland-timbers-vs-chicago-fire/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-05-23-real-salt-lake-vs-new-york-city-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2015-06-03-dc-united-vs-chicago-fire/feed
Calculating targets...
https://matchcenter.mlssoc

 10%|█         | 2/20 [18:50<2:40:14, 534.16s/it]

mls_2019_2020.json
https://matchcenter.mlssoccer.com/matchcenter/2019-10-24-atlanta-united-fc-vs-philadelphia-union/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-07-27-chicago-fire-vs-dc-united/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-06-18-panama-vs-trinidad-and-tobago/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-07-10-cavalry-vs-vancouver-whitecaps-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-04-14-sporting-kansas-city-vs-new-york-red-bulls/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-03-05-houston-dynamo-vs-tigres-uanl/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-05-08-columbus-crew-sc-vs-la-galaxy/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2019-03-02-vancouver-whitecaps-fc-vs-minnesota-united-fc/feed
Calculating targets...
https://matchcenter.ml

 15%|█▌        | 3/20 [33:48<3:02:16, 643.34s/it]

premier_league_2016_2017.json
http://www.premierleague.com/match/14410
Calculating targets...
http://www.premierleague.com/match/14411
Calculating targets...
http://www.premierleague.com/match/14412
Calculating targets...
http://www.premierleague.com/match/14413
Calculating targets...
http://www.premierleague.com/match/14414
Calculating targets...
http://www.premierleague.com/match/14415
Calculating targets...
http://www.premierleague.com/match/14416
Calculating targets...
http://www.premierleague.com/match/14417
Calculating targets...
http://www.premierleague.com/match/14418
Calculating targets...
http://www.premierleague.com/match/14419
Calculating targets...
http://www.premierleague.com/match/14375
Calculating targets...
http://www.premierleague.com/match/14319
Calculating targets...
http://www.premierleague.com/match/14370
Calculating targets...
http://www.premierleague.com/match/14377
Calculating targets...
http://www.premierleague.com/match/14313
Calculating targets...
http://www

 20%|██        | 4/20 [46:01<2:58:43, 670.20s/it]

italian_serie_a_2019_2020.json
https://www.bbc.com/sport/football/49871134
Calculating targets...
https://www.bbc.com/sport/football/49873763
Calculating targets...
https://www.bbc.com/sport/football/49865134
Calculating targets...
https://www.bbc.com/sport/football/49866723
Calculating targets...
https://www.bbc.com/sport/football/49832309
Calculating targets...
https://www.bbc.com/sport/football/49818211
Calculating targets...
https://www.bbc.com/sport/football/49783147
Calculating targets...
https://www.bbc.com/sport/football/49783140
Calculating targets...
https://www.bbc.com/sport/football/49710440
Calculating targets...
https://www.bbc.com/sport/football/49702257
Calculating targets...
https://www.bbc.com/sport/football/49545699
Calculating targets...
https://www.bbc.com/sport/football/49546433
Calculating targets...
https://www.bbc.com/sport/football/50243858
Calculating targets...
https://www.bbc.com/sport/football/50243942
Calculating targets...
https://www.bbc.com/sport/footb

 25%|██▌       | 5/20 [48:20<2:07:43, 510.91s/it]

spanish_la_liga_2019_2020.json
https://www.bbc.com/sport/football/50614421
Calculating targets...
https://www.bbc.com/sport/football/50530355
Calculating targets...
https://www.bbc.com/sport/football/50530376
Calculating targets...
https://www.bbc.com/sport/football/50352381
Calculating targets...
https://www.bbc.com/sport/football/50363584
Calculating targets...
https://www.bbc.com/sport/football/50281795
Calculating targets...
https://www.bbc.com/sport/football/50274870
Calculating targets...
https://www.bbc.com/sport/football/50274884
Calculating targets...
https://www.bbc.com/sport/football/51254950
Calculating targets...
https://www.bbc.com/sport/football/51249888
Calculating targets...
https://www.bbc.com/sport/football/51170440
Calculating targets...
https://www.bbc.com/sport/football/51163398
Calculating targets...
https://www.bbc.com/sport/football/50994528
Calculating targets...
https://www.bbc.com/sport/football/50994535
Calculating targets...
https://www.bbc.com/sport/footb

 30%|███       | 6/20 [50:13<1:31:19, 391.42s/it]

mls_2017_2018.json
https://matchcenter.mlssoccer.com/matchcenter/2017-05-14-portland-timbers-vs-atlanta-united-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-06-10-portland-timbers-vs-fc-dallas/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-03-04-colorado-rapids-vs-new-england-revolution/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-05-03-sporting-kansas-city-vs-new-york-red-bulls/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-10-01-philadelphia-union-vs-seattle-sounders-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-09-27-houston-dynamo-vs-la-galaxy/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-03-05-atlanta-united-fc-vs-new-york-red-bulls/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2017-06-14-fc-cincinnati-vs-columbus-crew-sc/feed
Calculating targets...
http

 35%|███▌      | 7/20 [1:07:48<2:07:56, 590.47s/it]

premier_league_2019_2020.json
http://www.premierleague.com/match/46975
Calculating targets...
http://www.premierleague.com/match/46976
Calculating targets...
http://www.premierleague.com/match/46977
Calculating targets...
http://www.premierleague.com/match/46978
Calculating targets...
http://www.premierleague.com/match/46979
Calculating targets...
http://www.premierleague.com/match/46980
Calculating targets...
http://www.premierleague.com/match/46981
Calculating targets...
http://www.premierleague.com/match/46982
Calculating targets...
http://www.premierleague.com/match/46983
Calculating targets...
http://www.premierleague.com/match/46984
Calculating targets...
http://www.premierleague.com/match/46969
Calculating targets...
http://www.premierleague.com/match/46968
Calculating targets...
http://www.premierleague.com/match/46973
Calculating targets...
http://www.premierleague.com/match/46965
Calculating targets...
http://www.premierleague.com/match/46967
Calculating targets...
http://www

 40%|████      | 8/20 [1:22:39<2:16:08, 680.73s/it]

mls_2018_2019.json
https://matchcenter.mlssoccer.com/matchcenter/2018-08-11-portland-timbers-vs-vancouver-whitecaps-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-08-29-portland-timbers-vs-toronto-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-06-30-seattle-sounders-fc-vs-portland-timbers/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-03-13-new-york-red-bulls-vs-club-tijuana/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-07-21-seattle-sounders-fc-vs-vancouver-whitecaps-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-07-25-san-jose-earthquakes-vs-seattle-sounders-fc/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-03-24-fc-dallas-vs-portland-timbers/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2018-04-21-columbus-crew-sc-vs-new-england-revolution/feed
Calculating 

 45%|████▌     | 9/20 [1:43:28<2:36:02, 851.13s/it]

premier_league_2017_2018.json
http://www.premierleague.com/match/22712
Calculating targets...
http://www.premierleague.com/match/22713
Calculating targets...
http://www.premierleague.com/match/22714
Calculating targets...
http://www.premierleague.com/match/22715
Calculating targets...
http://www.premierleague.com/match/22716
Calculating targets...
http://www.premierleague.com/match/22717
Calculating targets...
http://www.premierleague.com/match/22718
Calculating targets...
http://www.premierleague.com/match/22719
Calculating targets...
http://www.premierleague.com/match/22720
Calculating targets...
http://www.premierleague.com/match/22721
Calculating targets...
http://www.premierleague.com/match/22651
Calculating targets...
http://www.premierleague.com/match/22685
Calculating targets...
http://www.premierleague.com/match/22645
Calculating targets...
http://www.premierleague.com/match/22647
Calculating targets...
http://www.premierleague.com/match/22650
Calculating targets...
http://www

 50%|█████     | 10/20 [2:00:32<2:30:29, 902.95s/it]

mls_2016_2017.json
https://matchcenter.mlssoccer.com/matchcenter/2016-09-24-new-york-red-bulls-vs-montreal-impact/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-07-02-houston-dynamo-vs-philadelphia-union/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-03-06-la-galaxy-vs-dc-united/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-08-16-canada-womens-national-team-vs-germany-womens-national-team/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-06-15-dc-united-vs-fort-lauderdale-strikers/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-07-20-new-england-revolution-vs-philadelphia-union/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-09-02-honduras-vs-canada/feed
Calculating targets...
https://matchcenter.mlssoccer.com/matchcenter/2016-06-06-argentina-vs-chile/feed
Calculating targets...
https://matchcenter.ml

 55%|█████▌    | 11/20 [2:24:39<2:39:56, 1066.29s/it]

italian_serie_a_2017_2018.json
https://www.espn.com/soccer/report?gameId=491592
Calculating targets...
https://www.espn.com/soccer/report?gameId=491503
Calculating targets...
https://www.espn.com/soccer/report?gameId=491636
Calculating targets...
https://www.espn.com/soccer/report?gameId=491542
Calculating targets...
https://www.espn.com/soccer/report?gameId=491513
Calculating targets...
https://www.espn.com/soccer/report?gameId=491535
Calculating targets...
https://www.espn.com/soccer/report?gameId=491550
Calculating targets...
https://www.espn.com/soccer/report?gameId=491602
Calculating targets...
https://www.espn.com/soccer/report?gameId=491565
Calculating targets...
https://www.espn.com/soccer/report?gameId=491639
Calculating targets...
https://www.espn.com/soccer/report?gameId=491681
Calculating targets...
https://www.espn.com/soccer/report?gameId=491497
Calculating targets...
https://www.espn.com/soccer/report?gameId=491547
Calculating targets...
https://www.espn.com/soccer/repor

 60%|██████    | 12/20 [2:27:48<1:47:04, 803.03s/it] 

french_ligue_one_2019_2020.json
https://www.bbc.com/sport/football/50869599
Calculating targets...
https://www.bbc.com/sport/football/50801837
Calculating targets...
https://www.bbc.com/sport/football/50693042
Calculating targets...
https://www.bbc.com/sport/football/50666098
Calculating targets...
https://www.bbc.com/sport/football/50653515
Calculating targets...
https://www.bbc.com/sport/football/50203363
Calculating targets...
https://www.bbc.com/sport/football/50104968
Calculating targets...
https://www.bbc.com/sport/football/49939532
Calculating targets...
https://www.bbc.com/sport/football/51254981
Calculating targets...
https://www.bbc.com/sport/football/51126242
Calculating targets...
https://www.bbc.com/sport/football/51084232
Calculating targets...
https://www.bbc.com/sport/football/51690379
Calculating targets...
https://www.bbc.com/sport/football/51563538
Calculating targets...
https://www.bbc.com/sport/football/51517956
Calculating targets...
https://www.bbc.com/sport/foot

 65%|██████▌   | 13/20 [2:29:06<1:08:18, 585.55s/it]

german_bundesliga_2017_2018.json
https://www.espn.com/soccer/report?gameId=487224
Calculating targets...
https://www.espn.com/soccer/report?gameId=487200
Calculating targets...
https://www.espn.com/soccer/report?gameId=487139
Calculating targets...
https://www.espn.com/soccer/report?gameId=487170
Calculating targets...
https://www.espn.com/soccer/report?gameId=487134
Calculating targets...
https://www.espn.com/soccer/report?gameId=487251
Calculating targets...
https://www.espn.com/soccer/report?gameId=487115
Calculating targets...
https://www.espn.com/soccer/report?gameId=487206
Calculating targets...
https://www.espn.com/soccer/report?gameId=487152
Calculating targets...
https://www.espn.com/soccer/report?gameId=487228
Calculating targets...
https://www.espn.com/soccer/report?gameId=487168
Calculating targets...
https://www.espn.com/soccer/report?gameId=487218
Calculating targets...
https://www.espn.com/soccer/report?gameId=487258
Calculating targets...
https://www.espn.com/soccer/rep

 70%|███████   | 14/20 [2:31:09<44:40, 446.78s/it]  

german_bundesliga_2019_2020.json
https://www.bbc.com/sport/football/49865155
Calculating targets...
https://www.bbc.com/sport/football/49865162
Calculating targets...
https://www.bbc.com/sport/football/49783189
Calculating targets...
https://www.bbc.com/sport/football/49783161
Calculating targets...
https://www.bbc.com/sport/football/49702326
Calculating targets...
https://www.bbc.com/sport/football/50869516
Calculating targets...
https://www.bbc.com/sport/football/50830900
Calculating targets...
https://www.bbc.com/sport/football/50844804
Calculating targets...
https://www.bbc.com/sport/football/50829306
Calculating targets...
https://www.bbc.com/sport/football/50796810
Calculating targets...
https://www.bbc.com/sport/football/50796927
Calculating targets...
https://www.bbc.com/sport/football/50692979
Calculating targets...
https://www.bbc.com/sport/football/50692986
Calculating targets...
https://www.bbc.com/sport/football/50623425
Calculating targets...
https://www.bbc.com/sport/foo

 75%|███████▌  | 15/20 [2:33:30<29:35, 355.06s/it]

german_bundesliga_2018_2019.json
https://www.espn.com/soccer/report?gameId=517749
Calculating targets...
https://www.espn.com/soccer/report?gameId=517766
Calculating targets...
https://www.espn.com/soccer/report?gameId=517839
Calculating targets...
https://www.espn.com/soccer/report?gameId=517822
Calculating targets...
https://www.espn.com/soccer/report?gameId=517779
Calculating targets...
https://www.espn.com/soccer/report?gameId=517831
Calculating targets...
https://www.espn.com/soccer/report?gameId=517888
Calculating targets...
https://www.espn.com/soccer/report?gameId=517787
Calculating targets...
https://www.espn.com/soccer/report?gameId=517755
Calculating targets...
https://www.espn.com/soccer/report?gameId=517740
Calculating targets...
https://www.espn.com/soccer/report?gameId=517816
Calculating targets...
https://www.espn.com/soccer/report?gameId=517785
Calculating targets...
https://www.espn.com/soccer/report?gameId=517770
Calculating targets...
https://www.espn.com/soccer/rep

 80%|████████  | 16/20 [2:35:22<18:48, 282.08s/it]

spanish_la_liga_2017_2018.json
https://www.espn.com/soccer/report?gameId=490672
Calculating targets...
https://www.espn.com/soccer/report?gameId=490590
Calculating targets...
https://www.espn.com/soccer/report?gameId=490643
Calculating targets...
https://www.espn.com/soccer/report?gameId=490566
Calculating targets...
https://www.espn.com/soccer/report?gameId=490576
Calculating targets...
https://www.espn.com/soccer/report?gameId=490615
Calculating targets...
https://www.espn.com/soccer/report?gameId=490682
Calculating targets...
https://www.espn.com/soccer/report?gameId=490685
Calculating targets...
https://www.espn.com/soccer/report?gameId=490654
Calculating targets...
https://www.espn.com/soccer/report?gameId=490627
Calculating targets...
https://www.espn.com/soccer/report?gameId=490705
Calculating targets...
https://www.espn.com/soccer/report?gameId=490686
Calculating targets...
https://www.espn.com/soccer/report?gameId=490674
Calculating targets...
https://www.espn.com/soccer/repor

 85%|████████▌ | 17/20 [2:39:15<13:22, 267.55s/it]

spanish_la_liga_2018_2019.json
https://www.espn.com/soccer/report?gameId=521899
Calculating targets...
https://www.espn.com/soccer/report?gameId=521739
Calculating targets...
https://www.espn.com/soccer/report?gameId=521800
Calculating targets...
https://www.espn.com/soccer/report?gameId=521827
Calculating targets...
https://www.espn.com/soccer/report?gameId=521779
Calculating targets...
https://www.espn.com/soccer/report?gameId=521905
Calculating targets...
https://www.espn.com/soccer/report?gameId=521902
Calculating targets...
https://www.espn.com/soccer/report?gameId=521767
Calculating targets...
https://www.espn.com/soccer/report?gameId=521870
Calculating targets...
https://www.espn.com/soccer/report?gameId=521781
Calculating targets...
https://www.espn.com/soccer/report?gameId=521758
Calculating targets...
https://www.espn.com/soccer/report?gameId=521841
Calculating targets...
https://www.espn.com/soccer/report?gameId=521761
Calculating targets...
https://www.espn.com/soccer/repor

 90%|█████████ | 18/20 [2:42:34<08:13, 246.75s/it]

champions_league_2019_2020.json
https://www.bbc.com/sport/football/49732379
Calculating targets...
https://www.bbc.com/sport/football/49732414
Calculating targets...
https://www.bbc.com/sport/football/49732393
Calculating targets...
https://www.bbc.com/sport/football/49732407
Calculating targets...
https://www.bbc.com/sport/football/49731932
Calculating targets...
https://www.bbc.com/sport/football/49719155
Calculating targets...
https://www.bbc.com/sport/football/49719169
Calculating targets...
https://www.bbc.com/sport/football/49718948
Calculating targets...
https://www.bbc.com/sport/football/49719162
Calculating targets...
https://www.bbc.com/sport/football/51633790
Calculating targets...
https://www.bbc.com/sport/football/51634157
Calculating targets...
https://www.bbc.com/sport/football/51623058
Calculating targets...
https://www.bbc.com/sport/football/51623065
Calculating targets...
https://www.bbc.com/sport/football/51533280
Calculating targets...
https://www.bbc.com/sport/foot

 95%|█████████▌| 19/20 [2:47:09<04:15, 255.39s/it]

italian_serie_a_2018_2019.json
https://www.espn.com/soccer/report?gameId=522676
Calculating targets...
https://www.espn.com/soccer/report?gameId=522673
Calculating targets...
https://www.espn.com/soccer/report?gameId=522725
Calculating targets...
https://www.espn.com/soccer/report?gameId=522685
Calculating targets...
https://www.espn.com/soccer/report?gameId=522796
Calculating targets...
https://www.espn.com/soccer/report?gameId=522731
Calculating targets...
https://www.espn.com/soccer/report?gameId=522677
Calculating targets...
https://www.espn.com/soccer/report?gameId=522772
Calculating targets...
https://www.espn.com/soccer/report?gameId=522760
Calculating targets...
https://www.espn.com/soccer/report?gameId=522643
Calculating targets...
https://www.espn.com/soccer/report?gameId=522704
Calculating targets...
https://www.espn.com/soccer/report?gameId=522710
Calculating targets...
https://www.espn.com/soccer/report?gameId=522661
Calculating targets...
https://www.espn.com/soccer/repor

100%|██████████| 20/20 [2:50:33<00:00, 511.68s/it]


#### Todo

In [16]:
ltr = LTRFeaturesTargets(target_metric=target_metric, 
                        key_events=key_events,
                        lags=lags,
                        metric_params=metric_params,
                        count_vec_kwargs=count_vec_kwargs,
                        drop_teams=drop_teams,
                        lemma=lemma)

Setting target metric to rouge


In [17]:
ltr.run_target_features()

Reading targets from /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/targets/c868aa4c6d/targets.csv
Reading features from /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/features/b8bcd377c1/features.csv
Writing to /home/carlos/MasterDS/tfm/data/csv/summaries/ltr/features_targets/341d2aa93d/features_targets.csv


In [13]:
ltr.file_path

'/home/carlos/MasterDS/tfm/data/csv/summaries/ltr/features_targets/c4f2c5790f/features_targets.csv'