

# Proyecto

### Equipo:

- Sebastian Avendaño
- Felipe Urrutia

- \<Nombre de usuarios en Codalab\>

- \<Nombre del Equipo en Codalab\>

### Link de repositorio de GitHub: https://github.com/furrutiav/lab-mds-2022



## 0. Librerías Utilizadas

In [1]:
# Carga y Preparación de los datos

import pickle
import pandas as pd
import numpy as np
!pip install pyarrow

# EDA

import plotly.express as px



---
## 2. Preparación de los Datos

#### Carga y Preparación de los Datos

- Cargar los datos con Pandas y fusionar por `id`.
- Eliminar columnas `'poster_path'`, `'backdrop_path'`, `'recommendations'`.
- Filtrar ejemplos con `revenue` igual a 0.
- Filtrar ejemplos con `release_date` y `runtime` nulos.
- Convertir fechas de release_date a `pd.DateTime`.
- Conservar solo los ejemplos con `status` `"Released"`.
- Rellenar valores nulos categóricos y de texto con `''`.
- Discretizar `vote_average` a los siguientes bins y guardar los resultados en la columna `label`: 
  - (0, 5]: `'Negative'`
  - (5, 6]: `'Mixed'`
  - (6, 7]: `'Mostly Positive'`
  - (7, 8]: `'Positive'`
  - (8, 10]: `'Very Positive'`
- Eliminar la columna `vote_average` e `id`
- Renombrar la columna `revenue` por `target`.

Cargar los datos con Pandas

In [2]:
# datos: train_numerical_features.parquet y train_text_features.parquet
train_numerical_features = pd.read_parquet('train_numerical_features.parquet').set_index("id")
train_text_features = pd.read_parquet('train_text_features.parquet').set_index("id")

Fusionar por id

In [3]:
df = pd.concat([train_numerical_features, train_text_features], axis=1)

Eliminar columnas duplicadas

In [4]:
df = df.T.drop_duplicates().T

Eliminar columnas 'poster_path', 'backdrop_path', 'recommendations'

In [5]:
df = df.drop(columns=['poster_path', 'backdrop_path', 'recommendations'])

Filtrar ejemplos con revenue igual a 0.

In [6]:
df = df[df["revenue"]>0]

Filtrar ejemplos con release_date y runtime nulos.

In [7]:
df["release_date"].isna().sum()

0

In [8]:
df["runtime"].isna().sum()

0

Convertir fechas de release_date a pd.DateTime

In [9]:
df["release_date"] = df["release_date"].apply(pd.to_datetime)

Conservar solo los ejemplos con status "Released"

In [10]:
df = df[df["status"] == "Released"]

Rellenar valores nulos categóricos y de texto con ' '

In [11]:
df[train_text_features.columns.tolist()] = df[train_text_features.columns.tolist()].fillna("")

Discretizar vote_average a los siguientes bins y guardar los resultados en la columna label:

    (0, 5]: 'Negative'
    (5, 6]: 'Mixed'
    (6, 7]: 'Mostly Positive'
    (7, 8]: 'Positive'
    (8, 10]: 'Very Positive'


In [12]:
df["label"] = pd.cut(df['vote_average'], 
       bins=[0,5,6,7,8,10], 
       labels=["Negative", "Mixed", "Mostly Positive", "Positive", "Very Positive"])

Eliminar la columna vote_average e id

In [13]:
df = df.drop(columns="vote_average")
df = df.reset_index(drop=True)

Renombrar la columna revenue por target

In [14]:
df = df.rename(columns={"revenue": "target"})

Cargamos los clusters

In [15]:
clusters_BERT = pickle.load(open("clusters_BERT.pickle", "rb"))

In [16]:
df = df.merge(clusters_BERT, how="left", on="title")
df

Unnamed: 0,title,budget,target,runtime,status,tagline,credits,genres,original_language,overview,production_companies,release_date,keywords,label,clusters_keywords,clusters_overview,clusters_tagline
0,Fantastic Beasts: The Secrets of Dumbledore,200000000.0,400000000.0,142.0,Released,Return to the magic.,Jude Law-Eddie Redmayne-Mads Mikkelsen-Ezra Mi...,Fantasy-Adventure-Action,en,Professor Albus Dumbledore knows the powerful ...,Warner Bros. Pictures-Heyday Films,2022-04-06,magic-curse-fantasy world-wizard-magical creat...,Mostly Positive,8,14,11
1,Sonic the Hedgehog 2,110000000.0,393000000.0,122.0,Released,Welcome to the next level.,James Marsden-Ben Schwartz-Tika Sumpter-Natash...,Action-Adventure-Family-Comedy,en,After settling in Green Hills Sonic is eager t...,Original Film-Blur Studio-Marza Animation Plan...,2022-03-30,sequel-based on video game-hedgehog-live actio...,Positive,7,14,11
2,The Lost City,74000000.0,164289828.0,112.0,Released,The adventure is real. The heroes are not.,Sandra Bullock-Channing Tatum-Daniel Radcliffe...,Action-Adventure-Comedy,en,A reclusive romance novelist was sure nothing ...,Paramount-Fortis Films-3dot Productions-Exhibi...,2022-03-24,duringcreditsstinger,Mostly Positive,10,13,13
3,Morbius,75000000.0,161000000.0,105.0,Released,A new Marvel legend arrives.,Jared Leto-Matt Smith-Adria Arjona-Jared Harri...,Action-Science Fiction-Fantasy,en,Dangerously ill with a rare blood disorder and...,Columbia Pictures-Avi Arad Productions-Matt To...,2022-03-30,vampire-based on comic,Mostly Positive,7,10,4
4,Uncharted,120000000.0,400780000.0,116.0,Released,Fortune favors the bold.,Tom Holland-Mark Wahlberg-Sophia Ali-Tati Gabr...,Action-Adventure,en,A young street-smart Nathan Drake and his wise...,Columbia Pictures-Atlas Entertainment-PlayStat...,2022-02-10,treasure-treasure hunt-based on video game-dlb,Positive,14,14,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6446,Amistad,36000000.0,44229441.0,155.0,Released,Freedom is not given. It is our right at birth...,Morgan Freeman-Nigel Hawthorne-Anthony Hopkins...,Drama-History-Mystery,en,In 1839 the slave ship Amistad set sail from C...,DreamWorks Pictures,1997-12-10,cuba-mutiny-slavery-sentence-historical figure...,Mostly Positive,13,0,9
6447,Home Alone,18000000.0,476684675.0,103.0,Released,A family comedy without the family.,Macaulay Culkin-Joe Pesci-Daniel Stern-John He...,Comedy-Family,en,Eight-year-old Kevin McCallister makes the mos...,Hughes Entertainment-20th Century Fox,1990-11-16,holiday-burglar-slapstick-little boy-family re...,Positive,9,7,13
6448,Ip Man: The Final Fight,0.0,3967001.0,100.0,Released,,Anthony Wong-Anita Yuen-Gillian Chung-Jordan C...,Action-Drama,cn,In postwar Hong Kong legendary Wing Chun grand...,Emperor Motion Pictures-Cinemasia-National Art...,2013-03-22,biography,Mostly Positive,5,1,-1
6449,A Rainy Day in New York,25000000.0,23800000.0,92.0,Released,Love In Spring.,Timothée Chalamet-Elle Fanning-Selena Gomez-Ju...,Comedy-Romance,en,Two young people arrive in New York to spend a...,Gravier Productions-Perdido Productions-FilmNa...,2019-07-26,new york city,Mostly Positive,5,9,13


---

## 3. Preprocesamiento, Holdout y Feature Engineering

#### ColumnTransformer y Holdout

*Esta sección consiste en generar los distintos pasos para preparar sus datos con el fin de luego poder crear su modelo.*

Generar un ColumnTransformer que:

- Preprocese datos categóricos y ordinales.
- Escale/estandarice datos numéricos.
- Codifique texto.

Luego, pruebe las transformaciones utilizando `fit_transform` y `get_feature_names out`.

Posteriormente, ejecute un Holdout que le permita más adelante evaluar los modelos. **Recuerde eliminar los target y las labels del dataset antes de dividirlo**.

In [17]:
## Código Holdout
from sklearn.model_selection import train_test_split

In [18]:
X_clf, y_clf = df.drop(columns=["target", "label"]), df["label"]
X_reg, y_reg = df.drop(columns=["target", "label"]), df["target"]

In [19]:
X_clf_train, X_clf_test, y_clf_train, y_clf_test = train_test_split(X_clf, y_clf, test_size=0.20, random_state=23)
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.20, random_state=23)

In [20]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk import word_tokenize  
from nltk.stem import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Sebastian\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Sebastian\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [21]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder, FunctionTransformer

In [22]:
## Código ColumnTransformer
atributos_minmax = [
    "budget",
    "release_date_month",
    "release_date_day",
    "num_top_production_companies",
    "num_top_artists",
    "ratio(runtime, budget)",
    "num_artists",
    "num_genres",
    "num_production_companies",
    "prop(num_top_production_companies)",
    "prop(num_top_artists)"
]
atributos_st = [
    "runtime",
    "release_date_timestamp",
]

atributos_onehot = [
    "original_language",
    "clusters_keywords",
    "clusters_overview",
    "clusters_tagline"
]

stop_words = stopwords.words('english')

# Definimos un tokenizador con Stemming
class StemmerTokenizer:
    def __init__(self):
        self.ps = PorterStemmer()
    def __call__(self, doc):
        doc_tok = word_tokenize(doc.lower())
        doc_tok = [t for t in doc_tok if t not in stop_words]
        return [self.ps.stem(t) for t in doc_tok]
    
class SplitTokenizer:
    def __init__(self, char="-", col=""):
        self.char = char
        self.col = col
    def __call__(self, doc):
        if self.col == "production_companies":
            return doc.replace(" ", "_").replace("Metro-Goldwyn-Mayer", "MGM").split(self.char)
        elif self.col == "credits":
            tokens = doc.split("-")
            real_tokens = []
            for i, tk in enumerate(tokens[:-1]):
                if len(tokens[i+1].split()) == 1: tk = f"{tk} {tokens[i+1]}"
                if len(tk.split())>1: real_tokens.append(tk)
            return real_tokens
        else:
            return doc.replace(" ", "_").split(self.char)

ct = ColumnTransformer([
    ("OneHot", 
     OneHotEncoder(handle_unknown="ignore"), 
     atributos_onehot
    ),
    ("MinMax", 
     MinMaxScaler(), 
     atributos_minmax
    ),
    ("Standard", 
     StandardScaler(), 
     atributos_st
    ),
    ("BOW1", 
     CountVectorizer(tokenizer= StemmerTokenizer()), 
     "overview"
    ),
    ("OneHot_split_genres", 
     CountVectorizer(tokenizer= SplitTokenizer()), 
     "genres"
    ),
    ("BOW2", 
     CountVectorizer(tokenizer= StemmerTokenizer()), 
     "tagline"
    ),
    ("OneHot_split_credits", 
     CountVectorizer(tokenizer= SplitTokenizer(col="credits")), 
     "credits"
    ),
    ("OneHot_split_production_companies", 
     CountVectorizer(tokenizer= SplitTokenizer(col="production_companies")), 
     "production_companies"
    ),
    ("OneHot_split_keywords", 
     CountVectorizer(tokenizer= SplitTokenizer()), 
     "keywords"
    ),
])

ct_wo_bow = ColumnTransformer([
    ("OneHot", 
     OneHotEncoder(handle_unknown="ignore"), 
     atributos_onehot
    ),
    ("MinMax", 
     MinMaxScaler(), 
     atributos_minmax
    ),
    ("Standard", 
     StandardScaler(), 
     atributos_st
    ),
    ("OneHot_split_genres", 
     CountVectorizer(tokenizer= SplitTokenizer()), 
     "genres"
    ),
    ("OneHot_split_credits", 
     CountVectorizer(tokenizer= SplitTokenizer(col="credits")), 
     "credits"
    ),
    ("OneHot_split_production_companies", 
     CountVectorizer(tokenizer= SplitTokenizer(col="production_companies")), 
     "production_companies"
    ),
    ("OneHot_split_keywords", 
     CountVectorizer(tokenizer= SplitTokenizer()), 
     "keywords"
    ),
])

#### Feature Engineering

Adicionalmente puede generar una nueva transformación que genere nuevas features y que se aplique antes del ColumnTransformer dentro del pipeline de los modelos. Investigar [`FunctionTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html) para ver como implementar una transformación a partir de una función que tome un dataframe y entregue uno distinto en la salida.

- Encodear ciclicamente los meses/días de las fechas de lanzamiento.
- Contar cuantas veces aparecen en las peliculas ciertos personajes célebres.
- Indicar si la pelicula es de una productora famosa o no.
- Agrupar distintas keywords en categorías más generales.
- Generar ratios con las variables numericas del dataset (como duración de la película/presupuesto).
- Contar los diferentes generos similares que posee una pelicula.
- Extraer vectores desde los overviews de las peliculas.
- Contar el número de actores/productoras/géneros.
- Etc... Usen su creatividad!

Nuevamente, recuerde no utilizar ni los targets ni las labels para generar nuevas features.

Nota: Este último paso no es requisito pero puede catapultarlos a la cima del tablero de las competencias.

In [23]:
from sklearn.preprocessing import FunctionTransformer

In [24]:
def feature_extractor(X):
    movies = X.copy()
    movies["release_date_timestamp"] = movies['release_date'].apply(lambda x: x.timestamp())
    movies["release_date_month"] = movies['release_date'].dt.month
    movies["release_date_day"] = movies['release_date'].dt.day
    
    top_companies = ['Warner Bros. Pictures', 'Universal Pictures', 'Columbia Pictures',
       'Paramount', '20th Century Fox', 'Canal+', 'New Line Cinema',
       'Metro-Goldwyn-Mayer', 'Lionsgate', 'Relativity Media',
       'StudioCanal', 'Touchstone Pictures', 'Walt Disney Pictures',
       'DreamWorks Pictures', 'Miramax']
    
    movies["num_top_production_companies"] = movies["production_companies"].apply(
        lambda x: 
        sum(
            [int(comp in str(x)) for comp in top_companies]
        )
    )
    
    top_artists = ['Samuel L. Jackson', 'Frank Welker', 'Nicolas Cage',
       'Bruce Willis', 'Robert De Niro', 'Matt Damon', 'Liam Neeson',
       'Willem Dafoe', 'Morgan Freeman', 'J.K. Simmons', 'Steve Buscemi',
       'Johnny Depp', 'John Goodman', 'Paul Giamatti', 'Stanley Tucci',
       'Woody Harrelson', 'Brad Pitt', 'Mickie McGowan', 'John Leguizamo',
       'Robin Williams', 'Sylvester Stallone', 'Tom Hanks',
       'Michael Papajohn', 'Nicole Kidman', 'Thomas Rosales Jr.',
       'James Franco', 'Harrison Ford', 'Ben Affleck', 'Owen Wilson',
       'Stephen Root', 'Julianne Moore', 'Ben Kingsley',
       'Antonio Banderas', 'Anthony Hopkins', 'Alec Baldwin',
       'Joe Chrest', 'Bill Hader', 'Richard Jenkins', 'John C. Reilly',
       'Bill Murray', 'John Hurt', 'Elizabeth Banks', 'Michael Caine',
       'Ewan McGregor', 'Keith David', 'Susan Sarandon',
       'Fred Tatasciore', 'Bob Bergen', 'Scarlett Johansson',
       'Keanu Reeves']
    movies["num_top_artists"] = movies["credits"].apply(
        lambda x: 
        sum(
            [int(art in str(x).replace("-", " ")) for art in top_artists]
        )
    )
    movies["ratio(runtime, budget)"] = (movies["runtime"]/(1+movies["budget"]))
    # Contar el número de actores/productoras/géneros.
    movies["num_artists"] = movies["credits"].apply(
        lambda x:
        sum(
            [len(str(x).split("-"))]
        )
    )
    movies["num_genres"] = movies["genres"].apply(
        lambda x: len(str(x).split("-"))
    )
    movies["num_production_companies"] = movies["production_companies"].apply(
        lambda x: len(str(x).split("-"))
    )
    movies["prop(num_top_production_companies)"] = movies["num_top_production_companies"]/(1+movies["num_production_companies"])
    movies["prop(num_top_artists)"] = movies["num_top_artists"]/(1+movies["num_artists"])
    return movies

feature_tranformer = FunctionTransformer(feature_extractor)
feature_tranformer

FunctionTransformer(func=<function feature_extractor at 0x000001D10F100B80>)

```
Comentarios
```

---

## 4. Regresión

### 4.1 Dummy y Baseline

In [25]:
from sklearn.dummy import DummyRegressor

In [26]:
## Código Dummy
dummy_reg = DummyRegressor(strategy="mean")
dummy_reg.fit(X_reg_train, y_reg_train)

y_pred_dummy = dummy_reg.predict(X_reg_test)

In [27]:
from sklearn import linear_model
from sklearn.pipeline import Pipeline

In [28]:
## Código Regresor Baseline LASSO
pipe_reg_baseline = Pipeline(
    steps=[
        ("FunctionTransformer", feature_tranformer),
        ("ColumnTransformer", ct),
        ("reg", linear_model.Lasso(alpha=0.1))
    ]
)

In [29]:
%%time
pipe_reg_baseline.fit(X_reg_train, y_reg_train)

CPU times: total: 12.7 s
Wall time: 12.3 s


Pipeline(steps=[('FunctionTransformer',
                 FunctionTransformer(func=<function feature_extractor at 0x000001D10F100B80>)),
                ('ColumnTransformer',
                 ColumnTransformer(transformers=[('OneHot',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['original_language',
                                                   'clusters_keywords',
                                                   'clusters_overview',
                                                   'clusters_tagline']),
                                                 ('MinMax', MinMaxScaler(),
                                                  ['budget',
                                                   'release_date_mon...
                                                  CountVectorizer(tokenizer=<__main__.SplitTokenizer object at 0x000001D10F106E20>),
                                             

In [30]:
%%time 
y_reg_baseline = pipe_reg_baseline.predict(X_reg_test)

CPU times: total: 2.22 s
Wall time: 1.35 s


In [31]:
from sklearn.metrics import r2_score

In [32]:
## Código Comparación de métricas
print(
    "Regresor Dummy\n"+str(r2_score(y_reg_test, y_pred_dummy))
)

Regresor Dummy
-0.0003707986340903968


In [33]:
print(
    "Regresor baseline: Lasso\n"+str(r2_score(y_reg_test, y_reg_baseline))
)

Regresor baseline: Lasso
0.11560134261108834


```
Justificación
```

---

### 4.2 Búsqueda del mejor modelo de Regresión


In [44]:
from sklearn import svm
from sklearn.ensemble import BaggingRegressor
from sklearn.feature_selection import SelectPercentile, f_classif, chi2, SelectKBest

In [36]:
def test_pipe_reg(model_class, params, params_selector):
    model = model_class(**params)
    pipe = Pipeline(
        steps=[
            ("FunctionTransformer", feature_tranformer),
            ("ColumnTransformer", ct),
            ("selector", SelectPercentile(f_classif, **params_selector)),
            ('reg', model)
        ]
     )
    pipe.fit(X_reg_train, y_reg_train)
    y_pred = pipe.predict(X_reg_test)
    print(
    f"Clasificador: {model_class.__name__}\n"+str(r2_score(y_reg_test, y_pred)))

In [104]:
%%time
test_pipe_reg(
    linear_model.Lasso,
    params={
        "alpha":0.5
    },
    params_selector={'percentile': 95}
)

  f = msb / msw


Clasificador: Lasso
0.11957285913143234
CPU times: total: 47.3 s
Wall time: 45.8 s


In [103]:
%%time
test_pipe_reg(
    linear_model.Ridge,
    params={
        "alpha":0.5,
        'tol':1e-5
    },
    params_selector={'percentile': 95}
)

  f = msb / msw


Clasificador: Ridge
0.5371698205340906
CPU times: total: 26.3 s
Wall time: 23.9 s


In [None]:
%%time
#en gridsearch usar este en vez de los dos anteriores, e incluir l1_ratio=1 para Lasso
# o l1_ratio=0 para ridge
test_pipe_reg( 
    linear_model.ElasticNet,
    params={
        "alpha":0.5,
        'tol':1e-4,
        'l1_ratio':0
    },
    params_selector={'percentile': 95}
)

In [46]:
%%time
test_pipe_reg( 
    svm.SVR,
    params={
        'kernel':'rbf', #rbf(default), linear, poly
        'C':100,
        'gamma': 'auto',
        'degree': 3,
        'epsilon': 0.1,
        'coef0': 1
    },
    params_selector={'percentile': 95}
)

  f = msb / msw


Clasificador: SVR
-0.10362704355148145
CPU times: total: 42.2 s
Wall time: 42.2 s


In [47]:
%%time
test_pipe_reg( 
    BaggingRegressor,
    params={
        'base_estimator': svm.SVR(),
        'n_estimators': 10,
    },
    params_selector={'percentile': 95}
)

  f = msb / msw


Clasificador: BaggingRegressor
-0.10287810808425646
CPU times: total: 1min 53s
Wall time: 1min 53s


In [None]:
### Código GridSearch

In [None]:
### Código Predicción de datos de la competencia aquí

---

## 5. Conclusiones Regresión

Conclusiones...



---

<br>

### Anexo: Generación de Archivo Submit de la Competencia

Para subir los resultados obtenidos a la pagina de CodaLab utilice la función `generateFiles` entregada mas abajo. Esto es debido a que usted deberá generar archivos que respeten extrictamente el formato de CodaLab, de lo contario los resultados no se veran reflejados en la pagina de la competencia.

Para los resultados obtenidos en su modelo de clasificación y regresión, estos serán guardados en un archivo zip que contenga los archivos `predicctions_clf.txt` para la clasificación y `predicctions_rgr.clf` para la regresión. Los resultados, como se comento antes, deberan ser obtenidos en base al dataset `test.pickle` y en cada una de las lineas deberan presentar las predicciones realizadas.

Ejemplos de archivos:

- [ ] `predicctions_clf.txt`

        Mostly Positive
        Mostly Positive
        Negative
        Positive
        Negative
        Positive
        ...

- [ ] `predicctions_rgr.txt`

        16103.58
        16103.58
        16041.89
        9328.62
        107976.03
        194374.08
        ...



In [None]:
from zipfile import ZipFile
import os

def generateFiles(predict_data, clf_pipe, rgr_pipe):
    """Genera los archivos a subir en CodaLab

    Input
    predict_data: Dataframe con los datos de entrada a predecir
    clf_pipe: pipeline del clf
    rgr_pipe: pipeline del rgr

    Ouput
    archivo de txt
    """
    y_pred_clf = clf_pipe.predict(predict_data)
    y_pred_rgr = rgr_pipe.predict(predict_data)
    
    with open('./predictions_clf.txt', 'w') as f:
        for item in y_pred_clf:
            f.write("%s\n" % item)

    with open('./predictions_rgr.txt', 'w') as f:
        for item in y_pred_rgr:
            f.write("%s\n" % item)

    with ZipFile('predictions.zip', 'w') as zipObj2:
       zipObj2.write('predictions_rgr.txt')
       zipObj2.write('predictions_clf.txt')

    os.remove("predictions_rgr.txt")
    os.remove("predictions_clf.txt")

In [None]:
# Ejecutar función para generar el archivo de predicciones.
# perdict_data debe tener cargada los datos del text.pickle
# mientras que clf_pipe y rgr_pipe, son los pipeline de 
# clasificación y regresión respectivamente.
generateFiles(predict_data, clf_pipe, rgr_pipe)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=87110296-876e-426f-b91d-aaf681223468' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>