## 1. Problem statement

The City of Los Angeles faces a big hiring challenge: 1/3 of its 50,000 workers are eligible to retire by July of 2020. The city has partnered with Kaggle to create a competition to improve the job bulletins that will fill all those open positions.

The content, tone, and format of job bulletins can influence the quality of the applicant pool. Overly-specific job requirements may discourage diversity. The Los Angeles Mayor’s Office wants to reimagine the city’s job bulletins by using text analysis to identify needed improvements.

The goal is to convert a folder full of plain-text job postings into a single structured CSV file and then to use this data to:
* (1) Identify language that can negatively bias the pool of applicants ;
* (2) Improve the diversity and quality of the applicant pool ; and/or
* (3) Make it easier to determine which promotions are available to employees in each job class.

## 2. Approach

In this kernel we focus on parts (2) and (3) of the problem statement. We present a keyword-based approach, which consists of two parts:
* **Part 1: A keyword-based recommender system**

We design a recommender system which helps to find related vacancies, which will facilitate the search for promotions upwards in the hierarchical path, as well as opportunities for horizontal movement/job rotation ;
* **Part 2: Automated keyword assignment**

As part 1 illustrates the need for well chosen keywords, we develop a model to try to automate keyword assignment to the vacancies as a first step to improve the visibility and searchability of the vacancies.

### References

[1] Es Shahul (2019) Discovering opportunities at LA (https://www.kaggle.com/shahules/discovering-opportunities-at-la)

* As we decided to focus on the solution itself, we made use of the code created by Shahul Es to read and preprocess the job vacancies into a single dataframe. Shahul Es has done a tremendous job in processing these files, so we are very grateful to be able to use this code.

[2] Jobscan (2018) Top 500 Resume Keywords: Examples for Your Job Search (https://www.jobscan.co/blog/top-resume-keywords-boost-resume/)

* We used a list of the 500 most frequently occurring job related keywords as a starting point to create the recommender system and train the keyword assignment model.



## 3. Part 1: A keyword-based recommender system

Importing packages and data

In [None]:
import numpy as np
import pandas as pd
import re
import os
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import chi2, SelectKBest
from sklearn.base import BaseEstimator, TransformerMixin
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import plotly.offline as py
import plotly.graph_objs as go
import plotly.tools as tls
py.init_notebook_mode(connected=True)
import copy
from scipy import sparse
from itertools import combinations
from warnings import warn

datadir=r"../input/data-science-for-good-city-of-los-angeles/cityofla/CityofLA"
keywords_file = r"../input/top-500-resume-keywords/resume_keywords_clean.txt"


Read and preprocess the vacancies (this code section is written by [1])

In [None]:
# import textstat
files = [dir for dir in os.walk(datadir)]
bulletins = os.listdir(datadir + "/Job Bulletins/")
additional = os.listdir(datadir + "/Additional data/")
bulletins = os.listdir(datadir + "/Job Bulletins/")
additional = os.listdir(datadir + "/Additional data/")

csvfiles = []
for file in additional:
    if file.endswith('.csv'):
        csvfiles.append(datadir + "/Additional data/" + file)
csvfiles = []
for file in additional:
    if file.endswith('.csv'):
        csvfiles.append(datadir + "/Additional data/" + file)

job_title = pd.read_csv(csvfiles[0])
sample_job = pd.read_csv(csvfiles[1])
kaggle_data = pd.read_csv(csvfiles[2])
job_title = pd.read_csv(csvfiles[0])
sample_job = pd.read_csv(csvfiles[1])
kaggle_data = pd.read_csv(csvfiles[2])
job_title.head()
print("The are %d rows and %d cols in job_title file" % (job_title.shape))
print("The are %d rows and %d cols in sample_job file" % (sample_job.shape))
print("The are %d rows and %d cols in kaggle_data file" % (kaggle_data.shape))
print("There are %d text files in bulletin directory" % len(bulletins))

def get_headings(bulletin):
    with open(datadir + "/Job Bulletins/" + bulletins[bulletin]) as f:  ##reading text files
        data = f.read().replace('\t', '').split('\n')
        data = [head for head in data if head.isupper()]
        return data

def clean_text(bulletin):
    with open(datadir + "/Job Bulletins/" + bulletins[bulletin]) as f:
        data = f.read().replace('\t', '').replace('\n', '')
        return data

get_headings(1)
get_headings(2)

def to_dataframe(num, df):
    opendate = re.compile(r'(Open [D,d]ate:)(\s+)(\d\d-\d\d-\d\d)')  # match open date
    salary = re.compile(r'\$(\d+,\d+)((\s(to|and)\s)(\$\d+,\d+))?')  # match salary
    requirements = re.compile(r'(REQUIREMENTS?/\s?MINIMUM QUALIFICATIONS?)(.*)(PROCESS NOTE)')  # match requirements

    for no in range(0, num):
        with open(datadir + "/Job Bulletins/" + bulletins[no],
                  encoding="ISO-8859-1") as f:  # reading files
            try:
                file = f.read().replace('\t', '')
                data = file.replace('\n', '')
                headings = [heading for heading in file.split('\n') if heading.isupper()]  ##getting heading from job bulletin

                sal = re.search(salary, data)
                date = datetime.strptime(re.search(opendate, data).group(3), '%m-%d-%y')
                try:
                    req = re.search(requirements, data).group(2)
                except Exception as e:
                    req = re.search('(.*)NOTES?', re.findall(r'(REQUIREMENTS?)(.*)(NOTES?)',
                                                             data)[0][1][:1200]).group(1)

                duties = re.search(r'(DUTIES)(.*)(REQ[A-Z])', data).group(2)
                try:
                    enddate = re.search(
                        r'(JANUARY|FEBRUARY|MARCH|APRIL|MAY|JUNE|JULY|AUGUST|SEPTEMBER|OCTOBER|NOVEMBER|DECEMBER)\s(\d{1,2},\s\d{4})'
                        , data).group()
                except Exception as e:
                    enddate = np.nan

                selection = [z[0] for z in re.findall('([A-Z][a-z]+)((\s\.\s)+)', data)]  ##match selection criteria

                df = df.append({'File Name': bulletins[no], 'Position': headings[0].lower(), 'salary_start': sal.group(1),
                                'salary_end': sal.group(5), "opendate": date, "requirements": req, 'duties': duties,
                                'deadline': enddate, 'selection': selection}, ignore_index=True)

                reg = re.compile(
                    r'(One|Two|Three|Four|Five|Six|Seven|Eight|Nine|Ten|one|two|three|four)\s(years?)\s(of\sfull(-|\s)time)')
                df['EXPERIENCE_LENGTH'] = df['requirements'].apply(
                    lambda x: re.search(reg, x).group(1) if re.search(reg, x) is not None else np.nan)
                df['FULL_TIME_PART_TIME'] = df['EXPERIENCE_LENGTH'].apply(lambda x: 'FULL_TIME' if x is not np.nan else np.nan)

                reg = re.compile(r'(One|Two|Three|Four|Five|Six|Seven|Eight|Nine|Ten|one|two|three|four)(\s|-)(years?)\s(college)')
                df['EDUCATION_YEARS'] = df['requirements'].apply(
                    lambda x: re.search(reg, x).group(1) if re.search(reg, x) is not None else np.nan)
                df['SCHOOL_TYPE'] = df['EDUCATION_YEARS'].apply(lambda x: 'College or University' if x is not np.nan else np.nan)

            except Exception as e:
                ''
                #print('Failed to read file ' + bulletins[no])

    return df

df = pd.DataFrame(
    columns=['File Name', 'Position', 'salary_start', 'salary_end', 'opendate', 'requirements', 'duties', 'deadline'])
df = to_dataframe(len(bulletins), df)
df.replace(to_replace=[None], value="N/A", inplace=True)

We start our analysis by processing the job duties using the TfidfVectorizer method, with the external keyword list of 500 items ([2]) as input dictionary. Basically this algorithm creates a big matrix which lists for every keyword how well it fits with the individual vacancies.

In [None]:
field="duties"
df_field=df[field].to_frame()

resume_keywords = list(pd.read_csv(keywords_file, header=None)[0].values)
vocab_keywords = dict(zip(resume_keywords, np.arange(len(resume_keywords))))

#tf-idf: count word frequencies
tfidf = TfidfVectorizer(vocabulary=vocab_keywords,ngram_range=[1,4])

# Apply fit_transform to document: csr_mat
csr_mat = tfidf.fit_transform(df_field[field])
words = tfidf.get_feature_names() #These are the words which the TfidfVectorizer detected, in our case this is simply the vocabulary we provided.

Let's find the top 5 related jobs for an example vacancy. As an example we list below:
* Job title
* Job duties
* Salary range

The salary range is a major criterion to find opportunities for promotions. However, we decided not to implement salary as a fixed criterion into the algorithm, since employees may also be interested in horizontal promotion/job rotation rather than moving upwards in the existing hierarchical path. Simply listing the salary range also provides a first indication.

In [None]:
job_nr=1 #An example job description

NMF (Non-negative Matrix Factorization) can be used to find the related vacancies. However matrix multiplication also does the job and runs a bit faster.

In [None]:
#Alternative but faster approach similar to NMF
QQ=csr_mat.dot(csr_mat.transpose()).todense()
recommendations_job_nr=np.flip(np.argsort(np.asarray(QQ)[job_nr,:]))

print("Original job description: " + df.loc[job_nr,"Position"] + " ($" + df.loc[job_nr,"salary_start"] + " - " + df.loc[job_nr,"salary_end"] + ")")
print(df.loc[job_nr,field])
print("")

for iRecommendation in range(5):
    print("Recommendation #" + str(iRecommendation+1) + ": " + df.loc[recommendations_job_nr[iRecommendation+1],"Position"] + " ($" + df.loc[recommendations_job_nr[iRecommendation+1],"salary_start"] + " - " + df.loc[recommendations_job_nr[iRecommendation+1],"salary_end"] + ")")  #"+1" removes the original job description itself
    print(df.loc[recommendations_job_nr[iRecommendation+1],field])
    print("")


df.assign(Recommendation_1="").assign(Recommendation_2="").assign(Recommendation_3="").assign(Recommendation_4="").assign(Recommendation_5="")
for job_nr in range(len(df)):
    recommendations_job_nr=np.flip(np.argsort(np.asarray(QQ)[job_nr,:]))
    for iRecommendation in range(5):
        df.loc[job_nr,"Recommendation_" + str(iRecommendation+1)]=df.loc[recommendations_job_nr[iRecommendation+1],"Position"]

After inspecting the related vacancies, it is clear that these are more or less related. However, the results are obviously also not ideal:
* Several vacancies have little or no similar vacancies, making it impossible to find the "perfect" promotion opportunity ;
* The keywords file contains the 500 most often used keywords in the Jobscan.co database. The list can be expanded with more keywords specifically related to the City of Los Angeles. This one-time task can significantly improve the algorithm further.

However, we do not necessarily need "perfect" matches. Contrariwise, the algorithm often provides related vacancies but in a different field, which are especially the type of vacancies of which the employee is not aware of, although some of them may be suitable positions as well!


But how useful is the list of keywords? We can have a look at the number of occurrences of each keyword in distinct vacancies.

In [None]:
occurrences_per_keyword = np.squeeze(np.asarray(sum(csr_mat.todense()!=0)))
data = [go.Histogram(x=occurrences_per_keyword, xbins=dict(start=0,end=140,size= 1),opacity=0.75)]
layout = go.Layout(title='Number of occurrences per keyword',
    xaxis=dict(title='Number of occurrences in different vacancies'),yaxis=dict(title='Number of keywords'),bargap=0.05)
py.iplot(go.Figure(data=data, layout=layout))

A lot of keywords do not occur in the vacancies at all, but quite a few occur in more than 20 distinct vacancies, with one keyword reaching even 132!

Let's plot the occurrence relative to the position of the keyword in the Jobscan.co ranking to investigate which keywords score high.

In [None]:
data = go.Scatter(x = df.index.values,y = occurrences_per_keyword, mode = 'markers', text=resume_keywords, opacity=0.75)
layout = go.Layout(title='Jobscan.co keyword ranking vs. City of LA occurrences',
    xaxis=dict(title='Jobscan.co keyword ranking'),yaxis=dict(title='Number of occurrences in City of LA vacancies'))
py.iplot(go.Figure(data=[data], layout=layout))

Similarly we can plot a histogram of the number of keywords per vacancy, to detect vacancies which are vague.

In [None]:
keywords_per_vacancy = np.squeeze(np.asarray(sum(csr_mat.todense().transpose()!=0)))
data = [go.Histogram(x=keywords_per_vacancy, xbins=dict(start=0,end=99,size= 1),opacity=0.75)]
layout = go.Layout(title='Number of keywords per vacancy',
    xaxis=dict(title='Number of keywords'),yaxis=dict(title='Number of vacancies'),bargap=0.05)
py.iplot(go.Figure(data=data, layout=layout))
keywords_per_vacancy==0
df.loc[keywords_per_vacancy==0,'Position'].head(5)
df['Matching_keywords']=keywords_per_vacancy

Notice that more than 100 vacancies have no matching keywords at all. Does this mean that the job duties description is too vague?
To investigate this, we have listed 5 vacancies which have no matching keywords. After verifying the descriptions, they do seem to be quite accurate. This implicates that the keyword list should be extended with keywords occurring in these vacancies. The created list of vacancies with no matching keywords is therefore a good starting point to extend the keywords list further manually.

How well did the clustering algorithm of the vacancies itself perform? Let's investigate using a hierarchical cluster plot. 
The plot aims to group vacancies with similar keywords together by rearranging the sequence of the vacancies and the keywords.

In [None]:
df_cm=pd.DataFrame(csr_mat.todense())
df_cm.index.name = 'Vacancies'
cm=sns.clustermap(df_cm,figsize=(15, 15))
cm.fig.suptitle('Clustered vacancies based on keywords') 
cm.ax_heatmap.set_xlabel('Keywords')
cm.ax_heatmap.set_ylabel('Vacancies')

The plot above looks quite messy due to the large amount of keywords and vacancies. However it shows some clear insights:
* The black border on the right indicates that a lot of keywords do not occur in the vacancies at all, as we saw before ;
* The black border at the bottom indicates that several vacancies have no matches at all with any of the keywords, so there is room for improvement adding more job-specific keywords ;
* The colored vertical stripes indicate similar vacancies based on similar keywords. Quite a few clear clusters are visible!

We finish this part of the analysis with a list of the first 20 related vacancies. It's not perfect, but seems to do it's job quite well.

In [None]:
print(df.loc[cm.dendrogram_row.reordered_ind,'Position'].head(20))

### 4. Part 2: Automated keyword assignment

First we define a few manual functions below.

In [None]:
def remove_words(text, word_list):
    text = text.lower()
    for word in word_list:
        pattern = r"\b" + word.lower() + r"\b"
        text = re.sub(pattern, "", text)
    return text

# from DataCamp course "Machine learning with the experts"
def combine_text_columns(data_frame, to_keep):
    """ converts all text in each row of data_frame to single vector"""
    text_data = data_frame[to_keep]

    # Replace nans with blanks
    text_data.fillna("", inplace=True)

    # Join all text items in a row that have a space in between
    return text_data.apply(lambda x: " ".join(x), axis=1)

# from github pjbull/SparseInteractions.py
class SparseInteractions(BaseEstimator, TransformerMixin):
    def __init__(self, degree=2, feature_name_separator="_"):
        self.degree = degree
        self.feature_name_separator = feature_name_separator

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        if not sparse.isspmatrix_csc(X):
            X = sparse.csc_matrix(X)

        if hasattr(X, "columns"):
            self.orig_col_names = X.columns
        else:
            self.orig_col_names = np.array([str(i) for i in range(X.shape[1])])

        spi = self._create_sparse_interactions(X)
        return spi

    def get_feature_names(self):
        return self.feature_names

    def _create_sparse_interactions(self, X):
        out_mat = []
        self.feature_names = self.orig_col_names.tolist()

        for sub_degree in range(2, self.degree + 1):
            for col_ixs in combinations(range(X.shape[1]), sub_degree):
                # add name for new column
                name = self.feature_name_separator.join(self.orig_col_names[list(col_ixs)])
                self.feature_names.append(name)

                # get column multiplications value
                out = X[:, col_ixs[0]]
                for j in col_ixs[1:]:
                    out = out.multiply(X[:, j])

                out_mat.append(out)

        return sparse.hstack([X] + out_mat)

# from github drivendataorg/box-plots-sklearn
def multilabel_sample(y, size=1000, min_count=5, seed=None):
    """ Takes a matrix of binary labels `y` and returns
        the indices for a sample of size `size` if
        `size` > 1 or `size` * len(y) if size =< 1.
        The sample is guaranteed to have > `min_count` of
        each label.
    """
    try:
        if (np.unique(y).astype(int) != np.array([0, 1])).any():
            raise ValueError()
    except (TypeError, ValueError):
        raise ValueError('multilabel_sample only works with binary indicator matrices')

    if (y.sum(axis=0) < min_count).any():
        raise ValueError('Some classes do not have enough examples. Change min_count if necessary.')

    if size <= 1:
        size = np.floor(y.shape[0] * size)

    if y.shape[1] * min_count > size:
        msg = "Size less than number of columns * min_count, returning {} items instead of {}."
        warn(msg.format(y.shape[1] * min_count, size))
        size = y.shape[1] * min_count

    rng = np.random.RandomState(seed if seed is not None else np.random.randint(1))

    if isinstance(y, pd.DataFrame):
        choices = y.index
        y = y.values
    else:
        choices = np.arange(y.shape[0])

    sample_idxs = np.array([], dtype=choices.dtype)

    # first, guarantee > min_count of each label
    for j in range(y.shape[1]):
        label_choices = choices[y[:, j] == 1]
        label_idxs_sampled = rng.choice(label_choices, size=min_count, replace=False)
        sample_idxs = np.concatenate([label_idxs_sampled, sample_idxs])

    sample_idxs = np.unique(sample_idxs)

    # now that we have at least min_count of each, we can just random sample
    sample_count = int(size - sample_idxs.shape[0])

    # get sample_count indices from remaining choices
    remaining_choices = np.setdiff1d(choices, sample_idxs)
    remaining_sampled = rng.choice(remaining_choices,
                                   size=sample_count,
                                   replace=False)

    return np.concatenate([sample_idxs, remaining_sampled])

def multilabel_train_test_split(X, Y, size, min_count=5, seed=None):
    """ Takes a features matrix `X` and a label matrix `Y` and
        returns (X_train, X_test, Y_train, Y_test) where all
        classes in Y are represented at least `min_count` times.
    """
    index = Y.index if isinstance(Y, pd.DataFrame) else np.arange(Y.shape[0])

    test_set_idxs = multilabel_sample(Y, size=size, min_count=min_count, seed=seed)
    train_set_idxs = np.setdiff1d(index, test_set_idxs)

    test_set_mask = index.isin(test_set_idxs)
    train_set_mask = ~test_set_mask

    return (X[train_set_mask], X[test_set_mask], Y[train_set_mask], Y[test_set_mask])


Data preprocessing: Combining text columns, here finally only one text column (duties) was selected. Outcome data (keywords to predict) for the model is created by tokenizing the text data, using the resume keywords as vocabulary.

In [None]:
import warnings
warnings.filterwarnings("ignore")
# data
text_columns = ["duties"]
text_data = combine_text_columns(df, to_keep=text_columns)
text_data.head()

# Tfidf for y labels - with smaller ngram_range to reduce computation time
keywords = resume_keywords
vec_tf = TfidfVectorizer(vocabulary=keywords, ngram_range=[1,3])
freq = vec_tf.fit_transform(text_data)
freq_df = pd.DataFrame(freq.toarray(), columns=keywords)

# List most frequent keywords
top_keywords = freq_df.sum().sort_values(ascending=False)[0:50]
not_occurring = freq_df.sum()[freq_df.sum() == 0]

A random forest model is trained using the tokenized text as input. Only keywords occurring at least 20 times in the text data were chosen as labels to predict, due to the limited number of vacancies and hence training data.

In [None]:
# model
# binary version of most frequent outcome (y) values = keywords
use_keywords = list(top_keywords[0:24].index)
freq_df_bin = freq_df.loc[:, freq_df.columns.isin(use_keywords)]
freq_df_bin[freq_df_bin>0]=1
freq_df_bin.shape
X_train, X_test, y_train, y_test = multilabel_train_test_split(text_data, freq_df_bin, size=0.25, min_count=3, seed=467)
pl = Pipeline([
        ('vectorizer', TfidfVectorizer(ngram_range=[1,3])),  # no vocab here, want to use all the text
        ('feature_sel', SelectKBest(chi2, 100)),
        ('int', SparseInteractions(degree=2)),
        ('clf', OneVsRestClassifier(RandomForestClassifier(n_estimators=100)))
    ])
pl.fit(X_train, y_train)
print('')

Model evaluation:

The following metrics were evaluated:

* proportion of samples with at least 90% correctly predicted labels
* true positive rate (heatmap value 2, corresponding to dark blue)
* true negative rate (heatmap value 0, corresponding to grey)
* false positive rate (heatmap value -1, corresponding to red)
* false negative rate (heatmap value 1, corresponding to light blue)

In [None]:
# predicted probabilities for all labels
y_pred = pl.predict_proba(X_test)
y_pred_df = pd.DataFrame(y_pred, columns=freq_df_bin.columns, index=y_test.index)
y_pred_df.head()
y_pred_df.sum()

# binary version
cutoff = 0.3
y_pred_bin = copy.deepcopy(y_pred_df)
y_pred_bin[y_pred_bin>=cutoff]=1
y_pred_bin[y_pred_bin<cutoff]=0
# compare y_test and y_pred_bin
sign = np.sign(y_test - y_pred_bin.values)
sign[sign==0] = 1
y_diff = (y_test + y_pred_bin.values)*sign


In [None]:
plt.figure(figsize=(16,16))
sns.heatmap(y_diff, cmap=sns.diverging_palette(20,240, n=75), center=0)
print("True positive: label predicted and present in test set labels")
print("True negative: label not predicted and not present in test set labels")
print("False positive: label predicted but not present in test set")
print("False negative: label not predicted but present in test set labels")

In [None]:
# quantify amount of true pos/neg (red), false pos (black), false neg (white)
flat = np.array(y_diff).flatten()
true_pos = np.mean(flat==2)
true_neg = np.mean(flat==0)
false_pos = np.mean(flat==-1)
false_neg = np.mean(flat==1)
samples_90_correct = np.mean(y_diff.apply(lambda row: np.mean(row==0), axis=1) >= 0.9)

print("Fraction 90% correct: " + str(samples_90_correct))
print("True positives: " + str(true_pos))
print("True negatives: " + str(true_neg))
print("False positives: " + str(false_pos))
print("False negatives: " + str(false_neg))

### 5. Conclusions

In order to tackle item (2) and (3) of the problem statement we designed a recommender system and a keyword-assignment model. 

#### Part 1: A keyword-based recommender system
* Although **the external list of keywords** we used does not fully correspond to all provided vacancies of the City of Los Angeles, our analysis has shown that it is indeed a **good starting point**, where high ranked keywords generally occur more frequently in the vacancies ;
* For each vacancy we provided a list of similar vacancies based on the discription of the duties. **The model effectively recognises similar vacancies**, although the fit is obviously not perfect due to the limited amount of available vacancies (some are simply not related to any of the other vacancies) and due to the fact that the external list of frequently occurring keywords is not fully representative for the available job vacancies of the City of Los Angeles ;
* Finally, details of the clustering analysis show **clear clusters of similar vacancies** based on keywords.

#### Part 2: Automated keyword assignment
* Predicted keywords can **facilitate job searching for candidates**
* Frequently occurring keywords can be **reasonably predicted** according to the model evaluation metrics
* Availability of the complete list of vacancies will result in a **larger training set and better model**


### 6. Recommendations

* During our analysis we observed that the overall language of the vacancies is quite good. We suggest to focus time and means on improving the readability and searchability by adding keywords to each vacancy.
* The external list of keywords we used should be updated with additional keywords which are related to the job vacancies of the City of Los Angeles. As a good starting point we provided in Part 1 a list with the number of keywords currently matching each vacancy: vacancies with no or a low number of matching keywords can be dealt with first. The model we developed in Part 2 can be used to initialize the first few keywords to reduce the manual work.


In [None]:
df.to_csv('Keyword_approach_output.csv')