# Predicting Job Performance Using Text Data From Resumes. 

#### The purpose of this project, is to see whether or not it is possible to predict job performance using the text in resumes. 

The larger goal is to minimize unconscious biases that plague recruitment and employee selection. In tech, this is especially pronounced given the dearth of females and non-white managers in tech. 

This project is an extension of my dissertation work. That work used the same data but analyzed word types/categories using the  [Linguistic Inquiry and Word Count tool](http://liwc.wpengine.com/how-it-works/ "How it Works").

One challenge with using LIWC to analyze and parse resumes text, is that word categories tend to be used relatively infrequently in text, resulting a [Zipfian distribution](http://nlp.stanford.edu/IR-book/html/htmledition/zipfs-law-modeling-the-distribution-of-terms-1.html). This is even more pronounced when pre-sorting words into categories as LIWC does. 

This project aims to address the limitation imposed by LIWC by applying TF-IDF with trigrams. (note: trigrams were chosen based on prior text analytic work by the author)

## Preparing the Text Data for Pre-Processing

#### Note: We need to make sure we start in the proper directory, so make sure this notebook is in the "resumes" directory. 


In [1]:
pwd

u'C:\\ds_sandbox\\project2\\resumes'

In [1]:
'''
Import all the packages we will need to work with resume files, text, and survey data. Note that we will need to manually
install some packages in Anaconda (further instrutions are below). 
'''

import pandas as pd
import numpy as np
import scipy as sp
import os
import hashlib
#import the packages we need to convert PDFs to text
#install the packages first :) conda install -c https://conda.anaconda.org/pejo pdfminer
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO
import fnmatch, os, pythoncom, sys, win32com.client
from sklearn.cross_validation import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
#fyi need to specifically install textblob via anaconda: 
#conda install -c https://conda.anaconda.org/sursma textblob
#don't use the most recent version due to conflicts
from textblob import TextBlob, Word
import nltk as nltk
from nltk.stem.snowball import SnowballStemmer
%matplotlib inline 

In [None]:
#build and test a loop that will iterate through a directory and print out file names
for root, dirs, files in os.walk(".", topdown=True):
    for name in files:
        print(os.path.join(root, name))
    for name in dirs:
        print(os.path.join(root, name))

In [None]:
pwd

In [None]:
#let's check to see how many .doc, .docx, and .pdf files we have 

count = 0
for (dirname, dirs, files) in os.walk(".", topdown=True):
   for filename in files:
       if filename.endswith('.pdf') :
           count = count + 1
print 'Files:', count

'''
Number of .doc texts = 290 
Number of .docx texts = 483
Number of .pdf texts = 80
''' 

#### Note!: in order to upload the resumes on github, I had to convert them all to .txt files.

In [None]:
'''
convert all .doc files into .txt files
reference: https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch02s28.html
1 file deleted due to security issues with the word doc
1 file deleted due to it not being english
'''
wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")
try:
    for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.doc'): continue
            doc = os.path.abspath(os.path.join(path, filename))
            print "processing %s" % doc
            wordapp.Documents.Open(doc)
            docastxt = doc[:-3] + 'txt'
            wordapp.ActiveDocument.SaveAs(docastxt,
                FileFormat=win32com.client.constants.wdFormatText)
            wordapp.ActiveDocument.Close( )
finally:
    # ensure Word is properly shut down even if we get an exception
    wordapp.Quit( )

In [None]:

'''
convert all .docx files into .txt files, change the -3 to -4 so file extension works
#10 files had to be deleted because it was corrupted
'''
wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")
try:
    for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.docx'): continue
            doc = os.path.abspath(os.path.join(path, filename))
            print "processing %s" % doc
            wordapp.Documents.Open(doc)
            docastxt = doc[:-4] + 'txt'
            wordapp.ActiveDocument.SaveAs(docastxt,
                FileFormat=win32com.client.constants.wdFormatText)
            wordapp.ActiveDocument.Close( )
finally:
    # ensure Word is properly shut down even if we get an exception
    wordapp.Quit( )

In [None]:
'''
remove any files that end in .doc or .docx since we created duplicate .txt
files 
'''
for root, dirs, files in os.walk(".", topdown=True):
    for currentFile in files:
        print "processing file: " + currentFile
        exts = ('.doc', '.docx')
        if any(currentFile.lower().endswith(ext) for ext in exts):
            os.remove(os.path.join(root, currentFile))   

Now we need to convert the 80 pdf files into text. 
PDF files are notoriously difficult to work with, fortuantely since we only
need the text we don't need to spend hours figuring out which pieces of the
pdf we need
reference: http://stackoverflow.com/questions/5725278/python-help-using-pdfminer-as-a-library
reference: http://davidmburke.com/2014/02/04/python-convert-documents-doc-docx-odt-pdf-to-plain-text-without-libreoffice
- run this in terminal: conda install -c https://conda.anaconda.org/pejo pdfminer
- original module github: https://github.com/euske/pdfminer

In [None]:
#define a function to convert a pdf file into text
def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()
    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)
    fp.close()
    device.close()
    str = retstr.getvalue()
    retstr.close()
    return str

In [None]:
'''
let's start out slow and test the concept of what we want to do: find a pdf, convert it to text, and write out that file,
with the same file name (e.g. 1206185) as a text file with the same name (e.g. 120685.txt)
#http://stackoverflow.com/questions/1900956/write-variable-to-file-including-name
#http://stackoverflow.com/questions/1684194/saving-output-of-a-for-loop-to-file
'''

path = 'C:\\ds_sandbox\\project2\\testingfolder\\120685.pdf'
txt = convert_pdf_to_txt(path)
print(txt)

with open('test.txt', 'w') as f:
    f.write(txt)

In [None]:
#let's define the loop 
for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.pdf'): continue
            print "processing file: " + os.path.join(path, filename)            
            convert_pdf_to_txt(os.path.join(path, filename))
            

In [None]:
'''
now we integrate the prior 2 cells into a single loop to: iterate through all files and sub-directories in the "resume"
directory and convert each pdf into a text object, and then write that object to a new .txt file, with the same name as 
the original pdf file. 
'''

for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.pdf'): continue
            print "processing file: " + os.path.join(path, filename)            
            doc = convert_pdf_to_txt(os.path.join(path, filename))
            with open(os.path.join(path, filename.replace('.pdf', '.txt')), 'w') as f: 
                f.write(doc)

In [None]:
'''
remove any files that end in .doc or .docx since we created duplicate .txt
files 
'''

for root, dirs, files in os.walk(".", topdown=True):
    for currentFile in files:
        print "processing file: " + currentFile
        exts = ('.pdf')
        if any(currentFile.lower().endswith(ext) for ext in exts):
            os.remove(os.path.join(root, currentFile))  

#### We have converted all .doc, .docx, and .pdf files into .txt files. W00t!
#### Next, we need to convert all these .txt files, about 1,007 of them into a data frame that has an ID column which is the name of the file e.g. 120685 and a second column that contains all the text in that file

In [2]:
text = []
for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.txt'): continue
            print "processing file: " + os.path.join(path, filename)            
            with open (os.path.join(path, filename), "r") as f:
                text.append(f.read())
                
df = pd.DataFrame(text)

processing file: .\01.07.15 Resumes\45416565.txt
processing file: .\01.07.15 Resumes\45416848.txt
processing file: .\01.07.15 Resumes\45417076.txt
processing file: .\01.07.15 Resumes\45417498.txt
processing file: .\01.07.15 Resumes\45417994.txt
processing file: .\01.07.15 Resumes\45418178.txt
processing file: .\01.07.15 Resumes\45418240.txt
processing file: .\01.07.15 Resumes\45418879.txt
processing file: .\01.07.15 Resumes\45420257.txt
processing file: .\01.07.15 Resumes\45420733.txt
processing file: .\01.07.15 Resumes\45422051.txt
processing file: .\01.07.15 Resumes\45424160.txt
processing file: .\01.07.15 Resumes\45424611.txt
processing file: .\01.07.15 Resumes\45425243.txt
processing file: .\01.07.15 Resumes\45425422.txt
processing file: .\01.07.15 Resumes\45425696.txt
processing file: .\01.07.15 Resumes\45426273.txt
processing file: .\01.07.15 Resumes\45426607.txt
processing file: .\01.07.15 Resumes\45427103.txt
processing file: .\01.07.15 Resumes\45427104.txt
processing file: .\0

In [3]:
df.head(1)

Unnamed: 0,0
0,"\n\nProfessional Experience\nACS, Beattyville,..."


Create a list that has only the document names, remove extension e.g. ".txt"

In [4]:
rowid= []

for path, dirs, files in os.walk(".", topdown=True):
        for filename in files:
            if not fnmatch.fnmatch(filename, '*.txt'): continue
            print "processing file: " + os.path.join(path, filename)
            rowid.append(os.path.join(filename[:-4]))

processing file: .\01.07.15 Resumes\45416565.txt
processing file: .\01.07.15 Resumes\45416848.txt
processing file: .\01.07.15 Resumes\45417076.txt
processing file: .\01.07.15 Resumes\45417498.txt
processing file: .\01.07.15 Resumes\45417994.txt
processing file: .\01.07.15 Resumes\45418178.txt
processing file: .\01.07.15 Resumes\45418240.txt
processing file: .\01.07.15 Resumes\45418879.txt
processing file: .\01.07.15 Resumes\45420257.txt
processing file: .\01.07.15 Resumes\45420733.txt
processing file: .\01.07.15 Resumes\45422051.txt
processing file: .\01.07.15 Resumes\45424160.txt
processing file: .\01.07.15 Resumes\45424611.txt
processing file: .\01.07.15 Resumes\45425243.txt
processing file: .\01.07.15 Resumes\45425422.txt
processing file: .\01.07.15 Resumes\45425696.txt
processing file: .\01.07.15 Resumes\45426273.txt
processing file: .\01.07.15 Resumes\45426607.txt
processing file: .\01.07.15 Resumes\45427103.txt
processing file: .\01.07.15 Resumes\45427104.txt
processing file: .\0

In [5]:
print(rowid)

['45416565', '45416848', '45417076', '45417498', '45417994', '45418178', '45418240', '45418879', '45420257', '45420733', '45422051', '45424160', '45424611', '45425243', '45425422', '45425696', '45426273', '45426607', '45427103', '45427104', '45427271', '45427390', '45427407', '45427969', '45429229', '45431061', '4544536', '45445364-', '45445519', '45445522', '45445808', '45445987', '45447024', '45448558', '45448580', '45448667', '45449404', '45453795', '45455176', '45455423', '45461181', '45461248', 'Personal Attributes', '41194514', '41194519', '41194604', '41194663', '41194668', '41194701', '41194703', '41194755', '41194759', '41194805', '41194866', '41194876', '41194912', '41194945', '41194960', '41194974', '41195002', '41195044', '41195078', '41195097', '41195155', '41195168', '41195171', '41195194', '41195223', '41195247', '41195253', '41195304', '41195347', '41195365', '41195479', '41195483', '41195497', '41195504', '41195558', '41195641', '41195663', '41195754', '41195923', '411

In [6]:
rowid

['45416565',
 '45416848',
 '45417076',
 '45417498',
 '45417994',
 '45418178',
 '45418240',
 '45418879',
 '45420257',
 '45420733',
 '45422051',
 '45424160',
 '45424611',
 '45425243',
 '45425422',
 '45425696',
 '45426273',
 '45426607',
 '45427103',
 '45427104',
 '45427271',
 '45427390',
 '45427407',
 '45427969',
 '45429229',
 '45431061',
 '4544536',
 '45445364-',
 '45445519',
 '45445522',
 '45445808',
 '45445987',
 '45447024',
 '45448558',
 '45448580',
 '45448667',
 '45449404',
 '45453795',
 '45455176',
 '45455423',
 '45461181',
 '45461248',
 'Personal Attributes',
 '41194514',
 '41194519',
 '41194604',
 '41194663',
 '41194668',
 '41194701',
 '41194703',
 '41194755',
 '41194759',
 '41194805',
 '41194866',
 '41194876',
 '41194912',
 '41194945',
 '41194960',
 '41194974',
 '41195002',
 '41195044',
 '41195078',
 '41195097',
 '41195155',
 '41195168',
 '41195171',
 '41195194',
 '41195223',
 '41195247',
 '41195253',
 '41195304',
 '41195347',
 '41195365',
 '41195479',
 '41195483',
 '41195497',
 

#### Convert the list of file names into a dataframe with a column called "rowid" and  the filenames as the row ids

In [7]:
dfrowid = pd.DataFrame({'ID': rowid})

In [10]:
frames = [dfrowid, df]

In [11]:
id_resume = pd.concat(frames, axis=1)

In [34]:
id_resume.head(5)

Unnamed: 0,0,ID
0,"\n\nProfessional Experience\nACS, Beattyville,...",45416565
1,Name\nAddress\nPhone number; e-mail address\nO...,45416848
2,John A. Smith\r\r\rContact\rTel : 716-555-5555...,45417076
3,\n___\n\n\nEXPERIENCE\nDirector of Business Op...,45417498
4,\rPersonal Experience\r\rDedicated computer in...,45417994


In [12]:
#change column names 
id_resume.columns = ['ID', 'resume_text']
id_resume.columns

Index([u'ID', u'resume_text'], dtype='object')

In [13]:
#check data types, ID should be an integer
id_resume.dtypes

#convert ID to integer 
resume = id_resume.convert_objects(convert_numeric=True)

#check to make sure the type converted to a numeric type, in this case a float
resume.dtypes



ID             float64
resume_text     object
dtype: object

In [14]:
#convert to clean text to remove unicode characters 
def clean_text(row):
    # return the list of decoded cell in the Series instead 
    return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
resume['resume_text'] = df.apply(clean_text)

#check that unicode characters have been removed
resume

Unnamed: 0,ID,resume_text
0,45416565.0,"\n\nProfessional Experience\nACS, Beattyville,..."
1,45416848.0,Name\nAddress\nPhone number; e-mail address\nO...
2,45417076.0,John A. Smith\r\r\rContact\rTel : 716-555-5555...
3,45417498.0,\n___\n\n\nEXPERIENCE\nDirector of Business Op...
4,45417994.0,\rPersonal Experience\r\rDedicated computer in...
5,45418178.0,* Involved with all phases of design and const...
6,45418240.0,\n
7,45418879.0,"SAURABH TYAGI\nH.No 192, Village Karnera, Ball..."
8,45420257.0,Summary\n_____________________________________...
9,45420733.0,RESUME\n\nEducational qualification: 11th pass...


#### read in the csv of the survey data
note: I did take a shortcut with this. I leveraged the work I had already done in SPSS for another project for this data. The file that created the bulk of the survey data and the raw survey data are uploaded on github.

In [16]:
#read in csv of survey data 
survey = pd.read_csv('C:/ds_sandbox/project2/data/survey_data_16May16a.csv', sep=',')

#check shape 
survey.shape

#notice that the data type for ID is an numeric type, this means we can join the survey data
survey.dtypes

#check that ID column is in the survey dataframe 
"ID" in survey

True

In [17]:
#join the survey data to the text data
text = pd.merge(resume, survey, how='inner', left_on='ID', right_on='ID')

text.shape
text.columns
type(text.resume_text)

pandas.core.series.Series

The join worked. This ties out my dissertation work within +/- 3 rows. 
Note for this analysis we aren't controlling for gender, although from prior work we know that
individuals identifying as females tend to report higher job performance behaviors. 
We are taking a purely text analytic approach. Also I didn't have time to figure out how to
control for gender in Python :D

Now we need to create our X and y 

In [21]:
#define X and y. We will use cross-validation here so no need to split into test-train-split
X = text.resume_text

y = text.task_performance_dichotomous3

#### note: 
I used the task_performance_dichotomous3. I played around with different cut points for dichotomizing the outcome in a manner that didn't result in an extremely unbalanced data set. 


In [22]:
#create 3 different vectorizors using english stop words, and ngrams of 1, 2, and 3
vectTFidf1 = TfidfVectorizer(analyzer='word', lowercase=True, min_df=3, 
                             stop_words='english',max_features=5000, ngram_range=(1, 1))

vectTFidf2 = TfidfVectorizer(analyzer='word', lowercase=True, min_df=3, 
                             stop_words='english',max_features=1000, ngram_range=(2, 2))

vectTFidf3 = TfidfVectorizer(analyzer='word', lowercase=True, min_df=3, 
                             stop_words='english',max_features=1000, ngram_range=(3, 3))


In [23]:
#create 3 tf-idf dtms 
X_dtm1 = vectTFidf1.fit_transform(X)

X_dtm2 = vectTFidf2.fit_transform(X)

X_dtm3 = vectTFidf3.fit_transform(X)

In [24]:
from sklearn.svm import LinearSVC #nice
from sklearn import cross_validation
svm = LinearSVC(C=1, penalty='l2', loss='hinge')

In [29]:
#unigrams
scores = cross_validation.cross_val_score(svm, X_dtm1, y, scoring='recall', cv=10)
print(scores.mean())

0.824022108844


In [31]:
print(vectTFidf1.get_feature_names()[-50:])

[u'workers', u'workflow', u'workflows', u'workforce', u'working', u'workload', u'workloads', u'workplace', u'works', u'workshop', u'workshops', u'workstations', u'world', u'worldwide', u'worth', u'wpm', u'wright', u'write', u'writer', u'writers', u'writing', u'written', u'wrote', u'www', u'xml', u'xp', u'xslt', u'xuan', u'xx', u'xxx', u'xxxx', u'xxxxx', u'xxxxxx', u'xxxxxxx', u'xxxxxxxx', u'xxxxxxxxx', u'xxxxxxxxxx', u'xxxxxxxxxxxxx', u'yahoo', u'year', u'yearly', u'years', u'york', u'young', u'youth', u'youtube', u'zealand', u'zero', u'zone', u'zoology']


In [32]:
#bigrams
scores = cross_validation.cross_val_score(svm, X_dtm2, y, scoring='recall', cv=10)
print(scores.mean())

0.782568027211


In [33]:
print(vectTFidf2.get_feature_names()[-50:])

[u'web services', u'website design', u'weekly basis', u'weekly monthly', u'west virginia', u'whilst working', u'wide range', u'wide variety', u'windows 2000', u'windows 98', u'windows linux', u'windows mac', u'windows server', u'windows xp', u'word excel', u'word microsoft', u'word powerpoint', u'word processing', u'words minute', u'work closely', u'work environment', u'work ethic', u'work experience', u'work history', u'work independently', u'work placement', u'work pressure', u'work team', u'worked closely', u'worked team', u'working closely', u'working experience', u'working independently', u'working knowledge', u'working relationships', u'working small', u'working team', u'writing listening', u'written communication', u'written spoken', u'written verbal', u'xp vista', u'xxx xxx', u'yahoo com', u'year passing', u'year year', u'years experience', u'years months', u'york city', u'york ny']


In [34]:
#trigrams
scores = cross_validation.cross_val_score(svm, X_dtm3, y, scoring='recall', cv=10)
print(scores.mean())

0.840476190476


In [35]:
print(vectTFidf3.get_feature_names()[-50:])

[u'word powerpoint excel', u'word powerpoint outlook', u'work experience 01', u'work experience 2013', u'work experience 2014', u'work experience april', u'work experience company', u'work experience date', u'work experience dates', u'work experience july', u'work experience october', u'work experience placement', u'work experience september', u'work experience university', u'work experience worked', u'work fast paced', u'work independently team', u'work key skills', u'work team independently', u'work tight deadlines', u'worked small team', u'working high pressure', u'working knowledge microsoft', u'working multinational companies', u'working small team', u'working team members', u'working team people', u'working variety different', u'world wide web', u'writing listening reading', u'written communication skills', u'written oral communication', u'written spoken english', u'written verbal communication', u'www linkedin com', u'xp vista linux', u'xxx xxx xxx', u'xxx xxx xxxx', u'xxxx gmai

In [37]:
from sklearn.linear_model import LogisticRegression
log = LogisticRegression()

scores = cross_validation.cross_val_score(log, X_dtm3, y, scoring='recall', cv=10)
print(scores.mean())

0.883928571429


In [39]:
print(vectTFidf3.get_feature_names()[-50:])

[u'word powerpoint excel', u'word powerpoint outlook', u'work experience 01', u'work experience 2013', u'work experience 2014', u'work experience april', u'work experience company', u'work experience date', u'work experience dates', u'work experience july', u'work experience october', u'work experience placement', u'work experience september', u'work experience university', u'work experience worked', u'work fast paced', u'work independently team', u'work key skills', u'work team independently', u'work tight deadlines', u'worked small team', u'working high pressure', u'working knowledge microsoft', u'working multinational companies', u'working small team', u'working team members', u'working team people', u'working variety different', u'world wide web', u'writing listening reading', u'written communication skills', u'written oral communication', u'written spoken english', u'written verbal communication', u'www linkedin com', u'xp vista linux', u'xxx xxx xxx', u'xxx xxx xxxx', u'xxxx gmai

In [18]:
#import a new set of modules
#this was  shamelessly stolen from: http://scikit-learn.org/stable/_downloads/grid_search_text_feature_extraction.py

from __future__ import print_function

from pprint import pprint
from time import time
import logging

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline

# Display progress logs on stdout
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')

In [None]:
######################################################################
# define a pipeline combining a text feature extractor with a simple
# classifier
pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])

# uncommenting more parameters will give better exploring power but will
# increase processing time in a combinatorial way
parameters = {
    'vect__max_df': (0.5, 0.75, 1.0),
    'vect__max_features': (None, 5000, 10000, 50000),
    'vect__ngram_range': ((1, 1), (2, 2), (3, 3)),  # unigrams or bigrams
    'tfidf__use_idf': (True, False),
    'tfidf__norm': ('l1', 'l2'),
    'clf__alpha': (0.00001, 0.000001),
    'clf__penalty': ('l2', 'elasticnet'),
    'clf__n_iter': (10, 50, 80),
}

if __name__ == "__main__":
    # multiprocessing requires the fork to happen in a __main__ protected
    # block

    # find the best parameters for both the feature extraction and the
    # classifier
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1)

    print("Performing grid search...")
    print("pipeline:", [name for name, _ in pipeline.steps])
    print("parameters:")
    pprint(parameters)
    t0 = time()
    grid_search.fit(X, y)
    print("done in %0.3fs" % (time() - t0))
    print()

    print("Best score: %0.3f" % grid_search.best_score_)
    print("Best parameters set:")
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(parameters.keys()):
        print("\t%s: %r" % (param_name, best_parameters[param_name]))