# The Groceries List
---

This is a typical example where you cleary do not need a cluster system to compute anything, but it is useful in order to understand the mechanisms of the tool, to apply them to more complex examples. 

We are proposing here a notebook that will help to to make your groceries list. Yes, yes, your groceries list. 

![image.png](attachment:image.png)

In [1]:
# Importing modules from the pyHTC package

from pyHTC.Study import *
import pyHTC.toolkit as toolkit

## A very very VERY simple example
---

In this first example, things are gonna be extremely simple. Our character, let's call him Guido, loves pandas DataFrame so much that he uses them to make his groceries list. 

Guido eats only vegetables with meat, those are our two parameters. He plans to go for groceries next week on Monday, Wednesday and Saturday. Here's what he wants to buy: 

- Monday : carrots and beef
- Wednesday : peas and chicken
- Saturday : eggplants and lamb 

Let's store those values in a DF. 

In [2]:
# DF creation 

myGroceries_week1 = pd.DataFrame(index = ['Monday', 'Wednesday', 'Saturday'], columns=['Vegetables', 'Meat'])

myGroceries_week1.loc['Monday'] = ['Carrots', 'Beef']
myGroceries_week1.loc['Wednesday'] = ['Peas', 'Chicken']
myGroceries_week1.loc['Saturday'] = ['Eggplants', 'Lamb']

# and print it

myGroceries_week1

Unnamed: 0,Vegetables,Meat
Monday,Carrots,Beef
Wednesday,Peas,Chicken
Saturday,Eggplants,Lamb


In [3]:
# Let's therefore define a first study 

myName = 'week1'
myPath = '/afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries'
myExe = '/afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/groceries_list_generator.sh'
mySubFileName = 'mySubFile'

myStudy1 = StudyObj(myName, myPath, myExe, mySubFileName, arguments='$(input_file)', output_dir='output/', error_dir='error/',
                   log_dir = 'log/')


In [30]:
# Define the study

myStudyDF = myStudy1.define_study(myGroceries_week1)
myStudy1.parameters

Unnamed: 0,Vegetables,Meat
Monday,Carrots,Beef
Wednesday,Peas,Chicken
Saturday,Eggplants,Lamb


In [31]:
# Creation of the input

myTemplate = 'myTemplateList.py'
myMaskedParam = toolkit.getMaskedParameterList(myTemplate, tag='%MASKED_')
print(myMaskedParam)

['%MASKED_day' '%MASKED_meat' '%MASKED_vegetables']


In [32]:
for i in myGroceries_week1.index:
    myParam = {}
    myParam.update({
        '%MASKED_day' : i,
        '%MASKED_vegetables' : myGroceries_week1.loc[i]['Vegetables'],
        '%MASKED_meat' : myGroceries_week1.loc[i]['Meat']
    })
    myInputFile = f'input/{myStudy1.name}_{i}.in'
    toolkit.unmask(myTemplate, myMaskedParam, myParam, myInputFile)

In [33]:
# Creating the submission file corresponding to the STUDY 
# NB : MULTIPLE JOBS SUBMISSION

myStudy1.submit2file(myStudy1.submit2str())

In [38]:
# One can display the submission file

myStudy1.display_subfile()

executable = /afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/groceries_list_generator.sh
arguments = $(input_file)
output = output/week1.$(ClusterId).$(ProcId).out
error = error/week1.$(ClusterId).$(ProcId).err
log = log/week1.$(ClusterId).log
universe = vanilla
queue input_file matching files /afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/input/week1_*.in


In [39]:
# And...... SUBMISSION

myStudy1.submit2HTCondor()

Submitting job(s)...
3 job(s) submitted to cluster 3682722.



In [43]:
# Monitor

myStudy1.condor_q()



-- Schedd: bigbird16.cern.ch : <188.184.90.62:9618?... @ 07/31/19 11:13:29
OWNER  BATCH_NAME             SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
apoyet CMD: groceries_list   7/31 11:09      _      _      3      3 3682722.0-2

3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended



In [42]:
# completing the DF with the files paths

myStudy1.complete_studyDF()
myStudy1.parameters

Unnamed: 0,Vegetables,Meat,ProcID,Input,Output,Error,Log,Status
Monday,Carrots,Beef,0,input/week1_Monday.in,output/week1.3682722.0.out,error/week1.3682722.0.err,log/week1.3682722.log,Running
Wednesday,Peas,Chicken,1,input/week1_Wednesday.in,output/week1.3682722.1.out,error/week1.3682722.1.err,log/week1.3682722.log,Running
Saturday,Eggplants,Lamb,2,input/week1_Saturday.in,output/week1.3682722.2.out,error/week1.3682722.2.err,log/week1.3682722.log,Running


In [55]:
# check status (checks if the job failed or not)

myStudy1.check_jobs_status()
myStudy1.parameters

Unnamed: 0,Vegetables,Meat,ProcID,Input,Output,Error,Log,Status
Monday,Carrots,Beef,0,input/week1_Monday.in,output/week1.3682722.0.out,error/week1.3682722.0.err,log/week1.3682722.log,Complete
Wednesday,Peas,Chicken,1,input/week1_Wednesday.in,output/week1.3682722.1.out,error/week1.3682722.1.err,log/week1.3682722.log,Complete
Saturday,Eggplants,Lamb,2,input/week1_Saturday.in,output/week1.3682722.2.out,error/week1.3682722.2.err,log/week1.3682722.log,Complete


## Another week...

The next week, Guido was planning to do the groceries the same days... But on Tuesday and Sunday, he has guests! This implies two news groceries list..

In [45]:
# DF creation 

myGroceries_week2 = pd.DataFrame(index = ['Monday', 'Tuesday', 'Wednesday', 'Saturday', 'Sunday'], columns=['Vegetables', 'Meat'])

myGroceries_week2.loc['Monday'] = ['Potatos', 'Beef']
myGroceries_week2.loc['Tuesday'] = ['Carrots', 'Beef']
myGroceries_week2.loc['Wednesday'] = ['Peas', 'Chicken']
myGroceries_week2.loc['Saturday'] = ['Eggplants', 'Lamb']
myGroceries_week2.loc['Sunday'] = ['Zucchini', 'Duck']

# and print it

myGroceries_week2


Unnamed: 0,Vegetables,Meat
Monday,Potatos,Beef
Tuesday,Carrots,Beef
Wednesday,Peas,Chicken
Saturday,Eggplants,Lamb
Sunday,Zucchini,Duck


But obviously, some of the groceries are the same as the previous week... It is therefore not necessary to redo all the lists... Thank God, this tool is smart enough to avoid that!

In [46]:
# Filter the jobs to be submitted

newDF, oldDF = toolkit.cross_studies(myGroceries_week2,myGroceries_week1)
newDF

Unnamed: 0,Vegetables,Meat
Monday,Potatos,Beef
Sunday,Zucchini,Duck


In [47]:
# The 'removed' jobs are stored also, with the index of the previous study for convenience
oldDF

Unnamed: 0,Vegetables,Meat,Old_Index
Tuesday,Carrots,Beef,Monday
Wednesday,Peas,Chicken,Wednesday
Saturday,Eggplants,Lamb,Saturday


In [48]:
# Let's define our second study

myName = 'week2'
myPath = '/afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries'
myExe = '/afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/groceries_list_generator.sh'
mySubFileName = 'mySubFile'

myStudy2 = StudyObj(myName, myPath, myExe, mySubFileName, arguments='$(input_file)', output_dir='output/', error_dir='error/',
                   log_dir = 'log/')


In [49]:
# Define the study

myStudyDF = myStudy2.define_study(newDF)
myStudy2.parameters

Unnamed: 0,Vegetables,Meat
Monday,Potatos,Beef
Sunday,Zucchini,Duck


In [50]:
# Creation of the input

myTemplate = 'myTemplateList.py'
myMaskedParam = toolkit.getMaskedParameterList(myTemplate, tag='%MASKED_')
print(myMaskedParam)

for i in newDF.index:
    myParam = {}
    myParam.update({
        '%MASKED_day' : i,
        '%MASKED_vegetables' : newDF.loc[i]['Vegetables'],
        '%MASKED_meat' : newDF.loc[i]['Meat']
    })
    myInputFile = f'input/{myStudy2.name}_{i}.in'
    toolkit.unmask(myTemplate, myMaskedParam, myParam, myInputFile)

['%MASKED_day' '%MASKED_meat' '%MASKED_vegetables']


In [51]:
# Creating the submission file corresponding to the STUDY 
# NB : MULTIPLE JOBS SUBMISSION

myStudy2.submit2file(myStudy2.submit2str())

In [52]:
# One can display the submission file

myStudy2.display_subfile()

executable = /afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/groceries_list_generator.sh
arguments = $(input_file)
output = output/week2.$(ClusterId).$(ProcId).out
error = error/week2.$(ClusterId).$(ProcId).err
log = log/week2.$(ClusterId).log
universe = vanilla
queue input_file matching files /afs/cern.ch/user/a/apoyet/public/pyHTC/lets_go_groceries/input/week2_*.in


In [53]:
# And...... SUBMISSION

myStudy2.submit2HTCondor()

Submitting job(s)..
2 job(s) submitted to cluster 3682723.



In [59]:
# Monitor

myStudy2.condor_q()



-- Schedd: bigbird16.cern.ch : <188.184.90.62:9618?... @ 07/31/19 11:32:31
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended



In [56]:
# completing the DF with the files paths

myStudy2.complete_studyDF()
myStudy2.parameters

Unnamed: 0,Vegetables,Meat,ProcID,Input,Output,Error,Log,Status
Monday,Potatos,Beef,0,input/week2_Monday.in,output/week2.3682723.0.out,error/week2.3682723.0.err,log/week2.3682723.log,Running
Sunday,Zucchini,Duck,1,input/week2_Sunday.in,output/week2.3682723.1.out,error/week2.3682723.1.err,log/week2.3682723.log,Running


In [57]:
# retrive all the results

myStudy2.retrieve_results(myStudy1, oldDF)
myStudy2.parameters

Unnamed: 0,Vegetables,Meat,ProcID,Input,Output,Error,Log,Status
Monday,Potatos,Beef,0,input/week2_Monday.in,output/week2.3682723.0.out,error/week2.3682723.0.err,log/week2.3682723.log,Running
Sunday,Zucchini,Duck,1,input/week2_Sunday.in,output/week2.3682723.1.out,error/week2.3682723.1.err,log/week2.3682723.log,Running
Tuesday,Carrots,Beef,0,input/week1_Monday.in,output/week1.3682722.0.out,error/week1.3682722.0.err,log/week1.3682722.log,Complete
Wednesday,Peas,Chicken,1,input/week1_Wednesday.in,output/week1.3682722.1.out,error/week1.3682722.1.err,log/week1.3682722.log,Complete
Saturday,Eggplants,Lamb,2,input/week1_Saturday.in,output/week1.3682722.2.out,error/week1.3682722.2.err,log/week1.3682722.log,Complete


In [25]:
# check status (checks if the job failed or not)

myStudy2.check_jobs_status()
myStudy2.parameters

Unnamed: 0,Vegetables,Meat,ProcID,Input,Output,Error,Log,Status
Monday,Potatos,Beef,0,input/week2_Monday.in,output/week2.3682710.0.out,error/week2.3682710.0.err,log/week2.3682710.log,Complete
Sunday,Zucchini,Duck,1,input/week2_Sunday.in,output/week2.3682710.1.out,error/week2.3682710.1.err,log/week2.3682710.log,Complete
Tuesday,Carrots,Beef,0,input/week1_Monday.in,output/week1.3682709.0.out,error/week1.3682709.0.err,log/week1.3682709.log,Complete
Wednesday,Peas,Chicken,1,input/week1_Wednesday.in,output/week1.3682709.1.out,error/week1.3682709.1.err,log/week1.3682709.log,Complete
Saturday,Eggplants,Lamb,2,input/week1_Saturday.in,output/week1.3682709.2.out,error/week1.3682709.2.err,log/week1.3682709.log,Complete
