## General setup

To make sure things are working and `hepdata_lib` is available, run the following command:

In [1]:
import hepdata_lib

ModuleNotFoundError: No module named 'hepdata_lib'

## Creating your HEPData submission

The `Submission` object represents the whole HEPData entry and thus carries the top-level meta data that is equally valid for all the tables and variables you may want to enter. The object is also used to create the physical submission files you will upload to the HEPData web interface.

When using `hepdata_lib` to make an entry, you always need to create a `Submission` object. Let's do that now, and then add data to it step by step:

In [None]:
from hepdata_lib import Submission
submission = Submission()

In general, a `Submission` should contain details on the actual analysis such as it's abstract as well as links to the actual publication. The abstract should be in a plain text file. For `inspire` there's a special `record_id`, while for links to `arXiv` etc. one should use plain hyperlinks.

In [None]:
submission.read_abstract("abstract.txt")
#submission.add_link("Webpage with all figures and tables", "https://cms-results.web.cern.ch/cms-results/public-results/publications/B2G-16-029/")
#submission.add_link("arXiv", "http://arxiv.org/abs/arXiv:1802.09407")
#submission.add_record_id(1657397, "inspire")

Adding CalcHEP model and LHE headers


## Adding a table/figure

In HEPData, figures and table will both be `Table` objects. The example here shows reading a plain text file containing the signal effiency times acceptance as a function of resonance mass for different signal models. The file has been uploaded to the `example_files` directory. For your submission, create a new directory, e.g. using the analysis identifier.

Let's have a look at the file:

In [None]:
!head example_inputs/effacc_signal.txt

head: example_inputs/effacc_signal.txt: No such file or directory


The first column is the mass value, the other columns contain the efficiency times acceptance values.

Let's create the table/figure. First, we need to give it a name, which is usually just the identifier in the paper, here "Figure 1". The table also needs a description, which is usually the caption. You also need to describe the location, i.e. where to find it in the publication:

In [None]:
from hepdata_lib import Table
table = Table("Additional Figure 1")
table.description = "Signal selection efficiency times acceptance as a function of resonance mass for a spin-2 bulk graviton decaying to WW and a spin-1 W' decaying to WZ."
table.location = "Data from additional Figure 1"

ModuleNotFoundError: No module named 'hepdata_lib'

Now we need to provide more information on what is actually shown, which is done via `keywords`. The ones that are available can be taken from the documentation:
- [Observables](https://hepdata-submission.readthedocs.io/en/latest/keywords/observables.html)
- [Phrases](https://hepdata-submission.readthedocs.io/en/latest/keywords/phrases.html)
- [Particles](https://hepdata-submission.readthedocs.io/en/latest/keywords/partlist.html)

In [None]:
table.keywords["observables"] = ["ACC", "EFF"]
table.keywords["reactions"] = ["P P --> GRAVITON --> W+ W-", "P P --> WPRIME --> W+/W- Z0"]

Let's read in the file. For this purpose, `numpy` is very handy. Since the first two rows are the header, we skip them:

In [None]:
import numpy as np
data = np.loadtxt("example_inputs/effacc_signal.txt", skiprows=2)

`numpy` stores the content as arrays. You can actually see that the entry that was labelled as `NaN` is correctly read in:

In [None]:
from __future__ import print_function
print(data)

[[1.00000000e+03 4.65100000e-01 4.51360000e-01]
 [1.20000000e+03 5.03360000e-01 5.10900000e-01]
 [1.40000000e+03 5.12600000e-01 5.40160000e-01]
 [1.60000000e+03 5.24740000e-01 5.51300000e-01]
 [1.80000000e+03 5.31000000e-01 5.67240000e-01]
 [2.00000000e+03 5.39100000e-01 5.72800000e-01]
 [2.50000000e+03 5.49430894e-01 5.85602410e-01]
 [3.00000000e+03 5.53780000e-01 5.89520000e-01]
 [3.50000000e+03 5.62160000e-01 6.03240000e-01]
 [4.00000000e+03 5.64538153e-01            nan]
 [4.50000000e+03 5.66820000e-01 5.99780000e-01]]


We will now use this for our `Variable` definitions. The x-axis is usually the independent variable (`is_independent=True`), whereas the other ones are dependent (i.e. a function of the former). You also need to declare whether the variable is binned or not as well as the units. Similar as for the `keywords` used above, it is again important to provide additional information that can be found via the HEPData web interface using the observables and particles linked above. The values assigned are just slices of the `data` array:

In [None]:
from hepdata_lib import Variable
d = Variable("Resonance mass", is_independent=True, is_binned=False, units="GeV")
d.values = data[:,0]

BulkG = Variable("Efficiency times acceptance", is_independent=False, is_binned=False, units="")
BulkG.values = data[:,1]
BulkG.add_qualifier("Efficiency times acceptance", "Bulk graviton --> WW")
BulkG.add_qualifier("SQRT(S)", 13, "TeV")

Wprime = Variable("Efficiency times acceptance", is_independent=False, is_binned=False, units="")
Wprime.values = data[:,2]
Wprime.add_qualifier("Efficiency times acceptance", "Wprime --> WZ")
Wprime.add_qualifier("SQRT(S)", 13, "TeV")

table.add_variable(d)
table.add_variable(BulkG)
table.add_variable(Wprime)

In case of a plot, you should also add the original figure itself. `hepdata_lib` will take care of creating the thumbnail as well. Just add the figure as below.

*WARNING*: This needs `ImageMagick` to be installed (this is the case when running on Binder and SWAN with LCG_94 or later). Executing the following line will fail if it is missing. In this case, comment out this line and restart from the top.

In [None]:
table.add_image("example_inputs/signalEffVsMass.pdf")

This is all that's needed for the table/figure. We still need it to the submission:

In [None]:
submission.add_table(table)

Once you've added all tables/figures and the general submission details, you should add a few more keywords to all tables for better identification and searchability, e.g. the centre-of-mass energy:

In [None]:
for table in submission.tables:
    table.keywords["cmenergies"] = [13000]

Now it's time to create the submission for the upload. Here, we choose `example_output` as output directory:

In [None]:
outdir = "example_output"
submission.create_files(outdir,remove_old=True)

In the working directory, you will now find a `submission.tar.gz` file, which you can use for uploading to your HEPData sandbox:

In [None]:
!ls submission.tar.gz

submission.tar.gz


And the `example_output` directory will contain the generated `yaml` files:

In [None]:
!ls example_output

additional_figure_1.yaml  signalEffVsMass.png       submission.yaml           thumb_signalEffVsMass.png
