Skip to content

Commit

Permalink
Merge a9a86fd into 9aa269a
Browse files Browse the repository at this point in the history
  • Loading branch information
Burak Han Alver committed Oct 4, 2016
2 parents 9aa269a + a9a86fd commit 0617266
Showing 1 changed file with 31 additions and 60 deletions.
91 changes: 31 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,21 @@
[![Code Quality](https://api.codacy.com/project/badge/Grade/a4d521b4dd9c49058304606714528538)](https://www.codacy.com/app/jeremy_7/Submit4DN)
[![PyPI version](https://badge.fury.io/py/Submit4DN.svg)](https://badge.fury.io/py/Submit4DN)

##Installing the package
The Submit4DN package is written by the [4DN Data Coordination and Integration Center](http://dcic.4dnucleome.org/) for data submitters from the 4DN Network. Please [contact us](mailto:4DN.DCIC.support@hms-dbmi.atlassian.net) to get access to the system, or if you have any questions or suggestions.

## Installing the package

The Submit4DN package is registered with Pypi so installation is as simple as:

```
pip install submit4dn
pip3 install submit4dn
```

Once installed then follow the directions below:


##Connection
first thing you need is the keyfile to access the REST application
it is a json formatted file that contains key,secret and server
under one identifier. Here is the default structure. The default path
is /Users/user/keypairs.json
## Connection
To be able to use the provided tools, you need to have a secure key to access the REST application.
If you do not have a secure key, please contact [4DN Data Wranglers](mailto:4DN.DCIC.support@hms-dbmi.atlassian.net)
to get an account and to learn how to generate a key. Place your key in a json file in the following format.

{
"default": {
Expand All @@ -29,12 +28,15 @@ is /Users/user/keypairs.json
"server":"www.The4dnWebsite.com"
}
}
if file name is different and the key is not named default add it to the code:

python3 code.py --keyfile nameoffile.json --key NotDefault
The default path for your keyfile is `/Users/<user>/keypairs.json`.
If you prefer to use a different file location or a different key name (not "default"), you can specify your key with the `keyfile` and `key` parameters:

## Generate fields.xls
To create an xls file with sheets to be filled use the example and modify to your needs. It will accept the following parameters.
import_data --keyfile nameoffile.json --key NotDefault

## Generating data submission forms
To create the data submission xls forms, you can use the `wranglertools.get_field_info` method.
It will accept the following parameters:

--type use for each sheet that you want to add to the excel workbook
--descriptions adds the descriptions in the second line (by default True)
Expand All @@ -44,68 +46,37 @@ To create an xls file with sheets to be filled use the example and modify to you
--outfile change the default file name "fields.xls" to a specified one
--order create an ordered and filtered version of the excel (by default True)

*Full list*
~~~~
python3 -m wranglertools.get_field_info --type Publication --type Document --type Vendor --type Protocol --type BiosampleCellCulture --type Biosource --type Enzyme --type Construct --type TreatmentChemical --type TreatmentRnai --type Modification --type Biosample --type FileFastq --type FileSet --type IndividualHuman --type IndividualMouse --type ExperimentHiC --type ExperimentCaptureC --type Target --type GenomicRegion --type ExperimentSet --type Image --comments --outfile AllItems.xls

~~~~
*To get a single sheet use*
Examples generating a single sheet:
```
python3 get_field_info.py --type Biosample
python3 get_field_info.py --type Biosample --comments
python3 get_field_info.py --type Biosample --comments --outfile biosample.xls
get_field_info --type Biosample
get_field_info --type Biosample --comments
get_field_info --type Biosample --comments --outfile biosample.xls
```

## Specifications for fields.xls
In fields.xls, each excel sheet is named after an object type, like ExperimentHiC, Biosample, Construct, Protocol...

*Each sheet has at least 4 rows that begin with a #*
1) Field name
2) Field type
3) Field description
4) Choices for controlled vocabulary (some fields only accept a value from a list of selection, like experiment type). NOTE if you add the optional --comments argument the comments will also be included in this row.

The first entry will start from row 4, and column 2.

Each field can be a certain type; string, number/integer, list. If the type is integer, number or array, it will be indicated with the fields name; field:number, fields:int, field:array. If the field is a string, you will only see the field name.
If the field is an array (field:list), you may enter a single item, or multiple items separated by comma.

field:array
item1,item2,item2,item4

Some objects containing fields that are grouped together, called embedded sub-objects. For example the "experiment_relations" has 2 fields called "relationship_type", and "experiment". In the field names you will see
* experiment_relations.relationship_type
* experiment_relations.experiment

If the embedded sub-object is a list, you can increase the number of items by creating new columns and appending numbers to the fields names
* experiment_relations.relationship_type-1
* experiment_relations.experiment-1
* experiment_relations.relationship_type-2
* experiment_relations.experiment-2


**Aliases**
Complete list of sheets:
~~~~
get_field_info --type Publication --type Document --type Vendor --type Protocol --type BiosampleCellCulture --type Biosource --type Enzyme --type Construct --type TreatmentChemical --type TreatmentRnai --type Modification --type Biosample --type FileFastq --type FileSet --type IndividualHuman --type IndividualMouse --type ExperimentHiC --type ExperimentCaptureC --type Target --type GenomicRegion --type ExperimentSet --type Image --comments --outfile AllItems.xls
~~~~

When you create new object types at the same time, it is not possible to reference one item in another with an accession or uuid since it is not assigned yet. For example, if you have a new experiment with a new biosample in the same excel workbook (different sheets), what are you going to put in biosample field in experiments sheet? To overcome this problem, a lab specific identifier called alias is used. "aliases" field accepts multiple entries in the form of "labname:refname,labname:refname2" (testlab:expHic001). If you add lab:bisample1 to aliases field in biosample, you can then use this value in biosample field in experiment.

## Data submission
After you fill out the data submission forms, you can use the `wranglertools.import_data` method to submit the metadata. The method can be used both to create new metadata items and to patch fields of existing items.

## Specifications for import_data.py
You can use import_data.py either to upload new metadata or patch fields of an existing metadata.
When you import file data, the status has to be "uploading". if you have some other status, like "uploaded" and then patch the status to "uploading", you will not be able to upload file, because the dedicated url for aws upload is creating during post if the status is uploading.
import_data filename.xls

**Uploading vs Patching**

If there is a uuid, alias, @id, or accession in the document that matches and existing entry in the database, it will ask if you want to PATCH that object one by one.
If you use '--patchall' if you want to patch ALL objects in your document and ignore that message.
If there are uuid, alias, @id, or accession fields in the xls form that match existing entries in the database, you will be asked if you want to PATCH each object.
You can use the `--patchall` flag, if you want to patch ALL objects in your document and ignore that message.

If no object identifiers are found in the document, you need to use '--update' for POSTing to occur.
If no object identifiers are found in the document, you need to use `--update` for POSTing to occur.

To upload objects with attachments, use the column titled "attachment" containing the path the file you wish to attach

# Development
Note if you are attempting to run the scripts in the wranglertools directory without installing the package then in order to get the correct sys.path you need to run the scripts from the parent directory using the following command format::
Note if you are attempting to run the scripts in the wranglertools directory without installing the package then in order to get the correct sys.path you need to run the scripts from the parent directory using the following command format:

python3 -m wranglertools.get_field_info —-type Biosource
python3 -m wranglertools.import_data filename.xls
python3 -m wranglertools.import_data filename.xls

0 comments on commit 0617266

Please sign in to comment.