Tasks:
- get existing overall_scores' provenance
- load overall_scores data into AGORA's live folder
- get provenance for manifest file and update overall_scores data
- update manifest file and its provenance

In [20]:
import synapseclient
from synapseclient import File, Activity
import pandas as pd

In [21]:
syn = synapseclient.Synapse()
syn.login(silent=True)

In [22]:
data_manifest = 'syn13363290'
agora_live_data = 'syn12177492'
overall_scores_file = '../output/overall_scores.json'
old_overall_scores = 'syn25741025'

Get provenance for the old overall_scores.json in Agora:

In [23]:
overall_scores_provenance = syn.getProvenance(old_overall_scores)['used']
overall_scores_provenance = [i['reference']['targetId'] for i in overall_scores_provenance]
print(overall_scores_provenance)

['syn25575153', 'syn22758536']


Update overall_scores in Agora with the new file and the correct provenance:  

In [24]:
overall_scores = File(overall_scores_file, parent=agora_live_data)
overall_scores = syn.store(overall_scores, used=overall_scores_provenance)


##################################################
 Uploading file to Synapse storage 
##################################################



It's useful to make sure the id and the versionNumber are correct.

In [None]:
print(overall_scores) # {'id' : 'syn25741025', 'versionNumber' : 2}

overall_scores now stores the return value of uploading the file, which contains the 'id' and the 'versionNumber'.  Here we get the provenance from the data_manifest and update it the correct version of overall_scores:

In [31]:
manifest_provenance = syn.getProvenance(data_manifest)
manifest_provenance = [i['reference'] for i in manifest_provenance['used']]

# update manifest_provenance
for i in manifest_provenance:
    if i['targetId'] == overall_scores['id']:
        i['targetVersionNumber'] = overall_scores['versionNumber']
print(manifest_provenance)

[{'targetId': 'syn12177499', 'targetVersionNumber': 27}, {'targetId': 'syn12548902', 'targetVersionNumber': 29}, {'targetId': 'syn12616884', 'targetVersionNumber': 12}, {'targetId': 'syn12523173', 'targetVersionNumber': 18}, {'targetId': 'syn18693175', 'targetVersionNumber': 4}, {'targetId': 'syn19315964', 'targetVersionNumber': 3}, {'targetId': 'syn25741023', 'targetVersionNumber': 1}, {'targetId': 'syn25741024', 'targetVersionNumber': 1}, {'targetId': 'syn25741025', 'targetVersionNumber': 2}, {'targetId': 'syn25741026', 'targetVersionNumber': 1}, {'targetId': 'syn22167182', 'targetVersionNumber': 2}, {'targetId': 'syn25741027', 'targetVersionNumber': 3}, {'targetId': 'syn26274945', 'targetVersionNumber': 3}]


Now that the provenance is correct, the manifest file can be updated with the correct version of overall_scores.  After this step, both the provenance and the contents of the file should be aligned:

In [47]:
data_manifest = syn.get(data_manifest)
data_manifest = pd.read_csv(data_manifest['path'], index_col=0)
data_manifest.loc[data_manifest['id'] == overall_scores['id'], 'version'] = overall_scores['versionNumber']

print(data_manifest) # make sure it's updated properly

data_manifest_local_file = '../output/data_manifest.csv'
data_manifest.to_csv(data_manifest_local_file, index=False)


             id  version
0   syn12177499       27
1   syn12548902       29
2   syn12616884       12
3   syn12523173       18
4   syn18693175        4
5   syn19315964        3
6   syn25741023        1
7   syn25741024        1
8   syn25741025        2
9   syn25741026        1
10  syn22167182        2
11  syn25741027        2
12  syn26274945        3


Before uploading the updated data_manifest to Synapse, it's convenient to open it locally and make sure it does not have a new line character at the end of the file. Otherwise, the parser down the pipeline will fail to parse the manifest.  Another thing to note here is that the manifest_provenance now contains the updated value for overall_scores:

In [49]:
updated_data_manifest = File(data_manifest_local_file, parent=agora_live_data)
updated_data_manifest = syn.store(updated_data_manifest, used=[i['targetId'] for i in manifest_provenance])

Verify provenance and version of file:

In [50]:
print(syn.getProvenance(updated_data_manifest['id']))
print(updated_data_manifest['versionNumber']) 


  Executed:
  Used:
syn12177499.27
syn12548902.29
syn12616884.12
syn12523173.18
syn18693175.4
syn19315964.3
syn25741023.1
syn25741024.1
syn25741025.2
syn25741026.1
syn22167182.2
syn25741027.3
syn26274945.3
File: data_manifest.csv (syn13363290)
  md5=37d72a076da25cd36be4edea66c5c275
  fileSize=196
  contentType=text/csv
  externalURL=None
  cacheDir=../output
  files=['data_manifest.csv']
  path=../output/data_manifest.csv
  synapseStore=True
properties:
  concreteType=org.sagebionetworks.repo.model.FileEntity
  createdBy=3323072
  createdOn=2018-07-12T05:35:31.362Z
  dataFileHandleId=84829936
  etag=824e5c5a-0f2f-4bcb-877d-5daa51a25312
  id=syn13363290
  isLatestVersion=True
  modifiedBy=3419125
  modifiedOn=2021-11-19T05:22:31.441Z
  name=data_manifest.csv
  parentId=syn12177492
  versionLabel=33
  versionNumber=33
annotations:

