Skip to content

Commit

Permalink
Merge pull request #380 from ror-community/api-v2-create
Browse files Browse the repository at this point in the history
change zipfile name and update readme
  • Loading branch information
lizkrznarich committed Mar 14, 2024
2 parents 4c33baf + b0dd7ef commit 080e398
Show file tree
Hide file tree
Showing 4 changed files with 66 additions and 39 deletions.
23 changes: 17 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Making a POST request `/organizations` performs the following actions:

2. Make a POST request to `/organizations` with the JSON file as the data payload. Credentials are required for POST requests.

curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"

3. The response is a schema-valid JSON object populated with the submitted metadata as well as a ROR ID and Geonames details retrieved from Geonames. Fields and values will be ordered as in the ROR API and optional fields will be populated with empty or null values. Redirect the response to a file for use in the ROR data deployment process. **The resulting record is NOT added to the the ROR index.**

Expand All @@ -191,7 +191,7 @@ Making a PUT request `/organizations/[ROR ID]` performs the following actions:

2. Make a PUT request to `/organizations/[ROR ID]` with the JSON file as the data payload. Credentials are required for PUT requests. The ROR ID specified in the request path must match the ROR ID in the `id` field of the JSON data.

curl -X PUT -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations/[ROR ID]" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
curl -X PUT -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations/[ROR ID]" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"

3. The response is a schema-valid JSON object populated with the updates in the submitted metadata as well as updated Geonames details retrieved from Geonames. Fields and values will be ordered as in the ROR API and optional fields will be populated with empty or null values. Redirect the response to a file for use in the ROR data deployment process. **The resulting record is NOT updated in the the ROR index.**

Expand All @@ -217,18 +217,26 @@ Making a POST request `/organizations/bulkupdate` performs the following actions

2. Make a POST request to `/bulkupdate`` with the filepath specfied in the file field of a multi-part form payload. Credentials are required for POST requests.

curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" 'https://api.dev.ror.org/v2/bulkupdate' --form 'file=@"[PATH TO CSV FILE].csv"'
curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" 'https://api.dev.ror.org/v2/bulkupdate' --form 'file=@"[PATH TO CSV FILE].csv"'

3. The response is a summary with counts of records created/updated/skipped and a link to download the generated files from AWS S3.

{"file":"https://s3.eu-west-1.amazonaws.com/2024-03-09-15:56:26-ror-records.zip","rows processed":208,"created":207,"udpated":0,"skipped":1}
{"file":"https://s3.eu-west-1.amazonaws.com/2024-03-09_15_56_26-ror-records.zip","rows processed":208,"created":207,"udpated":0,"skipped":1}

The zipped file contains the following items:
- **input.csv:** Copy of the CSV submitted to the API
- **report.csv:** CSV with a row for each processed row in the input CSV, with indication of whether it was created, updated or skipped. If a record was created, its new ROR ID is listed in the `ror_id` column. If a record was skipped, the reasons(s) are listed in the `errors` column.
- **new:** Directory containing JSON files for records that were successully created (omitted if no records were created)
- **updates:** A directory containing JSON files for records that were successfully updated (omitted if no records were updated)

#### Validate only
Use the `?validate` parameter to simulate running the bulkupdate request without actually generating files. The response is the same CSV report described above.

1. Make a POST request to `/bulkupdate?validate`` with the filepath specfied in the file field of a multi-part form payload. Credentials are required for POST requests. Makre sure to redirect the output to a CSV file on your machine.

curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" 'https://api.dev.ror.org/v2/bulkupdate?validate' --form 'file=@"[PATH TO CSV FILE].csv"' > report.csv


### CSV formatting

#### Column headings & values
Expand Down Expand Up @@ -282,7 +290,10 @@ The zipped file contains the following items:
| delete | Remove all values from field (single or multi-item field) | All optional fields. Not allowed for required fields: locations, names.types.ror_display, status, types | |
| replace== | Replace all value(s) with specified value(s) (single or multi-item field) | All fields | replace== has special behavior for external_ids.[type].all and names fields - see below |
| no action (only value supplied) | Replace existing value or add value to currently empty field (single-item fields) | established, external_ids preferred, status, names.types.ror_display | Same action as replace |
#### External IDs
#### Fields with special behaviors
For some fields that contain a list of dictionaries as their value, update actions have special behaviors.

##### External IDs

| Action | external_ids.[TYPE].all | external.[TYPE].preferred |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -292,7 +303,7 @@ The zipped file contains the following items:
| delete | Deletes any existing all existing values from external_ids.[TYPE].all. Preferred ID is NOT automatically removed from external_ids.[TYPE].all  - it must be explicitly deleted from external.[TYPE].all . After all changes to external_ids.[TYPE].all and external.[TYPE].preferred are calcuated, if the result is that BOTH fields are empty the entire external_ids object is deleted. | Deletes any existing value in external.[TYPE].preferred. Value is NOT automatically removed from external_ids.[TYPE].all - it must be explicitly deleted from external.[TYPE].all |
| no action (only value supplied) | Same as replace== | Same as replace== |

#### Names
##### Names

| Action | names.[TYPE] |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
Expand Down
62 changes: 36 additions & 26 deletions rorapi/common/csv_bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def save_record_file(ror_id, updated, json_obj, dir_name):
with open(full_path, "w") as outfile:
json.dump(json_obj, outfile, ensure_ascii=False, indent=2)

def save_report_file(report, report_fields, csv_file, dir_name):
def save_report_file(report, report_fields, csv_file, dir_name, validate_only):
dir_path = os.path.join(DATA['DIR'],dir_name)
if not os.path.exists(dir_path):
os.mkdir(dir_path)
Expand All @@ -34,16 +34,17 @@ def save_report_file(report, report_fields, csv_file, dir_name):
writer = csv.DictWriter(csvfile, fieldnames=report_fields)
writer.writeheader()
writer.writerows(report)
# save copy of input file
filepath = os.path.join(dir_path, 'input.csv')
csv_file.seek(0)
with open(filepath, 'wb+') as f:
for chunk in csv_file.chunks():
f.write(chunk)
if not validate_only:
# save copy of input file
filepath = os.path.join(dir_path, 'input.csv')
csv_file.seek(0)
with open(filepath, 'wb+') as f:
for chunk in csv_file.chunks():
f.write(chunk)

def process_csv(csv_file, version):
def process_csv(csv_file, version, validate_only):
print("Processing CSV")
dir_name = datetime.now().strftime("%Y-%m-%d-%H:%M:%S") + "-ror-records"
dir_name = datetime.now().strftime("%Y-%m-%d_%H_%M_%S") + "-ror-records"
success_msg = None
error = None
report = []
Expand Down Expand Up @@ -77,30 +78,39 @@ def process_csv(csv_file, version):
serializer = OrganizationSerializerV2(v2_record)
json_obj = json.loads(JSONRenderer().render(serializer.data))
print(json_obj)
#create file
file = save_record_file(ror_id, updated, json_obj, dir_name)
if not validate_only:
#create file
file = save_record_file(ror_id, updated, json_obj, dir_name)
else:
action = 'skipped'
skipped_count += 1
report.append({"row": row_num, "ror_id": ror_id if ror_id else '', "action": action, "errors": "; ".join(row_errors) if row_errors else ''})
row_num += 1
if new_count > 0 or updated_count > 0 or skipped_count > 0:
try:
#create report file
save_report_file(report, report_fields, csv_file, dir_name)
# create zip file
zipfile = shutil.make_archive(os.path.join(DATA['DIR'], dir_name), 'zip', DATA['DIR'], dir_name)
# upload to S3
try:
DATA['CLIENT'].upload_file(zipfile, DATA['PUBLIC_STORE'], dir_name + '.zip')
zipfile = f"https://s3.eu-west-1.amazonaws.com/{DATA['PUBLIC_STORE']}/{urllib.parse.quote(dir_name)}.zip"
except Exception as e:
error = f"Error uploading zipfile to S3: {e}"
if validate_only:
try:
save_report_file(report, report_fields, csv_file, dir_name, validate_only)
success_msg = os.path.join(DATA['DIR'], dir_name, 'report.csv')
except Exception as e:
error = f"Error creating validation report: {e}"
else:
#create report file
save_report_file(report, report_fields, csv_file, dir_name, validate_only)
# create zip file
zipfile = shutil.make_archive(os.path.join(DATA['DIR'], dir_name), 'zip', DATA['DIR'], dir_name)
# upload to S3
try:
DATA['CLIENT'].upload_file(zipfile, DATA['PUBLIC_STORE'], dir_name + '.zip')
zipfile = f"https://s3.eu-west-1.amazonaws.com/{DATA['PUBLIC_STORE']}/{urllib.parse.quote(dir_name)}.zip"
success_msg = {"file": zipfile,
"rows processed": new_count + updated_count + skipped_count,
"created": new_count,
"updated": updated_count,
"skipped": skipped_count}
except Exception as e:
error = f"Error uploading zipfile to S3: {e}"
except Exception as e:
error = f"Unexpected error generating records: {e}"
success_msg = {"file": zipfile,
"rows processed": new_count + updated_count + skipped_count,
"created": new_count,
"udpated": updated_count,
"skipped": skipped_count}

return error, success_msg
3 changes: 2 additions & 1 deletion rorapi/common/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
from django.urls import path, re_path
from rest_framework.documentation import include_docs_urls
from . import views
from rorapi.common.views import HeartbeatView,GenerateAddress,GenerateId,IndexData,IndexDataDump,BulkUpdate
from rorapi.common.views import (
HeartbeatView,GenerateAddress,GenerateId,IndexData,IndexDataDump,BulkUpdate)

urlpatterns = [
# Health check
Expand Down
17 changes: 11 additions & 6 deletions rorapi/common/views.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import csv
from rest_framework import viewsets, routers, status
from rest_framework.response import Response
from django.http import HttpResponse
Expand Down Expand Up @@ -125,7 +126,6 @@ def create(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
else:
errors = Errors(["Version {} does not support creating records".format(version)])
if errors is not None:
print(errors)
return Response(
ErrorsSerializer(errors).data, status=status.HTTP_400_BAD_REQUEST
)
Expand Down Expand Up @@ -235,6 +235,7 @@ class BulkUpdate(APIView):
parser_classes = (MultiPartParser, FormParser)

def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
validate_only = False
errors = None
if version == 'v2':
if request.data:
Expand All @@ -246,11 +247,10 @@ def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
csv_validation_errors = validate_csv(file_object)
if len(csv_validation_errors) == 0:
file_object.seek(0)
process_csv_error, msg = process_csv(file_object, version)
print("views msg")
print(msg)
print("views type msg")
print(type(msg))
params = request.GET.dict()
if "validate" in params:
validate_only = True
process_csv_error, msg = process_csv(file_object, version, validate_only)
if process_csv_error:
errors = Errors([process_csv_error])
else:
Expand All @@ -266,6 +266,11 @@ def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
return Response(
ErrorsSerializer(errors).data, status=status.HTTP_400_BAD_REQUEST
)
if validate_only:
with open(msg) as file:
response = HttpResponse(file, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=reports.csv'
return response

return Response(
msg,
Expand Down

0 comments on commit 080e398

Please sign in to comment.