Merge pull request #380 from ror-community/api-v2-create

change zipfile name and update readme
ror-community · Mar 14, 2024 · 080e398 · 080e398
2 parents 4c33baf + b0dd7ef
commit 080e398
Show file tree

Hide file tree

Showing 4 changed files with 66 additions and 39 deletions.
diff --git a/README.md b/README.md
@@ -169,7 +169,7 @@ Making a POST request `/organizations` performs the following actions:
 
 2. Make a POST request to `/organizations` with the JSON file as the data payload. Credentials are required for POST requests.
 
-    curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
+        curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
 
 3. The response is a schema-valid JSON object populated with the submitted metadata as well as a ROR ID and Geonames details retrieved from Geonames. Fields and values will be ordered as in the ROR API and optional fields will be populated with empty or null values. Redirect the response to a file for use in the ROR data deployment process. **The resulting record is NOT added to the the ROR index.**
 
@@ -191,7 +191,7 @@ Making a PUT request `/organizations/[ROR ID]` performs the following actions:
 
 2. Make a PUT request to `/organizations/[ROR ID]` with the JSON file as the data payload. Credentials are required for PUT requests. The ROR ID specified in the request path must match the ROR ID in the `id` field of the JSON data.
 
-    curl -X PUT -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations/[ROR ID]" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
+        curl -X PUT -H "Route-User: [API USER]" -H "Token: [API TOKEN]" "http://api.dev.ror.org/v2/organizations/[ROR ID]" -d @[PATH TO JSON FILE].json -H "Content-Type: application/json"
 
 3. The response is a schema-valid JSON object populated with the updates in the submitted metadata as well as updated Geonames details retrieved from Geonames. Fields and values will be ordered as in the ROR API and optional fields will be populated with empty or null values. Redirect the response to a file for use in the ROR data deployment process. **The resulting record is NOT updated in the the ROR index.**
 
@@ -217,18 +217,26 @@ Making a POST request `/organizations/bulkupdate` performs the following actions
 
 2. Make a POST request to `/bulkupdate`` with the filepath specfied in the file field of a multi-part form payload. Credentials are required for POST requests.
 
-    curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]"  'https://api.dev.ror.org/v2/bulkupdate' --form 'file=@"[PATH TO CSV FILE].csv"'
+        curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]"  'https://api.dev.ror.org/v2/bulkupdate' --form 'file=@"[PATH TO CSV FILE].csv"'
 
 3. The response is a summary with counts of records created/updated/skipped and a link to download the generated files from AWS S3.
 
-    {"file":"https://s3.eu-west-1.amazonaws.com/2024-03-09-15:56:26-ror-records.zip","rows processed":208,"created":207,"udpated":0,"skipped":1}
+        {"file":"https://s3.eu-west-1.amazonaws.com/2024-03-09_15_56_26-ror-records.zip","rows processed":208,"created":207,"udpated":0,"skipped":1}
 
 The zipped file contains the following items:
 - **input.csv:** Copy of the CSV submitted to the API
 - **report.csv:** CSV with a row for each processed row in the input CSV, with indication of whether it was created, updated or skipped. If a record was created, its new ROR ID is listed in the `ror_id` column. If a record was skipped, the reasons(s) are listed in the `errors` column.
 - **new:** Directory containing JSON files for records that were successully created (omitted if no records were created)
 - **updates:** A directory containing JSON files for records that were successfully updated (omitted if no records were updated)
 
+#### Validate only
+Use the `?validate` parameter to simulate running the bulkupdate request without actually generating files. The response is the same CSV report described above.
+
+1. Make a POST request to `/bulkupdate?validate`` with the filepath specfied in the file field of a multi-part form payload. Credentials are required for POST requests. Makre sure to redirect the output to a CSV file on your machine.
+
+        curl -X POST -H "Route-User: [API USER]" -H "Token: [API TOKEN]"  'https://api.dev.ror.org/v2/bulkupdate?validate' --form 'file=@"[PATH TO CSV FILE].csv"' > report.csv
+
+
 ### CSV formatting
 
 #### Column headings & values
@@ -282,7 +290,10 @@ The zipped file contains the following items:
 | delete                          | Remove all values from field (single or multi-item field)                         | All optional fields. Not allowed for required fields: locations, names.types.ror_display, status, types                                                                                                                                               |                                                                                                                                                                                                                                                                                                           |
 | replace==                       | Replace all value(s) with specified value(s) (single or multi-item field)         | All fields                                                                                                                                                                                                                                            | replace== has special behavior for external_ids.[type].all and names fields - see below                                                                                                                                                                                                                   |
 | no action (only value supplied) | Replace existing value or add value to currently empty field (single-item fields) | established, external_ids preferred, status, names.types.ror_display                                                                                                                                                                                  | Same action as replace                                                                                                                                                                                                                                                                                    |
-#### External IDs
+#### Fields with special behaviors
+For some fields that contain a list of dictionaries as their value, update actions have special behaviors.
+
+##### External IDs
 
 | Action                          | external_ids.[TYPE].all                                                                                                                                                                                                                                                                                                                                                                           | external.[TYPE].preferred                                                                                                                                                                                           |
 | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -292,7 +303,7 @@ The zipped file contains the following items:
 | delete                          | Deletes any existing all existing values from external_ids.[TYPE].all. Preferred ID is NOT automatically removed from external_ids.[TYPE].all  - it must be explicitly deleted from external.[TYPE].all . After all changes to external_ids.[TYPE].all and external.[TYPE].preferred are calcuated, if the result is that BOTH fields are empty the entire external_ids object is deleted.        | Deletes any existing value in external.[TYPE].preferred. Value is NOT automatically removed from external_ids.[TYPE].all - it must be explicitly deleted from external.[TYPE].all                                   |
 | no action (only value supplied) | Same as replace==                                                                                                                                                                                                                                                                                                                                                                                 | Same as replace==                                                                                                                                                                                                   |
 
-#### Names
+##### Names
 
 | Action                          | names.[TYPE]                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |

diff --git a/rorapi/common/csv_bulk.py b/rorapi/common/csv_bulk.py
@@ -25,7 +25,7 @@ def save_record_file(ror_id, updated, json_obj, dir_name):
     with open(full_path, "w") as outfile:
         json.dump(json_obj, outfile, ensure_ascii=False, indent=2)
 
-def save_report_file(report, report_fields, csv_file, dir_name):
+def save_report_file(report, report_fields, csv_file, dir_name, validate_only):
     dir_path = os.path.join(DATA['DIR'],dir_name)
     if not os.path.exists(dir_path):
         os.mkdir(dir_path)
@@ -34,16 +34,17 @@ def save_report_file(report, report_fields, csv_file, dir_name):
             writer = csv.DictWriter(csvfile, fieldnames=report_fields)
             writer.writeheader()
             writer.writerows(report)
-    # save copy of input file
-    filepath =  os.path.join(dir_path, 'input.csv')
-    csv_file.seek(0)
-    with open(filepath, 'wb+') as f:
-        for chunk in csv_file.chunks():
-            f.write(chunk)
+    if not validate_only:
+        # save copy of input file
+        filepath =  os.path.join(dir_path, 'input.csv')
+        csv_file.seek(0)
+        with open(filepath, 'wb+') as f:
+            for chunk in csv_file.chunks():
+                f.write(chunk)
 
-def process_csv(csv_file, version):
+def process_csv(csv_file, version, validate_only):
     print("Processing CSV")
-    dir_name = datetime.now().strftime("%Y-%m-%d-%H:%M:%S") + "-ror-records"
+    dir_name = datetime.now().strftime("%Y-%m-%d_%H_%M_%S") + "-ror-records"
     success_msg = None
     error = None
     report = []
@@ -77,30 +78,39 @@ def process_csv(csv_file, version):
             serializer = OrganizationSerializerV2(v2_record)
             json_obj = json.loads(JSONRenderer().render(serializer.data))
             print(json_obj)
-            #create file
-            file = save_record_file(ror_id, updated, json_obj, dir_name)
+            if not validate_only:
+                #create file
+                file = save_record_file(ror_id, updated, json_obj, dir_name)
         else:
             action = 'skipped'
             skipped_count += 1
         report.append({"row": row_num, "ror_id": ror_id if ror_id else '', "action": action, "errors": "; ".join(row_errors) if row_errors else ''})
         row_num += 1
     if new_count > 0 or updated_count > 0 or skipped_count > 0:
         try:
-            #create report file
-            save_report_file(report, report_fields, csv_file, dir_name)
-            # create zip file
-            zipfile = shutil.make_archive(os.path.join(DATA['DIR'], dir_name), 'zip', DATA['DIR'], dir_name)
-            # upload to S3
-            try:
-                DATA['CLIENT'].upload_file(zipfile, DATA['PUBLIC_STORE'], dir_name + '.zip')
-                zipfile = f"https://s3.eu-west-1.amazonaws.com/{DATA['PUBLIC_STORE']}/{urllib.parse.quote(dir_name)}.zip"
-            except Exception as e:
-                error = f"Error uploading zipfile to S3: {e}"
+            if validate_only:
+                try:
+                    save_report_file(report, report_fields, csv_file, dir_name, validate_only)
+                    success_msg = os.path.join(DATA['DIR'], dir_name, 'report.csv')
+                except Exception as e:
+                    error = f"Error creating validation report: {e}"
+            else:
+                #create report file
+                save_report_file(report, report_fields, csv_file, dir_name, validate_only)
+                # create zip file
+                zipfile = shutil.make_archive(os.path.join(DATA['DIR'], dir_name), 'zip', DATA['DIR'], dir_name)
+                # upload to S3
+                try:
+                    DATA['CLIENT'].upload_file(zipfile, DATA['PUBLIC_STORE'], dir_name + '.zip')
+                    zipfile = f"https://s3.eu-west-1.amazonaws.com/{DATA['PUBLIC_STORE']}/{urllib.parse.quote(dir_name)}.zip"
+                    success_msg = {"file": zipfile,
+                        "rows processed": new_count + updated_count + skipped_count,
+                        "created": new_count,
+                        "updated": updated_count,
+                        "skipped": skipped_count}
+                except Exception as e:
+                    error = f"Error uploading zipfile to S3: {e}"
         except Exception as e:
             error = f"Unexpected error generating records: {e}"
-    success_msg = {"file": zipfile,
-                   "rows processed": new_count + updated_count + skipped_count,
-                   "created": new_count,
-                   "udpated": updated_count,
-                   "skipped": skipped_count}
+
     return error, success_msg
diff --git a/rorapi/common/urls.py b/rorapi/common/urls.py
@@ -2,7 +2,8 @@
 from django.urls import path, re_path
 from rest_framework.documentation import include_docs_urls
 from  . import views
-from rorapi.common.views import HeartbeatView,GenerateAddress,GenerateId,IndexData,IndexDataDump,BulkUpdate
+from rorapi.common.views import (
+    HeartbeatView,GenerateAddress,GenerateId,IndexData,IndexDataDump,BulkUpdate)
 
 urlpatterns = [
     # Health check

diff --git a/rorapi/common/views.py b/rorapi/common/views.py
@@ -1,3 +1,4 @@
+import csv
 from rest_framework import viewsets, routers, status
 from rest_framework.response import Response
 from django.http import HttpResponse
@@ -125,7 +126,6 @@ def create(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
         else:
             errors = Errors(["Version {} does not support creating records".format(version)])
         if errors is not None:
-            print(errors)
             return Response(
                 ErrorsSerializer(errors).data, status=status.HTTP_400_BAD_REQUEST
             )
@@ -235,6 +235,7 @@ class BulkUpdate(APIView):
     parser_classes = (MultiPartParser, FormParser)
 
     def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
+        validate_only = False
         errors = None
         if version == 'v2':
             if request.data:
@@ -246,11 +247,10 @@ def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
                     csv_validation_errors = validate_csv(file_object)
                     if len(csv_validation_errors) == 0:
                         file_object.seek(0)
-                        process_csv_error, msg = process_csv(file_object, version)
-                        print("views msg")
-                        print(msg)
-                        print("views type msg")
-                        print(type(msg))
+                        params = request.GET.dict()
+                        if "validate" in params:
+                            validate_only = True
+                        process_csv_error, msg = process_csv(file_object, version, validate_only)
                         if process_csv_error:
                             errors = Errors([process_csv_error])
                     else:
@@ -266,6 +266,11 @@ def post(self, request, version=REST_FRAMEWORK["DEFAULT_VERSION"]):
             return Response(
                 ErrorsSerializer(errors).data, status=status.HTTP_400_BAD_REQUEST
             )
+        if validate_only:
+            with open(msg) as file:
+                response = HttpResponse(file, content_type='text/csv')
+                response['Content-Disposition'] = 'attachment; filename=reports.csv'
+            return response
 
         return Response(
             msg,