This notebook takes the raw splunk data and aggregates it by conversation ID to generate a list of the messages generated during the transfer

PRMT-2038: Generate list of messages for each transfers using raw spine data

We wish to add a new field to the parquet files that we currently use for analytics.
- Specifically, we want the list of all messages, in human readable form and the order in which they occur, added to the transfers used
- We want to do this in Jupyter notebooks (rather than a branch etc on the pipeline), using the raw spine data already in S3.


In [1]:
import pandas as pd
import time

In [2]:
# Define a list of files to be loaded in
folder="s3://prm-gp2gp-data-sandbox-dev/spine-gp2gp-data/"
files=["Sept-2020","Oct-2020","Nov-2020","Dec-2020","Jan-2021","Feb-2021","Mar-2021"]
full_filenames=[folder + file + ".csv.gz" for file in files]

In [3]:
# Rename message types to be human readable
interaction_name_mapping={"urn:nhs:names:services:gp2gp/RCMR_IN010000UK05":"request started",
"urn:nhs:names:services:gp2gp/RCMR_IN030000UK06":"request completed",
"urn:nhs:names:services:gp2gp/COPC_IN000001UK01":"common point to point",
"urn:nhs:names:services:gp2gp/MCCI_IN010000UK13":"application acknowledgement"}

In [4]:
# Function to define what to retain from each file when it is loaded in - note that this is done in advance to save memory usage!
def generate_single_frame(file):
    df=pd.read_csv(file, compression='gzip',error_bad_lines=False)
    df=df.sort_values(by='_time')
    df['interaction name']=df['interactionID'].replace(interaction_name_mapping)
    df=df[['conversationID','interaction name']]
    return df

In [5]:
a=time.perf_counter()
field_data=[generate_single_frame(file) for file in full_filenames]
time.perf_counter()-a

286.74416992699935

In [6]:
field_data=pd.concat(field_data,axis=0)
field_data=field_data.groupby('conversationID')['interaction name'].apply(list)

In [7]:
pd.DataFrame(field_data).to_parquet('s3://prm-gp2gp-data-sandbox-dev/extra-fields-data-from-splunk/Sept_20_Feb_21_conversations_interaction_messages.parquet')