# Large attachment failures in requesting TPP practices

Questions:

- Can we reliably identify GP2GP conversations that have failed due to TPP’s attachment size constraints?
- For such conversations, can we identify the size of the largest attachment?
- Given a proposed attachment size limit, can we estimate the number of transfer failures that will occur due to breaching this limit?
- Are there any trends in attachment size that may impact on our findings in the future? (e.g. is there a trend for attaching larger files to patient records?)

In [2]:
import paths
%load_ext autoreload
%autoreload 2

## Coverage of large message failures in the MI data

- We know that the MI data isn't always reliable
- We assume that the `spine2vfmmonitor` index in NMS is more complete, as NHSD has complete observability for it
- We believe that by comparing large message error volumes in MI and in spine, we should be able to get some insight into the value of the MI data for addressing these issues

### According to Spine

NMS query to retrieve all large message errors from spine (regardless of whether generated by the sender or the requestor) for all suppliers:
```
index="spine2vfmmonitor" ackExceptionCode=30
| stats count, dc(conversationID) as distinct_conversations
```

Running this query for 1st - 29th Feb 2020 gave:

```
count: 1200
distinct_conversations: 1139
```

That is, there were `1200` spine messages with `ackExceptionCode = 30` (i.e. Application Acknowledgement messages with `large message general failures` according to the gp2gp spec), and these occurred over `1139` unique conversations.

Interestingly, one conversation (`440CAD30-5879-11EA-AAFF-6168D3AA4F1F`) was responsible for `54` of the messages with `ackExceptionCode=30`. Requesting practice `fromPartyID` field contained `YGA`, therefore we assume that this is a TPP practice.

| conversationID | number of code 30 messages |
|----------------|----------------------------|
|440CAD30-5879-11EA-AAFF-6168D3AA4F1F	 | 54 |
|3FCF25F0-4CC5-11EA-9FFE-8500E7C8EA1C	 | 3  |
|44CD4320-48D8-11EA-9E3E-2961FF2D25D3	 | 2  |
|4A61FD90-5179-11EA-9101-1FE05E86F508	 | 2  |

TODO:

- codes `29` and `31` are also associated with large message errors -- do we see any of these in practice? associated with TPP?
- If we could distinguish between the Sender and Requestor practice in spine, we could break down the failures as to whether they occurred on the sender side or requestor side; only the latter corresponds to TPP attachment failures.

### According to the MI data

For the MI data, we have both the 'sender' and 'requestor' views of each conversation.  All the following queries run for 1st - 29th Feb 2020 only for data received from TPP.

#### From the sender perspective - looking at request acknowledgement errors

Note: To create a view with sender records (`mi_sr`) in AWS Athena see [Create SR view](/athena/create_sr_view.txt).
```
SELECT count(ConversationID) AS conversation_count,
         count(DISTINCT ConversationID) AS distinct_conversation_count
FROM mi_sr
WHERE RequestTime
    BETWEEN '2020-02-01'
        AND '2020-02-29'
        AND RequestAckCode='30'
```

Results:
```
count: 	944
distinct_count: 944
```

#### From the requestor perspective - looking at request errors

Note: To create a view with requester records (`mi_rr`) in AWS Athena see [Create RR view](/athena/create_rr_view.txt).

```
SELECT count(ConversationID) AS conversation_count,
         count(DISTINCT ConversationID) AS distinct_conversation_count
FROM mi_rr
WHERE RequestTime
    BETWEEN '2020-02-01'
        AND '2020-02-29'
        AND RequestErrorCode='30'
```

Results:

```
count: 0
```

#### From the requestor perspective - looking at extract acknowledgement errors

```
SELECT count(ConversationID) AS conversation_count,
         count(DISTINCT ConversationID) AS distinct_conversation_count
FROM mi_rr
WHERE RequestTime
    BETWEEN '2020-02-01'
        AND '2020-02-29'
        AND ExtractAckCode='30'
```

Results:

```
count: 180
distinct_count: 180
```

#### Merging all sender and requester large attachment failures

Create a view with sender large attachment failures

```
CREATE OR REPLACE VIEW sr_large_attachment_failures AS 
SELECT ConversationID
FROM mi_sr
WHERE RequestTime
    BETWEEN '2020-02-01'
        AND '2020-02-29'
        AND RequestAckCode='30'
```

Create a view with requester large attachment failures

```
CREATE OR REPLACE VIEW rr_large_attachment_failures AS
SELECT ConversationID
FROM mi_rr
WHERE RequestTime
    BETWEEN '2020-02-01'
        AND '2020-02-29'
        AND (RequestErrorCode='30'
        OR ExtractAckCode='30')
```

Join and count sender and requester large attachment failures

```
SELECT COUNT(*)
FROM 
    (SELECT ConversationID
    FROM rr_large_attachment_failures) AS a FULL OUTER
JOIN 
    (SELECT ConversationID
    FROM sr_large_attachment_failures) AS b
    ON a.ConversationID=b.ConversationID;
```

Results:

```
distinct_conversations: 1124
```

### Conclusion

```
Spine large attachment failures for all suppliers: 1139
MI large attachment failures for TPP: 1124
```

The results from the MI data are very close to the Spine results --- the discrepancy could be explained by having only TPP MI data available at the time of running the above queries.

**Thus: we can be confident in using the MI data to analyse 'error code 30' events --- i.e. `large message general failures`**.