# Background
* Respiratory Medicine physicians see between 50 to >150 COPD patients each month, of which over 70% would be on inhaled maintenance therapy.
* Market is fragmented. There is immense competition from newer treatment options.
* There is an impending product launch using triple combination therapy - consisting of ICS, LABA and LAMA.

# Objectives
* Perform assessment of COPD drugs w.r.t. market changes.
    * Analyze drug awareness amongst physicians
    * Analyze drug satisfaction

# Data staging
## Survey Data
* Extract survey data from `Raw data` tab
* Clean up the columns - to align with product code from master

In [1]:
import pandas as pd
import re
import plotly.graph_objs as go
import plotly.express as px

# IMPORT SURVEY DATA

## `Raw data` tab clipped to store into df for the first time
# df = pd.read_clipboard() 
# df.to_pickle('./survey.pkl')

df = pd.read_pickle('./survey.pkl')
df.head()

Unnamed: 0,RID,Specialty,B1,B1_1,B1_2,Grid_C1[{_3}].C1,Grid_C1[{_5}].C1,Grid_C1[{_7}].C1,Grid_C1[{_9}].C1,Grid_C1[{_10}].C1,Grid_C1[{_11}].C1,Grid_C1[{_12}].C1,Grid_C1[{_16}].C1
0,701,{_1},{_7},"{_5,_9,_10,_3,_11,_16}","{_12,_101,_102}",{_8},{_10},{_10},{_9},{_9},{_8},,{_8}
1,702,{_1},{_5},"{_9,_10,_3}","{_7,_11,_16}",{_7},{_10},{_7},{_10},{_7},{_7},,{_8}
2,703,{_1},{_5},"{_7,_9,_3}","{_12,_101,_102,_10,_11,_16}",{_7},{_8},{_9},{_8},{_7},{_6},,{_6}
3,704,{_3},{_7},{_9},"{_5,_3,_16}",{_6},{_10},{_9},{_9},,,,{_10}
4,705,{_1},{_9},"{_7,_5,_11}","{_101,_102,_10,_3,_16}",{_7},{_8},{_8},{_9},{_6},{_7},,{_8}


In [2]:
# cleanup the columns of df
df.columns = ['RID', 'Specialty', 'Favourite', 'Unaided', 'Aided'] + [int(s.split('_')[2].split('}')[0]) for s in df.columns[5:]]
df.head()

Unnamed: 0,RID,Specialty,Favourite,Unaided,Aided,3,5,7,9,10,11,12,16
0,701,{_1},{_7},"{_5,_9,_10,_3,_11,_16}","{_12,_101,_102}",{_8},{_10},{_10},{_9},{_9},{_8},,{_8}
1,702,{_1},{_5},"{_9,_10,_3}","{_7,_11,_16}",{_7},{_10},{_7},{_10},{_7},{_7},,{_8}
2,703,{_1},{_5},"{_7,_9,_3}","{_12,_101,_102,_10,_11,_16}",{_7},{_8},{_9},{_8},{_7},{_6},,{_6}
3,704,{_3},{_7},{_9},"{_5,_3,_16}",{_6},{_10},{_9},{_9},,,,{_10}
4,705,{_1},{_9},"{_7,_5,_11}","{_101,_102,_10,_3,_16}",{_7},{_8},{_8},{_9},{_6},{_7},,{_8}


In [3]:
# Clean-up the {}_
df = df.fillna('').replace(r'[^0-9,]','', regex=True)

df = df.assign(Specialty=df.Specialty.astype('int'),
          Favourite=df.Favourite.astype('int'))

df.head()

Unnamed: 0,RID,Specialty,Favourite,Unaided,Aided,3,5,7,9,10,11,12,16
0,701,1,7,591031116,12101102,8,10,10,9,9.0,8.0,,8
1,702,1,5,9103,71116,7,10,7,10,7.0,7.0,,8
2,703,1,5,793,12101102101116,7,8,9,8,7.0,6.0,,6
3,704,3,7,9,5316,6,10,9,9,,,,10
4,705,1,9,7511,10110210316,7,8,8,9,6.0,7.0,,8


## Master data
* The master dataframe is taken from `master_data` tab of the spreadsheet.
* The `master_data` tab is constructed manually from `Brand List of Coding` sheet
    - The only automation is in the `Drug` column, which uses `=LEFT(E2, SEARCH(" (", E2,1))` formula


In [4]:
# IMPORT MASTER DATA

## 'master_data tab clipped to store into df_master for the first time
# df_master = pd.read_clipboard()
# df_master.to_pickle('./master.pkl')

df_master = pd.read_pickle('./master.pkl')
df_master.head()

Unnamed: 0,Code,Type,SubType,Drug,Label,Added_on,Remarks
0,201,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + ICS/ LABA FDC,Precodes,
1,202,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Onbrez (Indacaterol),Precodes,
2,203,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Striverdi (Olodaterol),Precodes,
3,204,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + ICS,Wave 3,
4,205,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Ventolin (Albuterol),Wave 3,


In [5]:
# Build the code -> drug dictionary
code_drug = {str(k): v for k, v in df_master.set_index('Code').Drug.to_dict().items()}

In [6]:
# Rename df columns to reflect drug names
df.columns = list(df.columns[:5])+[code_drug.get(str(s), str(s)) for s in df.columns[5:]]

# Determine the Specialty and Favourite
specialty = {1: 'Pulmo', 3: 'GP/IM'}
df = df.assign(Specialty=df.Specialty.astype('int').map(specialty),
          Favourite=df.Favourite.astype('str').map(code_drug))

df.head()

Unnamed: 0,RID,Specialty,Favourite,Unaided,Aided,Onbrez,Seretide,Spiriva,Symbicort,Ultibro,Combivent,Relvar,Ventolin
0,701,Pulmo,Spiriva,591031116,12101102,8,10,10,9,9.0,8.0,,8
1,702,Pulmo,Seretide,9103,71116,7,10,7,10,7.0,7.0,,8
2,703,Pulmo,Seretide,793,12101102101116,7,8,9,8,7.0,6.0,,6
3,704,GP/IM,Spiriva,9,5316,6,10,9,9,,,,10
4,705,Pulmo,Symbicort,7511,10110210316,7,8,8,9,6.0,7.0,,8


# Perspectives
## Drug awareness
* [x] How many `"inhaled maintenance therapies"` drugs are there in the marketplace?
    - [x] How many of them have been addressed by physicians in this sample?
* How are the Favourite drugs rated?
    - By all physicians
    - By physician types
* How are the drug-types (mono, combi) for Favourite rated?
    - By all physicians
    - By physician types

## Drug satisfaction
* How does the distribution of Favourites look like?
    - By physician type
    - By drug type
* Which drug is the clear winner?
* Which drug-type (mono or combi) more satisfying?

# Analysis

## Drug market

In [7]:
# The marketplace
total_drugs = df_master.Code.unique()
print(f'There are {len(total_drugs)} drugs in the marketplace.\n')

There are 58 drugs in the marketplace.



In [8]:
# Addressed drugs
unaided = [j for i in df.Unaided.apply(lambda x: [code_drug.get(y, y) for y in x.split(',')]).to_list() for j in i]
aided = [j for i in df.Aided.apply(lambda x: [code_drug.get(y, y) for y in x.split(',')]).to_list() for j in i]

addressed_drugs = set(df.Favourite.to_list()+unaided+aided)

print(f'Out of {len(total_drugs)} drugs, {len(addressed_drugs)} have been addressed by physicians in this paper')

Out of 58 drugs, 13 have been addressed by physicians in this paper


In [85]:
df_master.head()

Unnamed: 0,Code,Type,SubType,Drug,Label,Added_on,Remarks
0,201,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + ICS/ LABA FDC,Precodes,
1,202,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Onbrez (Indacaterol),Precodes,
2,203,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Striverdi (Olodaterol),Precodes,
3,204,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + ICS,Wave 3,
4,205,LAMA,Combi,Spiriva,Spiriva (Tiotropium) + Ventolin (Albuterol),Wave 3,


In [86]:
s_type = df_master.Type.value_counts() # Type series
s_sub_type = df_master.SubType.value_counts() # SubType series
s_drugs = df_master.Drug.value_counts() # Drug series

### Plots in plotly