# Fix the Model ID Formatting in the Models Sheet
## Date: 2022-02-18
## Author: Jeffrey Grover
**Purpose:** Fix the model ID for BCM and HCI to match the correct formatting.

### Load libraries

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.6     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.1     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



### Load the metadata

In [2]:
pdtc_models <- read_csv('2022-02-09_pdxnet_portal_pdtc_models.csv')

head(pdtc_models)

New names:
* `` -> ...1

[1mRows: [22m[34m334[39m [1mColumns: [22m[34m21[39m

[36m──[39m [1mColumn specification[22m [36m──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (18): PDXSource, Contributor, ContributorPDX.ID, PDMR.Patient.ID, Gende...
[32mdbl[39m   (2): ...1, CTEP.SDCCode
[34mdttm[39m  (1): Submission


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



...1,PDXSource,Contributor,ContributorPDX.ID,PDMR.Patient.ID,Gender,CTEP.SDCCode,CTEP.SDCDescription,DiagnosisSubtype,Disease.BodyLocation,⋯,Date.ofDiagnosis,Has.KnownMetastaticDisease,Grade.StageInformation,PatientNotes,Molecular.andIHC.Data,Has.Smoked100.Cigarettes,Race,Ethnicity,AdditionalMedicalHistory,Submission
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dttm>
1,PDXNet Consortium Members,MDACC,B8174,K42829,Female,10009951,Adenocarcinoma - colon,adenocarcinoma of sigmoid colon,Digestive/Gastrointestinal,⋯,42790,Yes,Stage,"Tumor Grade/Stage: Stage IV Location of known metastases: Pelvis, omentum, Chest wall (left), Lymph node (left axillary)",APC c.4037C>G p.S1346*; FBXW7 c.832C>T p.R278*; KRAS c.38G>A p.G13D; TP53 c.427G>A p.V143M,No,White,Not Hispanic or Latino,-,2021-09-02
2,PDXNet Consortium Members,MDACC,B8175,K30337,Female,10009951,Adenocarcinoma - colon,adenocarcinoma,Digestive/Gastrointestinal,⋯,42500,Yes,Stage,"Tumor Grade/Stage: Stage IV, poorly differentiated Location of known metastases: Peritoneum, Liver","KRAS/NRAS WT,BRAF mutated, MSI Stable",No,White,Hispanic or Latino,"Family h/o: 2nd degree relative, Uterine cancer",2018-12-19
3,PDXNet Consortium Members,MDACC,B8176,K45526,Female,10009951,Adenocarcinoma - colon,Lynch syndrome; mucinous and signet ring cell adenocarcinoma,Digestive/Gastrointestinal,⋯,42132,Not Reported,TNM (Pathological),"Tumor Grade/Stage: pT1bpN0pM0, poorly differentiated Location of known metastases: Liver, Abdomen","Germline heterozygous MSH2 c388_389del variant (deleterious) Lynch syndrome, MSI-high; KRAS-G12D; Loss of MSH2 and MSH6;",Yes,White,Hispanic or Latino,"Family History: 1st degree relative, GI cancer involving esophagus, liver, bile duct, stomach, pancreas, colon, rectum",2018-12-19
4,PDXNet Consortium Members,MDACC,B8182,K75566,Female,10009951,Adenocarcinoma - colon,poorly differentiated mucinous and signet ring cell adenocarcinoma,Digestive/Gastrointestinal,⋯,42644,Yes,"Stage, TNM",Tumor Grade/Stage: Stage III T4N2M0 Location of known metastases: Liver,-,No,White,Hispanic or Latino,-,2021-09-02
5,PDXNet Consortium Members,MDACC,B8183,K83548,Male,10009951,Adenocarcinoma - colon,adenocarcinoma,Digestive/Gastrointestinal,⋯,42184,Yes,Stage,"Tumor Grade/Stage: Stage IV, moderately differentiated Location of known metastases: Liver, Duodenum, Pelvis",MSI-Stable; BRAF mutant; KRAS wild type,Yes,Not Provided,Not Provided,-,2018-12-19
6,PDXNet Consortium Members,MDACC,B8207,K49395,Male,10009951,Adenocarcinoma - colon,Sigmoid,Digestive/Gastrointestinal,⋯,41537,Yes,"Stage, TNM","Tumor Grade/Stage: Stage IVB, TxNxM1b, Moderately differentiated Location of known metastases: Liver, Lung","Mutations present in: MDM4, NOTCH1, RB1, TP53, APC",Yes,White,Not Hispanic or Latino,No family history of cancer,2021-09-02


### Edit the HCI models to match the correct formatting
BCM's are all already in the correct format.

In [5]:
pdtc_models <- pdtc_models %>% mutate(ContributorPDX.ID = ifelse(Contributor == 'HCI', str_replace(ContributorPDX.ID, '^(?!.*HCI-)HCI', 'HCI-'), ContributorPDX.ID))

# Check the HCI ones for formatting
pdtc_models %>% filter(Contributor == 'HCI')

...1,PDXSource,Contributor,ContributorPDX.ID,PDMR.Patient.ID,Gender,CTEP.SDCCode,CTEP.SDCDescription,DiagnosisSubtype,Disease.BodyLocation,⋯,Date.ofDiagnosis,Has.KnownMetastaticDisease,Grade.StageInformation,PatientNotes,Molecular.andIHC.Data,Has.Smoked100.Cigarettes,Race,Ethnicity,AdditionalMedicalHistory,Submission
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dttm>
68,PDXNet Consortium Members,HCI,HCI-004,K92478,Female,10006190,Invasive breast carcinoma,TNBC,Breast,⋯,39995,Not Reported,None Provided,"Tumor Grade/Stage: Stage IIA, Grade III, Poorly differentiated","ER-, PR-, Her2-",No,White,Hispanic or Latino,-,2021-09-02
69,PDXNet Consortium Members,HCI,HCI-008,K88751,Female,10006190,Invasive breast carcinoma,-,Breast,⋯,39083,Yes,Stage,Tumor Grade/Stage: Stage IV Location of known metastases: Pleura,"ER-, PR-, Her2+",No,White,Not Hispanic or Latino,-,2021-09-02
70,PDXNet Consortium Members,HCI,HCI-014,K39284,Female,10006190,Invasive breast carcinoma,"TNBC, Infiltrating lobular adenocarcinoma",Breast,⋯,39904,Yes,"Grade, Stage","Tumor Grade/Stage: Stage IV, Grade II Location of known metastases: Bone, Pleura","ER-, PR-, HER2-; Her2 IHC (2+); Her2 not amplified (FISH)",No,White,Not Provided,-,2021-09-02
71,PDXNet Consortium Members,HCI,HCI-017,K18605,Female,10006190,Invasive breast carcinoma,ductal,Breast,⋯,41153,Yes,None Provided,"Location of known metastases: Brain, bone, lung","ER+, PR+, Her2-",No,White,Not Hispanic or Latino,-,2021-09-02
72,PDXNet Consortium Members,HCI,HCI-018,K80651,Female,10006190,Invasive breast carcinoma,Invasive lobular carcinoma with focal signet ring features,Breast,⋯,37012,Yes,Stage,"Tumor Grade/Stage: Stage IV Location of known metastases: Brain, Bone, Ovary","ER+, PR-, Her2-",No,Not Provided,Not Provided,"Disease Progression: Invasive breast carcinoma, Stage IIIA (2001); Invasive lobular carcinoma, Stage IV (2003)",2021-09-02
73,PDXNet Consortium Members,HCI,HCI-019,K18234,Female,10006190,Invasive breast carcinoma,"TNBC, IDC with apocrine features",Breast,⋯,41456,Not Reported,None Provided,Tumor Stage/Grade: poorly differentiated,"ER- (1%), PR- (1%), Her2-",Not Provided,Not Provided,Hispanic or Latino,-,2019-05-07
74,PDXNet Consortium Members,HCI,HCI-023,K65121,Female,10006190,Invasive breast carcinoma,TNBC,Breast,⋯,41640,Yes,None Provided,"Tumor Grade/Stage: Stage IIA, Grade II, Poorly differentiated Location of known metastases: Brain, Lung, Bone","ER-, PR-, Her2-",No,White,Not Hispanic or Latino,-,2021-09-02
75,PDXNet Consortium Members,HCI,HCI-028LV,K71592,Female,10006190,Invasive breast carcinoma,"TNBC, adenocarcinoma",Breast,⋯,40513,Yes,Stage,"Tumor Grade/Stage: Stage IV Location of known metastases: bone, pleura, brain","ER-, PR-, Her2-",Not Provided,White,Not Hispanic or Latino,"History of: ER+ (75%), PR-, Her2- in primary tumor and ER-, PR-, Her2+ in bone metastases; reported history of ER+, PR+, Her2- breast cancer micropapillary type with apocrine features",2021-09-02
76,PDXNet Consortium Members,HCI,HCI-031 (153-M),K61826,Female,10006190,Invasive breast carcinoma,"TNBC, lobular",Breast,⋯,39845,Yes,None Provided,"Location of known metastases: bones, liver, ovary, fallopian tubes, pleural effusion, brain","ER-, PR-, Her2-",No,White,Not Hispanic or Latino,"History of: ER+, PR+, Her2- lobular breast cancer",2021-09-02
77,PDXNet Consortium Members,HCI,HCI-031OV (305-X),K61826,Female,10006190,Invasive breast carcinoma,"TNBC, lobular",Breast,⋯,39845,Yes,None Provided,"Location of known metastases: bones, liver, ovary, fallopian tubes, pleural effusion, brain","ER-, PR-, Her2-",No,White,Not Hispanic or Latino,"History of: ER+, PR+, Her2- lobular breast cancer",2021-09-02


In [8]:
# Export
colnames(pdtc_models)[1] <- ''
write_csv(pdtc_models, '2022-02-18_pdxnet_portal_pdtc_models.csv')