diff --git a/README.md b/README.md index e8de5a3..2ce0c4a 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,14 @@ ARD reduction for HLA with Python +`py-ard` works with Python 3.8 and higher. + +## Install from PyPi + +```shell +pip install py-ard +``` + ## Install from source ```shell @@ -11,13 +19,6 @@ source venv/bin/activate python setup.py install ``` - -## Install from PyPi - -```shell -pip install py-ard -``` - ## Testing To run behavior-driven development (BDD) tests locally via the behave framework, you'll need to set up a virtual @@ -30,10 +31,15 @@ pip install -r test-requirements.txt # Running Behave and all BDD tests behave + +# Run unit-tests +python -m unittest tests.test_pyard ``` ## Using `py-ard` from Python code +`py-ard` can be used in a program to reduce/expand HLA GL String representation. If pyard discovers an invalid Allele, it'll throw an Invalid Exception, not silently return an empty result. + ### Initialize `py-ard` Import `pyard` package. @@ -42,8 +48,7 @@ Import `pyard` package. import pyard ``` -The cache size of pre-computed reductions can be changed from the default of 1000 - +The cache size of pre-computed reductions can be changed from the default of 1000 (_not working_: will be fixed in a later release.) ```python pyard.max_cache_size = 1_000_000 ``` @@ -74,7 +79,7 @@ ard = pyard.ARD() ### Reduce Typings -Reduce a single locus HLA Typing +Reduce a single locus HLA Typing. ```python allele = "A*01:01:01" @@ -107,13 +112,13 @@ ard.redux_gl('B14', 'lg') ## Valid Reduction Types -|Reduction Type | Description | -|-------------- |-------------| -| `G` | Reduce to G Group Level | -| `lg` | Reduce to 2 field ARD level (append `g`) | -| `lgx` | Reduce to 2 field ARD level | -| `W` | Reduce/Expand to 3 field WHO nomenclature level| -| `exon` | Reduce/Expand to exon level | +| Reduction Type | Description | +|----------------|-------------------------------------------------| +| `G` | Reduce to G Group Level | +| `lg` | Reduce to 2 field ARD level (append `g`) | +| `lgx` | Reduce to 2 field ARD level | +| `W` | Reduce/Expand to 3 field WHO nomenclature level | +| `exon` | Reduce/Expand to exon level | # Command Line Tools @@ -160,6 +165,12 @@ $ pyard-import --v2-to-v3-mapping map2to3.csv $ pyard-import --db-version 3450 --refresh-mac ``` +### Show the status of all `py-ard` databases + +```shell +$ pyard-status +``` + ### Reduce a GL String from command line ```shell @@ -172,10 +183,6 @@ DRB1*08:01:01G/DRB1*08:02:01G/DRB1*08:03:02G/DRB1*08:04:01G/DRB1*08:05/ ... $ pyard -v 3290 --gl 'A1' -r lgx # For a particular version of DB A*01:01/A*01:02/A*01:03/A*01:06/A*01:07/A*01:08/A*01:09/A*01:10/A*01:12/ ... ``` +### Batch Reduce a CSV file -### Show the status of all `py-ard` databases - -```shell -$ pyard-status -``` - +`pyard-csv-reduce` can be used to batch process a CSV file with HLA typings. See [documentation](extras/README.md) for instructions on how to configure and run. diff --git a/extras/README.md b/extras/README.md index 3092e3f..25a2dfc 100644 --- a/extras/README.md +++ b/extras/README.md @@ -4,111 +4,133 @@ **Example Scripts to batch reduce HLA typings from a CSV File** -`pyard-reduce-csv` command can be used with a config file(that describes ways -to reduce the file) can be used to take a CSV file with HLA typing data and -reduce certain columns and produce a new CSV or an Excel file. - -Install `py-ard` and use `pyard-reduce-csv` command specifying the changes in a JSON -config file and running `pyard-reduce-csv -c ` will produce result based -on the configuration in the config file. +`pyard-reduce-csv` command can be used with a config file(that describes ways to reduce the file) can be used to take a +CSV file with HLA typing data and reduce certain columns and produce a new CSV or an Excel file. +Install `py-ard` and use `pyard-reduce-csv` command specifying the changes in a JSON config file and +running `pyard-reduce-csv -c ` to produce a resulting file based on the configuration in the config file. See [Example JSON config file](reduce_conf.json). - ### Input CSV filename + `in_csv_filename` Directory path and file name of the Input CSV file ### Output CSV filename + `out_csv_filename` Directory path and file name of the Reduced Output CSV file ### CSV Columns to read + `columns_from_csv` The column names to read from CSV file ```json [ - "nmdp_id", - "r_a_typ1", - "r_a_typ2", - "r_b_typ1", - "r_b_typ2", - "r_c_typ1", - "r_c_typ2", - "r_drb1_typ1", - "r_drb1_typ2", - "r_dpb1_typ1", - "r_dpb1_typ2" - ] + "nmdp_id", + "r_a_typ1", + "r_a_typ2", + "r_b_typ1", + "r_b_typ2", + "r_c_typ1", + "r_c_typ2", + "r_drb1_typ1", + "r_drb1_typ2", + "r_dpb1_typ1", + "r_dpb1_typ2" +] ``` ### CSV Columns to reduce + `columns_to_reduce_in_csv` List of columns which have typing information and need to be reduced. -**NOTE**: The locus is the 2nd term in the column name -E.g., for column `column R_DRB1_type1`, `DPB1` is the locus name +**Important**: The locus is the 2nd term in the column name separated by `_`. The program uses this to figure out the +column name for the typings in that column. + +E.g., for column `R_DRB1_type1`, `DPB1` is the locus name ```json [ - "r_a_typ1", - "r_a_typ2", - "r_b_typ1", - "r_b_typ2", - "r_c_typ1", - "r_c_typ2", - "r_drb1_typ1", - "r_drb1_typ2", - "r_dpb1_typ1", - "r_dpb1_typ2" - ], + "r_a_typ1", + "r_a_typ2", + "r_b_typ1", + "r_b_typ2", + "r_c_typ1", + "r_c_typ2", + "r_drb1_typ1", + "r_drb1_typ2", + "r_dpb1_typ1", + "r_dpb1_typ2" +] ``` - ### Redux Options -`redux_type` Reduction Type -Valid Options: `G`, `lg` and `lgx` +`redux_type` Reduction Type -### Compression Options -`apply_compression` Compression to use for output file +Valid Options are: -Valid options: `'gzip'`, `'zip'` or `null` +| Reduction Type | Description | +|----------------|-------------------------------------------------| +| `G` | Reduce to G Group Level | +| `lg` | Reduce to 2 field ARD level (append `g`) | +| `lgx` | Reduce to 2 field ARD level | +| `W` | Reduce/Expand to 3 field WHO nomenclature level | +| `exon` | Reduce/Expand to exon level | -### Verbose log Options -`log_comment` Show verbose log ? -Valid options: `true` or `false` +### Kinds of typings to reduce -### Types of typings to reduce ```json - "verbose_log": true, - "reduce_serology": false, - "reduce_v2": true, - "reduce_3field": true, - "reduce_P": true, - "reduce_XX": false, - "reduce_MAC": true, +"reduce_serology": false, +"reduce_v2": true, +"convert_v2_to_v3": false, +"reduce_3field": true, +"reduce_P": true, +"reduce_XX": false, +"reduce_MAC": true, ``` Valid options: `true` or `false` - ### Locus Name in Allele -`locus_in_allele_name` -Is locus name present in allele ? E.g. A*01:01 vs 01:01 + +`locus_in_allele_name` +Is locus name present in allele ? E.g. `A*01:01` vs `01:01` Valid options: `true` or `false` ### Output Format + `output_file_format` Format of the output file -Valid options: `csv` or `xlsx` +Valid options: `csv` or `xlsx` + +For Excel output, `openpyxl` library needs to be installed. Install with: +```shell + pip install openpyxl +``` + -### Create New Column -`new_column_for_redux` Add a separate column for processed column or replace -the current column. Creates a `reduced_` version of the column. +### Create New Column + +`new_column_for_redux` Add a separate column for processed column or replace the current column. Creates a `reduced_` version of the column. Otherwise, the same column is replaced with the reduced version. Valid options: `true`, `false` ### Map to DRBX -`map_drb345_to_drbx` Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings. + +`map_drb345_to_drbx` Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings using [WMDA method](https://www.nature.com/articles/1705672). Valid options: `true` or `false` + +### Compression Options + +`apply_compression` Compression to use for output file. Applies only to CSV files. + +Valid options: `'gzip'`, `'zip'` or `null` + +### Verbose log Options + +`verbose_log` Show verbose log ? + +Valid options: `true` or `false` \ No newline at end of file diff --git a/extras/reduce_conf.json b/extras/reduce_conf.json index 901f85a..42ffc28 100644 --- a/extras/reduce_conf.json +++ b/extras/reduce_conf.json @@ -27,7 +27,6 @@ "r_dpb1_typ2" ], "redux_type": "lgx", - "apply_compression": "gzip", "reduce_serology": false, "reduce_v2": true, "convert_v2_to_v3": false, @@ -40,5 +39,6 @@ "output_file_format": "csv", "new_column_for_redux": false, "map_drb345_to_drbx": false, + "apply_compression": "gzip", "verbose_log": true } \ No newline at end of file diff --git a/pyard/data_repository.py b/pyard/data_repository.py index 0a8aa99..46bfc69 100644 --- a/pyard/data_repository.py +++ b/pyard/data_repository.py @@ -387,15 +387,28 @@ def to_serological_name(locus_name: str): def generate_serology_mapping(db_connection: sqlite3.Connection, imgt_version): if not db.table_exists(db_connection, 'serology_mapping'): - # Load WMDA serology mapping data + """ + Read `rel_dna_ser.txt` file that contains alleles and their serological equivalents. + + The fields of the Alleles->Serological mapping file are: + Locus - HLA Locus + Allele - HLA Allele Name + USA - Unambiguous Serological Antigen associated with allele + PSA - Possible Serological Antigen associated with allele + ASA - Assumed Serological Antigen associated with allele + EAE - Expert Assigned Exceptions in search determinants of some registries + + EAE is ignored when generating the serology map. + """ rel_dna_ser_url = f'{IMGT_HLA_URL}{imgt_version}/wmda/rel_dna_ser.txt' + # Load WMDA serology mapping data from URL df_sero = pd.read_csv(rel_dna_ser_url, sep=';', skiprows=6, - names=['Locus', 'Allele', 'USA', 'PSA', 'ASA'], + names=['Locus', 'Allele', 'USA', 'PSA', 'ASA', 'EAE'], index_col=False) # Remove 0 and ? from USA df_sero = df_sero[(df_sero['USA'] != '0') & (df_sero['USA'] != '?')] - df_sero['Allele'] = df_sero['Locus'] + df_sero['Allele'] + df_sero['Allele'] = df_sero.loc[:, 'Locus'] + df_sero.loc[:, 'Allele'] usa = df_sero[['Locus', 'Allele', 'USA']].dropna() usa['Sero'] = usa['Locus'] + usa['USA'] diff --git a/pyard/pyard.py b/pyard/pyard.py index 6c97eab..932fcb2 100644 --- a/pyard/pyard.py +++ b/pyard/pyard.py @@ -24,7 +24,7 @@ import functools import sys import re -from typing import Iterable, Literal +from typing import Iterable, Literal, List from . import db from . import data_repository as dr @@ -217,6 +217,32 @@ def redux(self, allele: str, redux_type: VALID_REDUCTION_TYPES) -> str: else: raise InvalidAlleleError(f"{allele} is an invalid allele.") + def sorted_unique_gl(self, gls: List[str], delim: str) -> str: + """ + Make a list of sorted unique GL Strings separated by delim. + As the list may itself contains elements that are separated by the + delimiter, split the elements first and then make them unique. + + :param gl: List of gl strings that need to be joined by delim + :param delim: Delimiter of concern + :return: a GL string sorted and made of unique GL + """ + if delim == '~': + # No need to sort + return delim.join(gls) + + if delim == "+": + # No need to make unique. eg. homozygous cases are valid for SLUGs + return delim.join(sorted(gls, key=functools.cmp_to_key(smart_sort_comparator))) + + # generate a unique list over a delimiter + # e.g. [A, A/B] => [ A, B ] for / delimiter + all_gls = [] + for gl in gls: + all_gls += gl.split(delim) + unique_gls = set(all_gls) + return delim.join(sorted(unique_gls, key=functools.cmp_to_key(smart_sort_comparator))) + @functools.lru_cache(maxsize=max_cache_size) def redux_gl(self, glstring: str, redux_type: VALID_REDUCTION_TYPES) -> str: """ @@ -236,23 +262,19 @@ def redux_gl(self, glstring: str, redux_type: VALID_REDUCTION_TYPES) -> str: raise InvalidTypingError(f"{glstring} is not a valid typing.") if re.search(r"\^", glstring): - return "^".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("^")]), - key=functools.cmp_to_key(smart_sort_comparator))) + return self.sorted_unique_gl([self.redux_gl(a, redux_type) for a in glstring.split("^")], "^") if re.search(r"\|", glstring): - return "|".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("|")]), - key=functools.cmp_to_key(smart_sort_comparator))) + return self.sorted_unique_gl([self.redux_gl(a, redux_type) for a in glstring.split("|")], "|") if re.search(r"\+", glstring): - return "+".join(sorted([self.redux_gl(a, redux_type) for a in glstring.split("+")], - key=functools.cmp_to_key(smart_sort_comparator))) + return self.sorted_unique_gl([self.redux_gl(a, redux_type) for a in glstring.split("+")], "+") if re.search("~", glstring): - return "~".join([self.redux_gl(a, redux_type) for a in glstring.split("~")]) + return self.sorted_unique_gl([self.redux_gl(a, redux_type) for a in glstring.split("~")], "~") if re.search("/", glstring): - return "/".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("/")]), - key=functools.cmp_to_key(smart_sort_comparator))) + return self.sorted_unique_gl([self.redux_gl(a, redux_type) for a in glstring.split("/")], "/") # Handle V2 to V3 mapping if self.is_v2(glstring): diff --git a/tests/test_pyard.py b/tests/test_pyard.py index 7e124a7..5d43299 100644 --- a/tests/test_pyard.py +++ b/tests/test_pyard.py @@ -92,7 +92,7 @@ def test_mac_G(self): def test_xx_code(self): expanded_string = """ - B*40:01:01G/B*40:01:01G/B*40:01:03G/B*40:02:01G/B*40:03:01G/B*40:04:01G/B*40:05:01G/B*40:06:01G/B*40:07/B*40:08/B*40:09/B*40:10:01G/B*40:11:01G/B*40:12/B*40:13/B*40:14/B*40:15/B*40:16:01G/B*40:18/B*40:19/B*40:20:01G/B*40:21/B*40:22N/B*40:23/B*40:24/B*40:25/B*40:26/B*40:27/B*40:28/B*40:29/B*40:30/B*40:31/B*40:32/B*40:33/B*40:34/B*40:35/B*40:36/B*40:37/B*40:38/B*40:39/B*40:40:01G/B*40:42/B*40:43/B*40:44/B*40:45/B*40:46/B*40:47/B*40:48/B*40:49/B*40:50:01G/B*40:51/B*40:52/B*40:53/B*40:54/B*40:57/B*40:58/B*40:59/B*40:60/B*40:61/B*40:62/B*40:63/B*40:64:01G/B*40:65/B*40:66/B*40:67/B*40:68/B*40:69/B*40:70/B*40:71/B*40:72/B*40:73/B*40:74/B*40:75/B*40:76/B*40:77/B*40:78/B*40:79/B*40:80/B*40:81/B*40:82/B*40:83/B*40:84/B*40:85/B*40:86/B*40:87/B*40:88/B*40:89/B*40:90/B*40:91/B*40:92/B*40:93/B*40:94/B*40:95/B*40:96/B*40:98/B*40:99/B*40:100/B*40:101/B*40:102/B*40:103/B*40:104/B*40:105/B*40:106/B*40:107/B*40:108/B*40:109/B*40:110/B*40:111/B*40:112/B*40:113/B*40:114:01G/B*40:115/B*40:116/B*40:117/B*40:118N/B*40:119/B*40:120/B*40:121/B*40:122/B*40:123/B*40:124/B*40:125/B*40:126/B*40:127/B*40:128/B*40:129/B*40:130/B*40:131/B*40:132/B*40:133Q/B*40:134/B*40:135/B*40:136/B*40:137/B*40:138/B*40:139/B*40:140/B*40:142N/B*40:143/B*40:145/B*40:146/B*40:147/B*40:148/B*40:149/B*40:152/B*40:153/B*40:154/B*40:155:01G/B*40:156/B*40:157/B*40:158/B*40:159/B*40:160/B*40:161/B*40:162/B*40:163/B*40:164/B*40:165/B*40:166/B*40:167/B*40:168/B*40:169/B*40:170/B*40:171/B*40:172/B*40:173/B*40:174/B*40:175/B*40:177/B*40:178/B*40:180/B*40:181/B*40:182/B*40:183/B*40:184/B*40:185/B*40:186/B*40:187/B*40:188/B*40:189/B*40:190/B*40:191/B*40:192/B*40:193/B*40:194/B*40:195/B*40:196/B*40:197/B*40:198/B*40:199/B*40:200/B*40:201/B*40:202/B*40:203/B*40:204/B*40:205/B*40:206/B*40:207/B*40:208/B*40:209/B*40:210/B*40:211/B*40:212/B*40:213:01G/B*40:214/B*40:215/B*40:216N/B*40:217/B*40:218/B*40:219/B*40:220/B*40:222/B*40:223/B*40:224/B*40:225/B*40:226/B*40:227/B*40:228/B*40:230/B*40:231/B*40:232/B*40:233/B*40:234/B*40:235/B*40:237/B*40:238/B*40:239/B*40:240/B*40:242/B*40:243/B*40:244/B*40:245/B*40:246/B*40:248/B*40:249/B*40:250/B*40:251/B*40:252/B*40:253/B*40:254/B*40:255/B*40:256N/B*40:257/B*40:258/B*40:259/B*40:260/B*40:261/B*40:262/B*40:263N/B*40:265N/B*40:266/B*40:268/B*40:269/B*40:270/B*40:271/B*40:273/B*40:274/B*40:275/B*40:276/B*40:277/B*40:279/B*40:280/B*40:281/B*40:282/B*40:283/B*40:284/B*40:285/B*40:286N/B*40:287/B*40:288/B*40:289/B*40:290/B*40:291N/B*40:292/B*40:293/B*40:294/B*40:295/B*40:296/B*40:297/B*40:298/B*40:300/B*40:302/B*40:304/B*40:305/B*40:306/B*40:307/B*40:308/B*40:309/B*40:310/B*40:311/B*40:312/B*40:313/B*40:314/B*40:315/B*40:316/B*40:317/B*40:318/B*40:319/B*40:320/B*40:321/B*40:322/B*40:323/B*40:324/B*40:325/B*40:326/B*40:327/B*40:328/B*40:330/B*40:331/B*40:332/B*40:333/B*40:334/B*40:335/B*40:336/B*40:337N/B*40:339/B*40:340/B*40:341/B*40:342/B*40:343/B*40:344/B*40:345N/B*40:346/B*40:347/B*40:348/B*40:349/B*40:350/B*40:351/B*40:352/B*40:354/B*40:355/B*40:357/B*40:358/B*40:359/B*40:360/B*40:361N/B*40:362/B*40:363/B*40:364/B*40:365/B*40:366/B*40:367/B*40:368/B*40:369/B*40:370/B*40:371/B*40:372N/B*40:373/B*40:374/B*40:375/B*40:376/B*40:377/B*40:378/B*40:380/B*40:381/B*40:382/B*40:385/B*40:388/B*40:389/B*40:390/B*40:391/B*40:392/B*40:393/B*40:394/B*40:396/B*40:397/B*40:398/B*40:399N/B*40:400/B*40:401/B*40:402/B*40:403/B*40:404/B*40:407/B*40:408/B*40:409/B*40:410/B*40:411/B*40:412/B*40:413/B*40:414/B*40:415/B*40:420/B*40:421Q/B*40:422/B*40:423/B*40:424/B*40:426N/B*40:428N/B*40:429/B*40:430/B*40:432/B*40:433/B*40:434/B*40:436/B*40:437/B*40:438N/B*40:441/B*40:445/B*40:447/B*40:448/B*40:449/B*40:451/B*40:452/B*40:454/B*40:457/B*40:458/B*40:459/B*40:460/B*40:461/B*40:462/B*40:463/B*40:465/B*40:466/B*40:467/B*40:468/B*40:469/B*40:470/B*40:471/B*40:472/B*40:477/B*40:478/B*40:479/B*40:481N/B*40:482 + B*40:01:01G/B*40:01:03G/B*40:02:01G/B*40:03:01G/B*40:04:01G/B*40:05:01G/B*40:06:01G/B*40:07/B*40:08/B*40:09/B*40:10:01G/B*40:11:01G/B*40:12/B*40:13/B*40:14/B*40:15/B*40:16:01G/B*40:18/B*40:19/B*40:20:01G/B*40:21/B*40:22N/B*40:23/B*40:24/B*40:25/B*40:26/B*40:27/B*40:28/B*40:29/B*40:30/B*40:31/B*40:32/B*40:33/B*40:34/B*40:35/B*40:36/B*40:37/B*40:38/B*40:39/B*40:40:01G/B*40:42/B*40:43/B*40:44/B*40:45/B*40:46/B*40:47/B*40:48/B*40:49/B*40:50:01G/B*40:51/B*40:52/B*40:53/B*40:54/B*40:57/B*40:58/B*40:59/B*40:60/B*40:61/B*40:62/B*40:63/B*40:64:01G/B*40:65/B*40:66/B*40:67/B*40:68/B*40:69/B*40:70/B*40:71/B*40:72/B*40:73/B*40:74/B*40:75/B*40:76/B*40:77/B*40:78/B*40:79/B*40:80/B*40:81/B*40:82/B*40:83/B*40:84/B*40:85/B*40:86/B*40:87/B*40:88/B*40:89/B*40:90/B*40:91/B*40:92/B*40:93/B*40:94/B*40:95/B*40:96/B*40:98/B*40:99/B*40:100/B*40:101/B*40:102/B*40:103/B*40:104/B*40:105/B*40:106/B*40:107/B*40:108/B*40:109/B*40:110/B*40:111/B*40:112/B*40:113/B*40:114:01G/B*40:115/B*40:116/B*40:117/B*40:118N/B*40:119/B*40:120/B*40:121/B*40:122/B*40:123/B*40:124/B*40:125/B*40:126/B*40:127/B*40:128/B*40:129/B*40:130/B*40:131/B*40:132/B*40:133Q/B*40:134/B*40:135/B*40:136/B*40:137/B*40:138/B*40:139/B*40:140/B*40:142N/B*40:143/B*40:145/B*40:146/B*40:147/B*40:148/B*40:149/B*40:152/B*40:153/B*40:154/B*40:155:01G/B*40:156/B*40:157/B*40:158/B*40:159/B*40:160/B*40:161/B*40:162/B*40:163/B*40:164/B*40:165/B*40:166/B*40:167/B*40:168/B*40:169/B*40:170/B*40:171/B*40:172/B*40:173/B*40:174/B*40:175/B*40:177/B*40:178/B*40:180/B*40:181/B*40:182/B*40:183/B*40:184/B*40:185/B*40:186/B*40:187/B*40:188/B*40:189/B*40:190/B*40:191/B*40:192/B*40:193/B*40:194/B*40:195/B*40:196/B*40:197/B*40:198/B*40:199/B*40:200/B*40:201/B*40:202/B*40:203/B*40:204/B*40:205/B*40:206/B*40:207/B*40:208/B*40:209/B*40:210/B*40:211/B*40:212/B*40:213:01G/B*40:214/B*40:215/B*40:216N/B*40:217/B*40:218/B*40:219/B*40:220/B*40:222/B*40:223/B*40:224/B*40:225/B*40:226/B*40:227/B*40:228/B*40:230/B*40:231/B*40:232/B*40:233/B*40:234/B*40:235/B*40:237/B*40:238/B*40:239/B*40:240/B*40:242/B*40:243/B*40:244/B*40:245/B*40:246/B*40:248/B*40:249/B*40:250/B*40:251/B*40:252/B*40:253/B*40:254/B*40:255/B*40:256N/B*40:257/B*40:258/B*40:259/B*40:260/B*40:261/B*40:262/B*40:263N/B*40:265N/B*40:266/B*40:268/B*40:269/B*40:270/B*40:271/B*40:273/B*40:274/B*40:275/B*40:276/B*40:277/B*40:279/B*40:280/B*40:281/B*40:282/B*40:283/B*40:284/B*40:285/B*40:286N/B*40:287/B*40:288/B*40:289/B*40:290/B*40:291N/B*40:292/B*40:293/B*40:294/B*40:295/B*40:296/B*40:297/B*40:298/B*40:300/B*40:302/B*40:304/B*40:305/B*40:306/B*40:307/B*40:308/B*40:309/B*40:310/B*40:311/B*40:312/B*40:313/B*40:314/B*40:315/B*40:316/B*40:317/B*40:318/B*40:319/B*40:320/B*40:321/B*40:322/B*40:323/B*40:324/B*40:325/B*40:326/B*40:327/B*40:328/B*40:330/B*40:331/B*40:332/B*40:333/B*40:334/B*40:335/B*40:336/B*40:337N/B*40:339/B*40:340/B*40:341/B*40:342/B*40:343/B*40:344/B*40:345N/B*40:346/B*40:347/B*40:348/B*40:349/B*40:350/B*40:351/B*40:352/B*40:354/B*40:355/B*40:357/B*40:358/B*40:359/B*40:360/B*40:361N/B*40:362/B*40:363/B*40:364/B*40:365/B*40:366/B*40:367/B*40:368/B*40:369/B*40:370/B*40:371/B*40:372N/B*40:373/B*40:374/B*40:375/B*40:376/B*40:377/B*40:378/B*40:380/B*40:381/B*40:382/B*40:385/B*40:388/B*40:389/B*40:390/B*40:391/B*40:392/B*40:393/B*40:394/B*40:396/B*40:397/B*40:398/B*40:399N/B*40:400/B*40:401/B*40:402/B*40:403/B*40:404/B*40:407/B*40:408/B*40:409/B*40:410/B*40:411/B*40:412/B*40:413/B*40:414/B*40:415/B*40:420/B*40:421Q/B*40:422/B*40:423/B*40:424/B*40:426N/B*40:428N/B*40:429/B*40:430/B*40:432/B*40:433/B*40:434/B*40:436/B*40:437/B*40:438N/B*40:441/B*40:445/B*40:447/B*40:448/B*40:449/B*40:451/B*40:452/B*40:454/B*40:457/B*40:458/B*40:459/B*40:460/B*40:461/B*40:462/B*40:463/B*40:465/B*40:466/B*40:467/B*40:468/B*40:469/B*40:470/B*40:471/B*40:472/B*40:477/B*40:478/B*40:479/B*40:481N/B*40:482 """.strip() gl = self.ard.redux_gl('B*40:XX', 'G') self.assertEqual(gl, expanded_string) @@ -139,4 +139,11 @@ def test_invalid_serology(self): self.assertEqual(serology_a10.split('/')[0], 'A*25:01') # And A100 isn't a valid typing with self.assertRaises(InvalidTypingError): - self.ard.redux_gl('A100', 'lgx') \ No newline at end of file + self.ard.redux_gl('A100', 'lgx') + + def test_allele_duplicated(self): + # Make sure the reduced alleles are unique + # https://github.com/nmdp-bioinformatics/py-ard/issues/135 + allele_code = "C*02:ACMGS" + allele_code_rx = self.ard.redux_gl(allele_code, 'lgx') + self.assertEqual(allele_code_rx, 'C*02:02/C*02:10')