Bases: xml.etree.ElementTree.Element
This is a collection of methods that helps handle better the ET.Element instnaces. They are monkeypatched to the class object itself.
Gets only the first subtag.
Gets only the first subtag.
ns_strip(ns='{http://uniprot.org/uniprot}')
Bases: object
This is just a container.
Given the list of genes in in seqdex.json. do a blast against pdbaa from NCBI ftp. :return:
This is a not implemented function. It is a fix for ProtParam.ProteinAnalysis().protein_scale and the DIWV scale. As the latter requires knowldge of the preceeding amino acid it will fail.
p = ProtParam.ProteinAnalysis(sequence) p.protein_scale(ProtParamData.DIWV, window=9, edge=.4) hashtag epicfail. So this is the repalacement. :param sequence: sequence to score :type sequence: str :return: DIWV score. :rtype: list[int]
Entry point:
>>> gnomAD().split().write()
Bases: object
Instantiation starts the settings. but the settings can be changed. split splits the file into the self.data dict containing gene acc id as key and list of gnomADVariant. But the bound method write writes and the gnomADVariant as regular dictionary.
class protein.generate.split_gnomAD.gnomADVariant(symbol, identifier, from_residue, residue_index, to_residue, impact, count, homozygous)
Bases: object
This is the same as the namedtuple but with more stuff. It does not get written. to_dict does.
Initialize self. See help(type(self)) for accurate signature.
The old code was slow:
def get_PTM(self):
assert self.uniprot, ‘Uniprot Acc. required. Kind of.’ modified_residues = [] for f in os.listdir(self.settings.reference_folder):
> if ‘_site_dataset’ in f and ‘.gz’ not in f: # it’s a Phosphosite plus entry.
> with open(os.path.join(self.settings.reference_folder, f)) as fh:
> next(fh) # date > next(fh) # licence > next(fh) # blankline > for row in csv.DictReader(fh, delimiter=’ ‘):
> > if row[‘ACC_ID’] == self.uniprot: ## this will not pick up mice!
> > modified_residues.append(row[“MOD_RSD”])
self.features[‘PSP_modified_residues’] = modified_residues ## list of str (e.g. ‘K30-m2’)
Unfortunately, you have to agree to the CC-by licence at https://www.phosphosite.org/staticDownloads at phosphosite. Then you have to manually download all the files with _site_dataset.gz. Afterwards run the settings method retrieve_references.
Bases: object
Initialize self. See help(type(self)) for accurate signature.
This file parses the uniprot FTP file and can do various things. such as making a small one that is only human. But mainly the UniprotMasterReader.convert(‘uniprot_sprot.xml’) method whcih generates the JSON files required. In future these will be databases… Be warned that ET.Element is a monkeypatched version.
class protein.generate.uniprot_master_parser.UniprotMasterReader(uniprot_master_file=None, first_n_protein=0, chosen_attribute='uniprot')
Bases: object
see generator iter_human NB. The ET.Element has been expanded. See help(ElementalExpansion)
THIS IS FOR MICHELANGLO :param uniprot_master_file: :param first_n_protein: set to zero for all, to interger to get the first n. :return:
DO NOT USE!!! :param uniprot_master_file: :param first_n_protein: set to zero for all, to interger to get the first n. :return:
dataset = Swiss-Prot is better than TrEMBL Interates across a LARGE Uniprot XML file and returns entries regardless of humanity. :return: ET.Element()
Interates across a LARGE Uniprot XML file and returns only the humans. :return: ET.Element()
Make a smaller XML file, but with only the human proteome. :param outfile: :return: