<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">Genetics of Amyotrophic Lateral Sclerosis</h1>


Authors: Mehdi Ghasemi and Robert H. Brown, Jr.

Cold Spring Harb Perspect Med. 2018 May; 8(5): a024125.
doi: 10.1101/cshperspect.a024125

Protein homeostasis is disturbed in ALS. Many ALS genes highlight loss of protein quality control as a central feature of the disease. Misfolding of proteins may reflect many factors including aberrant folding and quality control at the ER (disturbed ER-associated degradation or ERAD), inherent instability due to mutations, aggregation during residence in stress granules, or imperfect clearance via proteasomes or autophagy. Self-assembly of misfolded proteins may lead to propagated instability and prion-like spreading of pathology.

![](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/bin/cshperspectmed-PRD-024125_F3.jpg)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('../input/end-als/end-als/clinical-data/filtered-metadata/metadata/clinical/ALS_Gene_Mutations.csv', encoding='ISO-8859-2')
pd.set_option('display.max_columns', None)
df.head()

In [None]:
df.shape

In [None]:
df.isnull().sum()

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">Gene mutations that cause ALS</h1>

GENES and FUNCTIONALITY

ANG (Angiogenin): RNA processing and tRNA modification; vascularization; RNAase activity and assembly of stress granules; neurite outgrowth and pathfinding

C9ORF72 (Chromosome 9 open reading frame 72):Transcription and pre-mRNA splicing regulation; membrane traffic via Rab GTPase family.

FUS(fused in sarcoma): Transcription and pre-mRNA splicing regulation; micRNA processing; mRNA transport and stabilization; maintenance of genomic integrity; regulating protein synthesis at synapse.

mutSOD1 ( mutant superoxide dismutase 1)

SOD1 (Major cytosolic antioxidant): NCI; NII; DN; GCI; aggregates—p62, C9ORF72, ubiquilin 2, others; impaired axonal transport, mitochondrial function; disturbed dendritic arborization of neurons; oxidative stress-related neuronal toxicity.

TDP-43, TAR DNA-binding protein: Transcription and pre-mRNA splicing regulation; micRNA biogenesis; RNA transport and stabilization; translational regulation of ApoE-II and CFTR.

VCP or p97: Protein degradation via UPS, autophagy, and the ER; membrane fusion.

VAPB: Regulation of ER–Golgi transport and secretion

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

#Proportion of ALS explained by the four most commonly mutated genes in Asian and European populations. Data adapted from Zou et al. (2017).

![](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/bin/fnins-13-01310-g001.jpg)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/

In [None]:
# Lets first handle numerical features with nan value
numerical_nan = [feature for feature in df.columns if df[feature].isna().sum()>1 and df[feature].dtypes!='O']
numerical_nan

In [None]:
## Replacing the numerical Missing Values

for feature in numerical_nan:
    ## We will replace by using median since there are outliers
    median_value=df[feature].median()
    
    df[feature].fillna(median_value,inplace=True)
    
df[numerical_nan].isnull().sum()

#Angiogenin and Ribonuclease 4

"Angiogenin (ANG, or ribonuclease 5) is a 123-amino-acid protein with multiple functions including stimulation of angiogenesis, hydrolysis of nucleic acids, and context-dependent activation of proteases." 

"In conditions of cellular stress, ANG induces expression of a class of tRNAs (tiRNAs, or transfer RNA-derived, stress-induced, small RNAs) that suppress protein synthesis, presumably thereby conserving cellular energy reserves. Because it has neuroprotective properties, it is of considerable interest that rare variants in ANG are overrepresented in both ALS (sALS and in some small families) and Parkinson's disease."

"Ribonuclease 4 is a paralog of ANG that also has angiogenic and neuroprotective properties; one study, as yet unconfirmed, has described a hypofunctional missense mutation in this enzyme that is more abundant in ALS than controls."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
# categorical features with missing values
categorical_nan = [feature for feature in df.columns if df[feature].isna().sum()>0 and df[feature].dtypes=='O']
print(categorical_nan)

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">ALS Genetics, Mechanisms, and Therapeutics: Where Are We Now?</h1>

Authors: Rita Mejzini, Loren L. Flynn, Ianthe L. Pitout, Sue Fletcher, Steve D. Wilton, and P. Anthony Akkari.

Front Neurosci. 2019; 13: 1310.
Published online 2019 Dec 6. doi: 10.3389/fnins.2019.01310

"Although more than 50 potentially causative or disease-modifying genes have been identified, pathogenic variants in SOD1, C9ORF72, FUS, and TARDBP occur most frequently with disease causing variants in other genes being relatively uncommon."

"An increase in rare variants, many of unknown significance, has been found in the untranslated regions of known disease-causing genes including SOD1, TARDBP, FUS, VCP, OPTN and UBQLN2, highlighting the potential importance of regulatory gene regions when determining disease pathogenesis and making genetic diagnoses."

"C9ORF72 and ATXN2 variants for example show incomplete penetrance, with symptoms not always manifesting in mutation carriers."

"A large proportion of the genetic risk for sALS remains elusive; this has meant much research to date has focused on understanding how variations and differences in expression of known ALS-linked genes lead to disease. SOD1, TARDBP, FUS, and C9ORF72 have been most extensively characterized."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/

In [None]:
# replacing missing values in categorical features
for feature in categorical_nan:
    df[feature] = df[feature].fillna('None')

In [None]:
df[categorical_nan].isna().sum()

#SOD1 Disease Mechanisms

"Variations in SOD1 have been associated with a decrease in enzyme activity of 50–80% , leading to early propositions that disease was conferred through a loss of dismutase activity. However, a later study showed that dismutase activity did not correlate with disease severity, indicating that a toxic gain of function mechanism might be at play."

"Additionally, non-native formations of wild-type SOD1 have been detected in small granular SOD1-immunoreactive inclusions in the motor neurons of sALS patients without pathogenic SOD1 variants and in patients carrying the C9ORF72 repeat expansion and pathogenic variants in other ALS-associated genes."

"This suggests that misfolding of wild-type SOD1 may be deleterious or be part of a common downstream event in ALS progression."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/

In [None]:
!pip install dabl
import dabl

#TDP-43

"Histological examinations of spinal cord samples had revealed that neuronal cytoplasmic ubiquitinated inclusions were present in the majority of ALS patients. In 2006, a shift in the understanding of ALS pathogenesis occurred with the discovery that the main component of the ubiquitinated protein aggregates found in sALS patients was TAR DNA-binding protein 43 (TDP-43). Further histological studies have since confirmed that TDP-43 is present in the cytoplasmic aggregates of the majority of ALS patients including sporadic cases without pathogenic variants in the TARDBP gene, and in those with C9ORF72 hexanucleotide repeat expansions . The aggregation of TDP-43 in ubiquitin-positive cytoplasmic neuronal inclusions in the brain and spinal cord is now considered a pathological hallmark of ALS."

"TDP-43 is a DNA/RNA binding protein composed of 414 amino acids, encoded by the TARDBP gene. Although usually concentrated in the nucleus, TDP-43 contains both a nuclear localization signal and a nuclear export signal and shuttles back and forth between the nucleus and cytoplasm . TDP-43 functions as a regulator of gene expression and is involved in several RNA processing steps with roles in pre-mRNA splicing, regulation of mRNA stability, mRNA transport, translation, and the regulation of non-coding RNAs."
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/

In [None]:
dabl.detect_types(df)

#C9orf72


In 2011, three landmark studies identified the most common cause of ALS and ALS-FTD: an expansion of a hexanucleotide intronic repeat (GGGGCC) within the first intron of a poorly understood gene, termed C9orf72 (chromosome 9 open reading frame 72).

Clinical diagnosis presently involves repeat primed polymerase chain reaction (PCR) coupled with Southern blotting to define the size of the expansion. Although the most common repeat size is a triplet of the hexanucleotides, controls may have up to 25 or 30 repeats.

By contrast, the disease alleles may have hundreds or even thousands of the hexanucleotide repeats. These C9orf72 intronic expansions underlie 45%–50% of fALS and 20%–25% of familial FTD cases in the United States. They also account for 5%–10% of sporadic ALS and FTLD. There is a tendency for bulbar ALS to be more common in the C9-expanded populations.

Little is known about the normal function of the C9orf72 gene.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
dabl.plot(df, target_col="c9orf72")

#FUS

"In 2009, pathogenic variants in the gene encoding another RNA-binding protein, fused in sarcoma (FUS), were identified in a subset of ALS patients. FUS variants are associated with early onset and juvenile ALS . FUS-ALS is characterized by pathological FUS aggregation, generally reported to occur only in patients with pathogenic variants in the FUS gene. TDP-43 aggregation is not commonly seen in FUS-ALS patients suggesting that the FUS disease pathway is independent of TDP-43."

"FUS encodes a ubiquitously expressed 526 amino acid protein belonging to the FET family of RNA binding proteins. FUS is predominantly localized to the nucleus under normal physiological conditions but crosses over to the cytoplasm, functioning in nucleocytoplasmic transport . FUS shares many physiological roles with TDP-43; playing a role in several aspects of gene expression including transcription, pre-mRNA splicing, RNA transport and translation regulation . Although they share many similarities, TDP-43 and FUS regulate different RNA targets and show different sequence binding specificity. Additionally, FUS is involved in DNA repair mechanisms including both homologous recombination during DNA double-strand break repair and in non-homologous end joining . FUS also plays a role in the formation of paraspeckles providing cellular defense against various types of stress."

"Over 50 autosomal dominant FUS variants have now been identified in ALS patients."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/

In [None]:
dabl.plot(df, target_col="mutotsp")

In [None]:
dabl.plot(df, target_col="sod1muta")

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">Molecular Approaches to Treat ALS</h1>

"RNA targeted therapeutics have entered a new phase of growth with the two main strategies investigated being short interfering RNA (siRNA) and antisense oligonucleotides (AOs). siRNA are double-stranded RNA molecules that can be used to downregulate the expression of target genes to which they are complementary through interactions with the RNA-induced silencing complex. Although there have been several preclinical investigations on the use of siRNA to target ALS genes, none have yet reached clinical trials. The review focuses on potential AO therapeutics for ALS. AOs are short, single-stranded nucleic acids that can bind to RNA through Watson-Crick base pairing and can alter gene expression through several different mechanisms. They can be used to restore or reduce protein expression or to modify protein isoform production through splice switching strategies."

"Therapeutic use of many RNA analog drugs has been slowed by inefficient and poorly targeted delivery. Unmodified single-stranded RNA is unable to cross the cell membrane efficiently unaided, due to its size and negative charge and is susceptible to rapid degradation by nucleases. A range of chemical modifications has helped to address some of these issues. Synthetic RNA-like drugs are commonly delivered to target cells using a nanoparticle delivery platform (usually a cationic polymer or lipid) or through conjugation to a bioactive ligand or cell penetrating peptide. Achieving effective concentrations of AOs in the organ or tissue of interest can be challenging, although treatment of neurodegenerative diseases with AOs may be advantageous in this regard as the AOs can be administered directly to the CNS via intrathecal administration. Once in the nervous system, AOs are readily taken up by neurons and glia. Furthermore, the blood-brain barrier prevents the dispersion into peripheral tissues and subsequent clearance by the kidney and liver, allowing clinically effective concentrations to be more easily reached. This means smaller doses can be used, minimizing any potential toxicity  or systemic off-target effects. Several AO drugs have received FDA approval in recent years to treat a variety of conditions, including Eteplirsen (Exondys 51®) for Duchenne muscular dystrophy  for spinal muscular atrophy and Inotersen (Tegsedi®) for hereditary transthyretin-mediated amyloidosis (hATTR). Exploration of AO therapeutics has also begun for the treatment of ALS."

"Pre-clinical studies of AOs targeted to C9ORF72 transcripts have also begun. In the case of C9ORF72, researchers have been able to exploit the location of the pathogenic hexanucleotide expansion that occurs in an intron between two alternatively used first exons. AOs utilizing RNase H mediated degradation that targets only HRE containing transcripts or all C9ORF72 transcripts have been developed with both methods reported to reduce RNA foci in patient-derived fibroblasts and iPSCs. HRE targeting AOs have also been able to reduce pathological C9ORF72 RNA in transgenic mice"

"With the scope of RNA therapeutics rapidly expanding and the genetic basis of ALS continuing to be uncovered, AOs may be a promising area for future therapeutic developments for subsets of ALS patients."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6909825/