<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">Genetics of Amyotrophic Lateral Sclerosis</h1>

Citation: Ghasemi M, Brown RH Jr. Genetics of Amyotrophic Lateral Sclerosis. Cold Spring Harb Perspect Med. 2018;8(5):a024125. Published 2018 May 1. doi:10.1101/cshperspect.a024125

"Investigations of the 10% of ALS cases that are transmitted as dominant traits have revealed numerous gene mutations and variants that either cause these disorders or influence their clinical phenotype. The evolving understanding of the genetic architecture of ALS has illuminated broad themes in the molecular pathophysiology of both familial and sporadic ALS and FTD."

"These central themes encompass disturbances of protein homeostasis, alterations in the biology of RNA binding proteins, and defects in cytoskeletal dynamics, as well as numerous downstream pathophysiological events. Together, these findings from ALS genetics provide new insight into therapies that target genetically distinct subsets of ALS and FTD."

"At autopsy, degeneration of corticospinal (“upper”) and spinal and bulbar (“lower”) motor neurons is often associated with activation of neuroimmune cells within the central nervous system (CNS) (microglia, astrocytes, and oligodendroglia)"

**<span style="color:#DC143C;">Sporadic ALS (sALS) and familial ALS (fALS)</span>**

"Although most cases of ALS are sporadic (sALS), ∼10% are familial (fALS), with autosomal dominant transmission. Rarely, fALS may be transmitted as either X-linked or recessive traits. fALS and sALS share similar clinical presentations; either can develop concurrently with FTD."

"Efforts to identify Mendelian genes whose mutations cause ALS have identified more than 100 gene mutations that increase susceptibility to ALS or modify its phenotype. Multiple genetic variants may interact simultaneously to increase ALS susceptibility; such oligogenic cases of ALS may not appear to be familial in a conventional Mendelian manner yet may underlie the apparently sporadic form of the disease.Analysis of pathways implicated by mutant ALS genes has provided new insights into the pathogenesis of both fALS and sALS, although, regrettably, this has not yet yielded definitive treatments."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**<span style="color:#DC143C;">Columns Names:</span>**

aliqcnt = Number of aliquots

aliqvol = Aliquot volume

aliqvolu = Units, mL(1), uL(2)

alqvolot = Volume of last aliquot

dnacnt = Number of DNA Tubes Collected

dnacol = Collection number

dnadn = Status, Collected(1), Not Collected (2)

dnadnsp = Reason not collected

dnadt = Date DNA Collected

resource = RESOURCE ID

smplnygc = Was sample sent to NYGC. Yes(1), No(2)






In [None]:
df = pd.read_csv('../input/end-als/end-als/clinical-data/filtered-metadata/metadata/clinical/DNA_Sample_Collection.csv', encoding='ISO-8859-2')
pd.set_option('display.max_columns', None)
df.head()

# **<span style="color:#DC143C;">Phenotypic Spectrum of Amyotrophic Lateral Sclerosis Genetics</span>**

"The phenotypic spectrum of amyotrophic lateral sclerosis (ALS) genetics. (A) Forty-six ALS-related genes are arrayed along axes that depict two major phenotypic aspects: the extent to which corticospinal versus lower motor neurons are involved (y-axis) and the overlap with frontotemporal dementia (FTD) (x-axis). The diameters of each gene approximate their relative frequencies. (B) The phenotypic overlap of ALS genes with hereditary spastic paraplegia (HSP), frontotemporal dementia (FTD), mitochondrial disease, and lower motor neuropathies (LMN) is shown."

Citation: Ghasemi M, Brown RH Jr. Genetics of Amyotrophic Lateral Sclerosis. Cold Spring Harb Perspect Med. 2018;8(5):a024125. Published 2018 May 1. doi:10.1101/cshperspect.a024125

![](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/bin/cshperspectmed-PRD-024125_F1.jpg)https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
df.isnull().sum()

In [None]:
sns.countplot(x=df['aliqvolu'], hue=df['dnadn'])

In [None]:
sns.scatterplot(data=df, x='dnadt', y='dnacnt', hue='dnadn', alpha=0.5)
plt.title('Date DNA Collected vs. Number of DNA Tubes Collected')

# **<span style="color:#DC143C;">Adult-Onset ALS Genes Transmitted as Mendelian Traits</span>**

**<span style="color:#DC143C;">Genes Implicated in Disturbances of Protein Homeostasis</span>**

Superoxide Dismutase

"The first ALS gene, cytosolic copper-zinc superoxide dismutase (SOD1), was identified in 1993. In most families harboring SOD1 gene mutations, disease penetrance is >90% by age 70. More than 170 mutations have now been detected in the SOD1 gene in fALS; together these account for ∼20% of fALS cases. Almost all of these mutations are missense changes, scattered across the coding sequence without focal mutation hotspots. A few mutations truncate the terminal segment of the SOD1 protein; none of the ALS-related mutations in SOD1 are predicted to eliminate production of the protein. This strongly suggests that the mutant protein must be present to initiate motor neuron death."

"Protein homeostasis is disturbed in ALS. Many ALS genes highlight loss of protein quality control as a central feature of the disease. Misfolding of proteins may reflect many factors including aberrant folding and quality control at the ER (disturbed ER-associated degradation or ERAD), inherent instability due to mutations, aggregation during residence in stress granules, or imperfect clearance via proteasomes or autophagy. Self-assembly of misfolded proteins may lead to propagated instability and prion-like spreading of pathology."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
sns.boxplot(data=df, x='alqvolot', y='dnacnt')
plt.xticks(rotation=45)
plt.title('Volume of last aliquot vs. DNA Tubes Collected')

**<span style="color:#DC143C;">Proteins</span>**

Valosin-Containing Protein

"Most cases of MSP (multisystem proteinopathy) are caused by dominantly inherited mutations in the gene encoding valosin-containing protein (VCP)."

Ubiquilin-2

"UBQLN2-positive are detected in many cases of both sporadic and familial ALS and FTD." 

Optineurin, TANK-Binding Kinase

"In patients with fALS, multiple mutations have been identified in the gene encoding optineurin (OPTN). Although some are transmitted as recessive traits (for example, in the initial families from Japan), it is now evident that others are dominantly inherited. According to one estimate, OPTN may account for ∼1% of fALS cases."

CHCHD10 (C22orf2)

"Within the last three years, several families have been described in which ALS, FTD, ataxia, and ragged red fibers in muscle are co-inherited with missense mutations in the gene CHCHD10. CHCHD10 mutations may account for >2% of fALS cases. This nuclear gene is a member of a family of proteins with coiled-coil helix domains (CHCHD); it encodes a mitochondrial protein within the intermembrane space that is thought to be important in maintaining the architecture of cristae and oxidative phosphorylation. This represents the first gene that directly implicates mitochondrial dysfunction as a cause for ALS."

**<span style="color:#DC143C;">Genes Implicated in Altered Function and Homeostasis of RNA</span>**

Transactive Response DNA Binding Protein

"TDP-43 mutations can cause familial forms of both ALS and ALS-FTD. TDP-43, a heterogeneous nuclear ribonucleoprotein or hnRNP, is predominantly nuclear, but shuttles into the cytoplasm and axons during cellular stress. Mutations in the TDP-43 gene account for ∼5% of fALS cases. In both ALS and FTD, TDP-43 accumulates in the cytoplasm, where it is cleaved and hyperphosphorylated; C-terminal fragments accumulate in insoluble cytoplasmic aggregates. These aggregates are present in both sALS and fALS. Many forms of fALS, regardless of the offending primary gene mutation, show TDP-43 pathology; exceptions are cases associated with SOD1 and FUS gene mutations." 

Fused in Sarcoma (a.k.a. TLS, translocated in liposarcoma)

"FUS is an ∼75-kDa protein that, like TDP-43, is predominantly nuclear and is also found in ALS cases to be mislocalized to aggregates within the cytoplasm. Like TDP-43, FUS serves multiple functions related to the generation and trafficking of diverse types of RNA. Childhood ALS caused by FUS mutations is associated with basophilic inclusions in the spinal cord."

hnRNPA2B1 and hnRNPA1

"The mutations in these genes augment the propensity for the low complexity domains to interact with other proteins and thus for the mutant proteins to induce propagated, prion-like misfolding. Both proteins also interact with TDP-43."

C9orf72 ((chromosome 9 open reading frame 72)

"In 2011, three landmark studies identified the most common cause of ALS and ALS-FTD: an expansion of a hexanucleotide intronic repeat (GGGGCC) within the first intron of a poorly understood gene, termed C9orf72."

Angiogenin and Ribonuclease 4 (ANG, or ribonuclease 5)

"Because it has neuroprotective properties, it is of considerable interest that rare variants in ANG are overrepresented in both ALS (sALS and in some small families) and Parkinson's disease ."

Matrin-3

"In 2014 it was reported that missense mutations in the gene encoding matrin-3 cause motor neuron disease with prominent myopathic features (a scenario not unlike the multisystem proteinopathy caused by VCP gene mutations)."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/

In [None]:
sns.jointplot(data=df,x='aliqcnt', y='aliqvol', kind='kde', hue='smplnygc')
plt.title('Number of Aliquots vs. Aliquot volume')

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">Genotyping of Amyotrophic Lateral Sclerosis Samples</h1>

In [None]:
nRowsRead = 1000 # specify 'None' if want to read whole file
df1 = pd.read_csv('../input/cusersmarildownloadsbramcsv/bram.csv', delimiter=';', encoding = "ISO-8859-2", nrows = nRowsRead)
df1.dataframeName = 'bram.csv'
nRow, nCol = df1.shape
print(f'There are {nRow} rows and {nCol} columns')
df1.head()

In [None]:
df1.isnull().sum()

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">ALS Genetics: Gains, Losses, and Implications for Future Therapies</h1>

Authors: Garam Kim, Olivia Gautier, Eduardo Tassoni-Tsuchida, X. Rosa Ma, Aaron D.Gitler
Neuron - Volume 108, Issue 5, 9 December 2020, Pages 822-842 - https://doi.org/10.1016/j.neuron.2020.08.022

This review discusses the genetics of amyotrophic lateral sclerosis (ALS) with special emphasis on therapeutic implications. The field has seen significant progress in recent years, resulting in several antisense oligonucleotide (ASO)–based therapies in development. Most mutations cause disease through a gain-of-function (GOF) mechanism, although some less common variants also involve loss of function (LOF). ALS-causing mutations can also be located in non-coding portions of the genome. More common variants including SOD1, C9ORF72, and TDP-43, as well as less common forms, are discussed. The authors emphasize the need to consider both GOF and LOF mechanisms.
https://www.practiceupdate.com/content/implications-for-future-therapies-in-als/110822

https://www.sciencedirect.com/science/article/abs/pii/S089662732030653X

In [None]:
df1["Genotype"].value_counts().plot.bar(title='Coriell Diagnostic Description Genotype', color='red')

In [None]:
px.pie(df1, 'Genotype', title= 'ALS Genotype')

<h1 style="background-color:#DC143C; font-family:'Brush Script MT',cursive;color:white;font-size:200%; text-align:center;border-radius: 50% 20% / 10% 40%">C9orf72 and the Care of the Patient With ALS or FTD</h1>

Citation: C9orf72 and the Care of the Patient With ALS or FTD - Progress and Recommendations After 10 Years
Jennifer Roggenbuck
Neurol Genet Feb 2021, 7 (1) e542; DOI: 10.1212/NXG.0000000000000542

"Most patients with the C9orf72 HRE (hexanucleotide repeat expansion) present clinically with ALS or behavioral variant FTD, or a combination of both. The spectrum of phenotypes includes rapidly and slowly progressive disease, and clinical presentations may differ from classic ALS or FTD. In families, the C9orf72 HRE is transmitted in an autosomal dominant manner, with age-dependent and incomplete penetrance. Age at onset and presenting symptomotology is variable, even among members of the same family. Notably, some individuals with a HRE in the clearly pathogenic range remain symptom free, even in old age. The C9orf72 HRE is by far the most common known genetic cause of ALS and FTD, far exceeding the prevalence of pathogenic variants in any other gene, although frequencies vary greatly by geoancestry. The highest HRE frequencies are observed in Europe, reported in 40% of familial and 6% of sporadic cases of ALS and in 18% of familial and 6% of sporadic cases of FTD in European patient cohorts. The HRE is rarely found in Asia."

**<span style="color:#DC143C;">Unanswered Questions Regarding Disease Expression</span>**

"When patients with C9orf72 HRE (hexanucleotide repeat expansion) present with symptoms of motor neuron disease, the phenotype is often indistinguishable from classic ALS. Although the most common cognitive phenotype is behavioral variant FTD, a range of cognitive-behavioral presentations are observed, including psychotic features such as delusions and hallucinations. Patients presenting with neuropsychiatric changes may not be recognized and diagnosed with C9orf72-related disease. Repeat number has not been robustly shown to correlate with age at onset, disease duration, or clinical presentation (ALS and/or FTD), although this remains an active area of investigation."

https://ng.neurology.org/content/7/1/e542

In [None]:
df1["Homozygous_Heterozygous"].value_counts().plot.bar(title='ALS Zygosity', color='b')

In [None]:
px.pie(df1, 'Homozygous_Heterozygous', title= 'ALS Zigosity')

# **<span style="color:#DC143C;">Practice Recommendations</span>**

All Persons With ALS and/or FTD of European Geoancestry Are Candidates for HRE Testing.

Amplicon Length Analysis With Bidirectional RP-PCR Is Suitable for Large Scale HRE Testing.

Genetic Counseling Should Be Provided to All Affected Persons.

Provide Genetic Risk Assessment Informed by Family History.

Empower Affected Persons to Make an Informed Decision About HRE Testing.

Prepare Unaffected Family Members Who Seek Presymptomatic Testing.

**<span style="color:#DC143C;">Prioritize C9orf72 as a Target for Clinical Trials</span>**

Engage Diverse Patients in Research

Develop Evidence-based Guidelines for ALS and FTD Genetic Testing and Counseling

https://ng.neurology.org/content/7/1/e542

In [None]:
df1["Minor_Alleles_Present"].value_counts().plot.bar(title='ALS Minor Alleles', color='b')

In [None]:
df1["Allele_one_RP_count"].value_counts().plot.bar(title='ALS Allele 1', color='r')

In [None]:
px.pie(df1, 'Allele_one_RP_count', title= 'ALS Allele 1')

In [None]:
df1["Allele_two_RP_count"].value_counts().plot.bar(title='ALS Allele 2', color='b')

In [None]:
px.pie(df1, 'Allele_two_RP_count', title= 'ALS Allele 2')

In [None]:
df1["3_Indels_Present\n0 = none\n1 = deletion\n2 = insertion/interruption\n3 = both"].value_counts().plot.bar(title='ALS Indels', color='red')

#Indels Present: 0 = none, 1 = deletion, 2 = insertion/interruption, 3 = both.

In [None]:
px.pie(df1, '3_Indels_Present\n0 = none\n1 = deletion\n2 = insertion/interruption\n3 = both', title= 'ALS 3 Indels')

In [None]:
# Lets first handle numerical features with nan value
numerical_nan = [feature for feature in df1.columns if df1[feature].isna().sum()>1 and df1[feature].dtypes!='O']
numerical_nan

In [None]:
## Replacing the numerical Missing Values

for feature in numerical_nan:
    ## We will replace by using median since there are outliers
    median_value=df1[feature].median()
    
    df1[feature].fillna(median_value,inplace=True)
    
df1[numerical_nan].isnull().sum()

In [None]:
# categorical features with missing values
categorical_nan = [feature for feature in df1.columns if df1[feature].isna().sum()>0 and df1[feature].dtypes=='O']
print(categorical_nan)

In [None]:
# replacing missing values in categorical features
for feature in categorical_nan:
    df1[feature] = df1[feature].fillna('None')
    
df1[categorical_nan].isna().sum()    

In [None]:
px.histogram(df1, x='3_Indels_Present\n0 = none\n1 = deletion\n2 = insertion/interruption\n3 = both', range_x=[-5, 50], color='Coriell_Diagnostic_Description', title='ALS Indels')

In [None]:
px.bar(df1, x = 'Genotype', y = 'Minor_Alleles_Present', color = 'Minor_Alleles_Present',orientation='h' , title='ALS Genotype',  height = 500 )

In [None]:
#word cloud
from wordcloud import WordCloud, ImageColorGenerator
text = " ".join(str(each) for each in df1.Coriell_Diagnostic_Description)
# Create and generate a word cloud image:
wordcloud = WordCloud(max_words=200,colormap='Set3', background_color="Red").generate(text)
plt.figure(figsize=(10,6))
plt.figure(figsize=(15,10))
# Display the generated image:
plt.imshow(wordcloud, interpolation='Bilinear')
plt.axis("off")
plt.figure(1,figsize=(12, 12))
plt.show()

# **<span style="color:#DC143C;">Conclusion</span>**

"The rate of discovery of genetic variables that play a role in ALS pathogenesis will only increase. Full exome sequencing is now the standard in most gene searches. This permits several predictions. First, the remaining, rare causes of Mendelian ALS will likely all be defined. It will be feasible to study the role in ALS of epistasis the interactions between multiple genes that each heighten disease susceptibility but cannot initiate pathogenesis alone." 

"There are already several reports of oligogenic ALS cases, in which more than one ALS-related gene variant has been detected in the same individual. As a corollary, it seems likely that an increasing fraction of apparently sALS cases will prove to be a consequence of multiple gene variants that are not individually penetrant. It is already apparent that first-degree relatives of ALS patients have an increased incidence compared with unrelated ones. Finally, improved knowledge of genetic variants and their related pathways to neurodegeneration will improve our capacity to define interactions between genetic topology and the environment (e.g., environmental toxins or elements within the microbiome).

"Perhaps most important, as the understanding of ALS genetics is refined, opportunities for developing meaningful therapies will improve. In part, this may be a direct consequence of improved methods to attenuate expression and toxicity of mutant ALS genes."

"Indirectly, one anticipates that it will be increasingly possible to stratify and personalize therapeutic approaches and trials based on molecular distinctions between subsets of ALS cases."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932579/