# ChIP-Seq

**Author**: Maria Antonia Madrid Restrepo

**Student number**: r0913112

**Student email**: mariaantonia.madridrestrepo@student.kuleuven.be


## Introduction

I will be performing a ChIP-seq analysis to investigate the target genes for the transcription factor MYB.  According to the NIH, MYB (also known as c-MYB) is encoded by the proto-oncogene _myb_ (**Fig. 1**). "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene." The study from which I obtained my data is "Chromatin occupancy and target genes of the haematopoietic master transcription factor MYB" published in 2021 by Lemma et al. These authors obtained ChIP-Seq data from K-562 cancer cells with the goal of identifying the target genes and chromatin action of MYB. The data can all be found [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124541). 

![41389_2021_309_Fig1_HTML.png.jpeg](attachment:41389_2021_309_Fig1_HTML.png.jpeg)
**Figure 1**. MYB DNA-binding domain (modified from Cicirò & Sala 2021). 

## Background
c-MYB is a transcription factor that is expressed in the progenitor cells to blood cells and has important roles in regulating multiple processes in these cells (this is why it's commonly known as a master regulator). It's believed that overexpression of MYB in these types of cells (HPCs) is associated with leukemia and bad prognosis for the patient. In humans, the MYB protein has three HTH DNA-binding domains that allow it to function as a transcription factor (**Fig. 2**). It plays an essential role in the regulation of hematopoiesis (the formation of blood cells).

![AF-P10242-F1_model-1.png](attachment:AF-P10242-F1_model-1.png)
**Figure 2**. Three dimensional structure for MYB predicted by AlphaFold (taken from GeneCards, https://www.genecards.org/cgi-bin/carddisp.pl?gene=MYB)

Leukemia is a broad term for any type of cancer that affect blood cells. There are many types of leukemia, and the characterization of the diseasedepends on the type of blood cell that becomes cancerous. It's very common in children, and the disregulation of the MYB transcription factor has been shown to be associated with this disease (Li et al 2021). Moreover, mutations in the target genes of MYB can lead to the development of different types of this cancer, as has been shown before in the case of the oncogene TAL1, where a mutation that generates MYB-binding sites upstream were detected in children with acute lymphoid leukemia (ALL) (Mansour et al 2014). With all of this in mind, it's important to consider all the possible target genes of MYB and their possible effects on the development of cancer and other diseases, particularly and children and the elderly. 

Previous studies have reported target genes of MYB. One study published in 2021 studied MYB using siRNAs and ChIP-Seq, and reported that genes like MYADM, LMO2, GATA2, STAT5A, and IKZF1 are targets of MYB (Lorenzo et al 2011). Other autors have previously reported other target genes in cancer cells (e.g. CXCR4 by Luo et al 2008), there is still no definitive, consensus list of all target genes, and much more data and analyses are necessary to fully uncover the target genes and genomic regions of MYB. 


## Motivation
Since I had already worked with cancer cells for the previous assignment (glioblastoma cells for bulk RNA-seq), I wanted to change my biological system slightly. I wanted to keep using human cells, since there seems to be greater information and interest for human cancers, but I wanted to change the type of cancer I studied to keep things new and interesting. I found this relatively recent research paper with good quality data that was easy to access and to reproduce. 

## Data
The link to the GEO dataset can all be found [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124541). It contains the ChIP-Seq data for K-562 cells (_H. sapiens_) and a control cell line (containing an empty vector). Each cell line was analyzed in triplicate, as well as with an input samples. For my analyses, I only considered the first K-562 cells (MYB1) and the corresponding input (inp-MYB1).  

## Main results and discussion
After performing an initial quality control of the data, the ChIP-Seq reads were mapped to a reference genome (hg19) using bowtie2 and peaks were called using MACS2. Peak calling is a method used to identify areas in a dataset that are enriched with aligned reads. These areas, or "peaks" are the parts of the DNA where a protein is surely interacting with the nucleotides. The focus of this analysis was studying the interaction of the transcription factor MYB with human DNA. After the peaks where called, a motif analysis was performed using RSAT and i-cisTarget, peaks were linked to specific genes using GREAT, and function was predicted using the STRING database. This all provided a comprehensive list of possible target genes for MYB.

The final, possible target genes predicted for the transcription factor MYB can be seen on **Table 1**. I will discuss each of the discovered genes individually, and will report how they are related to leukemia and how my results compare to the authors results for the same dataset. 

**Table 1.** Target genes predicted for MYB, obtained by peak calling by MACS2 and linking both direct and indirect peaks to genes using GREAT, cross-referenced with the STRING database.

| Possible target genes for MYB |
|-------------------------------|
| CEBPB                         |
| CREBBP                        |
| EP300                         |
| GATA1                         |
| GATA2                         |
| KMT2A                         |
| MYBBP1A                       |
| RUNX1                         |
| SPI1                          |
| TAL1                          |

**CEBPB**: Also known as CCAAT enhancer binding protein beta (C/EBPβ), it's an intronless gene that encodes for a transcription factor. This protein is involved in the regulation of genes involved in the immune system and inflammation, and has an important paralog called CEBPA. This protein also plays a crucial role in hematopoiesis, so it is no surprise it was predicted as one of the target genes of MYB. According to Kurata et al (2021), CEBPB induces ALL by reducing the number of mature B-lymphocytes in the bone marrow. The authors suggest the pathway of this protein could be a potential therapeutic target. 

**CREBBP**: CREB binding protein. Encoded by a gene known to play critical roles in embryonic development, growth control, and homeostasis. It has been reported that mutations in CREBBP are associated with ALL (Mullighan et al 2011). The researchers also found other genes whose mutations are associated with the disease, particularly in younger people, and suggest that these mutations were causing impairment in histone acetylation and confer resistance to cancer therapy. 

**EP300**: E1A binding protein p300, functions as a histone acetyltransferase that regulates transcription and is important for cell proliferation and differentiation. Several authors have reported that genetic abnormalities of EP300 are associated with leukemia, particularly pediatric ALL, as well as NK-cell leukemia and chronic-phase chronic myeloid leukemia. It has been reported that it has tumor-promoting roles (Zhu et al 2023). 

**GATA1**: GATA binding protein 1, belonging to the GATA family of transcription factors. This protein is important for the development of erythoids. This protein has been reported to also be associated with ALL, however, mutations in GATA1 are associated with the acquisition of ALL in individuals with Down syndrome, particularly for those younger than 4 years old (Hasle et al 2022). These researchers suggest that the combination of an extra chromosome 21 and mutations in GATA1 cooperate in a way that can result in the development of leukemia. 

**GATA2**: GATA binding protein 2, plays an essential role in regulating genes involved in the development of blood cells. More commonly associated with acute myeloid leukemia (AML), haploinsuffiency of GATA2 has also been reported to be associated with ALL due to aberrant activation by another protein that will be explained further down, KMT2A (Wang et al 2022). 

**KMT2A**: Lysine methyltransferase 2A, plays an essential role in hematopoiesis. Multiple chromosomal translocations of this genes are considered the main cause of ALL and AML cancers. KMT2A-rearranged affects more than 70% of new ALL diagnoses in infants (Górecki et al 2023). This rearrangement is associated with hyperleukocytosis (high leukocyte count), aggresive disease with early relapse, and poor prognosis. Younger patients with this phenotype have more aggressive cancers when compared to others with ALL. 

**MYBBP1A**: MYB binding protein 1a, a gene that encodes for a protein that binds specifically to the transcription facotr MYB. It has roles as a tumor suppressor, and mutations in this gene are associated with ALL, as well as pancreatic cancer (Abaji et al 2022). 

**RUNX1**: RUNX family transcription factor 1, it represents the alpha subunit of CBF (forms a complex with the cofactor CBFB) and i'ts involved in hematopoiesis. RUNX1 is involved in translocations seen in ALL, AML, and other cancers. Mutations in this gene are commonly associated with these types of cancers and they have been reported to be heritable, but de novo mutations are also seen in patients with the disease (Sood et al 2017). 

**SPI1**: Spi-1 proto-oncogene. It's involved in the activation of genes during the development of B-cells and myeloid tissue. Authors have reported that gene fusions involved SPI1 and other genes are involved in some cases of ALL, representing a small subset of all cases. These fusions result in inducing cell proliferation and the development of leukemia in pediatric patients (Seki et al 2017). 

**TAL1**: TAL bHLH transcription factor 1, erythroid differentiation factor. This protein regulates the expression of genes involved in processes like myeloid cell and erythrocyte differentiation. This transcription factor is essential for normal hematopoiesis, but when it is missexpresed in immature thymocytes, it is associated with ALL. This can happen by chromosomal translocation, intrachromosomal rearrangement, or mutations in the enchancer region (Sanda & Leong 2017). 

Clearly, MYB emerges as a pivotal player in leukemia development, forging interactions with multiple genes directly implicated in the development of this type of cancer. The original paper by Lemma et al. (2021) sheds light on some of these implicated genes, including GATA factors and KMT2 proteins. The arragement of gene expression pathways during blood cell development requires high levels of regulation. Any deviations from this program carry the risk of perturbing cellular states and, consequently, generating genetic abnormalities. Notably, one of these abnormalities manifests as acute lymphoid leukemia, emphasizing the urgency of unraveling MYB's regulatory pathways and target genes. A comprehensive understanding of these regulatory networks is crucial to decipher the complexities of MYB's influence and to pave the way for innovative therapies targeting some of the previously reported genes or other crucial elements within this pathway, offering new avenues for the treatment of blood cell cancers.

## References
Abaji R, Roux V, Yssaad IR, Kalegari P, Gagné V, Gioia R, Ferbeyre G, Beauséjour C, Krajinovic M. Characterization of the impact of the MYBBP1A gene and rs3809849 on asparaginase sensitivity and cellular functions. Pharmacogenomics. 2022 May;23(7):415-430. doi: 10.2217/pgs-2022-0010. Epub 2022 Apr 29. PMID: 35485735.
Cicirò Y, Sala A. MYB oncoproteins: emerging players and potential therapeutic targets in human cancer. Oncogenesis. 2021. 10, 19. doi: 10.1038/s41389-021-00309-y. 

Hasle H, Kline RM, Kjeldsen E, Nik-Abdul-Rashid NF, Bhojwani D, Verboon JM, DiTroia SP, Chao KR, Raaschou-Jensen K, Palle J, Zwaan CM, Nyvold CG, Sankaran VG, Cantor AB. Germline GATA1s-generating mutations predispose to leukemia with acquired trisomy 21 and Down syndrome-like phenotype. Blood. 2022 May 26;139(21):3159-3165. doi: 10.1182/blood.2021011463. PMID: 34758059; PMCID: PMC9136882.

Kurata M, Onishi I, Takahara T, Yamazaki Y, Ishibashi S, Goitsuka R, Kitamura D, Takita J, Hayashi Y, Largaesapda DA, Kitagawa M, Nakamura T. C/EBPβ induces B-cell acute lymphoblastic leukemia and cooperates with BLNK mutations. Cancer Sci. 2021 Dec;112(12):4920-4930. doi: 10.1111/cas.15164. Epub 2021 Oct 23. PMID: 34653294; PMCID: PMC8645713.

Lemma RB, Ledsaak M, Fuglerud BM, Sandve GK, Eskeland R, Gabrielsen OS. Chromatin occupancy and target genes of the haematopoietic master transcription factor MYB. Sci Rep. 2021 Apr 26;11(1):9008. doi: 10.1038/s41598-021-88516-w. PMID: 33903675; PMCID: PMC8076236.

Li M, Jiang P, Cheng K, Zhang Z, Lan S, Li X, Zhao L, Wang Y, Wang X, Chen J, Ji T, Han B, Zhang J. Regulation of MYB by distal enhancer elements in human myeloid leukemia. Cell Death Dis. 2021 Feb 26;12(2):223. doi: 10.1038/s41419-021-03515-z. PMID: 33637692; PMCID: PMC7910426.

Lorenzo PI, Brendeford EM, Gilfillan S, Gavrilov AA, Leedsak M, Razin SV, Eskeland R, Sæther T, Gabrielsen OS. Identification of c-Myb Target Genes in K562 Cells Reveals a Role for c-Myb as a Master Regulator. Genes Cancer. 2011 Aug;2(8):805-17. doi: 10.1177/1947601911428224. PMID: 22393465; PMCID: PMC3278898.

Luo B, Cheung HW, Subramanian A, Sharifnia T, Okamoto M, Yang X, Hinkle G, Boehm JS, Beroukhim R, Weir BA, Mermel C, Barbie DA, Awad T, Zhou X, Nguyen T, Piqani B, Li C, Golub TR, Meyerson M, Hacohen N, Hahn WC, Lander ES, Sabatini DM, Root DE. Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A. 2008 Dec 23;105(51):20380-5. doi: 10.1073/pnas.0810485105. Epub 2008 Dec 17. PMID: 19091943; PMCID: PMC2629277.

Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, Etchin J, Lawton L, Sallan SE, Silverman LB, Loh ML, Hunger SP, Sanda T, Young RA, Look AT. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science. 2014 Nov 346(6215), 1373–1377. doi: 10.1126/science.1259037.

Mullighan CG, Zhang J, Kasper LH, Lerach S, Payne-Turner D, Phillips LA, Heatley SL, Holmfeldt L, Collins-Underwood JR, Ma J, Buetow KH, Pui CH, Baker SD, Brindle PK, Downing JR. CREBBP mutations in relapsed acute lymphoblastic leukaemia. Nature. 2011 Mar 10;471(7337):235-9. doi: 10.1038/nature09727. PMID: 21390130; PMCID: PMC3076610.

Sanda T, Leong WZ. TAL1 as a master oncogenic transcription factor in T-cell acute lymphoblastic leukemia. Exp Hematol. 2017 Sep;53:7-15. doi: 10.1016/j.exphem.2017.06.001. Epub 2017 Jun 24. PMID: 28652130.

Seki M, Kimura S, Isobe T, Yoshida K, Ueno H, Nakajima-Takagi Y, Wang C, Lin L, Kon A, Suzuki H, Shiozawa Y, Kataoka K, Fujii Y, Shiraishi Y, Chiba K, Tanaka H, Shimamura T, Masuda K, Kawamoto H, Ohki K, Kato M, Arakawa Y, Koh K, Hanada R, Moritake H, Akiyama M, Kobayashi R, Deguchi T, Hashii Y, Imamura T, Sato A, Kiyokawa N, Oka A, Hayashi Y, Takagi M, Manabe A, Ohara A, Horibe K, Sanada M, Iwama A, Mano H, Miyano S, Ogawa S, Takita J. Recurrent SPI1 (PU.1) fusions in high-risk pediatric T cell acute lymphoblastic leukemia. Nat Genet. 2017 Aug;49(8):1274-1281. doi: 10.1038/ng.3900. Epub 2017 Jul 3. PMID: 28671687.

Sood R, Kamikubo Y, Liu P. Role of RUNX1 in hematological malignancies. Blood. 2017 Apr 13;129(15):2070-2082. doi: 10.1182/blood-2016-10-687830. Epub 2017 Feb 8. Erratum in: Blood. 2018 Jan 18;131(3):373. PMID: 28179279; PMCID: PMC5391618.

Wang H, Cui B, Sun H, Zhang F, Rao J, Wang R, Zhao S, Shen S and Liu Y. Aberrant GATA2 Activation in Pediatric B-Cell Acute Lymphoblastic Leukemia. Front. Pediatr. 2022. 9:795529. doi: 10.3389/fped.2021.795529

Zhu Y, Wang Z, Li Y, Peng H, Liu J, Zhang J, Xiao X. The Role of CREBBP/EP300 and Its Therapeutic Implications in Hematological Malignancies. Cancers (Basel). 2023 Feb 14;15(4):1219. doi: 10.3390/cancers15041219. PMID: 36831561; PMCID: PMC9953837.